# Loading conda environments in jupyter kernels

If you have created a custom conda environment and jupyter kernel, you may have noticed that executables are not added to the shell path inside the notebook. This notebook will show you how to support this type of workflow by demonstrating how to

- Create a conda environment with executables
- Access those tools on the CLI
- Create a Jupyter kernel for this environment
- Clean your local environment

We have taken the time to remove most of the hurdles to this process, but please submit an issue to

https://github.com/SD2E/jupyteruser-sd2e/issues

if you run into any walls.

## Creating a new conda envrionment

We are utilizing the [Anaconda python distribution](https://www.anaconda.com/what-is-anaconda/) for SD2E since it 

- ships with the Intel MKL
- manages python and system packages
- allows and manages encapsulated environments

So we can support as many development environments as possible. If you would like a brand new environment, but do not want it to be incorporated into the base image (what you are running right now), you can create a local environment that conda will manage and persist between server restarts.

First, lets see what `conda` environments are currently available:

In [1]:
conda env list

# conda environments:
#
base                  *  /opt/conda
python2                  /opt/conda/envs/python2



If you have not modified your environment, you should see two environments:

- **base** - (python3) activated with `source activate base`
- **python2** - activated with `source activate python2`

I am developing a new workflow that is dependant on using the BWA aligner inside my Jupyter notebook. While it is possible to `source activate` a different environment on the terminal, this does not persist between shell call in python.

To start, I am going to create a brand new Python 3.6 environment called `bioinfo` containing [**bwa**](https://github.com/lh3/bwa) from the [bioconda](https://bioconda.github.io/) channel using [`conda create`](https://conda.io/docs/commands/conda-create.html). We also created a `LOCAL_ENVS` variable for you to prefix your environment with, and added it to the path that conda crawls when searching for environments.

In [2]:
conda create -y -p $LOCAL_ENVS/bioinfo -c defaults -c conda-forge -c bioconda \
    python=3.6 ipykernel 'bwa==0.7.17'

# -y                 - respond yes to all questions
# -p PATH            - prefix/name
# -c defaults        - resolve packages in defaults first
# -c conda-forge     - look for packages in conda-forge second
# -c bioconda        - look for packages in bioconda LAST
# python=3.6         - python3 environment
# ipykernel          - required to make a kernel file later
# 'bwa==0.7.17'      - install bwa

Solving environment: done


  current version: 4.5.1
  latest version: 4.5.4

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /home/jupyter/tacc-work/jupyter_packages/envs/bioinfo

  added / updated specs: 
    - bwa==0.7.17
    - ipykernel
    - python=3.6


The following NEW packages will be INSTALLED:

    backcall:         0.1.0-py36_0            defaults   
    bwa:              0.7.17-pl5.22.0_2       bioconda   
    ca-certificates:  2018.03.07-0            defaults   
    certifi:          2018.4.16-py36_0        defaults   
    decorator:        4.3.0-py36_0            defaults   
    ipykernel:        4.8.2-py36_0            defaults   
    ipython:          6.4.0-py36_0            defaults   
    ipython_genutils: 0.2.0-py36hb52b0d5_0    defaults   
    jedi:             0.12.0-py36_1           defaults   
    jupyter_client:   5.2.3-py36_0            defaults   
    jupyter_core:     4.4.0-py36h7c827e3_0    def

You should now be able to list this new environment

In [3]:
conda env list

# conda environments:
#
bioinfo                  /home/jupyter/tacc-work/jupyter_packages/envs/bioinfo
base                  *  /opt/conda
python2                  /opt/conda/envs/python2



You can now see that this new `bioinfo` environment was created in your `tacc-work` directory, so it will persist between reboots until your delete it.

## Accessing environment on the CLI

Since `conda` can see your new environment, you just need to activate it to update your environment paths so you can ineract with the various
- executables
- libraries
- python modules

In [4]:
# Activate bioinfo environment
source activate bioinfo

# See that bioinfo is loaded
conda env list

(bioinfo) (bioinfo) (bioinfo) # conda environments:
#
bioinfo               *  /home/jupyter/tacc-work/jupyter_packages/envs/bioinfo
base                     /opt/conda
python2                  /opt/conda/envs/python2

(bioinfo) 

: 1

In [5]:
# Call bwa
bwa

# Deactivate for good measure
source deactivate bioinfo

(bioinfo) 
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
     

## Creating a Jupyter kernel that uses the environment

To use your new environment with Jupyter, you need to first create a new kernel spec. Once again, we created a convenience variable called `JUPYTER_WORK` which the Jupyter server automatically polls. To create it, invoke the python in your new environment
```
$LOCAL_ENVS/bioinfo
```
with the following arguments:

In [6]:
$LOCAL_ENVS/bioinfo/bin/python -m ipykernel install \
    --prefix $JUPYTER_WORK \
    --name bioinfo \
    --display-name "Bioinformatics"

# !! MUST use full $LOCAL_ENVS/bioinfo/bin/python path !!

# -m ipykernel install              - install a jupyter kernel
# --prefix $JUPYTER_WORK            - kernel installation path
# --name bioinfo                    - name of the kernel (no spaces)
# --display-name "Bioinformatics"   - the name that will display in the drop-down list

Installed kernelspec bioinfo in /home/jupyter/tacc-work/jupyter_packages/share/jupyter/kernels/bioinfo


You will also need to modify your jupyter kernel to so it first activates your environment

In [7]:
# The kernel you want to modify
KERN=bioinfo
# Temporary kernel
TMPF=/tmp/tmp.json
# Kernel you want to modify
MODK=${JUPYTER_WORK}/share/jupyter/kernels/${KERN}/kernel.json

echo "Original kernel"
cat $MODK

# Modify kernel with jq
echo -e "\n\nModified kernel"
jq " .argv = [\"activate_kernel\", \"${KERN}\", \"{connection_file}\"] " ${MODK} | tee $TMPF
mv $TMPF ${MODK}

Original kernel
{
 "argv": [
  "/home/jupyter/tacc-work/jupyter_packages/envs/bioinfo/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Bioinformatics",
 "language": "python"
}

Modified kernel
{
  "argv": [
    "activate_kernel",
    "bioinfo",
    "{connection_file}"
  ],
  "display_name": "Bioinformatics",
  "language": "python"
}


All paths will now be correct in notebooks running this kernel.

### Launch Notebook

<img width="400" alt="launch notebook" src="https://user-images.githubusercontent.com/6790115/40860685-e5f624d6-65ab-11e8-81ca-f4f6f9ba18cd.png">

### Run BWA in python notebook

<img width="700" alt="run bwa" src="https://user-images.githubusercontent.com/6790115/40860716-05dd89f6-65ac-11e8-9cb3-93938eda19f0.png">

## Clean your environment

When you want to de-clutter your development environment and remove both this kernel and environment, you just have to delete two directories:

In [8]:
# Conda environment
rm -rf $LOCAL_ENVS/bioinfo
# Jupyter kernel
rm -rf $JUPYTER_WORK/share/jupyter/kernels/bioinfo

All traces of this environment should now be gone.

In [9]:
conda env list

# conda environments:
#
base                  *  /opt/conda
python2                  /opt/conda/envs/python2



## Extra functionality

If you think we should handle additional functionality in the SD2E Jupyter environment, please submit a feature request to

https://github.com/SD2E/jupyteruser-sd2e/issues

Thanks and happy hacking!

-- TACC