# How to build and deploy reproducible environments?

This notebook has been produced for a mini workshop about Conda and Docker. It was presented as a seminar for the evo-adapt scientific animation.

January 31th 2025

## Install with `conda`

`conda` is a package manager, mostly known in the `python` programming community (historically it is a package manager for Python packages), but it is now widely used in bio-informatics and can be used to install a large number of softwares with C, C++, Python, R dependencies.

Conda installs virtual environments. They are installed in a `~/.conda` directory per default, though it can be configured to be your working directory, or any other directory that makes sense to you.

In [7]:
!ls -lha ~/.conda

# Virtual envs are in separate directories in `~/.conda/envs/`
!ls -lha ~/.conda/envs/ | head -n 8

# Package binaries are downloaded and stored in separate directories in `~/.conda/pkgs/`
!ls -lha ~/.conda/pkgs/ | head -n 8

total 88K
drwxr-xr-x.   4 tbrazier UR1 4,0K 10 oct.  17:22 .
drwx------.  40 tbrazier UR1 4,0K 20 janv. 11:55 ..
-rw-r--r--.   1 tbrazier UR1  563 20 janv. 11:52 environments.txt
drwxrwsr-x.  13 tbrazier UR1 4,0K 20 janv. 11:52 envs
drwxrwsr-x. 646 tbrazier UR1  68K 20 janv. 11:52 pkgs
total 52K
drwxrwsr-x. 13 tbrazier UR1 4,0K 20 janv. 11:52 .
drwxr-xr-x.  4 tbrazier UR1 4,0K 10 oct.  17:22 ..
drwxr-sr-x. 12 tbrazier UR1 4,0K  4 janv. 10:07 bcftools
-rw-r--r--.  1 tbrazier UR1    0 10 oct.  17:22 .conda_envs_dir_test
drwxr-sr-x.  8 tbrazier UR1 4,0K 17 déc.  14:44 goat
drwxr-sr-x. 17 tbrazier UR1 4,0K  4 janv. 13:17 herho
drwxr-sr-x. 17 tbrazier UR1 4,0K 16 janv. 14:13 jasminesv
total 1,7G
drwxrwsr-x. 646 tbrazier UR1  68K 20 janv. 11:52 .
drwxr-xr-x.   4 tbrazier UR1 4,0K 10 oct.  17:22 ..
drwxr-sr-x.   7 tbrazier UR1 4,0K 10 oct.  17:30 alsa-lib-1.2.12-h4ab18f5_0
-rw-r--r--.   1 tbrazier UR1 543K 10 oct.  17:30 alsa-lib-1.2.12-h4ab18f5_0.conda
drwxr-sr-x.   7 tbrazier UR1 4,0K  4 ja

You don't have to bother about these directories. You manage all your packages and envs through conda commands. Each `conda <command>` has a specific purpose and set of options. The most used commands are `conda create`, `conda install`, `conda remove` and `conda clean`.

In [1]:
# conda commands
!conda --help

usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.

Options:

positional arguments:
  command
    clean        Remove unused packages and caches.
    compare      Compare packages between conda environments.
    config       Modify configuration values in .condarc. This is modeled
                 after the git config command. Writes to the user .condarc
                 file (/home/tbrazier/.condarc) by default.
    create       Create a new conda environment from a list of specified
                 packages.
    info         Display information about current conda install.
    init         Initialize conda for shell interaction.
    install      Installs a list of packages into a specified conda
                 environment.
    list         List linked packages in a conda environment.
    package      Low-level conda package utility. (EXPERIMENTAL)
    remove       Remove a list of packages from a specified conda 

In [None]:
# Create a new virtual env
%conda create --yes --quiet --name workshop python=3.11 # Can be long, conda is not fast
%conda activate workshop

# Install a package
%conda install -y -q numpy

# Remove a package
%conda remove -y numpy

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /home/tbrazier/.conda/envs/workshop

  added / updated specs:
    - python=3.11


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
  ca-certificates    pkgs/main/linux-64::ca-certificates-2024.12.31-h06a4308_0
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
  libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
  libuuid            pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
  ncurses            pkgs/ma

You can also manage your conda envs with `conda env <command>`. It is much easier than using directly `conda` commands and allows you to use `.yaml` file to define your future virtual env. This `yaml` file is crucial to keep a trace of what has been installed in your env and to replicate it automatically in another machine/user.

A `yaml` is a basic human-readable markup language. It has a simple structure used to define/setup the virtual env.

In [None]:
# Example of a basic yaml file for conda env
name: workshop-test-1 # the name of the virtual env
channels: # The conda channels where to look for packages
    - conda-forge
dependencies: # packages to install
    - python=3.7 # Specify the required version
    - numpy

In [11]:
!conda env --help

usage: conda-env [-h] {create,export,list,remove,update,config} ...

positional arguments:
  {create,export,list,remove,update,config}
    create              Create an environment based on an environment
                        definition file. If using an environment.yml file (the
                        default), you can name the environment in the first
                        line of the file with 'name: envname' or you can
                        specify the environment name in the CLI command using
                        the -n/--name argument. The name specified in the CLI
                        will override the name specified in the
                        environment.yml file. Unless you are in the directory
                        containing the environment definition file, use -f to
                        specify the file path of the environment definition
                        file you want to use.
    export              Export a given environment
    list          

In [13]:
# Create a conda virtual env from a file
!conda env create -q -f workshop-test-1.yaml

# Update with a new package
!conda env update -q -f workshop-test-2.yaml

!conda env remove -n workshop-test-1


CondaValueError: prefix already exists: /home/tbrazier/.conda/envs/workshop-test-1

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... ^C

Remove all packages in environment /home/tbrazier/.conda/envs/workshop-test-1:



After you have installed your `conda` packages, it is good practice to do `conda clean` to remove all cached tarballs and unused packages (remember, they are in `~/.conda/pkgs/). These packages are decompressed in a lot of small files and you can have quota issues on a cluster due to all these files.

Note that having all your envs defined in `yaml` files is really a time-saver. You can safely remove envs on your cluster to save space and recompile them easily when needed.

In [14]:
!conda clean -y --all

Will remove 682 (1.79 GB) tarball(s).
Proceed ([y]/n)? ^C

CondaSystemExit: 
Operation aborted.  Exiting.



### Advanced. Complex conda env

With `conda env` and `yaml` files you can configure complex environments. You can include `pip` packages (another Python package manager), `R` packages, `C` libraries, Unix command line softwares.

In [15]:
!cat workshop-test-3.yaml

# Example of a complex yaml file for conda env
name: workshop-test-complex # the name of the virtual env
channels:
  - mamba
  - conda-forge
  - bioconda
dependencies:
  - git # Unix software
  - r-base # R
  - jq
  - bcftools # bioinformatic software
  - vcftools # bioinformatic software
  - samtools # bioinformatic software
  - htslib # bioinformatic library
  - blas
  - cyvcf2
  - gsl # GNU scientific library in C
  - openssl>1.0 # Unix software
  - pip # pip package manager
  - python=3.8
  - pip: # install with pip - not in conda or dependencies for github installs
    - cython
    - msprime
    - numba
    - numpy
    - scikit-learn
    - pandas
    - "--editable=git+https://github.com/popgenmethods/ldpop.git#egg=ldpop" # Install directly from github
    - tskit
    - "--editable=git+https://github.com/popgenmethods/pyrho.git#egg=pyrho" # Install directly from github


### Advanced. Create your own conda recipe

## Containers