# Package Management with Conda and Pip

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Package-Management-with-Conda-and-Pip" data-toc-modified-id="Package-Management-with-Conda-and-Pip-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Package Management with Conda and Pip</a></span><ul class="toc-item"><li><span><a href="#Anaconda-vs-Miniconda-vs-&quot;conda&quot;" data-toc-modified-id="Anaconda-vs-Miniconda-vs-&quot;conda&quot;-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Anaconda vs Miniconda vs "conda"</a></span></li><li><span><a href="#pip" data-toc-modified-id="pip-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>pip</a></span></li><li><span><a href="#Installing-and-testing-conda" data-toc-modified-id="Installing-and-testing-conda-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Installing and testing conda</a></span></li><li><span><a href="#conda-gotchas" data-toc-modified-id="conda-gotchas-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>conda gotchas</a></span></li><li><span><a href="#conda-&quot;channels&quot;" data-toc-modified-id="conda-&quot;channels&quot;-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>conda "channels"</a></span></li><li><span><a href="#&quot;conda-forge&quot;" data-toc-modified-id="&quot;conda-forge&quot;-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>"conda-forge"</a></span></li><li><span><a href="#Installing-packages" data-toc-modified-id="Installing-packages-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Installing packages</a></span></li><li><span><a href="#Upgrading-packages" data-toc-modified-id="Upgrading-packages-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Upgrading packages</a></span></li><li><span><a href="#Conda-environments" data-toc-modified-id="Conda-environments-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Conda environments</a></span></li><li><span><a href="#Conda-environment-demo" data-toc-modified-id="Conda-environment-demo-1.10"><span class="toc-item-num">1.10&nbsp;&nbsp;</span>Conda environment demo</a></span></li><li><span><a href="#Fixing-a-broken-environment" data-toc-modified-id="Fixing-a-broken-environment-1.11"><span class="toc-item-num">1.11&nbsp;&nbsp;</span>Fixing a broken environment</a></span></li><li><span><a href="#Reproducible-Science" data-toc-modified-id="Reproducible-Science-1.12"><span class="toc-item-num">1.12&nbsp;&nbsp;</span>Reproducible Science</a></span><ul class="toc-item"><li><span><a href="#Clone-original-environment-to-temporary-environment" data-toc-modified-id="Clone-original-environment-to-temporary-environment-1.12.1"><span class="toc-item-num">1.12.1&nbsp;&nbsp;</span>Clone original environment to temporary environment</a></span></li><li><span><a href="#Remove-original-environment,-replace-with-clone-of-temporary-environment" data-toc-modified-id="Remove-original-environment,-replace-with-clone-of-temporary-environment-1.12.2"><span class="toc-item-num">1.12.2&nbsp;&nbsp;</span>Remove original environment, replace with clone of temporary environment</a></span></li></ul></li></ul></li></ul></div>

## Anaconda vs Miniconda vs "conda"
Anaconda is a free and open-source distribution of the Python programming language for scientific computing. Anaconda includes a wide selection of Python packages that are installed by default, with the ability to install more packages using the "conda" package manager program.

Miniconda is a lightweight implementation of the Anaconda distribution that provides the "conda" package manager, but does not include the large collection of scientific Python packages installed by default like Anaconda does.

"conda" is simply the package and environment manager program that allows new software to be installed. The "conda" program is available whether you choose to install Anaconda or Miniconda.

## pip

Pip is a more basic package manager than conda that allows you to install software from PyPI (Python Package Index) as well as from GitHub. It works particularly well for pure Python packages, but things can get complicated when compiled code and external (non-Python) dependencies are involved. 

Not all packages are available on conda, so pip is still useful even if you're primarily using conda. All conda environments that have Python installed should also include pip by default.

```
pip install -e 'git+https://github.com/NCAR/esmlab.git#egg=esmlab'
```

`-e` installs in "editable" mode

`git+` at the beginning of a URL installs from a git repository

## Installing and testing conda

Download and run the conda installer script 
```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
sh ./Miniconda3-latest-MacOSX-x86_64.sh
```
OR
```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh ./Miniconda3-latest-Linux-x86_64.sh
```

Then open a new terminal and check to ensure the `conda` program exists:

In [None]:
conda --version
which conda

## conda gotchas

Note that the conda installation instructions recommend running `conda init {shell}`, but that this will likely result in whichever Python installation was previously used by default being overridden by the new conda-provided Python. The safest way to install conda without interfering with existing Python installs would be to add the directory `/path/to/miniconda/condabin` to your PATH environment variable, which will provide just the `conda` program but not `python`.

Behavior with shells other than `bash` (`tcsh` in particular) is a bit inconsistent. `conda activate` does not seem to work properly in `tcsh`, but you could manually set your PATH environment variable to include the appropriate environment's `bin` directory.

## conda "channels"

"Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. The conda command searches a default set of channels, and packages are automatically downloaded and updated from https://repo.anaconda.com/pkgs/. You can modify what remote channels are automatically searched. You might want to do this to maintain a private or internal channel. For details, see how to modify your channel lists." - [conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html)

Generally speaking, conda "channels" are intended to provide packages that are guaranteed to be compatible with each other. Mixing and matching packages between channels is a common source of frustration for users, so using a single source for all of your packages is generally preferred, if possible.

## "conda-forge"

The `conda-forge` channel is a community led collection of recipes and packages. As of June 4, 2019, There are 6862 repositories (nearly all of which represent unique conda packages) and 1373 members in the conda-forge organization on GitHub.

I usually recommend configuring conda to use `conda-forge` by default:

In [None]:
conda config --add channels conda-forge

## Installing packages

In [None]:
conda activate base
conda install -y python=3
conda install -y numpy xarray

## Upgrading packages

In [None]:
conda activate base
conda update -y python numpy xarray

## Conda environments

By default, conda operates in the `base` "environment". However, this means that every package installed in the `base` environment must be compatible with each other, even if they are not all used for the same projects. Installing packages into separate environments for each project/task prevents any possible collision between packages.

`conda env create -f /path/to/environment.yml # .yml file contains env name and packages to be installed`

In [None]:
conda create -y --name env1 python=2.7 numpy >/dev/null
conda create -y --name env2 python=3 numpy xarray >/dev/null

## Conda environment demo

In [None]:
conda activate env1
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__)'
conda deactivate

In [None]:
conda activate env2
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__)'
conda deactivate

## Fixing a broken environment

In [None]:
conda deactivate
conda env remove -n broken >/dev/null 2>&1
conda env create -f broken.yml >/dev/null 2>&1 # this will fail because there is no broken.yml file included
conda activate broken
echo "broken NCL..."
ncl -V

In [None]:
echo 'running "conda update" to fix NCL'
conda update --all -y >/dev/null 2>&1
echo "fixed NCL"
ncl -V
conda deactivate

## Reproducible Science

Backup a working production environment using `conda create` with the `--clone` option, update/install packages as needed, and then test the clone environment to ensure everything still works as expected. Once the clone environment has been verified, `conda env remove` the original environment and clone the new environment back to the original name, and verify that everything is still working.

In [None]:
conda create -y --name original_env python=3 numpy xarray >/dev/null
conda activate original_env
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__);import dask;print("dask version: %s" % dask.__version__)'
conda deactivate

### Clone original environment to temporary environment

In [None]:
conda create -y --name temp_env --clone original_env >/dev/null
conda activate temp_env
conda update -y --all # update any packages
conda install -y dask >/dev/null
# run tests
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__);import dask;print("dask version: %s" % dask.__version__)'
conda deactivate

### Remove original environment, replace with clone of temporary environment

In [None]:
conda env remove --name original_env
conda create --name original_env --clone temp_env >/dev/null
conda activate original_env
# run tests again
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__);import dask;print("dask version: %s" % dask.__version__)'
conda env remove --name temp_env
conda deactivate