![IE](../img/ie.png)

# Sessions 4 & 5: pip vs conda

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data (2019-04-05)

## Managing Python environments

![Python Comrades](../img/python_comrades.png)

Bootstrapping a working Python installation in a future-proof way can be tough. Linux distributions carry a ([sometimes crippled](https://github.com/pypa/pip/issues/4222#issuecomment-417646535)) Python _that should **never** be used for development_, OS X does the same thing with even older version, and getting the `%PATH%` to work on Windows is not exactly newcomers-friendly.

Furthermore, even with a correct Python installation (let's assume Python >= 3.4 for sanity), the native [environment creation utility](https://docs.python.org/3/library/venv.html) has two problems:

1. It's tied to the originating Python version (e.g. if I install Python 3.6, I can't create a Python 3.7 environment)
2. For packages that require compiled extensions (mostly scientific/data packages, but not only), sometimes admin intervention is needed to install some system dependency

There are three popular solutions for these problems nowadays:

* Use [pyenv](https://github.com/pyenv/pyenv/) to manage Python environments. It solves (1) by allowing several Python versions, and doesn't handle (2). The main advantage is that several Python interpreters can be installed seamlessly.
* Use [Docker](https://docs.docker.com/engine/). It solves (1) and (2) by giving you _an entire operating system_, which is a bit overkill, and some people claim [it's optimized for deployment, not development](https://github.com/moby/moby/issues/7198#issuecomment-230965019). The main advantage is its total flexibility.
* **Use [conda](https://conda.io/)**. It solves (1) and (2) by providing a somewhat language-agnostic package and environment manager that does not require admin privileges. We will use this solution, even though it requires some care when mixing it with pip and its performance is (at the time of writing) not very good.

### Summary

> For the user, the most salient distinction is probably this: pip installs _python_ packages within _any_ environment; conda installs _any_ package within _conda_ environments
>
> —Jake Vanderplas

To mix the best of both worlds and minimize risk, our (default) approach will be

* Use **conda** to _manage environments_ and _install non-Python dependencies_
* Use (recent versions of) **pip** inside conda environments to install Python dependencies

This way we will:

* Avoid the cumbersome multi-step process of pyenv
* Avoid incompatibilities between conda and pip
* Avoid the performance issues of conda while leveraging the improved dependency handling of modern pip (>= 19.0.3) 

### References

* Conda: Myths and Misconceptions https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/
* Using pip in a conda environment https://www.anaconda.com/using-pip-in-a-conda-environment/

## conda and conda-forge

### Installation

1. Download Miniconda3 https://docs.conda.io/en/latest/miniconda.html (version is not important, _but_ ignore Python 2 😉)
2. Accept the license terms ([BSD 3-clause](https://tldrlegal.com/license/bsd-3-clause-license-(revised)))
3. Specify a location (recommendation: `/home/userXX/.miniconda37`, although the default one is fine)
4. Do **NOT** initialize Miniconda3 in `.bashrc` (we will learn and understand how it works instead)

If we follow the steps correctly we will see this message and we _won't_ be able to run conda, yet:

```
...
installation finished.
Do you wish the installer to initialize Miniconda3
in your /home/user00/.bashrc ? [yes|no]
[no] >>> no

You may wish to edit your /home/user00/.bashrc to setup Miniconda3:

source /home/user00/miniconda3/etc/profile.d/conda.sh

Thank you for installing Miniconda3!
user00@ns3003537:~$ conda --version
conda: command not found
```

Notice though that Miniconda3 was correctly installed!

```
user00@ns3003537:~$ ~/miniconda3/bin/conda --version
conda 4.5.12
```

To properly initialize conda and avoid having to use the full path every time, we can do:

```
user00@ns3003537:~$ source /home/user00/miniconda3/etc/profile.d/conda.sh
user00@ns3003537:~$ conda --version
conda 4.5.12
```

And, to make it permanent, add it to our `.bashrc`:

```
user00@ns3003537:~$ echo 'source /home/user00/miniconda3/etc/profile.d/conda.sh' >> .bashrc
user00@ns3003537:~$
```

<div class="alert alert-warning">Depending on how the machine was configured, the <code>~/.bashrc</code> trick might not work. There should be a <code>~/.profile</code> or <code>~/.bash_profile</code> telling the system to load it on login.</div>

### Basic usage

Even if we initialized conda already (remember: **not** on installation, but right after) we won't have access to the Python that comes with it:

```
user00@ns3003537:~$ which python
user00@ns3003537:~$ which python3
/usr/bin/python3
```

The reason is that we have to **activate an environment**. conda comes with an environment called `base` by default, and we can activate it using `conda activate <environment>`:

```
user00@ns3003537:~$ conda info -e  # Lists environments
# conda environments:
#
base                  *  /home/user00/miniconda3

user00@ns3003537:~$ conda activate base  # Notice the prompt change!
(base) user00@ns3003537:~$ which python
/home/user00/miniconda3/bin/python
(base) user00@ns3003537:~$ which python3
/home/user00/miniconda3/bin/python3
(base) user00@ns3003537:~$ which pip
/home/user00/miniconda3/bin/pip
(base) user00@ns3003537:~$ echo $PATH  # Here's the trick!
/home/user00/miniconda3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
```

And we can go back to "normal" using `conda deactivate`:

```
(base) user00@ns3003537:~$ conda deactivate
user00@ns3003537:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
```

<div class="alert alert-warning"><strong>Note:</strong> Avoid using the <code>base</code> environment: you will pollute it, and eventually break it. Instead, get used to creating one environment per project.</div>

### Environment creation

To create an environment we use `conda create --name <name> <list-of-packages>`. We don't need to specify all the packages we will need, but it's customary to set the Python version, and sometimes also NumPy ([if you want extra performance](https://www.anaconda.com/tensorflow-in-anaconda/)).

For example, to create a environment for our ie-nlp-utils project using Python 3.7:

```
user00@ns3003537:~$ conda create -n nlp37 python=3.7
Solving environment: /
...
Proceed ([y]/n)? y
...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate nlp37
#
# To deactivate an active environment, use
#
#     $ conda deactivate

user00@ns3003537:~$ conda activate nlp37
(nlp37) user00@ns3003537:~$ 
```

### Channels

![conda-forge](../img/conda-forge.png)

Anaconda (formerly Continuum Analytics), the company behind the Anaconda product and conda, uploads all conda packages to [their main repository](https://repo.anaconda.com/), so `conda` knows where to look for them. However, there is a [_community repository_](https://anaconda.org/) as well, where anyone can upload any packages.

To decide where to download the packages from, `conda` has the concept of **channels**. The `defaults` channel is implicit, but we can decide to install a specific package from a specific channel. The most imporant of these channels (and almost the only one you should care about) is **conda-forge**:

https://conda-forge.org/

> A community led collection of recipes, build infrastructure and distributions for the conda package manager.

The `defaults` channel does not have _all_ the packages available out there, and also doesn't usually have the latest versions. The reason is that they are a bit more conservative to please corporate users.

To install a package from conda-forge, we can especify the channel in two ways:

* `$ conda install numpy --channel conda-forge`  (or `-c conda-forge`)
* `$ conda install conda-forge::numpy`

To configure `conda` to use `conda-forge` first:

```
user00@ns3003537:~$ conda config --prepend channels conda-forge
user00@ns3003537:~$ cat ~/.condarc
channels:
  - conda-forge
  - defaults
```

(See [the tips and tricks](http://conda-forge.org/docs/user/tipsandtricks.html) section of conda-forge documentation for more information)

## pip and PyPI

pip is the default Python installer. By default, it fetches packages from https://pypi.org/, which is the community repository for Python packages. As its Anaconda counterpart, it's not curated so anyone can upload anything - however, the concept of channels doesn't exist, so **there can't be name clashes**.

Several considerations must be taken into account while using `pip`:

* **Never, ever use `sudo pip install`**. You will break your system in very ugly ways. Create a conda environment instead.
* Check the pip version. The latest releases were:
  - 19.x (optimal)
  - 18.x
  - 10.x (yes, they switched to [calendar versioning](http://calver.org/) right after)
  - 9.x (between "old" and "very old")
  - <8.x (avoid like the plague!)
* As a general rule, _don't upgrade straight away_ - the developers iron the issues after each release