![IE](../img/ie.png)

# Sessions 3 & 4: pip vs conda

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data (2020-05-12)

## Managing Python environments

![Python Comrades](../img/python_comrades.png)

> Simple is better than complex.
>
> Complex is better than complicated.

Packaging in Python has historically been _complicated_ , and nowadays it is still _complex_. Getting it wrong is **the most common option**, and therefore you will likely be exposed to broken Python installations.

### How do people install and upgrade Python?

https://www.jetbrains.com/lp/python-developers-survey-2019/

![Installation and upgrade](../img/install-upgrade.png)

Let's analyze the most common options one by one:

1. ~~Using the OS-provided Python~, the most common one, is **the wrong thing to do**. It requires admin privileges, and manipulating it might leave the system in a broken state. In Windows this is not a problem because Python is not pre-installed, but it is on Linux and macOS.
2. **Downloading it from Python.org** works in all operative systems, does not require admin permissions, allows you to choose the version you want, and ships a tool to create development environments (`venv`). However, `venv` cannot create environments with different Python versions (you're tied to the one you downloaded) and certain packages will not be readily available on Windows. Therefore, it is _not for everyone_.
3. **Using Anaconda** has all the advantages of Python.org, and additionally makes it trivial to install common Scientific/Data libraries on Windows using `conda`. However, mixing `conda` with the official Python package installer, `pip`, might produce unexpected results, and requires careful handling. This will be our choice.
4. Using Docker containers provides perfect isolation at the cost of complexity, and in fact some people argue that Docker should be left for _deployment_ rather than _development_. We will not explore this option.

### How do people create isolated development environments?

![Environment isolation](../img/environment-isolation.png)

1. `virtualenv` probably includes `venv` (standard library) and [`virtualenv`](https://virtualenv.pypa.io/en/stable/) itself (third party package with similar functionality). Packages are installed with `pip`. Useful to know because it's the most common option, but does not work with `conda`, so we won't use it in this course.
2. **Docker** is a highly popular tool for complex application development and deployment. Not so good for development.
3. **Conda** is capable of creating environments (like `venv` and `virtualenv`) and installing complex dependencies easily (like `tensorflow`, `xgboost` and others). Will be our choice.

### Summary

> For the user, the most salient distinction is probably this: pip installs _python_ packages within _any_ environment; conda installs _any_ package within _conda_ environments
>
> —Jake Vanderplas

To mix the best of both worlds and minimize risk, our (default) approach will be

* Use **conda** to _manage environments_ and _install non-Python dependencies_
* Use (recent versions of) **pip** inside conda environments to install Python dependencies

This way we will:

* Minimize incompatibilities between conda and pip
* Avoid the performance issues of conda while leveraging the improved dependency handling of modern pip (>= 19.0.3) 

### References

* Conda: Myths and Misconceptions https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/
* Using pip in a conda environment https://www.anaconda.com/using-pip-in-a-conda-environment/

## pip and PyPI

pip is the default Python installer. By default, it fetches packages from https://pypi.org/, which is the community repository for Python packages. As its Anaconda counterpart, it's not curated so anyone can upload anything - however, the concept of channels doesn't exist, so **there can't be name clashes**.

Several considerations must be taken into account while using `pip`:

* **Never, ever use `sudo pip install`**. You will break your system in very ugly ways. Create a conda environment instead.
* Check the pip version. The latest releases were:
  - 19.x (optimal)
  - 18.x
  - 10.x (yes, they switched to [calendar versioning](http://calver.org/) right after)
  - 9.x (between "old" and "very old")
  - <8.x (avoid like the plague!)
* As a general rule, _don't upgrade straight away_ - the developers iron the issues after each release

## conda and conda-forge

### Installation

1. Download Miniconda3 https://docs.conda.io/en/latest/miniconda.html (version is not important, _but_ ignore Python 2 😉)
2. Accept the license terms ([BSD 3-clause](https://tldrlegal.com/license/bsd-3-clause-license-(revised)))
3. Specify a location (the default one is fine)
4. Do **NOT** initialize Miniconda3 in `.bashrc` (we will learn and understand how it works instead)

If we follow the steps correctly we will see this message and we _won't_ be able to run conda, yet:

```
...
installation finished.
Do you wish the installer to initialize Miniconda3
in your /home/user00/.bashrc ? [yes|no]
[no] >>> no

You may wish to edit your /home/user00/.bashrc to setup Miniconda3:

source /home/user00/miniconda3/etc/profile.d/conda.sh

Thank you for installing Miniconda3!
user00@ns3003537:~$ conda --version
conda: command not found
```

Notice though that Miniconda3 was correctly installed!

```
user00@ns3003537:~$ ~/miniconda3/bin/conda --version
conda 4.5.12
```

To properly initialize conda and avoid having to use the full path every time, we can do:

```
user00@ns3003537:~$ source /home/user00/miniconda3/etc/profile.d/conda.sh
user00@ns3003537:~$ conda --version
conda 4.5.12
```

And, to make it permanent, add it to our `.bashrc`:

```
user00@ns3003537:~$ echo 'source /home/user00/miniconda3/etc/profile.d/conda.sh' >> .bashrc
user00@ns3003537:~$
```

<div class="alert alert-warning">Depending on how the machine was configured, the <code>~/.bashrc</code> trick might not work. There should be a <code>~/.profile</code> or <code>~/.bash_profile</code> telling the system to load it on login.
</div>

<div class="alert alert-warning"><b>ABB</b>: In my case <code>~/.bash_profile</code> has been superseded by <code>~/.zshrc</code>
</div>

### Basic usage

> It is highly recommended to have the [conda cheatsheet](https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html) at hand.

Even if we initialized conda already (remember: **not** on installation, but right after) we won't have access to the Python that comes with it:

```
user00@ns3003537:~$ which python
user00@ns3003537:~$ which python3
/usr/bin/python3
```

The reason is that we have to **activate an environment**. conda comes with an environment called `base` by default, and we can activate it using `conda activate <environment>`:

```
user00@ns3003537:~$ conda info -e  # Lists environments
# conda environments:
#
base                  *  /home/user00/miniconda3

user00@ns3003537:~$ conda activate base  # Notice the prompt change!
(base) user00@ns3003537:~$ which python
/home/user00/miniconda3/bin/python
(base) user00@ns3003537:~$ which python3
/home/user00/miniconda3/bin/python3
(base) user00@ns3003537:~$ which pip
/home/user00/miniconda3/bin/pip
(base) user00@ns3003537:~$ echo $PATH  # Here's the trick!
/home/user00/miniconda3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
```

And we can go back to "normal" using `conda deactivate`:

```
(base) user00@ns3003537:~$ conda deactivate
user00@ns3003537:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
```

<div class="alert alert-warning"><strong>Note:</strong> Avoid using the <code>base</code> environment: you will pollute it, and eventually break it. Instead, get used to creating one environment per project.</div>

### Environment creation

To create an environment we use `conda create --name <name> <list-of-packages>`. We don't need to specify all the packages we will need, but it's customary to set the Python version, and sometimes also NumPy ([if you want extra performance](https://www.anaconda.com/tensorflow-in-anaconda/)).

For example, to create a environment for our ie-nlp-utils project using Python 3.7:

```
user00@ns3003537:~$ conda create -n nlp37 python=3.7
Solving environment: /
...
Proceed ([y]/n)? y
...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate nlp37
#
# To deactivate an active environment, use
#
#     $ conda deactivate

user00@ns3003537:~$ conda activate nlp37
(nlp37) user00@ns3003537:~$ 
```

### Channels

![conda-forge](../img/conda-forge.png)

Anaconda (formerly Continuum Analytics), the company behind the Anaconda product and conda, uploads all conda packages to [their main repository](https://repo.anaconda.com/), so `conda` knows where to look for them. However, there is a [_community repository_](https://anaconda.org/) as well, where anyone can upload any packages.

To decide where to download the packages from, `conda` has the concept of **channels**. The `defaults` channel is implicit, but we can decide to install a specific package from a specific channel. The most imporant of these channels (and almost the only one you should care about) is **conda-forge**:

https://conda-forge.org/

> A community led collection of recipes, build infrastructure and distributions for the conda package manager.

The `defaults` channel does not have _all_ the packages available out there, and also doesn't usually have the latest versions. The reason is that they are a bit more conservative to please corporate users.

To install a package from conda-forge, we can especify the channel in two ways:

* `$ conda install numpy --channel conda-forge`  (or `-c conda-forge`)
* `$ conda install conda-forge::numpy`

To configure `conda` to use `conda-forge` first:

```
user00@ns3003537:~$ conda config --prepend channels conda-forge
user00@ns3003537:~$ cat ~/.condarc
channels:
  - conda-forge
  - defaults
```

(See [the tips and tricks](http://conda-forge.org/docs/user/tipsandtricks.html) section of conda-forge documentation for more information)