![IE](../img/ie.png)

# Sessions 3 & 4: pip vs conda

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data

## Managing Python environments

![Python Comrades](../img/python_comrades.png)

> Simple is better than complex.
>
> Complex is better than complicated.

Packaging in Python has historically been _complicated_ , and nowadays it is still _complex_. Getting it wrong is **the most common option**, and therefore you will likely be exposed to broken Python installations.

### How do people install and upgrade Python?

https://www.jetbrains.com/lp/python-developers-survey-2020/

![Installation and upgrade](../img/install-upgrade.png)

Let's analyze the most common options one by one:

1. **Downloading it from Python.org**, the most common option, works in all operative systems, does not require admin permissions, allows you to choose the version you want, and ships a tool to create development environments (`venv`). However, `venv` cannot create environments with different Python versions (you're tied to the one you downloaded) and certain packages will require extra steps to be installed. Therefore, it is _not for everyone_.
2. ~Using the OS-provided Python~, the second most common one, applies to Linux and macOS, and it is **the wrong thing to do**. It requires admin privileges, and manipulating the installation might leave the system in a broken state.
3. **Using Anaconda** has all the advantages of Python.org, and additionally makes it trivial to install common Scientific/Data libraries on Windows using `conda`. However, mixing `conda` with the official Python package installer, `pip`, might produce unexpected results, and requires careful handling. This will be our choice.
4. Using Docker containers provides perfect isolation at the cost of complexity, and in fact some people argue that Docker should be left for _deployment_ rather than _development_. We will not explore this option.

### How do people create isolated development environments?

![Environment isolation](../img/environment-isolation.png)

1. `virtualenv` probably includes `venv` (standard library) and [`virtualenv`](https://virtualenv.pypa.io/en/stable/) itself (third party package with similar functionality). Packages are installed with `pip`. Useful to know because it's the most common option, but does not work with `conda`, so we won't use it in this course.
2. **Docker** is a highly popular tool for complex application development and deployment. Not so good for development.
3. **Conda** is capable of creating environments (like `venv` and `virtualenv`) and installing complex dependencies easily (like `tensorflow`, `xgboost` and others). Also, _"More than a half of the users of Jupyter Notebook and JupyterLab choose Conda"_. Will be our choice.

### Summary

> For the user, the most salient distinction is probably this: pip installs _python_ packages within _any_ environment; conda installs _any_ package within _conda_ environments
>
> —Jake Vanderplas

To mix the best of both worlds and minimize risk, our (default) approach will be

* Use **conda** to _manage environments_ and _install non-Python dependencies_
* Use (recent versions of) **pip** inside conda environments to install Python dependencies

This way we will:

* Minimize incompatibilities between conda and pip
* Avoid the performance issues of conda while leveraging the improved dependency handling of modern pip (>= 19.0.3) 

### References

* Conda: Myths and Misconceptions https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/
* Using pip in a conda environment https://www.anaconda.com/using-pip-in-a-conda-environment/

## conda and conda-forge

### Channels

![conda-forge](../img/conda-forge.png)

Anaconda (formerly Continuum Analytics), the company behind the Anaconda product and conda, uploads all conda packages to [their main repository](https://repo.anaconda.com/), so `conda` knows where to look for them. However, there is a [_community repository_](https://anaconda.org/) as well, where anyone can upload any packages.

To decide where to download the packages from, `conda` has the concept of **channels**. The `defaults` channel is implicit, but we can decide to install a specific package from a specific channel. The most imporant of these channels (and almost the only one you should care about) is **conda-forge**:

https://conda-forge.org/

> A community led collection of recipes, build infrastructure and distributions for the conda package manager.

The `defaults` channel does not have _all_ the packages available out there, and also doesn't usually have the latest versions. The reason is that they are a bit more conservative to please corporate users.

To install a package from conda-forge, we can especify the channel in two ways:

* `$ conda install numpy --channel conda-forge`  (or `-c conda-forge`)
* `$ conda install conda-forge::numpy`

To configure `conda` to use `conda-forge` first:

```
user00@ns3003537:~$ conda config --prepend channels conda-forge
user00@ns3003537:~$ cat ~/.condarc
channels:
  - conda-forge
  - defaults
```

(See [the tips and tricks](http://conda-forge.org/docs/user/tipsandtricks.html) section of conda-forge documentation for more information)

### Performance

However... conda is famous for being *very slow*.

![conda solving environment](../img/conda-solving-environment.png)

For this reason, **mamba** was created, as a drop-in replacement for the `conda` command-line.

### Installation

**tl;dr:**

```bash
curl -LO https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh
exec $SHELL
conda info
```

To install the conda package manager, the 

1. Download Mambaforge, picking the right version from https://github.com/conda-forge/miniforge/#mambaforge
2. Accept the license terms ([BSD 3-clause](https://tldrlegal.com/license/bsd-3-clause-license-(revised)))
3. Specify a location (the default one is fine)
4. Accept the addition of `conda init`

If we follow the steps correctly we will see this message and we _won't_ be able to run conda, yet:

```bash
...
installation finished.
...
Do you wish the installer to initialize Mambaforge
by running conda init? [yes|no]
[no] >>> yes
...

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup, 
   set the auto_activate_base parameter to false: 

conda config --set auto_activate_base false

Thank you for installing Mambaforge!
```

Instead of closing and opening the shell, you can run:

```bash
$ exec $SHELL
```

and now you will notice that `(base)` appears at the beginning of the prompt, and that `conda info` already works!

```bash
(base) runner@7e9a7e1abd2a:~$ conda info

     active environment : base
    active env location : /home/runner/mambaforge
            shell level : 1
       user config file : /home/runner/.condarc
 populated config files : /home/runner/mambaforge/.condarc
          conda version : 4.9.2
    conda-build version : not installed
         python version : 3.8.6.final.0
       virtual packages : __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/runner/mambaforge  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/runner/mambaforge/pkgs
                          /home/runner/.conda/pkgs
       envs directories : /home/runner/mambaforge/envs
                          /home/runner/.conda/envs
               platform : linux-64
             user-agent : conda/4.9.2 requests/2.25.1 CPython/3.8.6 Linux/5.4.0-1019-gcp ubuntu/18.04.5 glibc/2.27
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False
```

### Basic usage

> It is highly recommended to have the [conda cheatsheet](https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html) at hand.

With the default configuration, if we initialized conda correctly, the `base` environment will be activated for us:

```bash
(base) runner@7e9a7e1abd2a:~$ which python
/home/runner/mambaforge/bin/python
(base) runner@7e9a7e1abd2a:~$ echo $PATH
/home/runner/mambaforge/bin:/home/runner/mambaforge/condabin:/home/runner/.apt/usr/bin:/usr/local/go/bin:/opt/virtualenvs/python3/bin:/usr/GNUstep/System/Tools:/usr/GNUstep/Local/Tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
```

We could deactivate it to use the system Python (not recommended at all!) using `conda deactivate`:

```bash
runner@7e9a7e1abd2a:~$ which python
/opt/virtualenvs/python3/bin/python
runner@7e9a7e1abd2a:~$ echo $PATH
/home/runner/mambaforge/condabin:/home/runner/.apt/usr/bin:/usr/local/go/bin:/opt/virtualenvs/python3/bin:/usr/GNUstep/System/Tools:/usr/GNUstep/Local/Tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
```

<div class="alert alert-warning"><strong>Note:</strong> Avoid using the <code>base</code> environment: you will pollute it, and eventually break it. Instead, get used to creating one environment per project.</div>

### Environment creation

To create an environment we use `conda create --name <name> <list-of-packages>`. We don't need to specify all the packages we will need, but it's customary to set the Python version, and sometimes also NumPy ([if you want extra performance](https://www.anaconda.com/tensorflow-in-anaconda/)).

For example, to create a environment for our ie-nlp-utils project using Python 3.7:

```bash
runner@7e9a7e1abd2a:~$ conda create -n utils38 python=3.8
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##
...

Proceed ([y]/n)? y
...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate utils38
#
# To deactivate an active environment, use
#
#     $ conda deactivate
runner@7e9a7e1abd2a:~$ conda activate utils38
(utils38) runner@7e9a7e1abd2a:~$ which python
/home/runner/mambaforge/envs/utils38/bin/python
```

## pip and PyPI

pip is the default Python installer. By default, it fetches packages from https://pypi.org/, which is the community repository for Python packages. As its Anaconda counterpart, it's not curated so anyone can upload anything - however, the concept of channels doesn't exist, so **there can't be name clashes**.

Several considerations must be taken into account while using `pip`:

* **Never, ever use `sudo pip install`**. You will break your system in very ugly ways. Create a conda environment instead.
* Check the pip version. The latest releases were:
  - 19.x, 20.x, 21.x and onwards (optimal)
  - 18.x (first release with [calendar versioning](http://calver.org/))
  - 10.x
  - 9.x (between "old" and "very old")
  - <8.x (avoid like the plague!)
* As a general rule, _don't upgrade straight away_ - the developers iron the issues after each release