# Intro to Conda package management, for Python and beyond...
---

Sam Harrison<br>
CEEDS Coding Club - Python Session<br>
11 May 2021

## Outline

- What's a package and why do we need to manage them?
- Why Conda?
- Installing packages using Conda
- Managing Conda environments
- Sharing Conda environments
- It's not all about Python

## What's a package and why do we need them?

![Don't Repeat Yourself](https://miro.medium.com/max/500/1*Heq5WCj-ZOxn3CyUcN2K_Q.png)

- **DRY code**: A key programming mantra to avoid wasting time by writing the same code over and over.
- In scripts/programs, this can be enabled by e.g. writing loops, functions and modules.
- Code can be *packaged* together to make this re-use easier.

## Why do we need to manage them?

- **Package management:** Easy installation of packages from online registries: `conda install numpy` or `pip install numpy`.
- Package managers (generally) take care of **dependency management**.
- Ability to create **environments** to share with colleagues and deal with dependency conflicts.

**Depedency management** to install transitive packages:

```
my_program
|
├── super_package
|   └── another_package
|
└── useful_package
    ├── awesome_package
    └── mediocre_package
```

**Depedency management** to deal with versioning:

```
my_program
|
├── super_package
|   └── another_package=v2.*
|
└── useful_package
    ├── awesome_package
    ├── another_package=v2.4
    └── mediocre_package
```

Dealing with **dependency conflicts**, including code requiring different versions of Python.

```
my_program (Python 2)
└── awesome_package=1.*

another_program (Python 3)
└── awesome_package=2.*
```

Using package managers, you can create an "environment" for each of these programs.

![image.png](attachment:image.png)

## Why Conda?

![Conda logo](https://docs.conda.io/en/latest/_images/conda_logo.svg)

Conda provides cross-platform **package**, **dependency** and **environment management** for **any language**.

Installation instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html

### Conda vs Pip vs Venv?

- `pip` is only a package manager and needs to be hooked up with e.g. `venv` to create environments. `pip` can't install different versions of Python.
- `pip` has limited dependency management (though better since `pip 20.3` release).
- `pip` only installs Python packages, Conda can install anything.
  - Conda is much better at managing external dependencies, e.g. install Python packages that rely on GDAL is *much* more reliable using Conda.
- Conda environments can contain `pip`, so you can still install `pip` packages in Conda environments.
- Conda is (arguably) more complex and bloated.

### Miniconda vs Anaconda

**Anaconda** is a distribution of data science packages and programs along with Conda. It's basically Conda plus a load of packages. The base distribution is around 3 GB.

![Anaconda](https://www.anaconda.com/imager/assetsdo/Products/8031/open-source-logos2x_680db6b6f11f9cc710dd7defae241cd3.png)

**Miniconda** is a minimal install for Conda. It is a small, bootstrap version of Anaconda that only includes Conda, Python and a small number of other useful packages (like Pip).

*Strong recommendation: Use Miniconda.*

## Installing packages using Conda

### Channels

"Channels" are the are online repositories where all the available packages live. There are a lot of channels.

- `default` is the Conda channel where all the packages maintained by Anaconda are. Availability of packages depends on the Anaconda team.
- `conda-forge` is a [community maintained repository of 14,000+ packages](https://anaconda.org/conda-forge) that aims to tackle issues with installing packages across channels by providing a comprehensive repository of up-to-date packages.

*Strong recommendation: Use `conda-forge` for everything you possibly can.* You can set as the highest priority channel by running `conda config --add channels conda-forge`.

![](https://avatars.githubusercontent.com/u/11897326?s=280&v=4)

### Installing packages (from `conda-forge`)

Conda comes with a `base` environment, and anything installed without creating/activating a new environment will go in there.

To keep things clean, let's create a new environment `myenv` and install some packages in there:

```console
$ conda create -n myenv
$ conda activate myenv
(myenv) $ conda install -c conda-forge numpy
```

What Conda is telling you when you install a new package:

![image.png](attachment:image.png)

Let's see what packages where installed in creating the environment and installing SciPy:

```console
(myenv) $ conda list
```

![image.png](attachment:image.png)

You can **install specific versions of a package**:

```console
(myenv) $ conda install -c conda-forge pandas=1.0.0
```

Conda will usefully tell you if it can't satisfy those dependencies. Here, `pandas 1.0.0` requires a lower version of Python than we have installed in this environment (and Conda won't downgrade Python itself to satisfy dependencies)

![image.png](attachment:image.png)

If we really wanted `pandas 1.0.0`, we can manually downgrade to an older version of Python:

```console
(myenv) $ conda install -c conda-forge python=3.7
```

![image.png](attachment:image.png)

Now we can install `pandas 1.0.0`:

```console
(myenv) $ conda install -c conda-forge pandas=1.0.0
```

![image.png](attachment:image.png)

### Useful commands

Search for packages:

```console
(myenv) $ conda search xarray -c conda-forge
```

![image.png](attachment:image.png)

```console
(myenv) $ conda search xarray -c conda-forge --info
```

![image.png](attachment:image.png)

Removing a package:
    
```
(myenv) $ conda uninstall pandas
```

---
Setting `conda-forge` as the top priority channel (recommended):

```
$ conda config --add channels conda-forge
```

---
Update all packages within an environment

```
(myenv) $ conda update --all
```

---
We can still install Pip packages in a Conda environment (with caution - use Conda where possible!):

```
(myenv) $ pip install convertbng
```

## Managing Conda environments

Creating environments:

```console
$ conda create -n myenv
$ conda create -n myenv Python=3.7
$ conda create -n myenv scipy pandas
```

---
We can **list the names and locations of the available environments**:

```console
$ conda env list
```

---
**Activating and deactivating:**

```console
$ conda activate myenv
(myenv) $ conda deactivate
```

## Sharing Conda environments

Say you have an application that depends on a complicated network of packages. You have a Conda environment set up to manage this, but your co-worker needs to be able to run the application too. You can create a **YAML environment file** to easily share this with them.

Creating an environment file from an existing environment:

```console
(myenv) $ conda env export > environment.yaml
```

![image.png](attachment:image.png)

This can easily be shared, e.g. in the GitHub repo for the application.

Co-workers can then **create an environment from a file**:

```console
$ conda env create -f environment.yaml
```

Note that `conda env create` will create an environment from the file named `environment.yml` in the current directory.

--- 
You can easily create an environment file from scratch:

```yaml
name: myenv
channels:
  - conda-forge
dependencies:
  - python=3.9
  - numpy
  - pip:
    - convertbng
```

Deleting an environment:

```console
$ conda remove -n myenv
```

## It's not all about Python

Conda makes a great R package manager too. The `conda-forge` and `defaults` channels have a good selection of R packages, all prepended with `r-`. `conda-forge` is more up-to-date (R v4.0.3 versus R v3.6.1 on `default` channel).

To create an R environment:

```console
$ conda create -n renv r-base
```

---
To install tidyverse:

```console
$ conda activate renv
(renv) $ conda install r-tidyverse
```

To **open Rstudio using a particular Conda environment**, activate that environment and launch Rstudio from the console (you can also install Rstudio using Conda):

```console
$ conda activate renv
(renv) $ rstudio
```

You can even use Conda to manage more problematic packages, like NetCDF-Fortran. E.g. to install the latest version of the GFortran compiler and NetCDF-Fortran:

```console
$ conda install libgfortran netcdf-fortran
```

Caveat: `netcdf-fortran` isn't available on Windows, **yet**. See: https://github.com/conda-forge/netcdf-fortran-feedstock/issues/3.

## Jupyter Notebooks and Conda

You can easily use Conda alongside Jupyter Notebooks. To do so:
- Install `jupyterlab` and `nb_conda_kernels` in your base environment.
- Install `ipykernel` in every Conda environment you want to use Jupyter Notebooks in.
- Run `jupyter notebook` or `jupyter lab` from your base environment.

![image.png](attachment:image.png)

## Tips and tricks

- Keep Conda environments small, modular and tidy. Unwieldy environments mean dependency solving takes ages and there are more likely to be conflicts.
- Use `conda clean --all` semi-frequently. It clears the cache and removes unused packages, and can save *a lot* of disk space.
- Having problems with GDAL/PROJ on Windows? [Check out this guide.](https://chrieke.medium.com/howto-install-python-for-geospatial-applications-1dbc82433c05)
- Check out Conda revisions - it's like version control, for your environment. Run `conda list --revisions` to see the revision history, and run `conda install --revision 0` to roll back to a previous revision (in this case, rev 0).

- Always want your new environments to contain certain packages (e.g. Pip, NumPy)? Include `create_default_packages` in your `~/.condarc` file.
   
```yaml
create_default_packages:
  - pip
  - numpy
```

- The `.condarc` file has [lots of useful configuration options.](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html)
- Got a Pip package you want to convert to a Conda package? [Check out `conda skeleton`](https://conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs-skeleton.html)
- It might be worth checking out the new kid on the block, [Mamba](https://github.com/mamba-org/mamba), which is a reimplementation of Conda in C++ and is apparently much faster.

## Next sessions

- 25 May: **Random forests in scikit-learn**, Diarmuid Corr
- 8 June: **Show-and-tell drop-in**

Any more volunteers?

![](https://imgs.xkcd.com/comics/dependency.png)

https://xkcd.com/2347/