# Package Managers

## A package manager handles dependency resolution

In the previous section, we saw that certain versions of packages can have conflicts with each other. So how should you figure out which versions of a package won't cause conflicts? You can use a package manager!

![](../../images/dependency_resolution_package_managers.jpg)

A package manager has two inputs: 1) the packages requested by the user and 2) a database of available packages and versions. Consider the following `pip` command, for example.
```bash
pip install 'scikit-learn' 'numpy>=1.21.6'
```
In this case, the user is asking the `pip` package manager to install `scikit-learn` and some version of `numpy` greater than `1.21.6`. In the process, `pip` will search PyPI (the python packaging index), a database of all versions of every python package and their dependency constraints.

While attempting to install packages, `pip` always has one goal. It tries to answer the question:
> What are the newest versions of the requested packages and their dependencies that can be installed together such that a conflict does not arise?

If it can find a compatible set of versions, it will install them. Otherwise, it will raise an error message.

## What about when conflicts are unavoidable? Use environments!

It's common to encounter version conflicts with bioinformatics software. And in some cases, they will be impossible to resolve.

**TODO: check that this example is valid**
For example, let's say that you want to install `numpy==1.14.4`, but it requires `python==2.7` and all of your other software uses `python>=3.0`. Your best bet is to install `numpy==1.14.4` in a separate _virtual environment_. This ensures that it lives in a separate place where it won't conflict with your other software.

**TODO: include image here**

## Using `pip` and `venv` for python environment management

`pip` is a package manager exclusively for python packages. It downloads software from a _package index_ called PyPI, where developers will often upload their tools.

To create and manage virtual environments consisting of python packages, you can use a tool called `venv`. You can install it using `pip`.

**Command cheat sheet**
Command | Description
--------|-----------
python -m venv myenv | create a new environment called `myenv`
source myenv/bin/activate | activate the `myenv` environment
pip install 'pysam>=0.19.1' | install pysam in the current environment
pip list | list packages in the current environment
pip freeze > requirements.txt | export the current environment to share with others
pip install -r requirements.txt | install packages from an exported environment

The `pip freeze` command will export an environment to a `requirements.txt` file, which can then be shared with your collaborators. For example, an environment with `pysam` v0.19.1 and `numpy` v1.14.4 will appear in the `requirements.txt` file like this:

**TODO: include image here instead of code block**
```
pysam==0.19.1
numpy==1.14.4
```

## Conda, the ultimate package and environment manager

`conda` is the ultimate package manager because it can install *any* type of package -- not just python packages. And it is also an environment manager! We recommend exclusively using `conda` to install all your software.

**Command cheat sheet**
Command | Description
--------|-----------
conda create -n myenv | create a new environment called `myenv`
conda activate myenv | activate the `myenv` environment
conda install -n myenv 'python=3.8' | install python in the `myenv` environment
conda install -n myenv -c conda-forge 'python=3.8' | install python from the conda-forge channel in the `myenv` environment
conda list | list packages in the current environment
conda env export > env.yml | export the current environment to share with others* *\[not recommended!\]*
conda env create -f env.yml | recreate an exported environment

**Note:* We do not recommend using `conda env export` to share environments because the environments won't be reproducible. Read on to learn about how to properly share an environment.

### When should I create a new environment?

<span style="color:red">When installing a new package in an existing environment, it will be *more* likely that you encounter a conflict if the existing environment has *many* packages.</span> Refer to the previous chapter of this wiki if you don't yet understand why.

Based on this fact, we've created the following list of best practices.

**Best practices for environment creation**
1. Try to keep your environments small with few existing packages
2. Create a new environment whenever you start a new project or run into a conflict
3. Manually maintain a YAML file that lists each package in your environment in case you need to recreate it. Read on to learn about how to do this
4. Avoid installing packages in the original `base` environment

### Conda uses multiple package indexes called **channels**

Each package that you install with `conda` will belong to a *channel*. A channel consists of a set of packages and their available versions.

Anybody can create a *channel* so there are many online. Here are the most important to know if you're a bioinformatician.

channel name | description
-------------|------------
anaconda | data-science packages that come pre-installed with the Anaconda distribution
r | R packages re-distributed by Anaconda, Inc
defaults | an umbrella channel that refers to anaconda, r, and others created by Anaconda Inc
conda-forge | data-science packages, curated by an open source community
bioconda | bioinformatics packages! open source. Packages have dependencies in conda-forge

We do not recommend using the `anaconda`, `r`, or `defaults` channels. Read on to learn why.

### Best practices for conda channels

It's best to use open-source channels. They're more up-to-date and more comprehensive. So how do you tell `conda` to use `conda-forge` and `bioconda`?

By default, `conda` uses the channels listed in your `~/.condarc` config file. When you first install `conda`, Anaconda Inc channels will be listed there, but channels created by Anaconda Inc are known to **conflict** with conda-forge! For this reason, <span style="color:red">we recommend adding `conda-forge` and `bioconda` to your `~/.condarc` config file and removing the `defaults` channel</span>. To ensure the defaults channel is never used, you can list `nodefaults` at the end of your channel list.

You should never install packages from the `r` channel because it is part of the `defaults` channel, which conflicts with `conda-forge`. You can find all of the R packages in the `r` channel within either `conda-forge` or `bioconda`, anyway.

Two channels may contain the same package. The order that you specify in your `~/.condarc` config file or environment YAML file determines which channel is preferred. Since many `bioconda` packages have dependencies in `conda-forge`, <span style="color:red">we recommend listing `conda-forge` before `bioconda`</span>.

Here are some bad examples of channel listings, pictured in <span style="color:red">red</span>. There is only one good example, which we've illustrated below in <span style="color:blue">blue</span>. After installing conda, <span style="color:red">you should edit your `~/.condarc` config file to ensure it matches</span> <span style="color:blue">the blue example</span>.

**TODO: add good and bad examples as an image**

### The best, reproducible env.yml files are created **manually**

If you want to create `env.yml` files that are reproducible, that get installed quickly, that are easy to read and understand, and that are easy to adapt and change in the future, then you need to create your `env.yml` files manually -- **don't use conda env export**!

**TODO: include image with list of best practices**

### Use `conda-lock` for fully reproducible `.lock` files

### Choosing how much of your conda environment to lock

### The best way to install `conda`

If you want to use `conda`, you should install `mamba`. Specifically, you should install it from the Mambaforge distribution.

## Conda, in summary

## FAQs

### Should I install R packages with `conda` or R's built-in `install.packages()`?

You shouldn't use R's `install.packages()` because it isn't a proper package manager. It doesn't report conflicts. Instead, if a conflict arises between a package that you've installed and a package that you'd like to install, `install.packages()` will silently upgrade your old package.

This is dangerous. If you installed a package once but a new version of it makes a change that is backwards-incompatible or introduces a bug, the new version may break your scripts! The broken behavior may even be subtle enough that you don't catch it until much later.

To mitigate the backwards compatibility issue, many R developers will publish versions of their tools with breaking changes as entirely separate packages. In practice, this system can still lead to issues when bugs arise.

**tldr;** You should exclusively use `conda` to install R packages. Otherwise, you should be prepared for unexpected changes to the software you've installed.

### What if a tool isn't yet installable wuth `conda`?

### What's the difference between Anaconda and miniconda?