# Setting up Conda and Making a Conda Env
> Matt's notes for issue-free conda env management. These instructions were developed mainly on Linux machines (and on Windows machines in the receding past) so the default locations for things may be somewhere else on your machine, but the default Anaconda or Miniconda installs will be in your home directory (`~`).

- toc: true 
- badges: true
- comments: false
- categories: [jupyter, conda]
- image: images/chart-preview.png

# What is `conda`? Why learn this when I already know `pip`?

Like `pip`, `conda` is a package manager, but unlike `pip`, `conda` can 
* install different versions of python,
* manage projects containing code written in any language,
* checks that new packages don't have dependency conflicts with existing packages or package versions (although I've read that `pip` started doing this too in its version 20.3).

[Anaconda](https://anaconda.org/anaconda/repo) offers a service like [PyPI](https://pypi.org/) or [CRAN](https://cran.r-project.org/) that serves user-created packaged code, but unlike PyPI or CRAN, there are many other such repos for `conda` packages, and these are referred to as **channels**. Some major `conda channels` are:
* [**Conda-Forge**](https://conda-forge.org/) (most popular public channel)
* [**Bioconda**](https://bioconda.github.io/index.html) (mainly used for biostatistical packages)
* [**R**](https://anaconda.org/r/repo) (unsurprisingly, for **R** packages)

Or you could build your own [private conda channel](https://conda.io/projects/conda/en/latest/user-guide/tasks/create-custom-channels.html).

A feature of this channel system is that it increases the likelihood that someone has built the most recent version of the package you want, but a drawback is that can slow down the dependency resolver as it has more places it can check for the requested package. The `conda` configuration recommended by **Conda-forge** is to set `channel_priority` to **strict** and add the **conda-forge** channel. These settings will be stored in the `.condarc` file in your home directory, which you can view either by

In [1]:
!conda config --show-sources

==> /home/matt/.condarc <==
channel_priority: strict
channels:
  - conda-forge
  - defaults



or by simply inspecting the file via

In [2]:
!cat /home/matt/.condarc

channels:
  - conda-forge
  - defaults
channel_priority: strict


This config is very easy to recreate, and pedagogically, I think it's just better to show the process, so I'll `rm` (remove, aka delete) my conda config and remake it.

In [3]:
!rm /home/matt/.condarc

In [4]:
!cat /home/matt/.condarc

cat: /home/matt/.condarc: No such file or directory


Ok, now that I've deleted my `.condarc` file, I'll recreate it.

In [5]:
!conda config --add channels conda-forge
!conda config --set channel_priority strict 

In [6]:
!cat /home/matt/.condarc

channels:
  - conda-forge
  - defaults
channel_priority: strict


### What does this do?

Setting `channel_priority` to strict tells `conda` to only look for packages in the highest-priority channel, which will be the channel at the top of the list.

## Creating a `conda env`

Now that we've configured `conda` to only pull packages from a specified channel (**conda-forge** in this notebook and my own typical usage), we can create a `conda env` and be confident that installed packages or code will be sourced from the intended channel. [Here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) is the official `conda env` management documentation, but to simply create a new `env`, the instructions below are sufficient.

1. First, you have to create the `env`, and I've set the python version to version 3.9.X (if you want an exact version, either include the full version number, or use 2 equals signs)

In [7]:
!conda create --name geocoder_env python=3.9 -y

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/matt/miniconda3/envs/geocoder_env

  added / updated specs:
    - python=3.9


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    libffi-3.4.2               |       h9c3ff4c_2          60 KB  conda-forge
    libgcc-ng-11.2.0           |       h1d223b6_8         892 KB  conda-forge
    libgomp-11.2.0             |       h1d223b6_8         428 KB  conda-forge
    libstdcxx-ng-11.2.0        |       he4da1e4_8         4.2 MB  conda-forge
    python-3.9.7               |hb7a2778_1_cpython        27.5 MB  conda-forge
    setuptools-58.0.4          |   py39hf3d152e_1         958 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        34.0 MB

The following NEW packages will be INSTALL

From the printout above, you can see that 6 packages (or more specifically, 6 builds of packages) would be downloaded, and 21 packages would be installed. The reason 21 packages weren't downloaded is that conda keeps all of the downloaded package builds in a single directory (`/home/matt/miniconda3/pkgs` on my machine) so that a user can have multiple `conda envs` using common package versions without having to download and store multiple copies of the same exact package build. Then `conda` will install the identified package versions into the `env` from that directory.

From the list of packages to install, you can also see that most of the package builds include the string `linux-64` but some have the string `noarch` instead. 
* The `linux-64` builds are built for 64-bit Linux operating systems.
* The `noarch` builds are implemented with pure python and are built for **no** specific **arch**itecture, meaning the same package build can install on any OS.

Going forward, I'm going to execute commands in a terminal and copy in the results into markdown, as it's a much more natural workflow than simply executing shell commands through jupyter notebook cells.

```bash
(base) matt@matt:~$ conda activate geocoder_env
(geocoder_env) matt@matt:~$ conda install -c conda-forge psycopg2
...
```
Enter `y` when prompted. You'll have to perform the same confirmation for each package you install, but I'm omitting it going forward for brevity. Also (obviously) you can install other packages. I'm just making this `conda env` for a geocoder I'm building that will use a PostGIS database and I'll interactively developing it using **geopandas** (and its dependancies) in **jupyterlab**.

```bash
(geocoder_env) matt@matt:~$ conda install -c conda-forge geopandas
(geocoder_env) matt@matt:~$ conda install -c conda-forge jupyterlab 
```

And to make your new `conda env` accessible (as a kernel) in **jupyterlab** (or in **jupyter notebooks**), register your `conda env` with `ipykernel` (the engine of **jupyter**) via the command below (credit to this [stack overflow answer](https://stackoverflow.com/questions/39604271/conda-environments-not-showing-up-in-jupyter-notebook) that I've visited easily 30 times).

```bash
(geocoder_env) matt@matt:~$ python -m ipykernel install --user --name \
    geocoder_env --display-name "Python (geocoder_env)"
```