# 2. Managing packages and environments using conda and pip


First of all, download or clone the GitHub repository for this workshop if you haven't done so already. You will need the material later.
If you're using git:

```
git clone https://github.com/LukasNeugebauer/Python_Workshop
```

If not, start using git in the near future. But for now, go to https://github.com/LukasNeugebauer/Python_workshop and download the whole repository and unzip somewhere on your computer. 

<img src="../img/github_repo.PNG" width=60% />

## 2.1 What do I need conda for?

In MATLAB almost everything comes from the same source. You can use third-party toolboxes that you download, e.g. SPM, but you have to do that manually. Python is a rather slim language if you only consider the core modules. It relies a lot more on packages, so you will need to manage those, i.e. download, install and update, etc.. Most packages depend on other packages. E.g. if you use a packages that somehow handles numerical data it will most likely depend on numpy. You need to also install numpy for this. `conda` and `pip` are *package managers*. They will do the package managing part for you. So while the fact that you will absolutely have to install packages is an additional burden, the implementation is a lot nicer.

## 2.2 conda vs. pip

For the package management part, `conda` and `pip` pretty much do the same thing on the surface. `pip` is the original Python package manager. It installs packages from *PyPi* - the Python Package Index. `conda` installs packages from the *Anaconda repositories*. They both look for packages in the respective repositories, install them and their dependencies and try to avoid dependency conflicts. Most packages you will use can be installed using `conda`, but for some you will have to use `pip` - there are a LOT more packages in the *PyPI* than in the *Anaconda repositories*. For the most part, you won't run into problems when installing some packages using `conda` and some using `pip`. Sometimes there are differences between the `pip` and the `conda` version, e.g. for `tensorflow`, see [here](https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c).
There is a lot more to it, and if you are interested, you can find a short comparison of `conda` and `pip` [here](https://www.anaconda.com/understanding-conda-and-pip/). 

In contrast so `pip`, `conda` does more than just install packages. It can (and absolutely should) be used to manage environments. We'll figure out what that is in a second. `pip` itself doesn't manage environments, but it interfaces very well with `pipenv` and `virtualenv`, which do. Those are perfectly fine options. I started with miniconda and haven't seen a reason to switch. That's why this workshop uses miniconda or anaconda.

Again, there is a lot to be understood here, but to get you going, it's not necessary to go real deep. I use the following approach and I suggest you do the same:
<br/>

**<center>Manage environments with conda. Install packages using conda whenever possible and use pip if that's not possible.<center/>**

## 2.3 How do I use conda and pip?

There are two options for conda, a graphical user interface (GUI) version and the command line interface (CLI).

You can use the **Anaconda Navigator**. This is included in the *Anaconda* python distribution, but it's not included in miniconda. It's a GUI that's meant for people who are hesitant to use a command line interface. As usual, this probably has less steep of a learning curve but is less flexible. If you want to use it in the future, feel free. You can also use it for the workshop but then you have to figure out how to do that by yourself. I never used it. It looks like this:

<img src="../img/anaconda_navigator.PNG" width=70% />

<br/>
The second option is to use a terminal. Technically you could use any terminal that your operating system provides, and on MacOS and Linux (which I will jointly refer to as *Unix* from now on) that's what you'll do. But there's a specific version for Windows. If you try to use conda in the normal Windows command line (`cmd`), you will likely find something like this:


<img src="../img/shell_no_work.PNG" width=70% />

Here's why: The shell is looking for programs that are on the PATH, which is an environment variable. It's the same principle as the path variable in MATLAB. If you installed Anaconda using the suggested settings, the Windows shell PATH variable doesn't include Python and conda, so you can't use it.

<img src="../img/shell_no_path.PNG" width=70% />

For this workshop you'll use the `Anaconda prompt` if you're on Windows, the `Terminal` if you're on Mac and whatever terminal emulator you're using when you're on Linux. You will need a few basic commands to navigate through directories in it. These are different in Windows and Unix. You only really need `cd` and `ls` (`dir` on Windows).

If you're on Windows, hit the Windows key, type `Anaconda prompt` and hopefully you should see the icon. If you're on Linux or MaxOs you should be able to use the standard terminal (*Terminal* on MacOS and whatever terminal emulator you're using if you're on Linux).

Open it up, it should look something like this on Windows.

<img src="../img/anaconda_prompt.PNG" width=70% />

Make sure that it knows where to find `python` and `conda`.

Windows: `where conda` and `where python`

Unix: `which conda` and `which python`

<img src="../img/check_path.PNG" width=70% />


<br/> 

So now we have a shell open that knows `conda` and we can use it. We can use `pip` from the same shell.


## 2.4 Environments

### What is dependency hell?

Consider the following: You install a package that is supposed to analyze SCR data. It uses a function that is new in **numpy 3.1** (arbitrary number). Afterwards you install a package that parses eyetracking files. It has a dependency on **numpy 3.0**. If you install this package, **conda** might downgrade **numpy** to 3.0 and the SCR package stops working. Plus of course dependencies themselves depend on other packages and if you install enough different things, sooner or later something will not be compatible. Welcome to dependency hell.

If this is too technical, let's try it this way. Going back to the idea of the Python interpreter actually being an interpreter: For some tasks it's not enough that an interpreter only speaks Japanese. He might also need technical knowledge to be able to translate. So we teach it to him, which is the equivalent to installing a package. If we use the same interpreter to translate in all areas, at some point he gets confused. It's better to have dedicated experts than to try to force someone to balance quantum mechanics and social skills. Maybe they're incompatible.

### Avoiding dependency hell using environments

Basically, there's two routes you can take:

   1. You don't bother using environments. For quite a while everything works out. At some points, everything stops working and since you only have one environment you don't have a clean one to go back to. You have no clue where the problem is, StackOverflow is not helping you either. You try for a bit but eventually give up, reinstall Anaconda completely and the first thing you do is creating an environment. You may now proceed to the second option. (This is the route I took and I can strongly advise against taking it.)
   
   2. You're smart and use environments from the start. You might consider that you're putting too much effort into this, but since you don't have any problems, it becomes a habit and you stick to it. Very well done!
   
Feel free to choose.

### Okay, I'm convinced. But what is an environment?

Excellent question! The short and simple version (that is suprisingly close to the truth):

Consider an environment one instance of the Python interpreter and a folder in which the packages are stored. They are completely isolated from each other. If you have a package in more than one environment (e.g. every environment will use numpy), you have multiple copies of this package. One per folder/environment. This is of course not parsimonious and not that elegant either. And it is the reason why some system administrators don't like Python on their servers. But it works. Also, it means that if you fuck up beyond repair in one folder, the other folders don't care. 

Activating an environment means changing the PATH variable to tell the shell which version of Python to call. It also makes sure that when you start an instance of the Python interpreter e.g. via IPython or Jupyter, the interpreter knows in which folder to look for packages that you can import.

Do this from the beginning. Remember - **you** are responsible for housekeeping. Take this seriously and try to stick to **conda** as much as possible. Using **pip** as needed is fine, but if you mix up too many things you might end up with something like this:

<img align="center" src="../img/xkcd_env.PNG" width=50% />

[source](https://xkcd.com/1987/)

## 2.5 Setting up our environment



The basic syntax to create an environment is this:

```
conda create -n name_of_environment
```

There are of course more options. We will only cover a few that you will likely use. 
<br/>

1. First of all, you can specify, which version of Python (e.g. 3.7 ) you want to use in this environment. The default depends on the version of Anaconda/Miniconda you installed. You can use different versions in different enviroments. E.g. if you want to use someone else's code which is written in Python 2.7 for whatever reason, you can create an environment for this using the following. The name "py_27" is of course arbitary and you could name it differently:
<br/>

```
conda create -n py_27 python=2.7
```

<br/>

2. There is different channels that conda can use to look for packages. Not every version of every package is in every channel. It's a bit like looking for a movie in Netflix and Amazon Prime at the same time. You can also speficy this while installing a package. Many of the packages that you will use are on "conda-forge", so it might be worth it to add that:
<br/>

```
conda create -n name_of_environment -c conda-forge
```
<br/>
3. You can speficy a list of packages that are supposed to be installed while creating. Just add the names of the packages in the end of the command
<br/>

```
conda create -n name_of_environment numpy scipy
```


**Exercise:**

Use the anaconda prompt to create an environment with Python 3.9 or 3.10 for the workshop. Add conda-forge to the list of channels while creating it. I didn't include any of the newest features (like the so-called walrus operator `:=`), so anything from 3.6 or so onwards should be fine. But there is no point to start with outdated software when you just start out and don't have to worry about backwards compatibility.


If you want to use an environment, i.e. the Python version and the packages in it, you need to activate it like so:

```
conda activate name_of_environment
```

You can see the active environment in the prompt. Use `where`/`which`to make sure that now your prompt links to the correct version of the Python executable.

<img align="center" src="../img/prompt_env.PNG" width=70% />

## 2.6 Installing packages

The syntax for installing packages with **conda** is:
```
conda install package1 package2 ... packageN
```

For **pip** it's the same:
```
pip install package1 package2 ... packageN
```


Installing multiple packages at once can be helpful because **conda** will try to resolve dependency conflicts for you.

**Exercise**

Use conda to install a few packages. The packages I want you to install are the backbone of scientific computing, (`numpy`, `scipy` and `matplotlib`), alternative packages for data and plotting that can make things more convenient (`pandas` and `seaborn`), a package for regression models (`statsmodels`) and some packages to use Jupyter Lab (`jupyterlab` and `nb_conda_kernels`).
Here's a list of all the packages

 * numpy
 * scipy
 * matplotlib
 * pandas
 * pyreadstats (this is an optional dependency for pandas to read e.g. SPSS files)
 * seaborn
 * statsmodels
 * jupyterlab
 * nb_conda_kernels

## 2.7 Removing environments

You can remove an environment like this:

```
conda env remove -n name_of_environment
```

<br/>
This is all you absolutely need to know about using environments and installing packages. There are more commands but as usual you can go a long way with only a few. 

Some more you might find useful in the future:

List all environments:

```
conda env list
```

List all packages in an environment:

```
conda list
```

Update a package to the newest version:

```
conda update package_name
```

Install a certain version of a package, e.g. numpy 1.16.2

```
conda install numpy=1.16.2
```

Search for availability of a package in all channels.

```
conda search package_name
```