# 2. Managing packages and environments using conda and pip


First of all, download or clone the GitHub repository for this workshop if you haven't done so already. You will need the material later.
If you're using git:

```
git clone https://github.com/LukasNeugebauer/Python_Workshop
```

If not, start using git in the near future. But for now, go to https://github.com/LukasNeugebauer/Python_workshop and download the whole repository and unzip somewhere on your computer. 

<img src="../img/github_repo.PNG" width=60% />


## 2.1 What do I need conda for?

In contrast to MATLAB, you will need to manage packages in Python, i.e. download, install and update them, sometimes compile code, set path variables, etc.. Most packages depend on other packages. E.g. if you use a packages that somehow handles numerical data it will most likely depend on numpy. You need to also install numpy for this. **conda** and **pip** are **package managers**. They will do the package managing part for you. 

## 2.2 conda vs. pip

Excluding the environment part, **conda** and **pip** pretty much do the same thing on the surface. **pip** is the original Python package manager. It was designed to be used for packages from **PyPi** - the Python Package Index. **conda** installs packages from the **Anaconda reposity**. They both look for packages in the respective repositories, install them and check that you have all dependencies installed. If not, they will install them for you. Most packages you will use can be installed using **conda**, but for some you will have to use **pip**. For the most part, you won't run into problems when installing some packages using **conda** and some using **pip**. Sometimes there are differences between the **pip** and the **conda** version, e.g. for *tensorflow*, see [here](https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c).
There is a lot more to it, and if you are interested, you can find a short comparison of **conda** and **pip** [here](https://www.anaconda.com/understanding-conda-and-pip/). 

**conda** does more than just install packages. It can (and absolutely should) be used to manage environments. We'll figure out what that is in a second. **pip** itself doesn't manage environments, but it interfaces very well with **pipenv** and **virtualenv**, which do.

Again, there is a lot to be understood here, but to get you going, it's not necessary. I use the following approach and I suggest you do the same:
<br/>

**Manage environments with conda. Install packages using conda whenever possible and use pip if that's not possible.**

## 2.3 How do I use conda and pip?

There are two options for conda,

You can use the **Anaconda Navigator**, that you should have installed if you installed Anaconda. It's a GUI that's meant for people who are afraid of a command line interface. As usual, this probably has less steep of a learning curve but is less flexible. If you want to use it in the future, feel free. You can also use it for the workshop but then you have to figure out how to do that by yourself. I never used it. It looks like this:

<img src="../img/anaconda_navigator.PNG" width=70% />

<br/>
The second option is to use a shell. Technically we could use any shell that your operating system provides. But if you try to use conda in the Windows shell, you will likely find something like this:

<img src="../img/shell_no_work.PNG" width=70% />

Here's why: The shell has a PATH variable. I.e. a list of directories in which it's looking for programs that match the commands you give it. It's the same principle as the path variable in MATLAB. If you installed Anaconda using the suggested settings, the Windows shell PATH variable doesn't include Python and conda, so you can't use it.

<img src="../img/shell_no_path.PNG" width=70% />

For this workshop we'll use the **Anaconda prompt**. You can consider this the same as your standard OS shell except that the PATH and some settings are optimized to be used with conda, IPython and so on. You will need a few basic commands to navigate through directories in the shell. These are different in Windows and UNIX. You only really need `cd` and `dir`, but you can find a list [here](https://www.computerhope.com/overview.htm).

If you're on Windows, hit the Windows key, type `Anaconda prompt` and hopefully you should see the icon. If you're on Linux or MaxOs you should be able to use the *Terminal*.

Open it up, it should look something like this. 

<img src="../img/anaconda_prompt.PNG" width=70% />

Make sure that it knows where to find **Python** and **conda**.

Windows: `where conda` and `where python`

UNIX: `which conda` and `which python`

<img src="../img/check_path.PNG" width=70% />


<br/> 
So now we have a shell open that knows **conda** and we can use it. We can use **pip** from the same shell.


On a sidenote. Technically, you should make sure that conda is up-to-date using `conda update conda`. However, the latest conda update is a major one and I've had pretty big problems with it. So for now, we'll use the version that was included in the download and ignore warnings that tell us to update conda.

## 2.4 Environments

### 2.4.1 What is dependency hell?

Consider the following: You install a package that is supposed to analyze SCR data. It uses a function that is new in **numpy 3.1** (arbitrary number). Afterwards you install a package that parses eyetracking files. It has a dependency on **numpy 3.0**. If you install this package, **conda** might downgrade **numpy** to 3.0 and the SCR package stops working. Plus of course dependencies themselves depend on other packages and if you install enough different things, sooner or later something will not be compatible. Welcome to dependency hell.

If this is too technical, let's try it this way. Going back to the idea of the Python interpreter actually being an interpreter: For some tasks it's not enough that an interpreter only speaks Japanese. He might also need technical knowledge to be able to translate. So we teach it to him, which is the equivalent to installing a package. If we use the same interpreter to translate in all areas, at some point he gets confused. It's better to have dedicated experts than to try to force someone to balance quantum mechanics and social skills. Maybe they're incompatible.

### 2.4.2 Avoiding dependency hell using environments

Basically, there's two routes you can take:

   1. You don't bother using environments. For quite a while everything works out. You think back to the workshop and laugh about me because I overcomplicate things. At some points, everything stops working and since you only have one environment you don't have a clean one to go back to. You have no clue where the problem is, StackOverflow is not helping you either. You try for a bit but eventually give up, reinstall Anaconda completely and the first thing you do is creating an environment. And creating a clone. And saving it to a .txt-file. You may now proceed to the second option. (This is the route I took and I can strongly advise against taking it.)
   
   2. You're smart and use environments from the start. Whenever you install a new package that has a lot of dependencies, you clone your working environment, install the package in there. If everything works, you install the package in the environment you're using. You might consider that you're putting too much effort into this, but since you don't have any problems, it becomes a habit and you stick to it. Very well done!
   
Feel free to choose.

### 2.4.3 Okay, I'm convinced. But what is an environment?

Excellent question! The short and simple version (that is suprisingly close to the truth:

Consider an environment one instance of the Python interpreter and a folder in which the packages are stored. They are completely isolated from each other. If you have a package in more than one environment (e.g. every environment will use numpy), you have multiple copies of this package. One per folder/environment. This is of course not parsimonious and not that elegant either. And it is the reason why some system administrators don't like Python on their servers. But it works. Also, it means that if you fuck up beyond repair in one folder, the other folders don't care. 

Activating an environment means telling the **Anaconda prompt** which version of Python to call. It also makes sure that when you start an instance of the Python interpreter e.g. via IPython or Jupyter, the interpreter knows in which folder to look for packages that you can import.

Do this from the beginning. Remember - **you** are responsible for housekeeping. Take this seriously and try to stick to **conda** as much as possible. Using **pip** as needed is fine, but if you mix up too many things you might end up with something like this:

<img align="center" src="../img/xkcd_env.PNG" width=50% />

[source](https://xkcd.com/1987/)

## 2.5 Setting up our environment



The basic syntax to create an environment is this:

```
conda create -n name_of_environment
```
which basically means: 
```
conda, create an environment and name it name_of_environment
```
There is of course more options. To keep it concise, we will only cover a few that you will likely use. 
<br/>

1. First of all, you can specify, which version of Python (e.g. 3.7 ) you want to use in this environment. The default depends on the version of Anaconda/Miniconda you installed. You can use different versions in different enviroments. E.g. if you want to use someone else's code which is written in Python 2.7 for whatever reason, you can create an environment for this using this. The name "py_27" is of course arbitary and you could name it differently:
<br/>

```
conda create -n py_27 python=2.7
```
<br/>

2. There is different channels that conda can use to look for packages. Not every version of every package is in every channel. It's a bit like looking for a movie in Netflix and Amazon Prime at the same time. You can also speficy this while installing a package. Many of the packages that you will use are on "conda-forge", so it might be worth it to add that:
<br/>

```
conda create -n name_of_environment -c conda-forge
```
<br/>
3. You can speficy a list of packages that are supposed to be installed while creating. Just add the names of the packages in the end of the command
<br/>

```
conda create -n name_of_environment numpy scipy
```

***

**Exercise:**

Use the anaconda prompt to create an environment for the workshop. Add conda-forge to the list of channels while creating it

***

If you want to use an environment, i.e. the Python version and the packages in it, you need to activate it like so:

```
(conda) activate name_of_environment
```

You can see the active environment in the prompt. Use `where`/`which`to make sure that now your prompt links to the correct version of the Python executable.

<img align="center" src="../img/prompt_env.PNG" width=70% />

<br/><br/>

## 2.6 Installing packages

The syntax for installing packages with **conda** is:
```
conda install package1 package2 ... packageN
```

For **pip** it's the same:
```
pip install package1 package2 ... packageN
```


Installing multiple packages at once can be helpful because **conda** will try to resolve dependency conflicts for you.

***

**Exercise**

Use conda to install `numpy`, `scipy` and `matplotlib` in the activated environment. These are the backbone of scientific computing and you will need them in pretty much every script you write.

***

When installing new packages that might lead to dependency conflicts, it's good practice to first clone the environment and install the package in the cloned environment. 

The syntax for this is:

```
conda create -n cloned_environment --clone original_environment 
```
where of course you replace `cloned_environment` with the name you want for the clone and `original_environment` with the name of the environment that you want a copy of.

***

**Exercise**

Create a clone of your environment.

***

We will now install `jupyterlab` and `nb_conda_kernels` in the cloned environment. These should work fine with the other packages we installed until now. But this way you see what a best practice workflow looks like.

For this, you can either open another prompt and activate the cloned environment. You can have as many of these prompts open at the same time as you want (or as your computer's memory can handle). Or you deactivate the other active environment and then activate the cloned one

```
conda deactivate
```

***

**Exercise**

Deactivate the active environment. Activate the cloned one. Use conda to install `jupyterlab` and `nb_conda_kernels` using conda. These depend on a few other packages we will need anyway like `IPython`. **conda** installs of them for you.

***

We'll talk more about these packages in the future. The short version is that Jupyter Lab is something like a browser-based IDE and nb_conda_kernels is needed so that we can use different environments in the same instance of Jupyter Lab. It's not absolutely essential, but can make your life easier when you use different environments.


Ideally you would now check that everything runs fine in the clone environment, then install the packages in the original one and delete the old one. Or you can just delete the old one and keep the cloned one. Although since for some reason you can't rename environments, you then have to live with a new name.

## 2.7 Removing and sharing environments

You can remove an environment like this:

```
conda env remove -n name_of_environment
```

If you want to move an environment to another computer that is also possible - at least using the same operating system. For this you basically write an explicit list of all packages to a .txt-file. This file can be used to create an environment with the exact same packages on any computer using the same operating system. This way you can also keep track of changes in your environment, e.g. in a GitHub repository and use them on different computers.

***

**Exercise**

First try the following command in the environment that you want to share (i.e. have the environment activated):
```
conda env export
```

This will print a lot of information on the screen. The name of the environment, the version of Python, the conda-channels. It will also print all of the packages you have installed. You can redirect the output into another file using the `>`-operator. It's a handy little operator and it means: Don't print the output in the shell, print it in the file I'll name next:

```
conda env export > my_environment.yml
```

***

Send this file to your collaborators or put it on GitHub. Now everyone using the same OS as you can reproduce your environment. It's also a great way to make your analysis in Python reproducible or to share environments within a group.
Creating an environment from a file is as easy as:

```
conda env create -f my_environment.yml
```


<br/>
This is all you absolutely need to know about using environments and installing packages. There are more commands but as usual you can go a long way with only a few. 

Some more you might find useful in the future:

List all environments:

```
conda env list
```

List all packages in an environment:

```
conda list
```

Update a package to the newest version:

```
conda update package_name
```

Install a certain version of a package, e.g. numpy 1.16.2

```
conda install numpy=1.16.2
```

Search for availability of a package in all channels.

```
conda search package_name
```












***

**Exercise**

Take a few minutes to try a few of them and ask questions if you don't understand something. After that we'll get started using Jupyter Lab.

***