# Getting started with Python, Anaconda and virtualenvs

Author: Gary Marigliano, gary.marigliano@heig-vd.ch

<i> Modified from Tutorial, courtesy of Gary Marigliano  (<a href="mailto:fary.marigliano@heig-vd.ch">gary.marigliano@heig-vd.ch</a>). </i>
- Professeur: Carlos Peña (<a href="mailto:carlos.pena@heig-vd.ch">carlos.pena@heig-vd.ch</a>)
- Assistant 2018: Gary Marigliano (<a href="mailto:gary.marigliano@heig-vd.ch">gary.marigliano@heig-vd.ch</a>)
- Assistant 2019: Diogo Leite (<a href="mailto:diogo.leite@heig-vd.ch">diogo.leite@heig-vd.ch</a>)

In this document, you will learn:

- How to install everything you will need for this course
- A basic introduction to Anaconda and Python virtual environments
- How to use and share a reproducable development environment for (your/any) Python projects


Date: Winter 2019

## Anaconda

### What is it ?

Python is a general-purpose programming language. Nowadays it has a great success among data scientists because it is relatively simple to learn, allows for rapid prototyping and offers great libraries. 

Two versions of Python are currently in production, Python 2.7 and Python 3.X. Legacy applications/labs still use Python 2 but if you are starting a new project, it is better to choose Python 3. 

To use Python, you can install Python from the official website. However, **for this course we will use Anaconda**.

Anaconda is a Python distribution that means that Anaconda is installed "on top" of your maybe-already-installed Python. **The main selling point here is that Anaconda allows you to create virtualenvs (virtual environments)**. A virtualenv is very useful to have an isolated place where you will install all the dependencies for your Python project. This allows you, for example, to have a Project A with Lib X version 1.0 and a Project B with the same Lib X  but version 2.0 installed on the same computer but isolated (i.e. no conflict) between virtualenvs. 

In short:

- Anaconda is required because it allows the use of virtualenvs
- Virtualenvs are isolated and reproductable/sharable environment where you can install all the dependencies your project needs
- <b>1 Python project/lab = 1 virtualenv</b>

Note to advanced users: Python's pip tool also allows you to create virtualenvs but Anaconda (with the help of... pip) will also install some more internal requirements (such as C++ dependancies or C compiler) when dealing with libraries that not use exclusively Python code. However, you possibly could achieve the same result with only pip and some manual interventions. Also Anaconda can download a specific version of Python for a given virtualenv althought it is not the version of Python you have installed on your computer. To make things simpler for everyone, we chose to use Anaconda.

### How to install it ?

- Install [Anaconda](https://www.anaconda.com/download/) **The version 32bits** . Choose the Python 3 version. If you don't have any Pythons installed, download the Python 3 version
- On Windows, start `Anaconda Prompt`.
- On Linux/MacOS, you are a grown person, you will find it yourself :) Maybe don't forget to add Anaconda (`<path_to_anaconda>/bin` - <b>It is proposed to do that during the installation</b>) to your `PATH` variable

### How to use Anaconda to create _virtualenvs_ ?

We will create a "hello world" virtualenv using Anaconda.

Create a folder tree like this:

```
─ hello_virtualenvs/
  ├── main.py
  └── requirements.txt
```

In main.py paste the following content:

``` python
#!/usr/bin/env python
# coding: utf-8

# note: in Python, it is recommended to specify the encoding of the file you use. On Python 3 it is automatic.

# this indicates we will use the module xkcd, it is not included in Python, you need to download this library
import xkcd

if __name__ == "__main__":
    print("Opening xkcd joke on web browser...")
    xkcd.Comic(353).show()
    
    # FYI, you can achieve the same result with the built-in easter egg in python by calling
    # import antigravity (remove comment...)

```

In requirements.txt paste the following content:

```
xkcd==2.4.2
```

The role of `requirements.txt` is to hold all the Python dependencies a project needs. Indeed with only one file you should be able to reproduce a working development environment to run and develop the project.

To use `requirements.txt` i.e. install all the dependencies you will first need to create a virtualenv. In the same folder, open a terminal (or Anaconda Prompt if you are ~~lame~~ on Windows) and enter the following commands:

```
conda create --name my_hello_env python=3.6
conda activate my_hello_env
```

Your terminal's prompt should have changed indicating you are using `my_hello_env` virtualenv. You can now install all the dependencies specified in `requirements.txt` with this command:

```
pip install -r requirements.txt
```

This has installed all the dependencies listed in `requirements.txt` (at the moment, just xkcd at version 2.4.2). This means that xkcd library is only available for the virtualenv `my_hello_env`. So don't forget to re-activate your virtualenv (`conda activate ...`) before installing new libs or launching your Python scripts.

Now you can start main.py with the following and admire how skilled you just became :-)

``` bash
python main.py
```

Remember:
``` python
#List of your environments
conda info --envs
#activate an environment
conda activate my_hello_env
#deactivate an environment
conda deactivate
```
We will use the version <font color='red'>3.6</font> of python until the enf of the course! The requirements.txt file isn't existant for all the labs

### How to generate a requirements.txt to be able to share a reproducable virtualenv to other people ?

Now imagine you are working on a new project or a lab and you want to make sure that others can reproduce (i.e. reexecute in the same conditions) your work. How would you do that ?

Virtualenvs of course ! You need to (1) install your libraries with pip, (2) generate the `requirements.txt` file and (3) send this file alongside your code. (4) ??? (5) Profit

1. Let's say you need to have `numpy` (a very useful lib in Python) installed to run your code. Just activate your virtualenv and run `pip install numpy`. This will make `numpy` available in your virtualenv.
2. To generate automatically all the libraries (with the exact versions!) you installed with pip, run `pip freeze > requirements.txt`

`requirements.txt` will look something like this: 

```
[...]
numpy=1.13.0
[...]
```

## Test TREFLE

Now, create an environment and test these lines bellow (It is not important what they do, for now). Don't forget the requirements of the labo 000

In [1]:
import random
import numpy as np

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from trefle.fitness_functions.output_thresholder import round_to_cls
from trefle.trefle_classifier import TrefleClassifier

### Summary

1. When you receive some code or a project
    * look for requirements.txt file.
    * create a new virtualenv using Anaconda making sure you use the right version of Python with `conda create -n my_virtualenv python=X.Y`
    * activate the virtualenv with `conda activate my_virtualenv`
    * install the libs listed in `requirements.txt` with `pip install -r requirements.txt`
    * run the code
2. When you want to send your code/project
    * Make sure you are in the virtualenv you create for this project `conda activate my_virtualenv`. If you don't have a virtualenv for this project but already started to code, shame on you, you won a journey to either produce manually the `requirements.txt` file (because `pip freeze` will not give you the minimal list of dependencies to install but all the globally installed libs...) or to create a new virtualenv and install one-by-one the libs required by your project until it starts working as before.
    * Generate `requirements.txt` with `pip freeze > requirements.txt`
    * Send the `requirements.txt` with your code/project next to it