<img src="../assets/images/Cover.png" alt="Cover" title="AI2E Cover" />

## AI2E - 1 - Discover the tools 
This is the first notebook of a serie of Machine learning workshops. To start our journey in the best way possible, we will get to know the tools we're going to work with : Anaconda, Jupyter Notebook. We will conclude this workshop by going through a small algerian dataset and getting to know interesting python packages that will help us explore the datasets and study them. 

### Content 
1. Python Cheat Sheet 
2. What is Anaconda ?  
2. What is Jupyter Notebook ? How can I use it ? 
3. Welcome to NumPy 
4. Data exploration (Part 1) 

## 1. What is Anaconda ? 

[Anaconda](https://www.anaconda.com/) is a distribution of packages built for data science. It comes with conda, a package and environment manager. You'll be using conda to create environments for isolating your projects that use different versions of Python and/or different packages. You'll also use it to install, uninstall, and update packages in your environments. Using Anaconda has made my life working with data much more pleasant. 

When working on different projects on the same computer you come across a pretty complicated issue where 2 projects use the same package but in 2 separate versions (For example Pytorch 0.4 VS Pytorch 1.0). One solution is to use [virtualenv](https://virtualenv.pypa.io/en/latest/). It lets you create two isolated Python environment for each project. 
Although virtualenv works even for data science projects, in this situation Anaconda is the way to go. It is specifically built for Data projects. 


#### Install Anaconda/Miniconda 

**Download** the latest version of `miniconda` that matches your system.

|        | Linux | Mac | Windows | 
|--------|-------|-----|---------|
| 64-bit | [64-bit (bash installer)][lin64] | [64-bit (bash installer)][mac64] | [64-bit (exe installer)][win64]
| 32-bit | [32-bit (bash installer)][lin32] |  | [32-bit (exe installer)][win32]

[win64]: https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe
[win32]: https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86.exe
[mac64]: https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
[lin64]: https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
[lin32]: https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86.sh

**Install** [miniconda](http://conda.pydata.org/miniconda.html) on your machine. Detailed instructions:

- **Linux:** http://conda.pydata.org/docs/install/quick.html#linux-miniconda-install
- **Mac:** http://conda.pydata.org/docs/install/quick.html#os-x-miniconda-install
- **Windows:** http://conda.pydata.org/docs/install/quick.html#windows-miniconda-install

#### Create an environment
It is very simple to create an environment using Conda. Here's the general structure of the create command with the most important options : 

``` conda create -n [NameofEnv] python=[version] [nameofPackage]=[version] --prefix [path\to\env\directory]```

By default, the environment created by conda resides in /Users/user-name/miniconda3/envs. 

#### What is an environment.yml file ? 
An evironment.yml file is configuration file for your python environment created with Conda. It contains the name of the environment and the packages it contains.

<img src="images/env.png" width="60%">

    channels : from where we get these packages 
    
You can create an environment from a configuration file as follow : 
```conda env create -f environment.yml``` 

It also recommended, to include the requirement.txt for your project for those who doesn't use Conda. 
(to get this file ``` pip freeze > requirements.txt ``` )

#### Install additional Packages 
#### Conda Revisions 👍 
Conda not only helps you create isolated environment, but it keeps track of the packages installed. This allows you to rollback to another revision. 

<ins>Example : </ins>

1. We create an environment names 'test' : ``` conda create -n test python=3 ```
2. We activate the environment : ``` conda activate test ```
3. We install numpy : ``` conda install numpy```
4. We install pandas : ``` conda install pandas```
5. The list of revisions is as follow : ```conda list --revisions``` 

We notice each revision with an id. If we want to get back to how packages were in rev 1, we simply call this command : ```conda install --revision 1``` 

We can also recheck the list of revisions to ensure the changes were made. 

<img src="images/revisions.png" align="center" width="40%">

#### Additional Information 
Conda documentation is super clear, we encourage you to use it whenever it is necessary : [https://docs.anaconda.com/](https://docs.anaconda.com/) 

You can check here a [cheat sheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf) gathering all the important commands you need to know. 

If your curiosity hasn’t been satisfied and you still want to know more about virtual environments, we highly recommend : 
* [Some Myths and Misconceptions about Conda](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/) 😛 
* [Everything you need to know about Anaconda by DZone](https://dzone.com/articles/python-anaconda-tutorial-everything-you-need-to-kn) 
* Get to know the Anaconda Navigator with [this tutorial](https://linuxhint.com/anaconda-python-tutorial/)
    

## What is Jupyter Notebook ? 

A [Jupyter Notebook](https://jupyter.org/) is a web application that allows you to combine explanatory text, math equations, code, and visualizations all in one easily sharable document. The name Jupyter comes from the combination of Julia, Python, and R. If you're interested, here's a [list of available kernels](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels).

In these kinds of notebooks, when dealing with a data science project, you could download the data, run the code in the notebook, and repeat the analysis or the training as long as you want to in the order you need to. 

Notebooks have quickly become an essential tool when working with data. You'll find them being used for data cleaning and exploration, visualization, machine learning, and big data analysis. You can find in these repository a lot of examples with the different topics we're going to tackle in AI2E. 


Without notebooks, your projects will be divided in separate files that contains the documentation, the code and the visualizations. 

*You don't find it cool yet ? 
Notebooks are rendered automatically by GitHub !* 😀


Let's create an example notebook. 
1. Go to the directory where you want the notebook to be.
2. Run the following command : ``` jupyter notebook ``` 
3. Click on New and choose a kernel (Python3 in our case) 
4. Create a cell --> Set the type to 'Code'
5. add ``` print('Hello World') ``` to the cell 
6. Run the cell by using shift + Enter or click on the run button on the top. 



#### More info ! 🤓

* You'll often want to download it as an HTML file to share with others who aren't using Jupyter. Also, you can download the notebook as a normal Python file where all the code will run like normal. The Markdown and reST formats are great for using notebooks in blogs or documentation.

* **Magic Keywords :**
    * %timeit : helps timing your code.
    * %matplotlib inline : helps show your visialization as a result of a cell run. 
    * %pdb : enable debugging in the notebook.
    * [List of all magic keywords] (https://ipython.readthedocs.io/en/stable/interactive/magics.html) 
    
Let's try the debugging! 


In [1]:
%pdb

Automatic pdb calling has been turned ON


In [2]:
i = 'a'
sum(i)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

> [0;32m<ipython-input-2-59b1bb397993>[0m(2)[0;36m<module>[0;34m()[0m
[0;32m      1 [0;31m[0mi[0m [0;34m=[0m [0;34m'a'[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 2 [0;31m[0msum[0m[0;34m([0m[0mi[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> i
'a'
ipdb> q


We used a jupyter notebook to create this slideshow ! 😁

here's the command to create and launch your slideshows : 

```jupyter nbconvert notebook.ipynb --to slides --post serve ``` 


Okey ! Now that we know the fundamental tools we're going to work with, let's discover some important data science packages and how to use them in a concrete example ! 