# Steps to Start an Exploratory Data Analysis

In the following, I will describe a reproducible workflow. For this purpose, we will

- set a virtual environment using conda / pip (Windows 10)
- add the virtual environment to our python installation path
- work on a series of Exploratory Data Analysis tasks to investigate sample data and create functions
- create a library from these analysis to be able to use later
- list our packages and save
- push everything into a version control system i.e. GitHub

## Creating a virtual environment 

I will describe 2 methods that works for Windows 10. However, the syntax and the procedure is similar for Linux and iOS systems as well. <br>

--- <br>
### <font color=blue> 1. Using conda  </font>

First, launch a command line (terminal). In some machines, you can directly open an anoconda prompt in this case you should be seeind a ```(base)``` in the beginning of your command line. Otherwise, if you have regular windows command line (cmd.exe), you can type ```conda activate``` to activate the base conda environment <br>
You should be seeing something similar to <br>

```shell
(base) C:\Users\Computer>
```

Inside your terminal you can move in&out of the folders. You normally start in the ```root```, from here, you can type <br>
```shell
(base) C:\Users\Computer> cd Desktop\
```
to move into your desktop and 
```shell
(base) C:\Users\Computer> dir
```
lets you display the directories (and files) in the current folder

**Let's create an environment** you can choose whatever name you like instead of NAMEOFENV <br>
```shell
(base) C:\Users\Computer> conda create -n NAMEOFENV
```
or if you didnt activate the base. (it really doesn't matter at this point)
```shell
C:\Users\Computer> conda create -n NAMEOFENV
```

here the ```-n``` is a flag indicating the following argument, which is the name of the environment in this case. <br>
Now that we created our new environment. We can activate this one.

```shell
C:\Users\Computer> conda activate NAMEOFENV
```
after executing this you should be seeing

```shell
(NAMEOFENV) C:\Users\Computer>
```

<font color=green>  You succesfully created and activated your fresh new environment! </font>

Now if we type ```conda list``` to list all the existing packages we would see nothing as we have not yet installed anything. <br>
Go ahead and try ```conda activate``` and ```conda deactivate``` commands to switch between your new environment and the base. <br>
Notice if you just say 'conda activate' it deafults to base. If you say 'conda activate X' it looks for an environment named X. <br>

Try listing the packages installed in base. In the next step, we will load new packages to our environment.

---


Let's have our ```NAMEOFENV``` activated environment and install desired packages. <br>

```shell
(NAMEOFENV) C:\Users\Computer> conda install matplotlib
```
It will tell you what it is going to install and ask for confirmation. Type ```y``` and continue. Now you have matplotlib and its dependencies. Notice that conda install several packages that matplotlib needs and not only the matplotlib. <br>
pip however have a different behaviour.

> If you have a ```requirements.txt``` file with packages specified. You can install them from this file using
>>```shell
>>(NAMEOFENV) C:\Users\Computer>conda install --file requirements.txt
>>```

> if you want to save your packages as ```requirements.txt```
>>```shell
>>(NAMEOFENV) C:\Users\Computer>conda list --export > requirements.txt
>>```


**Now, we have two options to make this environment accessible with jupyter notebooks**
- Install notebook extension in this environment

```shell
(NAMEOFENV) C:\Users\Computer>conda install conda_nb
(NAMEOFENV) C:\Users\Computer>jupyter notebook
``` 

- Tell your python that you have created this environment and it should be accessible
```shell
(NAMEOFENV) C:\Users\Computer>python -m ipykernel install --user --name
```
After this, you can deactivate your environment and run jupyter notebook as you normally do. <br>
When you try to create a new file in the notebook, you will see NAMEOFENV listed under the Python 3 option.

---

You can then work on your project as you like. After you are done with your project, and you want to remove everything. You can delte the environment folder manually or;

```shell
(NAMEOFENV) C:\Users\Computer>conda deactivate
C:\Users\Computer>conda env remove -n NAMEOFENV
```

After this, you will still have the environment listed in the kernels of the jupyter. To remove it run;

```
(base) C:\Users\Computer>jupyter kernelspec uninstall NAMEOFENV
```

If you want to list existing environments and kernels

```
(base) C:\Users\Computer>conda info --envs
```

```
(base) C:\Users\Computer>jupyter kernelspec list
```


### <font color=blue> 1. Using pip  </font>

- create
```shell
(base) C:\Users\Computer>python -m venv NAMEOFENV
```
- activate
```shell
(base) C:\Users\Computer>NAMEOFENV\Scripts\activate.bat
```
- install stuff
```shell
(base) C:\Users\Computer> pip install matplotlib
(base) C:\Users\Computer> pip install ipykernel
```

- add kernel
```shell
(base) C:\Users\Computer> ipython kernel install --name "name-venv" --user
```

- to save your packages
```shell
(base) C:\Users\Computer> pip freeze > requirements.txt
```

<br>
<br>

In the following notebooks we will explore data from kaggle and create a workflow step-by-step. <br>
In the first Netflix-Movies.ipynb we will explore the data and create some functions. Later, we will create a simple _library_ of our own. <br>
Next, we will use this library in the Netflix-Movies-2.ipynb to explore further things. 