## Stage 3: What do I need to install?
Typical python dependency management (https://xkcd.com/1987/):

<img src=https://imgs.xkcd.com/comics/python_environment.png>

Add in data science packages that have all sorts of additional non-Python dependencies and we end up spending more time sorting out our dependencies than doing data science. If you take home nothing else out of this tutorial, learn this stage. I promise. It will save you, and everyone who works with you, many days of your life back.



### Reproducibility Issues:
* (NO-ENVIRONMENT-INSTRUCTIONS) Chicken and egg issue with environments. No environment.yml file or the like. (Even if there are some instructions in a notebook).
* (NO-VERSION-PIN) Versions not pinned. E.g. uses a dev branch without a clear indication of when it became released.
* (HARDCODED-PATH) A file contains a hardcoded path, so the project will not run elsewhere without manual editing
* (IMPOSSIBLE-ENVIRONMENT) dependencies are not resolvable due to version clashes. (e.g. need <=0.48 and >=0.49)
* (ARCH-DIFFERENCE) The same code runs differently on different architectures
* (MONOLITHIC-ENVIRONMENT) One environment to rule (or fail) them all. 


## The Easydata way
We like `conda` for environment management since it's the least bad option for most data science workflows. There are no perfect ways of doing this. Here are some basics.


### Default Better Principles
* **Use (at least) one virtual environment per repo**: And use the same name for the environment as the repo.
* **Generate lock files**: Lock files include every single dependency in your dependency chain. Lock files are necessarily platform specific, so you need one per platform that you support. This way you have a perfect version pin on the environment that you used for that moment in time.
* **Check in your environment creation instructions**: That means an `environment.yml` file for conda, and its matching lock file(s). 
* **Always use relative paths** All paths that you use in your code should be relative paths and not hardcoded.



First, a cell that lets us reload without having to kill the kernel.

In [None]:
%load_ext autoreload
%autoreload 2

Now, try the followng imports.

In [None]:
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline

Follow the instructions found in `reference/easydata/conda-environment.md` to update the environment to include all of the above imports (or troubleshoot it if necessary). 

## Local Paths

Easydata makes all paths relative to the base project path using the `Pathlib` library and a local configuration file: `catalog/config.ini `. By the way, `Pathlib` is also an awesome way to manage paths across platforms. 

In [None]:
from src import paths

In [None]:
paths

Modify the `catalog/config.ini` to include the path to our `quest` directory. That is, we want `paths['quest_path']` to be 

In [None]:
paths['project_path'] / 'quest'

Run `make env_challenge` to complete this Challenge.