In [11]:
from IPython.display import Markdown, display

display(Markdown("header.md"))

<div>
    <img src="images/emlyon.png" style="height:60px; float:left; padding-right:10px; margin-top:5px" />
    <span>
        <h1 style="padding-bottom:5px;"> AI Booster Week 02 - Python for Data Science </h1>
        <a href="https://masters.em-lyon.com/fr/msc-in-data-science-artificial-intelligence-strategy">[Emlyon]</a> MSc in Data Science & Artificial Intelligence Strategy (DSAIS) <br/>
         Paris | © Antoine SCHERRER
    </span>
</div>

Please make sure you have a working installation of Jupyter Notebook / Jupyter Lab, with Python 3.6+ up and running.

## Naming conventions

Since we will implement functions that are already available in python standard library or other libraries, you will have to *prefix* every function with `msds_` prefix.

For instance, the function implementing the `mean` function should be named `msds_mean`.

For every function you write, you will need to write a test function that should be names `test_msds_[function_name]`.

For instance, the test function for the mean will be: `test_msds_mean`.

All function should be in snake case (no Camel case!)

When creating classes, then follow these rules:
 - class names should be in camel case
 - method names should be in snake case
 - attribute names should be in 

## Exercise's difficulty

Every exercise will be prefixed with an indication of its difficulty:
 - [easy]: for very easy exercise
 - [moderate]: for intermediate level exercise
 - [advanced]: for advanced students

Advanced exercises are not mandatory.


In [7]:
# Initial imports
import unittest
import math

# This imports all functions defined in the Session_01 notebook!
import import_ipynb
from Session_01 import *

# Session 01 - Introduction - Practice

Solving **all** question from Session_01 is mandatory before solving these practice exercises.

### [easy] write a function that computes the geometric mean of an iterable given as parameter.

The geometric mean is defined as:

$\Large {\displaystyle \left(\prod{X_i}\right)^{1/n}}$

$X_i$ is $i$-th data point


In [1]:
## YOUR CODE HERE

### [easy] write a function that computes the harmonic mean of an iterable given as parameter.

The harmonic mean is defined as:

$\Large {\displaystyle \frac{n}{\sum_{i=1}^{n} \frac{1}{X_i}}}$

$X_i$ is $i$-th data point


In [10]:
## YOUR CODE HERE

### [moderate] write a function that computes all data needed for a box-plot

Function should output two lists, the first list is the data needed to plot a box plot (in this order)
 - lower fence ($Q1 - 1.5\times IQR$)
 - 1st quartile ($Q1$) 
 - median ($Q2$)
 - 3rd quartile ($Q3$)
 - upper fence ($Q3 + 1.5\times IQR$)

The second list should contain all outliers ($v$ < lower fence or $v$ > upper fence)

In [None]:
## YOUR CODE HERE

### [moderate] write a function that computes all deciles

Function should output the list of 9 deciles (D1 .. D9).

You should write a test function and compare your results to the one from a statistics package.

Use `salary.csv` and `heights_weights.csv` datasets to check your results.

In [None]:
## YOUR CODE HERE

### [moderate] Application to data

Using the functions you implemented today, perform a basic statistical description of `highest_mountains.csv`, `salary.csv` and `heights_wweights.csv` datasets.

In [None]:
## YOUR CODE HERE

### [advanced] Deciles, percentiles, etc.

Write a function that takes a data set and an integer `P` as parameter.
`P` will be the number of segment in which we want to slice our data (4 means quartiles, 10 deciles, 100 percentiles).
Function should output all values (N-1).
Function should also print N messages like this, replacing `[...]` by the correct value:
```
[...]% of data values are below [...]
[...]% of data values are between [...] and [...]
...
[...]% percent of data values are above [...]
```
For instance, if `P=4`, the message should look like this (replacing `[Q1]` by actual values):
```
25% of data values are below [Q1]
50% of data values are below [Q2]
75% of data values are below [Q3]
```

Validate on weights dataset.

In [None]:
## YOUR CODE HERE

### [advanced] Compute empirical critical values

When performing statistical tests or building confidence intervals, 
you will sometimes need to compute the critical values for a given significance level $\alpha$  (typically $\alpha=0.05$).

Using function from previous question, write a function that will compute this critical values (lower and upper), for two-tailed tests (lower critical value $c_l$, such that $P(X < c_l) = \alpha / 2$, upper critical value is $c_h$, such that $P(X < c_h) = 1-\alpha / 2$


In [None]:
## YOUR CODE HERE

## Object-oriented programming

### Q6 [advanced] Convert all your functions and organize them in classes using OOP

The idea is that using OOP you can build your own statistics package, tailored to your needs.
Think carefully how you will organize your package, how you want to use it, etc.


In [2]:
## YOUR CODE HERE