# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [57]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

salaries = sorted(salaries)


## Exercise 5: Calculating statistics and verifying
### mean

In [41]:
def mean(data):
    return sum(data)/len(data)

mean(salaries)


585690.0

### median

In [42]:
def median(data):
    index =  int(len(data) // 2)
    if len(data) % 2:
        return data[index]
    else:
        return (data[index-1] + data[index])/2
    

median(salaries)

589000.0

### mode

In [43]:
from collections import Counter
def mode(data):
    c = Counter(data)
    return c.most_common(1)[0][0]
    
mode(salaries)

477000.0

### sample variance
Remember to use Bessel's correction.

In [101]:
def sample_variance(data):
    n = len(data)
    mean_s = mean(data)
    deviations = [(a - mean_s)**2 for a in data]
    variance = sum(deviations) / (n - 1)
    return variance

sample_variance(salaries)


70664054444.44444

### sample standard deviation
Remember to use Bessel's correction.

In [45]:
import math
def std_dev(data):
    variance = sample_variance(data)
    stdev = math.sqrt(variance)
    return stdev

std_dev(salaries)

264494.6386961369

## Exercise 6: Calculating more statistics
### range

In [46]:
def range_stat(data):
    return max(data) - min(data)

range_stat(salaries)

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [47]:
def coef_var(data):
    return std_dev(data) / mean(data)

coef_var(salaries)

0.4515949370761613

### interquartile range

In [48]:
def inter_quartile(data):
    n = len(data)
    mid = n//2
    lowerQ = median(data[:mid])
    upperQ = median(data[mid:])
    return upperQ-lowerQ

inter_quartile(salaries)

417500.0

### quartile coefficent of dispersion

In [49]:
def quar_coef(data):
    n = len(data)
    mid = n//2
    lowerQ = median(data[:mid])
    upperQ = median(data[mid:])
    return ((upperQ-lowerQ)/(upperQ+lowerQ))

quar_coef(salaries)

0.3417928776094965

## Exercise 7: Scaling data
### min-max scaling

In [111]:
def min_max_scaling(data):
    data2 = []
    n = len(data)
    for i in range(100):
        data2.append( (data[i] - min(data))/range_stat(data) )
    return data2

normalized = min_max_scaling(salaries)


### standardizing

In [114]:
def z_scores(data):
    data2 = []
    n = len(data)
    for i in range(100):
        data2.append( (data[i] - mean(data))/std_dev(data) )
    return data2

standardized = z_scores(salaries)


## Exercise 8: Calculating covariance and correlation
### covariance

In [115]:
#Covariance formula used here is: Summation((X - mean(X)) * (Y - mean(Y)) / n - 1
def cov(X,Y):
    mean_X = mean(X)
    mean_Y = mean(Y)
    temp1 = [i - mean_X for i in X]
    temp2 = [i - mean_Y for i in Y]
    temp1 = sum([temp1[i] * temp2[i] for i in range(len(X))])
    temp2 = len(X) - 1
    cov = temp1/temp2
    return cov

cov(normalized,standardized)
    

0.267162928467176

### Pearson correlation coefficient ($\rho$)

In [118]:
def corr(X,Y):
    corr = (cov(X,Y) / (std_dev(X) * std_dev(Y)) )
    return corr

corr(normalized,standardized)


1.0000000000000004

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>