# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [1]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [116]:
from statistics import mean

sum(salaries) / len(salaries) == mean(salaries)

True

### median

In [119]:
from statistics import median

find_median(salaries) == median(salaries)

True

### mode

In [120]:
from statistics import mode
from collections import Counter

Counter(salaries).most_common(1)[0][0] == mode(salaries)

True

### sample variance
Remember to use Bessel's correction.

In [121]:
from statistics import variance

sum([(x - sum(salaries) / len(salaries))**2 for x in salaries]) / (len(salaries) - 1) == variance(salaries)

True

### sample standard deviation
Remember to use Bessel's correction.

In [122]:
from statistics import stdev
import math

math.sqrt(sum([(x - sum(salaries) / len(salaries))**2 for x in salaries]) / (len(salaries) - 1)) == stdev(salaries)

True

## Exercise 6: Calculating more statistics
### range

In [76]:
range = max(salaries)-min(salaries)
print(range)

995000.0


### coefficient of variation
Make sure to use the sample standard deviation.

In [85]:
print(res/mean)

0.4515949370793889


### interquartile range

In [111]:
import math

def quantile(x, pct):
    x.sort()
    index = (len(x) + 1) * pct - 1
    if len(x) % 2:
        # odd, so grab the value at index
        return x[int(index)]
    else:
        return (x[math.floor(index)] + x[math.ceil(index)]) / 2
sum([x < quantile(salaries, 0.25) for x in salaries]) / len(salaries) == 0.25
sum([x < quantile(salaries, 0.75) for x in salaries]) / len(salaries) == 0.75
q3, q1 = quantile(salaries, 0.75), quantile(salaries, 0.25)
iqr = q3 - q1
iqr

417500.0

### quartile coefficent of dispersion

In [112]:
iqr / (q1 + q3)

0.3417928776094965

## Exercise 7: Scaling data
### min-max scaling

In [110]:
min_salary, max_salary = min(salaries), max(salaries)
salary_range = max_salary - min_salary

min_max_scaled = [(x - min_salary) / salary_range for x in salaries]
min_max_scaled[:5]


[0.8472361809045226,
 0.7608040201005025,
 0.4221105527638191,
 0.2592964824120603,
 0.5125628140703518]

### standardizing

In [113]:
from statistics import mean, stdev

mean_salary, std_salary = mean(salaries), stdev(salaries)

standardized = [(x - mean_salary) / std_salary for x in salaries]
standardized[:5]

[-2.199512275430514,
 -2.150608309943509,
 -1.9023266390094862,
 -1.8948029520114855,
 -1.8647082040194827]

## Exercise 8: Calculating covariance and correlation
### covariance

In [114]:
from statistics import mean

running_total = [
    (x - mean(min_max_scaled)) * (y - mean(standardized))
    for x, y in zip(min_max_scaled, standardized)
]

cov = mean(running_total)
cov

0.014510711670882345

### Pearson correlation coefficient ($\rho$)

In [115]:
from statistics import stdev
cov / (stdev(min_max_scaled) * stdev(standardized))

0.054314091233152346

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>