# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [1]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [10]:
sample_mean = sum(salaries) / len(salaries)
sample_mean

585690.0

### median

In [5]:
import math

def find_median(x):
    x.sort()
    midpoint = (len(x) + 1) / 2 - 1 # subtract 1 bc index starts at 0
    if len(x) % 2:
        # x has odd number of values
        return x[int(midpoint)]
    else:
        return (x[math.floor(midpoint)] + x[math.ceil(midpoint)]) / 2

In [6]:
find_median(salaries)

589000.0

In [7]:
from statistics import median

median(salaries)

589000.0

### mode

In [9]:
from statistics import mode

mode(salaries)

477000.0

### sample variance
Remember to use Bessel's correction.

$$s^2 = \frac{\sum_{1}^{n} (x_i - \bar{x})^2}{n - 1}$$

In [11]:
sample_variance = sum([(x - sample_mean)**2 for x in salaries]) / (len(salaries) - 1)
sample_variance

70664054444.44444

In [12]:
from statistics import variance

variance(salaries)

70664054444.44444

### sample standard deviation
Remember to use Bessel's correction.

In [17]:
sample_stddev = sample_variance ** 0.5

In [14]:
from statistics import stdev

stdev(salaries)

265827.11382484

## Exercise 6: Calculating more statistics
### range

In [16]:
sample_range = max(salaries) - min(salaries)
sample_range

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

$$CV = \frac{s}{\bar{x}}$$

In [18]:
cv = sample_stddev / sample_mean
cv

0.45386998894439035

### interquartile range

In [19]:
from statistics import quantiles

In [27]:
sample_quantiles = quantiles(salaries)

In [28]:
sample_quantiles

[400500.0, 589000.0, 822250.0]

In [29]:
Q1 = sample_quantiles[0]
Q3 = sample_quantiles[2]

IQR = Q3 - Q1

In [30]:
IQR

421750.0

### quartile coefficent of dispersion

In [32]:
IQR / (Q1 + Q3)

0.34491923941934166

## Exercise 7: Scaling data
### min-max scaling

In [34]:
x_scaled  = [(x - min(salaries)) / sample_range for x in salaries]
x_scaled[:5]

[0.0,
 0.01306532663316583,
 0.07939698492462312,
 0.0814070351758794,
 0.08944723618090453]

### standardizing

In [35]:
x_standardized = [(x - sample_mean)/ sample_stddev for x in salaries]
x_standardized[:5]

[-2.199512275430514,
 -2.150608309943509,
 -1.9023266390094862,
 -1.8948029520114855,
 -1.8647082040194827]

## Exercise 8: Calculating covariance and correlation
### covariance

In [36]:
import numpy as np

np.cov(x_scaled, x_standardized)

array([[0.07137603, 0.26716293],
       [0.26716293, 1.        ]])

### Pearson correlation coefficient ($\rho$)

In [39]:
np.cov(x_scaled, x_standardized)[0,1]

0.267162928467176

In [42]:
(np.cov(x_scaled, x_standardized)[0,1]) / ((stdev(x_scaled)) * stdev(x_standardized))

1.0000000000000004

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>