# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [40]:
import random
from collections import Counter
import statistics

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [60]:
mean_salary = sum(salaries) / len(salaries)
print(mean_salary)

from statistics import mean
print(mean(salaries))


585690.0
585690.0


### median

In [68]:
from statistics import median
import math
def quantile(x, pct):
    x.sort()
    index = (len(x) + 1) * pct - 1
    if len(x) % 2:
        # odd, so grab the value at index
        return x[int(index)]
    else:
        return (x[math.floor(index)] + x[math.ceil(index)]) / 2

median_salary = quantile(salaries, .5)

print('The statistics median is ' + str(median(salaries)))
print('I calculated it as ' + str(median_salary))



The statistics median is 589000.0
I calculated it as 589000.0


### mode

In [65]:
cnt = Counter()
for salary in salaries:
    cnt[salary] += 1
mode_salary = cnt.most_common(1)[0][0]
mode_salary

477000.0

### sample variance
Remember to use Bessel's correction.

In [66]:
from statistics import variance
print(variance(salaries))

sum_of_salaries = 0
for salary in salaries:
    sum_of_salaries += (salary - mean_salary)**2
variance_of_salaries = sum_of_salaries / (len(salaries) - 1)
print(variance_of_salaries)

70664054444.44444
70664054444.44444


### sample standard deviation
Remember to use Bessel's correction.

In [67]:
from statistics import stdev
print(stdev(salaries))
sd_of_salaries = variance_of_salaries**(1/2)
print(sd_of_salaries)

265827.11382484
265827.11382484


## Exercise 6: Calculating more statistics
### range

In [75]:
salary_range = max(salaries) - min(salaries)
salary_range

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [45]:
cv = sd_of_salaries / mean_salary
cv

0.45386998894439035

### interquartile range

In [69]:
salaries.sort()
iqr = quantile(salaries, .75) - quantile(salaries, .25)
iqr

417500.0

### quartile coefficent of dispersion

In [71]:
cod = iqr / (quantile(salaries, .75) + quantile(salaries, .25))
cod

0.3417928776094965

## Exercise 7: Scaling data
### min-max scaling

In [83]:
range(len(salaries))
min_max_salaries = []
for i in range(len(salaries)):
    min_max_salaries.append((salaries[i] - min(salaries)) / salary_range)
min_max_salaries[:5]

[0.0,
 0.01306532663316583,
 0.07939698492462312,
 0.0814070351758794,
 0.08944723618090453]

### standardizing

In [84]:
z_score_salary = [(salary - mean_salary) / sd_of_salaries for salary in salaries]
z_score_salary[:5]

[-2.199512275430514,
 -2.150608309943509,
 -1.9023266390094862,
 -1.8948029520114855,
 -1.8647082040194827]

## Exercise 8: Calculating covariance and correlation
### covariance

In [85]:
from statistics import covariance
covariance(min_max_salaries, z_score_salary)

0.2671629284671759

### Pearson correlation coefficient ($\rho$)

In [89]:
from scipy.stats import pearsonr 
pearsonr(min_max_salaries, z_score_salary)

PearsonRResult(statistic=np.float64(1.0), pvalue=np.float64(0.0))

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>