# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [4]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [10]:
sum(salaries)/100 ==  mean(salaries)

True

### median

In [17]:
from statistics import median
salaries.sort()
(salaries[49] + salaries[50])/2 == median(salaries)

True

### mode

In [18]:
from statistics import mode
from collections import Counter

Counter(salaries).most_common(1)[0][0] == mode(salaries)

True

### sample variance
Remember to use Bessel's correction.

In [22]:
from statistics import variance
sum((x - mean(salaries))**2 for x in salaries)/99 == variance(salaries)

True

### sample standard deviation
Remember to use Bessel's correction.

In [26]:
from statistics import stdev
import math
math.sqrt (variance(salaries)) == stdev(salaries)

True

## Exercise 6: Calculating more statistics
### range

In [28]:
max(salaries) - min(salaries)

986000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [29]:
from statistics import stdev
import math
stdev(salaries) / mean(salaries)

0.6279103863019512

### interquartile range

In [30]:
salaries.sort()
q1 = salaries[24]
q3 = salaries[74]
q3-q1

453000.0

### quartile coefficent of dispersion

In [32]:
salaries.sort()
q1 = salaries[24]
q3 = salaries[74]
(q3-q1)/(q3+q1)

0.5072788353863382

## Exercise 7: Scaling data
### min-max scaling

In [37]:
min_max_scaling = [(x - min(salaries)) /(max(salaries) - min(salaries)) for x in salaries]
min_max_scaling

[0.0,
 0.010141987829614604,
 0.02028397565922921,
 0.02129817444219067,
 0.02535496957403651,
 0.034482758620689655,
 0.037525354969574036,
 0.0436105476673428,
 0.04462474645030426,
 0.04563894523326572,
 0.056795131845841784,
 0.10040567951318459,
 0.10040567951318459,
 0.10141987829614604,
 0.11663286004056796,
 0.12373225152129817,
 0.1460446247464503,
 0.16937119675456389,
 0.17545638945233266,
 0.18052738336713997,
 0.18559837728194725,
 0.1926977687626775,
 0.1997971602434077,
 0.21196754563894524,
 0.2210953346855984,
 0.22210953346855983,
 0.24036511156186613,
 0.2413793103448276,
 0.24442190669371197,
 0.2525354969574036,
 0.2525354969574036,
 0.2616632860040568,
 0.2829614604462475,
 0.29006085192697767,
 0.3133874239350913,
 0.3356997971602434,
 0.3367139959432049,
 0.33874239350912777,
 0.3448275862068966,
 0.35294117647058826,
 0.35496957403651114,
 0.36004056795131845,
 0.3620689655172414,
 0.3620689655172414,
 0.37322515212981744,
 0.38235294117647056,
 0.3914807302231

### standardizing

In [36]:
from statistics import stdev
standardizing = [(x - mean(salaries)) / stdev(salaries) for x in salaries]
standardizing

[-1.58558249867003,
 -1.5505752680014935,
 -1.5155680373329568,
 -1.5120673142661032,
 -1.4980644219986885,
 -1.4665579143970056,
 -1.4560557451964444,
 -1.4350514067953226,
 -1.4315506837284688,
 -1.4280499606616153,
 -1.389542006926225,
 -1.2390109150515174,
 -1.2390109150515174,
 -1.2355101919846636,
 -1.1829993459818586,
 -1.158494284513883,
 -1.0814783770431025,
 -1.000961746505468,
 -0.9799574081043462,
 -0.9624537927700778,
 -0.9449501774358096,
 -0.9204451159678338,
 -0.8959400544998583,
 -0.8539313776976143,
 -0.8224248700959313,
 -0.8189241470290777,
 -0.7559111318257117,
 -0.752410408758858,
 -0.741908239558297,
 -0.7139024550234677,
 -0.7139024550234677,
 -0.6823959474217848,
 -0.6088807630178578,
 -0.5843757015498822,
 -0.5038590710122479,
 -0.4268431635414673,
 -0.4233424404746136,
 -0.41634099434090627,
 -0.3953366559397843,
 -0.36733087140495496,
 -0.3603294252712476,
 -0.3428258099369793,
 -0.335824363803272,
 -0.335824363803272,
 -0.2973164100678817,
 -0.2658099024661

## Exercise 8: Calculating covariance and correlation
### covariance

In [40]:
from statistics import mean

running_total = [
    (x - mean(min_max_scaled)) * (y - mean(standardizing))
    for x, y in zip(min_max_scaled, standardizing)
]

cov = mean(running_total)
cov

0.28681411695734604

### Pearson correlation coefficient ($\rho$)

In [43]:
from statistics import stdev
cov / (stdev(min_max_scaling) * stdev(standardizing))

0.9900000000000002

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>