# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [1]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [8]:
mean = sum(salaries) / len(salaries)
print(f"Mean salary: ${mean:,.2f}")

Mean salary: $585,690.00


### median

In [14]:
import numpy as np
from statistics import median

print(np.median(salaries))
print(median(salaries))

589000.0
589000.0


In [29]:
import math

def manual_median(array: list) -> float:
    if len(array) == 0:
        raise ValueError("Array is empty")
    sorted_array = sorted(array)
    length = len(sorted_array)
    mid = math.floor(length // 2)
    # even
    if length % 2 == 0:
        return (sorted_array[mid - 1] + sorted_array[mid]) / 2
    else:
        return sorted_array[mid]

In [27]:
median_salaray = manual_median(salaries)

In [28]:
print(f"Median salary: ${median_salaray:,.2f}")

Median salary: $589,000.00


### mode

In [30]:
from statistics import mode

mode(salaries)

477000.0

In [39]:
salaries_count = {}
for salary in salaries:
    if salary in salaries_count:
        salaries_count[salary] += 1
    else:
        salaries_count[salary] = 1
# sort dictionary by value
sorted_salaries = sorted(salaries_count.items(), key=lambda x: x[1], reverse=True)
print(sorted_salaries[0])


(477000.0, 3)


### sample variance
Remember to use Bessel's correction.

$$s^2 = \frac{\sum_{1}^{n} (x_i - \bar{x})^2}{n - 1}$$

In [41]:
n = len(salaries)
mean = sum(salaries) / n
var = (sum([(x - mean) ** 2 for x in salaries])) / (n - 1)
print(var)

70664054444.44444


In [42]:
from statistics import variance

variance(salaries)

70664054444.44444

### sample standard deviation
Remember to use Bessel's correction.

In [43]:
std_dev = var ** 0.5
std_dev

265827.11382484

In [44]:
from statistics import stdev

stdev(salaries)

265827.11382484

## Exercise 6: Calculating more statistics
### range

In [45]:
salaries_range = max(salaries) - min(salaries)
salaries_range

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

$$CV = \frac{s}{\bar{x}}$$

In [47]:
cv = std_dev / mean
cv

0.45386998894439035

### interquartile range

In [51]:
from statistics import quantiles

q = quantiles(salaries, n=4)
print(q)
iqr = q[2] - q[0]
print(iqr)

[400500.0, 589000.0, 822250.0]
421750.0


### quartile coefficent of dispersion

$$QCD = \frac{\frac{Q_3 - Q_1}{2}}{\frac{Q_1 + Q_3}{2}} = \frac{Q_3 - Q_1}{Q_3 + Q_1}$$

In [53]:
qcd = (q[2] - q[0]) / (q[2] + q[0])
print(qcd)

0.34491923941934166


## Exercise 7: Scaling data
### min-max scaling

**min-max scaling**:

$$x_{scaled}=\frac{x - min(X)}{range(X)}$$


In [62]:
min_max_scaled = [(salary - min(salaries))/ salaries_range for salary in salaries]
print(min_max_scaled)

[0.8472361809045226, 0.7608040201005025, 0.4221105527638191, 0.2592964824120603, 0.5125628140703518, 0.40603015075376886, 0.7869346733668342, 0.3035175879396985, 0.47839195979899496, 0.5849246231155779, 0.9115577889447236, 0.5065326633165829, 0.28241206030150756, 0.7587939698492462, 0.6201005025125628, 0.25125628140703515, 0.91356783919598, 0.9869346733668342, 0.8130653266331658, 0.9055276381909547, 0.31055276381909547, 0.7326633165829146, 0.9025125628140703, 0.6864321608040201, 0.4733668341708543, 0.10050251256281408, 0.43517587939698493, 0.6130653266331658, 0.9165829145728643, 0.9708542713567839, 0.47839195979899496, 0.8683417085427135, 0.26030150753768844, 0.8080402010050252, 0.5507537688442211, 0.01306532663316583, 0.7226130653266332, 0.4, 0.828140703517588, 0.6703517587939698, 0.0, 0.49547738693467336, 0.871356783919598, 0.2442211055276382, 0.3256281407035176, 0.8733668341708543, 0.19095477386934673, 0.5698492462311557, 0.23919597989949748, 0.9718592964824121, 0.8060301507537688, 

### standardizing

Another way is to use a **Z-score** to standardize the data:

$$z_i = \frac{x_i - \bar{x}}{s}$$

In [64]:
z_scores = [(x - mean) / std_dev for x in salaries]
print(z_scores)

[0.9717217942267801, 0.6482032533127501, -0.6195380058503674, -1.228956652688424, -0.28097209094033604, -0.6797275018343729, 0.7460111842867592, -1.0634355387324086, -0.40887476990634786, -0.010119359012310937, 1.2124797781628023, -0.3035431519343381, -1.142434252211416, 0.6406795663147493, 0.12154516345270126, -1.2590514006804265, 1.220003465160803, 1.4946180405878284, 0.8438191152607681, 1.1899087171688003, -1.037102634239406, 0.5428716353407403, 1.178623186671799, 0.36982683438672426, -0.4276839874013496, -1.8233279255304788, -0.5706340403633628, 0.09521225895969881, 1.2312889956578041, 1.434428544603823, -0.40887476990634786, 1.0507205077057873, -1.2251948091894236, 0.8250098977657664, -0.1380220379783228, -2.150608309943509, 0.5052532003507368, -0.7022985628283751, 0.9002467677457734, 0.3096373384027187, -2.199512275430514, -0.34492343042334195, 1.0620060382027885, -1.285384305173429, -0.9806749817544008, 1.0695297252007891, -1.4847620106204475, -0.06654701149731616, -1.3041935226

## Exercise 8: Calculating covariance and correlation
### covariance

In [67]:
np.cov(min_max_scaled, z_scores)

array([[0.07137603, 0.26716293],
       [0.26716293, 1.        ]])

### Pearson correlation coefficient ($\rho$)

In [68]:
np.corrcoef(min_max_scaled, z_scores)

array([[1., 1.],
       [1., 1.]])

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>