# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [27]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

In [83]:
import numpy as np
from scipy import stats as st
import pandas as pd
import collections

In [29]:
salaries[0:5]

[844000.0, 758000.0, 421000.0, 259000.0, 511000.0]

## Exercise 5: Calculating statistics and verifying
### mean

In [36]:
avg = np.mean(salaries)
avg

585690.0

### median

In [37]:
np.median(salaries)

589000.0

### mode

In [38]:
st.mode(salaries)

ModeResult(mode=array([477000.]), count=array([3]))

In [90]:
cnt = collections.Counter
cnt(salaries).most_common()[0]

(477000.0, 3)

### sample variance
Remember to use Bessel's correction.

In [39]:
svar = np.sum((np.asarray(salaries) - avg)**2)/(len(salaries)-1)
svar

70664054444.44444

In [41]:
np.var(salaries, ddof=1)

70664054444.44444

### sample standard deviation
Remember to use Bessel's correction.

In [42]:
sd = np.sqrt(svar)
sd

265827.11382484

In [47]:
np.std(salaries, ddof=1)

265827.11382484

## Exercise 6: Calculating more statistics
### range

In [49]:
range = np.max(salaries)-np.min(salaries)
range

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [50]:
sd/avg

0.45386998894439035

### interquartile range

In [53]:
Q3 = np.percentile(salaries, 75)
Q1 = np.percentile(salaries, 25)
IQR = Q3-Q1
print(Q1, Q3, IQR)

403500.0 816750.0 413250.0


### quartile coefficent of dispersion

In [54]:
QCD = (Q3 - Q1)/(Q3 + Q1)
QCD

0.338660110633067

## Exercise 7: Scaling data
### min-max scaling

In [57]:
mm_scaled_salaries = (salaries - np.min(salaries))/range
mm_scaled_salaries

array([0.84723618, 0.76080402, 0.42211055, 0.25929648, 0.51256281,
       0.40603015, 0.78693467, 0.30351759, 0.47839196, 0.58492462,
       0.91155779, 0.50653266, 0.28241206, 0.75879397, 0.6201005 ,
       0.25125628, 0.91356784, 0.98693467, 0.81306533, 0.90552764,
       0.31055276, 0.73266332, 0.90251256, 0.68643216, 0.47336683,
       0.10050251, 0.43517588, 0.61306533, 0.91658291, 0.97085427,
       0.47839196, 0.86834171, 0.26030151, 0.8080402 , 0.55075377,
       0.01306533, 0.72261307, 0.4       , 0.8281407 , 0.67035176,
       0.        , 0.49547739, 0.87135678, 0.24422111, 0.32562814,
       0.87336683, 0.19095477, 0.56984925, 0.23919598, 0.9718593 ,
       0.80603015, 0.44924623, 0.07939698, 0.32060302, 0.50954774,
       0.93668342, 0.10854271, 0.55276382, 0.70954774, 0.54874372,
       0.81708543, 0.54170854, 0.9678392 , 0.60502513, 0.58994975,
       0.44623116, 0.59798995, 0.38592965, 0.57788945, 0.29045226,
       0.18894472, 0.18693467, 0.61507538, 0.65929648, 0.47839

### standardizing

In [64]:
normal_salaries = ((salaries - avg)/sd)

## Exercise 8: Calculating covariance and correlation
### covariance

In [71]:
# the long way
mm_sal_mean = np.mean(mm_scaled_salaries)
norm_sal_mean = np.mean(normal_salaries)
my_cov = np.sum((mm_scaled_salaries - mm_sal_mean)*(normal_salaries - norm_sal_mean ))/(len(mm_scaled_salaries)-1)
my_cov

0.2671629284671759

In [72]:
np.cov(np.array(mm_scaled_salaries), np.array(normal_salaries))

array([[0.07137603, 0.26716293],
       [0.26716293, 1.        ]])

### Pearson correlation coefficient ($\rho$)

In [74]:
mm_sal_sd = np.std(mm_scaled_salaries, ddof=1)
norm_sal_sd = np.std(normal_salaries, ddof=1)
my_cov/(mm_sal_sd*norm_sal_sd )

1.0

In [76]:
np.corrcoef(mm_scaled_salaries, normal_salaries)

array([[1., 1.],
       [1., 1.]])

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>