# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [20]:
import random
import statistics as st
import numpy as np
import scipy.stats as sc
from sklearn import preprocessing

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [2]:
my_mean = sum(salaries) / len(salaries)
my_mean

585690.0

### median

In [3]:
my_median = st.median(salaries)
my_median

589000.0

### mode

In [4]:
my_mode = st.mode(salaries)
my_mode

477000.0

### sample variance
Remember to use Bessel's correction.

In [5]:
my_stdev = st.stdev(salaries)
my_stdev

265827.11382484

### sample standard deviation
Remember to use Bessel's correction.

In [6]:
my_var = st.variance(salaries)
my_var

70664054444.44444

## Exercise 6: Calculating more statistics
### range

In [9]:
my_range = max(salaries) - min(salaries)
my_range

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [12]:
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
cv(salaries)

45.38699889443903

### interquartile range

In [17]:
iqr = np.subtract(*np.percentile(salaries, [75, 25]))
print(iqr)
my_iqr = sc.iqr(salaries)
print(my_iqr)

413250.0
413250.0


### quartile coefficent of dispersion

In [19]:
def calc_qcd(data_array) -> float: 
  # Calculates Quartile Coefficient Difference
  q1, q3 = np.percentile(data_array, [25, 75])
  return (q3 - q1) / (q3 + q1)

my_qcd = calc_qcd(salaries)
print(my_qcd)

0.338660110633067


## Exercise 7: Scaling data
### min-max scaling

In [21]:
normalized_arr = preprocessing.normalize([salaries])
print(normalized_arr)


[[0.13133261 0.11795038 0.0655107  0.04030231 0.07951536 0.06302098
  0.12199617 0.04714903 0.07422471 0.09071909 0.14129148 0.07858172
  0.04388128 0.11763916 0.09616535 0.03905745 0.1416027  0.15296203
  0.12604196 0.14035784 0.04823828 0.11359337 0.13989102 0.10643543
  0.07344667 0.01571634 0.06753359 0.0950761  0.14206952 0.15047232
  0.07422471 0.13460037 0.04045791 0.12526392 0.08542844 0.0021785
  0.1120373  0.06208734 0.12837607 0.10394572 0.00015561 0.07687004
  0.13506719 0.0379682  0.05057239 0.1353784  0.02972101 0.08838498
  0.03719016 0.15062792 0.12495271 0.0697121  0.01244859 0.04979435
  0.07904854 0.14518167 0.0169612  0.08573965 0.1100144  0.08511723
  0.12666439 0.08402797 0.15000549 0.09383124 0.09149713 0.06924527
  0.09274199 0.05990883 0.08962984 0.04512613 0.02940979 0.02909858
  0.09538731 0.10223403 0.07422471 0.01400466 0.11795038 0.13646765
  0.14362559 0.1310214  0.13973541 0.14362559 0.08418358 0.06084248
  0.10970319 0.04294763 0.12635318 0.13211065 0.1

### standardizing

In [33]:
my_std = sc.stats.zscore(salaries)
print(my_std)

[ 0.97661715  0.65146878 -0.62265912 -1.23514791 -0.28238758 -0.68315184
  0.74976945 -1.06879293 -0.41093461 -0.01017034  1.21858803 -0.30507235
 -1.14818962  0.64390719  0.12215749 -1.26539427  1.22614962  1.50214765
  0.84807012  1.19590326 -1.04232737  0.54560652  1.18456087  0.37168995
 -0.42983858 -1.83251351 -0.57350879  0.09569192  1.237492    1.44165493
 -0.41093461  1.05601384 -1.23136711  0.82916615 -0.13871737 -2.16144268
  0.50779857 -0.70583661  0.90478204  0.31119723 -2.21059301 -0.34666109
  1.06735623 -1.29185983 -0.98561544  1.07491782 -1.49224197 -0.06688226
 -1.31076381  1.44543573  0.82160456 -0.52057766 -1.91191021 -1.00451942
 -0.29372996  1.3131079  -1.80226716 -0.13115578  0.45864824 -0.14627896
  0.8631933  -0.17274452  1.43031255  0.06544556  0.00873364 -0.53192004
  0.03898    -0.75876774 -0.0366359  -1.11794327 -1.49980356 -1.50736515
  0.10325351  0.26960849 -0.41093461 -1.87410226  0.65146878  1.10138338
  1.27529995  0.96905556  1.18078008  1.27529995 -0

## Exercise 8: Calculating covariance and correlation
### covariance

In [35]:
my_cov = np.cov(my_std, normalized_arr)
print(my_cov)

[[1.01010101 0.04157304]
 [0.04157304 0.00171103]]


### Pearson correlation coefficient ($\rho$)

In [39]:
my_pear = sc.pearsonr(my_std, normalized_arr)
print(my_pear)

ValueError: x and y must have the same length.

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>