## Activity 01: Use NumPy to compute the Mean, Median, and Variance

In this activity, you will consolidate the skills you've acquired in the last exercise and use NumPy to do some very basic mathematical calculations on our `normal_distribution` dataset.   
NumPy has a consistent API, so it should be rather easy to transfer your knowledge of the mean method to median and variance. 

#### Loading the dataset

In [52]:
# importing the necessary dependencies
import numpy as np

In [53]:
# loading the Dataset
dataset = np.genfromtxt('./data/normal_distribution.csv', delimiter=',')

In [54]:
# looking at the first two rows of the dataset
dataset[0:2]

array([[ 99.14931546, 104.03852715, 107.43534677,  97.85230675,
         98.74986914,  98.80833412,  96.81964892,  98.56783189],
       [ 92.02628776,  97.10439252,  99.32066924,  97.24584816,
         92.9267508 ,  92.65657752, 105.7197853 , 101.23162942]])

---

#### Mean

In [55]:
# calculate the mean of the third row
total = np.sum(dataset[2])
count = len(dataset[2])
mean = total/count
mean1 = np.mean(dataset[2])
mean, mean1

(100.20466135250001, 100.20466135250001)

In [56]:
# calculate the mean of the last column

lastColMean = np.mean(dataset[:,-1])
horizontal_mean = np.mean(dataset, axis=1)
vertical_mean = np.mean(dataset, axis=0)
lastColMean, vertical_mean, horizontal_mean

(100.4404927375,
 array([ 99.7674351 ,  99.61229127, 101.14584656, 101.8449316 ,
         99.04871791,  99.67838931,  99.7848489 , 100.44049274]),
 array([100.17764752,  97.27899259, 100.20466135, 100.56785907,
        100.98341406,  97.83018578, 101.49052285,  99.75332252,
        101.89845125,  99.77973914, 101.013081  , 100.54961696,
         98.48256886,  98.49816126, 101.85956927,  97.05201872,
        102.62147483, 101.21177037,  99.58777968,  98.96533534,
        103.85792812, 101.89050288,  99.07192574,  99.34233101]))

In [57]:
# calculate the mean of the intersection of the first 3 rows and first 3 columns

data = np.intersect1d(dataset[:3], dataset[:, :3])
np.mean(data), data

(97.87197312333335,
 array([ 90.93318132,  92.02628776,  95.17750125,  95.66253664,
         97.10439252,  99.14931546,  99.32066924, 104.03852715,
        107.43534677]))

---

#### Median

In [58]:
# calculate the median of the last row
np.median(dataset[-1])

99.18748092

In [59]:
# calculate the median of the last 3 columns
np.median(dataset[:, -3:])

99.47332349999999

In [60]:
# calculate the median of each row
np.median(dataset, axis=1)

array([ 98.77910163,  97.17512034,  98.58782879, 100.68449836,
       101.00170737,  97.76908825, 101.85002253, 100.04756697,
       102.24292555,  99.59514997, 100.4955753 ,  99.8860714 ,
        99.00647994,  98.67276177, 102.44376222,  96.61933565,
       104.0968893 , 100.72023043,  98.70877396,  99.75008654,
       104.89344428, 101.00634942,  98.30543801,  99.18748092])

---

#### Variance

In [61]:
# calculate the variance of each column
np.var(dataset, axis=0)

array([23.64757465, 29.78886109, 20.50542011, 26.03204443, 28.38853175,
       19.09960817, 17.67291174, 16.17923204])

In [62]:
# calculate the variance of the intersection of the last 2 rows and first 2 columns
data1 = np.intersect1d(dataset[-2:], dataset[:, :3])
np.var(data1)

13.824340469038788

The values of the variance might seem a little bit strange at first.   
You can always go back to the topic that gives you a quick statistical overview to recap what you've learned so far.   

> **Note:**   
Just remember, the variance is not the standard deviation.   

Try calculation the standard deviation with NumPy to get a more descriptive value when comparing it to our dataset

In [63]:
# calculate the standard deviation for the dataset
np.std(dataset)

4.838197554269257

### inspecting dataset


In [64]:
dataset

array([[ 99.14931546, 104.03852715, 107.43534677,  97.85230675,
         98.74986914,  98.80833412,  96.81964892,  98.56783189],
       [ 92.02628776,  97.10439252,  99.32066924,  97.24584816,
         92.9267508 ,  92.65657752, 105.7197853 , 101.23162942],
       [ 95.66253664,  95.17750125,  90.93318132, 110.18889465,
         98.80084371, 105.95297652,  98.37481387, 106.54654286],
       [ 91.37294597, 100.96781394, 100.40118279, 113.42090475,
        105.48508838,  91.6604946 , 106.1472841 ,  95.08715803],
       [101.20862522, 103.5730309 , 100.28690912, 105.85269352,
         93.37126331, 108.57980357, 100.79478953,  94.20019732],
       [102.80387079,  98.29687616,  93.24376389,  97.24130034,
         89.03452725,  96.2832753 , 104.60344836, 101.13442416],
       [106.71751618, 102.97585605,  98.45723272, 100.72418901,
        106.39798503,  95.46493436,  94.35373179, 106.83273763],
       [ 96.02548256, 102.82360856, 106.47551845, 101.34745901,
        102.45651798,  98.7476749

In [65]:
len(dataset)

24

In [66]:
# first column
for row in dataset:
    print(row[0])

99.14931546
92.02628776
95.66253664
91.37294597
101.20862522
102.80387079
106.71751618
96.02548256
105.30350449
110.44484313
101.3514185
97.21315663
95.65982034
100.39303522
103.1521596
106.11454989
96.78266211
101.86186193
97.49594839
96.76814836
106.89005002
99.80873105
96.10020311
94.11176915


In [67]:
# first row
dataset[0]

array([ 99.14931546, 104.03852715, 107.43534677,  97.85230675,
        98.74986914,  98.80833412,  96.81964892,  98.56783189])

In [68]:
# first column
dataset[:, 0]

array([ 99.14931546,  92.02628776,  95.66253664,  91.37294597,
       101.20862522, 102.80387079, 106.71751618,  96.02548256,
       105.30350449, 110.44484313, 101.3514185 ,  97.21315663,
        95.65982034, 100.39303522, 103.1521596 , 106.11454989,
        96.78266211, 101.86186193,  97.49594839,  96.76814836,
       106.89005002,  99.80873105,  96.10020311,  94.11176915])

In [69]:
# last 3 row
dataset[-3:]

array([[ 99.80873105, 101.63973121, 106.46476468, 110.43976681,
        100.69156231,  99.99579473, 101.32113654,  94.76253572],
       [ 96.10020311,  94.57421727, 100.80409326, 105.02389857,
         98.61325194,  95.62359311,  97.99762409, 103.83852459],
       [ 94.11176915,  99.62387832, 104.51786419,  97.62787811,
         93.97853495,  98.75108352, 106.05042487, 100.07721494]])

In [70]:
# intersection
dataset[3:, :3]

array([[ 91.37294597, 100.96781394, 100.40118279],
       [101.20862522, 103.5730309 , 100.28690912],
       [102.80387079,  98.29687616,  93.24376389],
       [106.71751618, 102.97585605,  98.45723272],
       [ 96.02548256, 102.82360856, 106.47551845],
       [105.30350449,  92.87730812, 103.19258339],
       [110.44484313,  93.87155456, 101.5363647 ],
       [101.3514185 , 100.37372248, 106.6471081 ],
       [ 97.21315663, 107.02874163, 102.17642112],
       [ 95.65982034, 107.22482426, 107.19119932],
       [100.39303522,  92.0108226 ,  97.75887636],
       [103.1521596 , 109.40523174,  93.83969256],
       [106.11454989,  88.80221141,  94.5081787 ],
       [ 96.78266211,  99.84251605, 104.03478031],
       [101.86186193, 103.61720152,  99.57859892],
       [ 97.49594839,  96.59385486, 104.63817694],
       [ 96.76814836,  91.6779221 , 101.79132774],
       [106.89005002, 106.57364584, 102.26648279],
       [ 99.80873105, 101.63973121, 106.46476468],
       [ 96.10020311,  94.57421

In [71]:
np.intersect1d(dataset[3:], dataset[:,:3])


array([ 88.80221141,  91.37294597,  91.6779221 ,  92.0108226 ,
        92.87730812,  93.24376389,  93.83969256,  93.87155456,
        94.11176915,  94.5081787 ,  94.57421727,  95.65982034,
        96.02548256,  96.10020311,  96.59385486,  96.76814836,
        96.78266211,  97.21315663,  97.49594839,  97.75887636,
        98.29687616,  98.45723272,  99.57859892,  99.62387832,
        99.80873105,  99.84251605, 100.28690912, 100.37372248,
       100.39303522, 100.40118279, 100.80409326, 100.96781394,
       101.20862522, 101.3514185 , 101.5363647 , 101.63973121,
       101.79132774, 101.86186193, 102.17642112, 102.26648279,
       102.80387079, 102.82360856, 102.97585605, 103.1521596 ,
       103.19258339, 103.5730309 , 103.61720152, 104.03478031,
       104.51786419, 104.63817694, 105.30350449, 106.11454989,
       106.46476468, 106.47551845, 106.57364584, 106.6471081 ,
       106.71751618, 106.89005002, 107.02874163, 107.19119932,
       107.22482426, 109.40523174, 110.44484313])