## Activity 02: Indexing, Slicing, and Iterating

Our client wants to prove that our dataset is nicely distributed around the mean value of 100.   
They asked us to run some tests on several subsections of it to make sure they won't get a non-descriptive section of our data.

Look at the mean value of each subtask.

#### Loading the dataset

In [285]:
# importing the necessary dependencies
import numpy as np

In [286]:
# loading the Dataset
dataset = np.genfromtxt('normal_distribution2.csv', delimiter=',')

---

#### Indexing

Since we need several rows of our dataset to complete the given task, we have to use indexing to get the right rows.   
To recap, we need: 
- the second row 
- the last row
- the first value of the first row
- the last value of the second to the last row

In [287]:
# indexing the second row of the dataset (2nd row)
print(dataset[2])


[ 95.66253664  95.17750125  90.93318132 110.18889465  98.80084371
 105.95297652  98.37481387 106.54654286 107.22482426]


In [288]:
# indexing the last element of the dataset (last row)
print(dataset[-1])


[ 94.11176915  99.62387832 104.51786419  97.62787811  93.97853495
  98.75108352 106.05042487 100.07721494 106.89005002]


In [289]:
# indexing the first value of the second row (1st row, 1st value)
print(dataset[1,0])


92.02628776


In [290]:
# indexing the last value of the second to last row (we want to use the combined access syntax here) 
print(dataset[-2,-1])


101.2226037


---

#### Slicing

Other than the single rows and values we also need to get some subsets of the dataset.   
Here we want slices:
- a 2x2 slice starting from the second row and second element to the 4th element in the 4th row
- every other element of the 5th row
- the content of the last row in reversed order

In [291]:
# slicing an intersection of 4 elements (2x2) of the first two rows and first two columns
print(dataset[:2,:2])


[[ 99.14931546 104.03852715]
 [ 92.02628776  97.10439252]]


##### Why is it not a problem if such a small subsection has a bigger standard deviation from 100?

Several smaller values can cluster in such a small subsection leading to the value being really low.   
If we make our subsection larger, we have a higher chance of getting a more expressive view of our data.

In [292]:
# selecting every second element of the fifth row 
print(dataset[4,::2])


[101.20862522 100.28690912  93.37126331 100.79478953  96.10020311]


In [293]:
# reversing the entry order, selecting the first two rows in reversed order
print(dataset[1::-1,::-1])


[[ 93.87155456 101.23162942 105.7197853   92.65657752  92.9267508
   97.24584816  99.32066924  97.10439252  92.02628776]
 [101.34745901  98.56783189  96.81964892  98.80833412  98.74986914
   97.85230675 107.43534677 104.03852715  99.14931546]]


---

#### Splitting

Our client's team only wants to use a small subset of the given dataset.   
Therefore we need to first split it into 3 equal pieces and then give them the first half of the first split.   
They sent us this drawing to show us what they need:
```
1, 2, 3, 4, 5, 6          1, 2     3, 4    5, 6          1, 2  
3, 2, 1, 5, 4, 6    =>    3, 2     1, 5    4, 6    =>    3, 2    =>    1, 2
5, 3, 1, 2, 4, 3          5, 3     1, 2    4, 3                        3, 2
1, 2, 2, 4, 1, 5          1, 2     2, 4    1, 5          5, 3
                                                         1, 2
```

> **Note:**   
We are using a very small dataset here but imagine you have a huge amount of data and only want to look at a small subset of it to tweak your visualizations

In [294]:
# splitting up our dataset horizontally on indices one third and two thirds
ncol = dataset.shape[1]
a = int(ncol/3)
for i in range(0, ncol, a):
    print(dataset[:,i:i+a])

[[ 99.14931546 104.03852715 107.43534677]
 [ 92.02628776  97.10439252  99.32066924]
 [ 95.66253664  95.17750125  90.93318132]
 [ 91.37294597 100.96781394 100.40118279]
 [101.20862522 103.5730309  100.28690912]
 [102.80387079  98.29687616  93.24376389]
 [106.71751618 102.97585605  98.45723272]
 [ 96.02548256 102.82360856 106.47551845]
 [105.30350449  92.87730812 103.19258339]
 [110.44484313  93.87155456 101.5363647 ]
 [101.3514185  100.37372248 106.6471081 ]
 [ 97.21315663 107.02874163 102.17642112]
 [ 95.65982034 107.22482426 107.19119932]
 [100.39303522  92.0108226   97.75887636]
 [103.1521596  109.40523174  93.83969256]
 [106.11454989  88.80221141  94.5081787 ]
 [ 96.78266211  99.84251605 104.03478031]
 [101.86186193 103.61720152  99.57859892]
 [ 97.49594839  96.59385486 104.63817694]
 [ 96.76814836  91.6779221  101.79132774]
 [106.89005002 106.57364584 102.26648279]
 [ 99.80873105 101.63973121 106.46476468]
 [ 96.10020311  94.57421727 100.80409326]
 [ 94.11176915  99.62387832 104.51

In [295]:
# splitting up our dataset vertically on index 2
print(dataset[:2])
print(dataset[2:])

[[ 99.14931546 104.03852715 107.43534677  97.85230675  98.74986914
   98.80833412  96.81964892  98.56783189 101.34745901]
 [ 92.02628776  97.10439252  99.32066924  97.24584816  92.9267508
   92.65657752 105.7197853  101.23162942  93.87155456]]
[[ 95.66253664  95.17750125  90.93318132 110.18889465  98.80084371
  105.95297652  98.37481387 106.54654286 107.22482426]
 [ 91.37294597 100.96781394 100.40118279 113.42090475 105.48508838
   91.6604946  106.1472841   95.08715803 103.40412146]
 [101.20862522 103.5730309  100.28690912 105.85269352  93.37126331
  108.57980357 100.79478953  94.20019732  96.10020311]
 [102.80387079  98.29687616  93.24376389  97.24130034  89.03452725
   96.2832753  104.60344836 101.13442416  97.62787811]
 [106.71751618 102.97585605  98.45723272 100.72418901 106.39798503
   95.46493436  94.35373179 106.83273763 100.07721494]
 [ 96.02548256 102.82360856 106.47551845 101.34745901 102.45651798
   98.74767493  97.57544275  92.5748759   91.37294597]
 [105.30350449  92.87730

---

#### Iterating

Once you sent over the dataset they tell you that they also need a way iterate over the whole dataset element by element as if it would be a one-dimensional list.   
However, they want to also now the position in the dataset itself.

They send you this piece of code and tell you that it's not working as mentioned.   
Come up with the right solution for their needs.

In [296]:
# iterating over whole dataset (each value in each row)
curr_index = 0
for x in np.nditer(dataset):
    print(x)

99.14931546
104.03852715
107.43534677
97.85230675
98.74986914
98.80833412
96.81964892
98.56783189
101.34745901
92.02628776
97.10439252
99.32066924
97.24584816
92.9267508
92.65657752
105.7197853
101.23162942
93.87155456
95.66253664
95.17750125
90.93318132
110.18889465
98.80084371
105.95297652
98.37481387
106.54654286
107.22482426
91.37294597
100.96781394
100.40118279
113.42090475
105.48508838
91.6604946
106.1472841
95.08715803
103.40412146
101.20862522
103.5730309
100.28690912
105.85269352
93.37126331
108.57980357
100.79478953
94.20019732
96.10020311
102.80387079
98.29687616
93.24376389
97.24130034
89.03452725
96.2832753
104.60344836
101.13442416
97.62787811
106.71751618
102.97585605
98.45723272
100.72418901
106.39798503
95.46493436
94.35373179
106.83273763
100.07721494
96.02548256
102.82360856
106.47551845
101.34745901
102.45651798
98.74767493
97.57544275
92.5748759
91.37294597
105.30350449
92.87730812
103.19258339
104.40518318
101.29326772
100.85447132
101.2226037
106.03868807
97.8523

In [297]:
# iterating over the whole dataset with indices matching the position in the dataset
for rows in range(dataset.shape[0]):
    for col in range(dataset.shape[1]):
        print(dataset[rows,col], f"({col}, {rows})")

99.14931546 (0, 0)
104.03852715 (1, 0)
107.43534677 (2, 0)
97.85230675 (3, 0)
98.74986914 (4, 0)
98.80833412 (5, 0)
96.81964892 (6, 0)
98.56783189 (7, 0)
101.34745901 (8, 0)
92.02628776 (0, 1)
97.10439252 (1, 1)
99.32066924 (2, 1)
97.24584816 (3, 1)
92.9267508 (4, 1)
92.65657752 (5, 1)
105.7197853 (6, 1)
101.23162942 (7, 1)
93.87155456 (8, 1)
95.66253664 (0, 2)
95.17750125 (1, 2)
90.93318132 (2, 2)
110.18889465 (3, 2)
98.80084371 (4, 2)
105.95297652 (5, 2)
98.37481387 (6, 2)
106.54654286 (7, 2)
107.22482426 (8, 2)
91.37294597 (0, 3)
100.96781394 (1, 3)
100.40118279 (2, 3)
113.42090475 (3, 3)
105.48508838 (4, 3)
91.6604946 (5, 3)
106.1472841 (6, 3)
95.08715803 (7, 3)
103.40412146 (8, 3)
101.20862522 (0, 4)
103.5730309 (1, 4)
100.28690912 (2, 4)
105.85269352 (3, 4)
93.37126331 (4, 4)
108.57980357 (5, 4)
100.79478953 (6, 4)
94.20019732 (7, 4)
96.10020311 (8, 4)
102.80387079 (0, 5)
98.29687616 (1, 5)
93.24376389 (2, 5)
97.24130034 (3, 5)
89.03452725 (4, 5)
96.2832753 (5, 5)
104.60344836 (6