## Activity 02: Indexing, Slicing, and Iterating

Our client wants to prove that our dataset is nicely distributed around the mean value of 100.   
They asked us to run some tests on several subsections of it to make sure they won't get a non-descriptive section of our data.

Look at the mean value of each subtask.

#### Loading the dataset

In [1]:
# importing the necessary dependencies
import numpy as np

In [19]:
# loading the Dataset
dataset = np.genfromtxt('./data/normal_distribution.csv', delimiter=',')
dataset


array([[ 99.14931546, 104.03852715, 107.43534677,  97.85230675,
         98.74986914,  98.80833412,  96.81964892,  98.56783189,
        101.34745901],
       [ 92.02628776,  97.10439252,  99.32066924,  97.24584816,
         92.9267508 ,  92.65657752, 105.7197853 , 101.23162942,
         93.87155456],
       [ 95.66253664,  95.17750125,  90.93318132, 110.18889465,
         98.80084371, 105.95297652,  98.37481387, 106.54654286,
        107.22482426],
       [ 91.37294597, 100.96781394, 100.40118279, 113.42090475,
        105.48508838,  91.6604946 , 106.1472841 ,  95.08715803,
        103.40412146],
       [101.20862522, 103.5730309 , 100.28690912, 105.85269352,
         93.37126331, 108.57980357, 100.79478953,  94.20019732,
         96.10020311],
       [102.80387079,  98.29687616,  93.24376389,  97.24130034,
         89.03452725,  96.2832753 , 104.60344836, 101.13442416,
         97.62787811],
       [106.71751618, 102.97585605,  98.45723272, 100.72418901,
        106.39798503,  95.4649

---

#### Indexing

Since we need several rows of our dataset to complete the given task, we have to use indexing to get the right rows.   
To recap, we need: 
- the second row 
- the last row
- the first value of the first row
- the last value of the second to the last row

In [3]:
# indexing the second row of the dataset (2nd row)
dataset[1]


array([ 92.02628776,  97.10439252,  99.32066924,  97.24584816,
        92.9267508 ,  92.65657752, 105.7197853 , 101.23162942,
        93.87155456])

In [10]:
# indexing the last element of the dataset (last row)
dataset[-1,-1]

106.89005002

In [11]:
# indexing the first value of the second row (1st row, 1st value)

dataset[1,0]

92.02628776

In [12]:
# indexing the last value of the second to last row (we want to use the combined access syntax here) 

dataset[-2,-1]

101.2226037

---

#### Slicing

Other than the single rows and values we also need to get some subsets of the dataset.   
Here we want slices:
- a 2x2 slice starting from the second row and second element to the 4th element in the 4th row
- every other element of the 5th row
- the content of the last row in reversed order

In [13]:
# slicing an intersection of 4 elements (2x2) of the first two rows and first two columns
dataset[0:2,0:2]


array([[ 99.14931546, 104.03852715],
       [ 92.02628776,  97.10439252]])

##### Why is it not a problem if such a small subsection has a bigger standard deviation from 100?

Several smaller values can cluster in such a small subsection leading to the value being really low.   
If we make our subsection larger, we have a higher chance of getting a more expressive view of our data.

In [21]:
# selecting every second element of the fifth row 

dataset[4, 0::2]

array([101.20862522, 100.28690912,  93.37126331, 100.79478953,
        96.10020311])

In [22]:
# selecting every 3rd element of the fifth row 

dataset[4, 0::3]

array([101.20862522, 105.85269352, 100.79478953])

In [26]:
# reversing the entry order, selecting the first two rows in reversed order
dataset[:2,::-1]


array([[101.34745901,  98.56783189,  96.81964892,  98.80833412,
         98.74986914,  97.85230675, 107.43534677, 104.03852715,
         99.14931546],
       [ 93.87155456, 101.23162942, 105.7197853 ,  92.65657752,
         92.9267508 ,  97.24584816,  99.32066924,  97.10439252,
         92.02628776]])

---

#### Splitting

Our client's team only wants to use a small subset of the given dataset.   
Therefore we need to first split it into 3 equal pieces and then give them the first half of the first split.   
They sent us this drawing to show us what they need:
```
1, 2, 3, 4, 5, 6          1, 2     3, 4    5, 6          1, 2  
3, 2, 1, 5, 4, 6    =>    3, 2     1, 5    4, 6    =>    3, 2    =>    1, 2
5, 3, 1, 2, 4, 3          5, 3     1, 2    4, 3                        3, 2
1, 2, 2, 4, 1, 5          1, 2     2, 4    1, 5          5, 3
                                                         1, 2
```

> **Note:**   
We are using a very small dataset here but imagine you have a huge amount of data and only want to look at a small subset of it to tweak your visualizations

In [151]:
# splitting up our dataset horizontally on indices one third and two thirds


In [152]:
# splitting up our dataset vertically on index 2


---

#### Iterating

Once you sent over the dataset they tell you that they also need a way iterate over the whole dataset element by element as if it would be a one-dimensional list.   
However, they want to also now the position in the dataset itself.

They send you this piece of code and tell you that it's not working as mentioned.   
Come up with the right solution for their needs.

In [12]:
# iterating over whole dataset (each value in each row)
curr_index = 0
for x in np.nditer(dataset):
    print(x, curr_index)
    curr_index += 1

NameError: name 'np' is not defined

In [13]:
# iterating over the whole dataset with indices matching the position in the dataset
