## Activity 03: Filtering, Sorting, and Reshaping

Following up on the last activity, we are asked to deliver some more complex operations.   
We will, therefore, continue to work with the same dataset, our `normal_distribution.csv`.

#### Loading the dataset

In [1]:
# importing the necessary dependencies
import numpy as np

In [2]:
# loading the Dataset
dataset = np.genfromtxt('./data/normal_distribution.csv', delimiter=',')

---

#### Filtering

To get better insights into our dataset, we want to only look at the value that fulfills certain conditions.   
Our client reaches out to us and asks us to provide lists of values that fulfills these conditions:
- all values greater than 95 (>95)
- all values that are between 80 and 95 (>80 and <95)
- the indices of all values that have a delta of less than 1 to 100 (x-100 < 1)

In [3]:
# values that are greater than 95
dataset[dataset>95]


array([100.,  96., 100.,  97., 100., 100.])

In [4]:
# values that are between 80 and 95
dataset[(dataset>80) & (dataset<95)]

array([85., 90., 82., 85., 90., 90., 90., 81., 82., 82., 87., 87., 83.,
       82., 89.])

> **Note:**    
Conditional filtering can be done either using the brackets syntax or NumPys `extract` method

In [5]:
# indices of values that have a delta of less than 1 to 100

#Solution-1
print("Solution-1:")
rows, cols = np.where(abs(dataset - 100) < 1)
print('row is :',rows)
print('col is :',cols)

my_list = []
for (index,_) in np.ndenumerate(rows):
    my_list.append([rows[index],cols[index]])

print("indices of values that have a delta of less than 1 to 100", my_list)


    
#Solution-2
print("Solution-2:")
one_away_indices = [[rows[index], cols[index]] for (index, _) in np.ndenumerate(rows)]
print(one_away_indices)


Solution-1:
row is : [2 3 4 5]
col is : [2 2 7 7]
indices of values that have a delta of less than 1 to 100 [[2, 2], [3, 2], [4, 7], [5, 7]]
Solution-2:
[[2, 2], [3, 2], [4, 7], [5, 7]]


---

#### Sorting

They also want to experiment with some more plotting techniques so they ask you to also deliver these datasets:
- values sorted in ascending order for each row
- values sorted in ascending order for each column
- the matrix of indices indicating the position in a sorted list of each value   
```
[3, 1, 2, 5, 4]  =>  [1, 2, 0, 4, 3]
```

In [6]:
# values sorted for each row
rows_sorted = np.sort(dataset,axis=1)
print(rows_sorted)
 




[[ 59.  65.  71.  72.  75.  75.  85.  90.]
 [ 65.  71.  71.  75.  80.  82.  85.  90.]
 [ 59.  61.  75.  80.  90.  90.  96. 100.]
 [ 63.  64.  77.  81.  82.  95.  97. 100.]
 [ 65.  67.  69.  72.  82.  87.  87. 100.]
 [ 58.  75.  76.  79.  82.  83.  89. 100.]]


> **Note:**   
By default, sorting will always be done along the last axis. In our case this is 1, leading to each row being sorted.

In [7]:
# values sorted for each column
cols_sorted = np.sort(dataset, axis = 0)
print(cols_sorted)

[[ 80.  65.  65.  58.  59.  59.  65.  61.]
 [ 81.  75.  67.  72.  71.  63.  72.  64.]
 [ 82.  75.  79.  75.  82.  69.  87.  71.]
 [ 83.  82.  85.  77.  87.  71.  89.  75.]
 [ 85.  82. 100.  80.  90.  75.  96. 100.]
 [ 90.  90. 100.  90.  95.  76.  97. 100.]]


In [8]:
# indices of positions for each row
index_sorted = np.argsort(dataset)
print(index_sorted)

[[4 2 5 6 3 7 0 1]
 [6 4 7 5 0 1 2 3]
 [5 7 1 3 0 4 6 2]
 [5 7 3 0 1 4 6 2]
 [1 2 5 3 0 4 6 7]
 [3 1 5 2 4 0 6 7]]


---

#### Combining

After finishing their visualization and doing ask you to deliver a way they can incrementally add the split parts of the dataset to make sure it works with every subset, too.   
They want you to send them examples for:
- adding the second half of the first column
- adding the second column
- adding the third and last separate column


In [9]:
# split up dataset from activity03
quarter = np.hsplit(dataset, (4))
print(quarter)

halfed_first = np.vsplit(quarter[0], (2))
print(halfed_first)
# # this is the part we've sent the client in activity03
halfed_first[0]

[array([[85., 90.],
       [80., 82.],
       [90., 75.],
       [81., 82.],
       [82., 65.],
       [83., 75.]]), array([[ 65.,  75.],
       [ 85.,  90.],
       [100.,  80.],
       [100.,  77.],
       [ 67.,  72.],
       [ 79.,  58.]]), array([[59., 71.],
       [71., 75.],
       [90., 59.],
       [95., 63.],
       [87., 69.],
       [82., 76.]]), array([[ 72.,  75.],
       [ 65.,  71.],
       [ 96.,  61.],
       [ 97.,  64.],
       [ 87., 100.],
       [ 89., 100.]])]
[array([[85., 90.],
       [80., 82.],
       [90., 75.]]), array([[81., 82.],
       [82., 65.],
       [83., 75.]])]


array([[85., 90.],
       [80., 82.],
       [90., 75.]])

In [10]:
# adding the second half of the first column to the data
first_col = np.vstack([halfed_first[0], halfed_first[1]])
print(first_col)

[[85. 90.]
 [80. 82.]
 [90. 75.]
 [81. 82.]
 [82. 65.]
 [83. 75.]]


In [12]:
# adding the second column to our combined dataset
first_second_col = np.hstack([first_col, quarter[1]])
print(first_second_col)

[[ 85.  90.  65.  75.]
 [ 80.  82.  85.  90.]
 [ 90.  75. 100.  80.]
 [ 81.  82. 100.  77.]
 [ 82.  65.  67.  72.]
 [ 83.  75.  79.  58.]]


In [14]:
# adding the third column to our combined dataset
first_second_third_col = np.hstack([first_second_col, quarter[2]])
print(first_second_third_col)

[[ 85.  90.  65.  75.  59.  71.]
 [ 80.  82.  85.  90.  71.  75.]
 [ 90.  75. 100.  80.  90.  59.]
 [ 81.  82. 100.  77.  95.  63.]
 [ 82.  65.  67.  72.  87.  69.]
 [ 83.  75.  79.  58.  82.  76.]]


> **Note:**    
The same results can be achieved with `np.concatenate` and `np.stack`.    
For both methods, you need to provide the axis onto which it should be appended.   
Depending on your preferences you might want to use those.

---

#### Reshaping

For their internal AI algorithms, they need the dataset in a reshaped manner that reduces the number of columns.   
They asked us to deliver the whole dataset in the following shapes:
- reshaped in a one-dimensional list with all values
- reshaped in a matrix with only 2 columns

In [25]:
# reshaping to a list of values
print(dataset.shape)
one_dim = dataset.reshape(1,48)
print(one_dim)
print(one_dim.shape)

(6, 8)
[[ 85.  90.  65.  75.  59.  71.  72.  75.  80.  82.  85.  90.  71.  75.
   65.  71.  90.  75. 100.  80.  90.  59.  96.  61.  81.  82. 100.  77.
   95.  63.  97.  64.  82.  65.  67.  72.  87.  69.  87. 100.  83.  75.
   79.  58.  82.  76.  89. 100.]]
(1, 48)


In [32]:
# reshaping to a matrix with two columns
reshaped = dataset.reshape(24,2)
auto_reshaped = dataset.reshape(24,-1)  # -1 means that it will figure out the other dimention itself - good for large datasets
print(reshaped)
print(reshaped.shape)
print("--------------")
print(auto_reshaped)
print(auto_reshaped.shape)


[[ 85.  90.]
 [ 65.  75.]
 [ 59.  71.]
 [ 72.  75.]
 [ 80.  82.]
 [ 85.  90.]
 [ 71.  75.]
 [ 65.  71.]
 [ 90.  75.]
 [100.  80.]
 [ 90.  59.]
 [ 96.  61.]
 [ 81.  82.]
 [100.  77.]
 [ 95.  63.]
 [ 97.  64.]
 [ 82.  65.]
 [ 67.  72.]
 [ 87.  69.]
 [ 87. 100.]
 [ 83.  75.]
 [ 79.  58.]
 [ 82.  76.]
 [ 89. 100.]]
(24, 2)
--------------
[[ 85.  90.]
 [ 65.  75.]
 [ 59.  71.]
 [ 72.  75.]
 [ 80.  82.]
 [ 85.  90.]
 [ 71.  75.]
 [ 65.  71.]
 [ 90.  75.]
 [100.  80.]
 [ 90.  59.]
 [ 96.  61.]
 [ 81.  82.]
 [100.  77.]
 [ 95.  63.]
 [ 97.  64.]
 [ 82.  65.]
 [ 67.  72.]
 [ 87.  69.]
 [ 87. 100.]
 [ 83.  75.]
 [ 79.  58.]
 [ 82.  76.]
 [ 89. 100.]]
(24, 2)


> **Note:**   
-1 in the dimension definition means that it figures out the other dimension on its own