## Activity 03: Filtering, Sorting, and Reshaping

Following up on the last activity, we are asked to deliver some more complex operations.   
We will, therefore, continue to work with the same dataset, our `normal_distribution.csv`.

#### Loading the dataset

In [1]:
# importing the necessary dependencies
import numpy as np

In [47]:
dataset.shape

(24, 9)

In [46]:
# loading the Dataset
dataset = np.genfromtxt('./data/normal_distribution.csv', delimiter=',')

---

#### Filtering

To get better insights into our dataset, we want to only look at the value that fulfills certain conditions.   
Our client reaches out to us and asks us to provide lists of values that fulfills these conditions:
- all values greater than 105 (>105)
- all values that are between 90 and 95 (>90 and <95)
- the indices of all values that have a delta of less than 1 to 100 (x-100 < 1)

In [5]:
# values that are greater than 105
# values that are greater than 105
values_greater_than_105 = dataset[dataset > 105]
values_greater_than_105


array([107.43534677, 105.7197853 , 110.18889465, 105.95297652,
       106.54654286, 107.22482426, 113.42090475, 105.48508838,
       106.1472841 , 105.85269352, 108.57980357, 106.71751618,
       106.39798503, 106.83273763, 106.47551845, 105.30350449,
       106.03868807, 110.44484313, 106.6471081 , 105.0320535 ,
       107.02874163, 105.07475277, 106.57364584, 107.22482426,
       107.19119932, 108.09423367, 109.40523174, 106.11454989,
       106.57052697, 105.13668343, 105.37011896, 110.44484313,
       105.86078488, 106.89005002, 106.57364584, 107.40064604,
       106.38276709, 106.46476468, 110.43976681, 105.02389857,
       106.05042487, 106.89005002])

In [6]:
# values that are between 90 and 95
# values that are between 90 and 95
values_between_90_and_95 = dataset[(dataset > 90) & (dataset < 95)]
values_between_90_and_95


array([92.02628776, 92.9267508 , 92.65657752, 93.87155456, 90.93318132,
       91.37294597, 91.6604946 , 93.37126331, 94.20019732, 93.24376389,
       94.35373179, 92.5748759 , 91.37294597, 92.87730812, 93.87155456,
       92.75048583, 93.97853495, 91.32093303, 92.0108226 , 93.18884302,
       93.83969256, 94.5081787 , 94.59300658, 93.04610867, 91.6779221 ,
       91.37294597, 94.76253572, 94.57421727, 94.11176915, 93.97853495])

> **Note:**    
Conditional filtering can be done either using the brackets syntax or NumPys `extract` method

In [7]:
# indices of values that have a delta of less than 1 to 100
indices_delta_less_than_1_to_100 = np.where(np.abs(dataset - 100) < 1)
indices_delta_less_than_1_to_100


(array([ 0,  1,  3,  3,  4,  4,  6,  6,  8,  9, 10, 10, 10, 12, 13, 13, 13,
        14, 14, 15, 16, 16, 17, 17, 18, 18, 20, 21, 21, 21, 22, 23, 23]),
 array([0, 2, 1, 2, 2, 6, 3, 8, 5, 8, 1, 3, 5, 8, 0, 4, 7, 3, 5, 8, 1, 6,
        2, 3, 7, 8, 4, 0, 4, 5, 2, 1, 7]))

---

#### Sorting

They also want to experiment with some more plotting techniques so they ask you to also deliver these datasets:
- values sorted in ascending order for each row
- values sorted in ascending order for each column
- the matrix of indices indicating the position in a sorted list of each value   
```
[3, 1, 2, 5, 4]  =>  [1, 2, 0, 4, 3]
```

In [12]:
import numpy as np

# Original array
arr = np.array([3, 1, 2, 5, 4])

# Argsort to get the indices that would sort the array
sorted_indices = np.argsort(arr)


print(sorted_indices)


[1 2 0 4 3]


> **Note:**   
By default, sorting will always be done along the last axis. In our case this is 1, leading to each row being sorted.

In [34]:
# values sorted for each column
import numpy as np

# Create a sample 2D array
array = np.array([[3, 2, 1], 
                  [6, 5, 8], 
                  [9, 8, 7]])

# Sort each column
sorted_columns = np.sort(array, axis=1)

print("Original Array:\n", array)
print("Sorted Columns:\n", sorted_columns)


Original Array:
 [[3 2 1]
 [6 5 8]
 [9 8 7]]
Sorted Columns:
 [[1 2 3]
 [5 6 8]
 [7 8 9]]


In [35]:
# indices of positions for each row
import numpy as np

# Create a sample 2D array
array = np.array([[3, 2, 1], 
                  [6, 55, 4], 
                  [9, 8, 7]])

# Get the indices of positions that would sort each row
sorted_indices_rows = np.argsort(array, axis=0)

print("Original Array:\n", array)
print("Sorted Indices for Each Row:\n", sorted_indices_rows)


Original Array:
 [[ 3  2  1]
 [ 6 55  4]
 [ 9  8  7]]
Sorted Indices for Each Row:
 [[0 0 0]
 [1 2 1]
 [2 1 2]]


---

#### Combining

After finishing their visualization and doing ask you to deliver a way they can incrementally add the split parts of the dataset to make sure it works with every subset, too.   
They want you to send them examples for:
- adding the second half of the first column
- adding the second column
- adding the third and last separate column


In [43]:
dataset_1 = np.random.randint(1,10,size=(6,6))

In [44]:
dataset_1

array([[4, 8, 1, 6, 9, 4],
       [2, 4, 2, 3, 5, 8],
       [2, 6, 3, 9, 9, 9],
       [2, 7, 2, 5, 3, 3],
       [5, 4, 7, 9, 8, 8],
       [3, 4, 8, 9, 1, 8]])

In [45]:
np.hsplit(dataset_1,3)

[array([[4, 8],
        [2, 4],
        [2, 6],
        [2, 7],
        [5, 4],
        [3, 4]]),
 array([[1, 6],
        [2, 3],
        [3, 9],
        [2, 5],
        [7, 9],
        [8, 9]]),
 array([[9, 4],
        [5, 8],
        [9, 9],
        [3, 3],
        [8, 8],
        [1, 8]])]

In [36]:
thirds = np.hsplit(dataset, (3))
thirds

[array([[ 99.14931546, 104.03852715, 107.43534677],
        [ 92.02628776,  97.10439252,  99.32066924],
        [ 95.66253664,  95.17750125,  90.93318132],
        [ 91.37294597, 100.96781394, 100.40118279],
        [101.20862522, 103.5730309 , 100.28690912],
        [102.80387079,  98.29687616,  93.24376389],
        [106.71751618, 102.97585605,  98.45723272],
        [ 96.02548256, 102.82360856, 106.47551845],
        [105.30350449,  92.87730812, 103.19258339],
        [110.44484313,  93.87155456, 101.5363647 ],
        [101.3514185 , 100.37372248, 106.6471081 ],
        [ 97.21315663, 107.02874163, 102.17642112],
        [ 95.65982034, 107.22482426, 107.19119932],
        [100.39303522,  92.0108226 ,  97.75887636],
        [103.1521596 , 109.40523174,  93.83969256],
        [106.11454989,  88.80221141,  94.5081787 ],
        [ 96.78266211,  99.84251605, 104.03478031],
        [101.86186193, 103.61720152,  99.57859892],
        [ 97.49594839,  96.59385486, 104.63817694],
        [ 96

In [18]:
# split up dataset from activity03
thirds = np.hsplit(dataset, (3))
halfed_first = np.vsplit(thirds[0], (2))

# this is the part we've sent the client in activity03
halfed_first[0]

array([[ 99.14931546, 104.03852715, 107.43534677],
       [ 92.02628776,  97.10439252,  99.32066924],
       [ 95.66253664,  95.17750125,  90.93318132],
       [ 91.37294597, 100.96781394, 100.40118279],
       [101.20862522, 103.5730309 , 100.28690912],
       [102.80387079,  98.29687616,  93.24376389],
       [106.71751618, 102.97585605,  98.45723272],
       [ 96.02548256, 102.82360856, 106.47551845],
       [105.30350449,  92.87730812, 103.19258339],
       [110.44484313,  93.87155456, 101.5363647 ],
       [101.3514185 , 100.37372248, 106.6471081 ],
       [ 97.21315663, 107.02874163, 102.17642112]])

In [70]:
# adding the second half of the first column to the data


In [78]:
# adding the second column to our combined dataset


In [79]:
# adding the third column to our combined dataset


## Concatenate

In [19]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8, 9], [10, 11, 12]])
np.concatenate((a, b), axis=0)
# Output:
# array([[ 1,  2,  3],
#        [ 4,  5,  6],
#        [ 7,  8,  9],
#        [10, 11, 12]])


array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [20]:
np.concatenate((a, b), axis=1)



array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

## Stack

In [48]:
l = np.stack((a, b), axis=0)
l.shape

(2, 2, 3)

In [49]:
l

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [50]:
k= np.hstack((a, b))
k.shape

(2, 6)

In [30]:
np.hstack((a, b)),np.vstack((a, b))


(array([[ 1,  2,  3,  7,  8,  9],
        [ 4,  5,  6, 10, 11, 12]]),
 array([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]]))

> **Note:**    
The same results can be achieved with `np.concatenate` and `np.stack`.    
For both methods, you need to provide the axis onto which it should be appended.   
Depending on your preferences you might want to use those.

---

#### Reshaping

For their internal AI algorithms, they need the dataset in a reshaped manner that reduces the number of columns.   
They asked us to deliver the whole dataset in the following shapes:
- reshaped in a one-dimensional list with all values
- reshaped in a matrix with only 2 columns

In [51]:
# reshaping to a list of values
# reshaping to a list of values
reshaped_to_list = dataset_1.flatten()
reshaped_to_list


array([4, 8, 1, 6, 9, 4, 2, 4, 2, 3, 5, 8, 2, 6, 3, 9, 9, 9, 2, 7, 2, 5,
       3, 3, 5, 4, 7, 9, 8, 8, 3, 4, 8, 9, 1, 8])

In [56]:
# reshaping to a matrix with two columns
# reshaping to a matrix with two columns
reshaped_to_two_columns = dataset_1.reshape(9,-1)
reshaped_to_two_columns


array([[4, 8, 1, 6],
       [9, 4, 2, 4],
       [2, 3, 5, 8],
       [2, 6, 3, 9],
       [9, 9, 2, 7],
       [2, 5, 3, 3],
       [5, 4, 7, 9],
       [8, 8, 3, 4],
       [8, 9, 1, 8]])

> **Note:**   
-1 in the dimension definition means that it figures out the other dimension on its own