<h1 align = center>Numpy Library - Indexing, Slicing and Filtering Arrays</h1>

**Table of contents**<a id='toc0_'></a>    
- [Indexing Array Elements](#toc1_1_)    
    - [Indexing 1D Arrays](#toc1_1_1_)    
    - [Indexing 2D-Arrays](#toc1_1_2_)    
      - [Accessing rows](#toc1_1_2_1_)    
      - [Accessing Multiple Rows at a Time](#toc1_1_2_2_)    
      - [Accessing Columns](#toc1_1_2_3_)    
      - [Accessing a Particular Set of Elements from ND-Arrays](#toc1_1_2_4_)    
    - [Accessing Elements in a 3D array](#toc1_1_3_)    
      - [Accessing a Complete Inner Array](#toc1_1_3_1_)    
      - [Accessing a Range of Complete Inner Arrays](#toc1_1_3_2_)    
      - [Accessing a Single Row from a Single Inner List](#toc1_1_3_3_)    
      - [Accessing a Column](#toc1_1_3_4_)    
      - [Accessing a Single Element](#toc1_1_3_5_)    
  - [Filtering Arrays](#toc1_2_)    
    - [Masking Method](#toc1_2_1_)    
      - [Basic Filtering](#toc1_2_1_1_)    
      - [Compound Conditions](#toc1_2_1_2_)    
      - [Logical Operations](#toc1_2_1_3_)    
    - [The `np.where()` Method](#toc1_2_2_)    
      - [Basic `np.where()` Usage](#toc1_2_2_1_)    
      - [Compound Conditions with `np.where()`](#toc1_2_2_2_)    
      - [Logical Operations with `np.where()`](#toc1_2_2_3_)    
    - [Inverting Filtering Masks](#toc1_2_3_)    
    - [Filtering with Custom Functions](#toc1_2_4_)    
    - [Filtering Based on Row or Column Properties](#toc1_2_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_1_'></a>[Indexing Array Elements](#toc0_)

### <a id='toc1_1_1_'></a>[Indexing 1D Arrays](#toc0_)

In [1]:
import numpy as np

In [2]:
an_array = np.array([1,2,3,4,5])
second_element = an_array[1]
fifth_element = an_array[4]
first_element_from_last = an_array[-1]
third_element_from_last = an_array[-3]


print(f'second element : {second_element}')
print(f'fifth element : {fifth_element}')
print(f'first element from the last : {first_element_from_last}')
print(f'third element from the last : {third_element_from_last}')

second element : 2
fifth element : 5
first element from the last : 5
third element from the last : 3


### <a id='toc1_1_2_'></a>[Indexing 2D-Arrays](#toc0_)

In [3]:
# defining a 2D array
b_array = np.array([
    [1,2,3,4,5],
    [6,7,8,9,10],
    [11,12,13,14,15]
])

b_array.shape

(3, 5)

#### <a id='toc1_1_2_1_'></a>[Accessing rows](#toc0_)

In [4]:
print(f'first row : {   b_array[0]  }')
print(f'second row : {  b_array[1]  }')

first row : [1 2 3 4 5]
second row : [ 6  7  8  9 10]


#### <a id='toc1_1_2_2_'></a>[Accessing Multiple Rows at a Time](#toc0_)
- Multiple row are accessed using `:`. A colon indicates a range of rows (or columns for that matter).
- __Syntax__
  - `our_array[starting_index : ending_index]`
  - Here starting index is included while selecting rows/columns, but the ending index is not included. That means, if we write 10 in place of ending index, the row at index 10 will not be included, instead, rows till index 9 will be included. 

In [5]:
# this will select rows at index 0 and 1, where index 2 will not be included. 
print(f'first two rows : \n{b_array[0:2]}') 

first two rows : 
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


- A simple colon `:` without any starting or ending index will select all the rows

In [6]:
# selecting all the rows
print(f"all the rows selected with colon `:` \n{    b_array[ : ]}")

all the rows selected with colon `:` 
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]


#### <a id='toc1_1_2_3_'></a>[Accessing Columns](#toc0_)
- The `:` in place of row index indicates that we want to select all the rows and while selecting a column, we do need all the rows as how can we get a complete column if any row is missing? That is why we need all the rows, hence we use `:` for the said purpose. 
- The column index is mention after the row index, where both row and column indexes are separated using a comma `,`. 
- __Syntax__:
  - array_name[row_indexes , column indexes]
  - Using columns `:` in place of both, rows and columns will fetch all the rows and all the columns.

In [7]:
print(f'second column : {    b_array[:,1]   }')
print(f'fourth column : {   b_array[:,3]    }')
print(f'last column : {     b_array[:,-1]   }')

second column : [ 2  7 12]
fourth column : [ 4  9 14]
last column : [ 5 10 15]


In [8]:
b_array[: , :]

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

#### <a id='toc1_1_2_4_'></a>[Accessing a Particular Set of Elements from ND-Arrays](#toc0_)

- 4 elements from top left

In [9]:
b_array[0:2, 0:2]

array([[1, 2],
       [6, 7]])

- Accessing 4th and 5th elements of the second and third row respectively. 

In [10]:
b_array[1:3,2:4]

array([[ 8,  9],
       [13, 14]])

### <a id='toc1_1_3_'></a>[Accessing Elements in a 3D array](#toc0_)

In [11]:
c_array = np.array(
    [

        [
            [1, 2, 3],
            [4, 5, 6]
        ],
        [
            [7, 8, 9],
            [10, 11, 12]
        ],
        [
            [13, 14, 15],
            [16, 17, 18]
        ]

    ]
)

print(c_array)
print(f"array shape : {c_array.shape}")

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]

 [[13 14 15]
  [16 17 18]]]
array shape : (3, 2, 3)


#### <a id='toc1_1_3_1_'></a>[Accessing a Complete Inner Array](#toc0_)

In [12]:
c_array[0]

array([[1, 2, 3],
       [4, 5, 6]])

#### <a id='toc1_1_3_2_'></a>[Accessing a Range of Complete Inner Arrays](#toc0_)

In [13]:
c_array[1:3] # last two inner arrays

array([[[ 7,  8,  9],
        [10, 11, 12]],

       [[13, 14, 15],
        [16, 17, 18]]])

#### <a id='toc1_1_3_3_'></a>[Accessing a Single Row from a Single Inner List](#toc0_)

In [14]:
'''
1. first row of second inner array

c_array[1 , 0]
the 1 indicates which outer list to select, here we are selecting outer list at index 1
the 0 indicates inside selected list at index 1, which inner list we want to access
'''
c_array[1 , 0]



array([7, 8, 9])

In [15]:
# second row of third inner array
c_array[2 , 1]


array([16, 17, 18])

#### <a id='toc1_1_3_4_'></a>[Accessing a Column](#toc0_)
- Accessing a column is tricky in a 3D array.
- The first argument reflects which of the outer lists we want to select, the second argument reflects which rows of these lists we want to select and third index reflects which columns to select. 

In [16]:
# 2nd columns of all the lists
c_array[: , : , 1]

array([[ 2,  5],
       [ 8, 11],
       [14, 17]])

In [17]:
# 1st column of all the lists 
c_array[: , : , 0]

array([[ 1,  4],
       [ 7, 10],
       [13, 16]])

#### <a id='toc1_1_3_5_'></a>[Accessing a Single Element](#toc0_)
- Selecting 14 from the third inner list.

In [18]:
c_array[2 , 0 , 1]

14

## <a id='toc1_2_'></a>[Filtering Arrays](#toc0_)
- There are two main methods to filter numpy arrays:
    1. Masking Method
    2. `np.where()` Method

### <a id='toc1_2_1_'></a>[Masking Method](#toc0_)
- When we apply conditional operators to a NumPy array, NumPy does not return a modified array with the applied conditions. Instead, it checks each array element against the specified conditions and returns `True` if an element meets the conditions or `False` if it does not. As a result, we get a Boolean NumPy array (of the same shape as the original array) where each element indicates whether the condition was met. This returned array is called a mask.
- This mask can then be used within square brackets of our array in place of row and column indices to select only those elements from our array that correspond to `True` values in the mask array.

#### <a id='toc1_2_1_1_'></a>[Basic Filtering](#toc0_)

In [19]:
my_array = np.arange(10,31)
print(f"The original array : {my_array}")

The original array : [10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30]


- Lets create a mask that checks our array against the condition `element > 20` and returns True/False accordingly for each element.

In [20]:
mask = my_array > 20
print(f"The masked returned : {mask}")

The masked returned : [False False False False False False False False False False False  True
  True  True  True  True  True  True  True  True  True]


- Now we will pass this mask to our array in place of element indices to filter our array.

In [21]:
my_array[mask]

array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30])

- We can see, only element greater than 20 are turned. The array is filtered. 

Lets have another example:
- Given the following 2D array, filter out all elements that are less than 50.

In [22]:
our_array = np.array([[34, 55, 72],
                     [49, 50, 88],
                     [65, 22, 11]])

mask = our_array < 50

our_array[mask]

array([34, 49, 22, 11])

In [23]:
print(f"Of course we can sort them : {sorted(our_array[mask])}")

Of course we can sort them : [11, 22, 34, 49]


#### <a id='toc1_2_1_2_'></a>[Compound Conditions](#toc0_)
- It is possible to have multiple conditions.
- Each condition is separated using python conditional operators. 

Lets create a 1D array with values from 1 to 20. Filter out elements that are either less than 5 or greater than 15.

In [24]:
our_array = np.arange(1,21)
our_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [25]:
mask = (our_array < 5) | (our_array > 15)

our_array[mask]

array([ 1,  2,  3,  4, 16, 17, 18, 19, 20])

Lets filter out all elements that are greater than or equal to 30 and less than or equal to 60.

In [26]:
our_array = np.array([[25, 40, 35],
                      [50, 28, 65],
                      [20, 45, 58]])
our_array

array([[25, 40, 35],
       [50, 28, 65],
       [20, 45, 58]])

In [27]:
mask = (our_array >= 30) & (our_array <= 60)

our_array[mask]

array([40, 35, 50, 45, 58])

#### <a id='toc1_2_1_3_'></a>[Logical Operations](#toc0_)
- We can also make use of numpy's logical functions to apply multiple conditions for array filtration 

Lets create a 1D array of integers from 0 to 50. Filter out the elements that are divisible by both 3 and 5.

In [28]:
our_array = np.arange(0 , 51)
our_array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

In [29]:
mask = np.logical_and(our_array % 3 == 0, our_array % 5 == 0)  

our_array[mask]

array([ 0, 15, 30, 45])

Let us filter out all elements that are either even or greater than 70.

In [30]:
our_array = np.array([[20, 45, 64, 81],
                      [33, 72, 19, 88],
                      [90, 34, 55, 77]])
our_array

array([[20, 45, 64, 81],
       [33, 72, 19, 88],
       [90, 34, 55, 77]])

In [31]:
mask = np.logical_or(our_array % 2 == 0, our_array > 70)

our_array[mask]

array([20, 64, 81, 72, 88, 90, 34, 77])

### <a id='toc1_2_2_'></a>[The `np.where()` Method](#toc0_)
- Just like the masking method, we can also filter arrays using the `np.where()` method. 
- Unlike masking method, np.where() does not return a boolean array, rather it returns a tuple consisting of indices of all the elements that meets the provided conditions. 
- This is more powerful than masking method, because, in addition to finding desired values according to our conditions, it can also replace those values with new one. 
- __Syntax__:
  - `np.where(conditions_to_meet, do_if_conditions_met , do_if_conditions_did_not_meet)`
  - First argument: set of conditions to check against each element of the array just like the masking method. 
  - Second argument: array of values to replace all the values that meet our provided set of conditions. 
  - Third argument: array of values to replace all the values that do not meet our provided set of conditions. This could be the original array, in that case, it will mean that the values will be replaced with the original array values if the conditions did not meet, that will eventually mean to leave the values as it is if the conditions not met. 

              

#### <a id='toc1_2_2_1_'></a>[Basic `np.where()` Usage](#toc0_)

- Lets create an array

In [32]:
our_array = np.array([[34, 55, 72],
                     [49, 50, 88],
                     [65, 22, 11]])

print(our_array)

[[34 55 72]
 [49 50 88]
 [65 22 11]]


- Applying a condition using `np.where` and getting the indices of values. 

In [33]:
condition = our_array > 25

indices = np.where(condition)

print(f"returned indices of elements that meet our condition : \n{indices}))")

returned indices of elements that meet our condition : 
(array([0, 0, 0, 1, 1, 1, 2], dtype=int64), array([0, 1, 2, 0, 1, 2, 0], dtype=int64))))


- Getting the filtered out array using the indices

In [34]:
x = our_array[indices]
print(x)
print(f'sorted to check {sorted(our_array[indices])}')

[34 55 72 49 50 88 65]
sorted to check [34, 49, 50, 55, 65, 72, 88]


- Lets create a 1D array with values from 10 to 30 and use `np.where()` to replace elements that are greater than 20 with -1.

In [35]:
our_array = np.arange(10, 31)
print(f"original array \n{our_array}")

original array 
[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30]


In [36]:
condition = our_array > 20
replace_if_condition_met = -1
replace_if_condition_did_not_meet = our_array # replace with original array / leave the values as it is

filtered_array = np.where(condition, replace_if_condition_met, replace_if_condition_did_not_meet)
print(filtered_array)

[10 11 12 13 14 15 16 17 18 19 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]


- Lets take another example: Given a 2D array, use np.where() to replace all elements that are less than 50 with 0.

In [37]:
our_array = np.array([[34, 55, 72],
                      [49, 50, 88],
                      [65, 22, 11]])

print(our_array)

[[34 55 72]
 [49 50 88]
 [65 22 11]]


In [38]:
new_array = np.where(our_array < 50, 0, our_array)
print(new_array)

[[ 0 55 72]
 [ 0 50 88]
 [65  0  0]]


#### <a id='toc1_2_2_2_'></a>[Compound Conditions with `np.where()`](#toc0_)

- Create a 1D array of integers from 0 to 50. Use np.where() to replace elements that are divisible by both 3 and 5 with 1000

In [39]:
our_array = np.arange(0,51)
print(our_array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50]


In [40]:
new_array = np.where((our_array % 3 == 0) & (our_array % 5 == 0), 1000, our_array)
print(new_array)

[1000    1    2    3    4    5    6    7    8    9   10   11   12   13
   14 1000   16   17   18   19   20   21   22   23   24   25   26   27
   28   29 1000   31   32   33   34   35   36   37   38   39   40   41
   42   43   44 1000   46   47   48   49   50]


- Using a 2D array, use np.where() to replace all elements that are either even or greater than 70 with -1

In [41]:
our_array = np.array([[20, 45, 64, 81],
                      [33, 72, 19, 88],
                      [90, 34, 55, 77]])
print(our_array)

[[20 45 64 81]
 [33 72 19 88]
 [90 34 55 77]]


In [42]:
new_array = np.where((our_array % 2 == 0) | (our_array > 70), -1, our_array)
print(new_array)

[[-1 45 -1 -1]
 [33 -1 19 -1]
 [-1 -1 55 -1]]


#### <a id='toc1_2_2_3_'></a>[Logical Operations with `np.where()`](#toc0_)

- Create a 1D array with values from 1 to 20. Use np.where() to replace elements that are either less than 5 or greater than 15 with 0.

In [43]:
our_array = np.arange(1,21)
print(our_array)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]


In [44]:
new_array = np.where(np.logical_or(our_array < 5, our_array > 15), 0 , our_array)
print(new_array)

[ 0  0  0  0  5  6  7  8  9 10 11 12 13 14 15  0  0  0  0  0]


- Given a 2D array, use np.where() to replace all elements that are greater than or equal to 30 and less than or equal to 60 with 999.

In [45]:
our_array = np.array([[25, 40, 35],
                      [50, 28, 65],
                      [20, 45, 58]])
print(our_array)

[[25 40 35]
 [50 28 65]
 [20 45 58]]


In [46]:
new_array = np.where(np.logical_and(our_array >= 30, our_array <= 60), 999, our_array)
print(new_array)

[[ 25 999 999]
 [999  28  65]
 [ 20 999 999]]


### <a id='toc1_2_3_'></a>[Inverting Filtering Masks](#toc0_)
With mask inverting, we can create a mask to filter out elements based on a condition, then invert the mask to filter out elements that do not meet the condition. For example:
- Filter an array to exclude elements that are less than 20, then invert the mask to include only those elements.

In [47]:
our_array = np.arange(1,41,3)
print(our_array)

[ 1  4  7 10 13 16 19 22 25 28 31 34 37 40]


In [48]:
mask = our_array > 20 # excluded all element less than 20
print(our_array[mask])

[22 25 28 31 34 37 40]


In [49]:
# inverted the mask with logical operator `~`
print(our_array[~mask])

[ 1  4  7 10 13 16 19]


### <a id='toc1_2_4_'></a>[Filtering with Custom Functions](#toc0_)
We can also create a custom function that returns a Boolean value based on a condition, and use that function to filter an array.
- For example: Write a function that checks if a number is prime and use it to filter an array to include only prime numbers.

In [50]:
import math
# defining an array
our_array = np.arange(1, 100, 4)
print(f"original array\n{our_array}")

# defining a custom function to check if a number is divisible by 5
def divisible_by_five(number):
    if number % 5 == 0:
        return True
    else:
        return False

# testing our function
x = divisible_by_five(10)
print(type(x))

original array
[ 1  5  9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93
 97]
<class 'bool'>


In [51]:
'''
Filtering with custom function
1- with masking method
2- with np where
'''
mask = list(map(divisible_by_five, our_array))
new_array = our_array[mask]
print(new_array)

v_func = np.vectorize(divisible_by_five) # because np.where will need a function that can check all array elements simultaneously
indices = np.where(v_func(our_array))
new_array = our_array[indices]
print(new_array)


[ 5 25 45 65 85]
[ 5 25 45 65 85]


### <a id='toc1_2_5_'></a>[Filtering Based on Row or Column Properties](#toc0_)
We can filter rows or columns based on properties like sum, mean, or standard deviation.
- For example: Filter out rows where the sum of the elements is greater than a certain value.


In [53]:
our_array = np.array([[25, 40, 35],
                      [50, 28, 65],
                      [20, 45, 58]])

column_sums = np.sum(our_array, axis = 1)
print(f"sum of each row : {column_sums}")

sum of each row : [100 143 123]


In [59]:
mask = np.sum(our_array, axis = 1) > 100
print(mask, end="\n\n")

new_array = our_array[mask]
print(new_array)

[False  True  True]

[[50 28 65]
 [20 45 58]]


In [62]:
indices = np.where(np.sum(our_array, axis = 1) > 100)
print(f"indices : {indices[0]}", end= '\n\n')

new_array = our_array[indices]
print(new_array)

indices : [1 2]

[[50 28 65]
 [20 45 58]]
