<a href="https://colab.research.google.com/github/MarcoAbrantes/Numpy/blob/main/5_numpy_array_operations_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 5 Numpy functions that you've might not know.


### Functions that you might or might not will use on the normal usage of Numpy

> Numpy is a Python library that was created by Travis Oliphant in 2005. It is an open source project that anyone can use for free. Numpy stands for Numerical Python and it was created to handle arrays and multidimentional matrices with a shorter usage of computational memory than the original program language, Python. It has a large number of functions to work within those structures (arrays and matrices). On a regular basis, as a Data Cientist, you will not work directly with Numpy, but use other tools that where built based on the Numpy library, like Pandas.

- Function 1 - [numpy.identity](#function1)
- Function 2 - [numpy.rot90](#function2)
- Function 3 - [numpy.sort](#function3)
- Function 4 - [numpy.extract](#function4)
- Function 5 - [numpy.count_nonzero](#function5)

**Note:** Click on the links of each function to direct you to the corresponding section.

The recommended way to run this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks.

Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [None]:
import numpy as np

In [None]:
# List of functions explained 
function1 = np.identity
function2 = np.rot90
function3 = np.sort
function4 = np.extract
function5 = np.count_nonzero

<a id='function1'></a>
## Function 1 - np.identity

> **np.identity(n, dtype=None)**
>
> It creates a square array with ones on the main diagonal. It was created based on the `np.eye` function to simplify the process of creating what is known as identity matrix.


##### For more detail please consult the official documentation [here](https://numpy.org/devdocs/reference/generated/numpy.identity.html#)

In [None]:
# Example 1 - working 

identity_matrix1 = np.identity(3)
print("This a 3x3 identity matrix with float numbers:\n")
print(identity_matrix1, "\n")

identity_matrix2 = np.identity(4, int)
print("This is a 4x4 identity matrix with integers:\n")
print(identity_matrix2)

This a 3x3 identity matrix with float numbers:

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]] 

This is a 4x4 identity matrix with integers:

[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]


Above examples demonstrate how to create identity matrices with float numbers and integers. Ignoring the second parameter of this function will create a float point number identity matrix.

In [None]:
# Example 2 - working

identity_matrix3 = np.identity(5, str)
print("This is a 5x5 identity matrix made with strings:\n")
print(identity_matrix3, "\n")

# getting really grafic here
identity_matrix4 = np.identity(10, bool)
print("This is a 10x10 identity matrix made with booleans:\n")
print(identity_matrix4)

This is a 5x5 identity matrix with empty strings:

[['1' '' '' '' '']
 ['' '1' '' '' '']
 ['' '' '1' '' '']
 ['' '' '' '1' '']
 ['' '' '' '' '1']] 

This is a 10x10 identity matrix made with booleans:

[[ True False False False False False False False False False]
 [False  True False False False False False False False False]
 [False False  True False False False False False False False]
 [False False False  True False False False False False False]
 [False False False False  True False False False False False]
 [False False False False False  True False False False False]
 [False False False False False False  True False False False]
 [False False False False False False False  True False False]
 [False False False False False False False False  True False]
 [False False False False False False False False False  True]]


These two examples demonstrate that, with this function, you can create diferent identity matrices with the diferent primitive data types `str` and `bool`.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)

np.identity(3, 3)

TypeError: Cannot interpret '3' as a data type

The simple use of this function almost doesn't have error margin, but when you insert a number or anything different than a primitive data type as `None == float`, `int`, `str`, `bool` on the second parameter, it will rase an error.

A similar result would occur when we don't insert any parameter.

**Conclusion:**
Whenever it is needed the creation of an identity matrix, this is the simplest way to do it. Like said on the introduction this is one of the several functions that you might or might not use.

<a id='function2'></a>
## Function 2 - np.rot90

> **np.rot90(m, k=1, axes=(0, 1))**
>
> Rotate any array of two or more dimentions by 90 degrees in the plane specified by axes.
>
> Rotation direction is from the first towards the second axis.

##### For more detail please consult the official documentation [here](https://numpy.org/devdocs/reference/generated/numpy.rot90.html#)

In [None]:
# Example 1 - working
arr = np.arange(9).reshape(3,3) # First create an array or use one already created.
print("Original array:\n")
print(arr, "\n")

rot_arr = np.rot90(arr) # Simplest way to use the function,'m' parameter that should be a 2D array as minimum.
print("Rotated array:\n")
print(rot_arr,"\n")

new_arr = np.rot90(rot_arr,-1) # Let's invert the rotation of 'rot_arr' to see what happens.
print("New array (like Original array):\n")
print(new_arr)

Original array:

[[0 1 2]
 [3 4 5]
 [6 7 8]] 

Rotated array:

[[2 5 8]
 [1 4 7]
 [0 3 6]] 

New array (like Original array):

[[0 1 2]
 [3 4 5]
 [6 7 8]]


As demonstrated, this is one of the simplest ways of using this function. It will rotate any array (2D as minimum) by 90 degrees inverted clockwise. This is because the `k` parameter is `=1` by default, if we use `k=-1` it will rotate 90 degrees clockwise.

In [None]:
# Example 2 - working
arr1 = np.identity(4, int) # Let's play with the axes in an identity matrix
print("Original matrix:\n")
print(arr1,"\n")

rot_arr1 = np.rot90(arr1, 1, (0,1)) # np.rot90(arr1) would result on the same output.
print("Rotated matrix:\n")
print(rot_arr1, "\n")

new_arr1 = np.rot90(rot_arr1, 1, (1,0)) # Inverting the axes (0,1)(1,0) have the same result as 'k=-1'
print("New matrix (back to the original form)\n")
print(new_arr1)

Original matrix:

[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]] 

Rotated matrix:

[[0 0 0 1]
 [0 0 1 0]
 [0 1 0 0]
 [1 0 0 0]] 

New matrix (back to the original form)

[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]


Here the objective was to explain in a simple and more graffic way how to work with the axes parameter. So the bottom line is that when we use the second parameter `k=1` it equals `axes=(0,1)`. The inversion of this operation is `k=-1` == `axes=(1,0)`. There are more advanced uses of these two parameters with 3D, 4D, etc. arrays but for a better and easy understanding I used only 2D arrays.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)

example = np.rot90(np.arange(8).reshape(2,2,2), k=1, axes=(2,3))
print(example)

ValueError: Axes=(2, 3) out of range for array of ndim=3.

The variable `example` uses the `np.rot90` function to rotate a 3D matrix with the shape 2x2x2, the error is caused by the misusing of the axes parameter, one of them exceeds the number of axes that the matrix have. The problem is easily solved by using one of the following examples `axes=(0,1)`, `axes=(1,2)`, `axes=(0,1)`, `axes=(2,1)`.

**Conclusion:** To add something more to what has being said or explained. The 
common use of this function is on images when they need to be rotated.

<a id='function3'></a>
## Function 3 - np.sort

> **np.sort(a, axis=- 1, kind=None, order=None)**
>
> Returns a sorted copy of an array.


##### For more detail please consult the official documentation [here](https://numpy.org/devdocs/reference/generated/numpy.sort.html)

In [None]:
# Example 1 - working
a = np.random.randint(100, size=(5,5)) # Generates a 5x5 matrix with random integers.
print("Preview of array 'a':")
print(a,'\n')
print("Array 'a' sorted by row:")
print(np.sort(a),'\n') # Simplest use of this function

b = np.random.randint(100, size=(5,5))
print("Preview of array 'b':")
print(b,'\n')
print("Array 'b' sorted by column:")
print(np.sort(b, axis=0),'\n') # The axis parameter defines the orientation.

# To sort the entire matrix we have to use a diferent appoach.
c = np.random.randint(100, size=(5,5))
print("Preview of array 'c':")
print(c,'\n')
c_sorted = np.sort(c, axis=None) # This will sort the entire matrix flatening it.
print("Preview of sorted 'c':")
print(c_sorted,'\n')
# To return to the initial form we have to use the reshape funtion.
print("Array 'c' totally sorted:")
print(c_sorted.reshape(5,5))

Preview of array 'a':
[[39 22 22 16 31]
 [13 49 79 20 76]
 [63 89 65  6 10]
 [22 87 40 21 32]
 [32 16 38 57 43]] 

Array 'a' sorted by row:
[[16 22 22 31 39]
 [13 20 49 76 79]
 [ 6 10 63 65 89]
 [21 22 32 40 87]
 [16 32 38 43 57]] 

Preview of array 'b':
[[65 31 30 56 93]
 [36 27 41 73  3]
 [95 51 42 24 67]
 [63 29 17 47 98]
 [72 76  8 12 43]] 

Array 'b' sorted by column:
[[36 27  8 12  3]
 [63 29 17 24 43]
 [65 31 30 47 67]
 [72 51 41 56 93]
 [95 76 42 73 98]] 

Preview of array 'c':
[[ 5 50 52 85 41]
 [12 51 13 21 73]
 [20  4 66 89 54]
 [ 9 70 65 27 99]
 [36 44 23 24 78]] 

Preview of sorted 'c':
[ 4  5  9 12 13 20 21 23 24 27 36 41 44 50 51 52 54 65 66 70 73 78 85 89
 99] 

Array 'c' totally sorted:
[[ 4  5  9 12 13]
 [20 21 23 24 27]
 [36 41 44 50 51]
 [52 54 65 66 70]
 [73 78 85 89 99]]


With these examples I intended to demonstrate a simple but usefull way of usind `np.sort` function. Sorting by row, by column and the entire array. The coments on the code explain the steps taken to solve the different sort needs.

In [None]:
# Example 2 - working
# A more complex use of this function with different data types:
# Creating an array with info from student data:
students = [('John', 80.7, 35, True), ('Sarah', 91.6, 28, True), ('Abel', 91.8, 34, False), ('Martha', 81.65, 30, True)]
dtype = [('name', 'S10'), ('score', float), ('age', int), ('scholarship', bool)]
student_data = np.array(students, dtype=dtype) # Creates a structured array
print("Preview of Student Data array:")
print(student_data,'\n')

sorted_by_score = np.sort(student_data, order='score')
print("Student Data sorted by 'Score':")
print(sorted_by_score,'\n')

# If you want it inverted you can use that simple trick '[::-1]'
print("Student Data sorted by 'Score' inverted:")
print(sorted_by_score[::-1],'\n')

name_age = np.sort(student_data, order=['name', 'age'])
print("Student Data sorted by 'Name' and 'Age':")
print(name_age)

Preview of Student Data array:
[(b'John', 80.7 , 35,  True) (b'Sarah', 91.6 , 28,  True)
 (b'Abel', 91.8 , 34, False) (b'Martha', 81.65, 30,  True)] 

Student Data sorted by 'Score':
[(b'John', 80.7 , 35,  True) (b'Martha', 81.65, 30,  True)
 (b'Sarah', 91.6 , 28,  True) (b'Abel', 91.8 , 34, False)] 

Student Data sorted by 'Score' inverted:
[(b'Abel', 91.8 , 34, False) (b'Sarah', 91.6 , 28,  True)
 (b'Martha', 81.65, 30,  True) (b'John', 80.7 , 35,  True)] 

Student Data sorted by 'Name' and 'Age':
[(b'Abel', 91.8 , 34, False) (b'John', 80.7 , 35,  True)
 (b'Martha', 81.65, 30,  True) (b'Sarah', 91.6 , 28,  True)]


On this example the `np.sort` function is used on a complex form to deal with different data types. As demonstrated on the example is possible to sort an array by each one or several different data types.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)
# Grabing the array student_data from the example above:
np.sort(student_data, axis=1, order='scholarship')

AxisError: axis 1 is out of bounds for array of dimension 1

It breaks when we use the `axis` parameter out of the array dimension. This instruction would work on a 3D array but not in this is one that is 2D. To avoid this error we use `axis=0`
or just leave it as default with the value `axis=-1`.

**Conclusion:** We use this function when we need to sort any type of array or arrays of diferent data types. The sorting is made in the ascending direction i.e from the minimum value to the maximum, but we can invert it as demonstrated on the second example.

<a id='function4'></a>
## Function 4 - np.extract

> **np.extract(condition, arr)**
>
> Returns the selected elements of an array that satisfy some condition.

##### For more detail please consult the official documentation [here](https://numpy.org/devdocs/reference/generated/numpy.extract.html)

In [None]:
# Example 1 - working
arr = np.arange(25).reshape((5, 5))
print("Array Preview:")
print(arr,'\n')

condition = np.mod(arr, 2)==0 # selecting the divisible by 2 (even numbers).
print("Preview of 'condition':")
print(condition,'\n')

result = np.extract(condition,arr)
print("Extracting all even numbers of 'arr':")
print(result)

Array Preview:
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]] 

Preview of 'condition':
[[ True False  True False  True]
 [False  True False  True False]
 [ True False  True False  True]
 [False  True False  True False]
 [ True False  True False  True]] 

Extracting all even numbers of 'arr':
[ 0  2  4  6  8 10 12 14 16 18 20 22 24]


On the example we created a condition array with boolean values that matches the original array, this is the condition that will extract all the `True` items of the original array and will create a new array. This function `np.extract` doesn't modify the orginal array.

In [None]:
# Example 2 - working
arr = np.random.randint(100, size=(5,5))
print("Preview of 'arr':")
print(arr,'\n')

# Let's pretend that we need to extract all the values on the diagonal of an array.
condition = np.identity(5, bool) # we already know that this will produce the condition we need to our task.
print("Preview of 'condition':")
print(condition,'\n')

result = np.extract(condition, arr)
print("Extracting the values of the diagonal of 'arr':")
print(result)

Preview of 'arr':
[[98 11 59  9 88]
 [ 3 26 29 58 34]
 [76 65 41 84 50]
 [ 1 75 85 87 74]
 [ 7 53 66 51 99]] 

Preview of 'condition':
[[ True False False False False]
 [False  True False False False]
 [False False  True False False]
 [False False False  True False]
 [False False False False  True]] 

Extracting the values of the diagonal of 'arr':
[98 26 41 87 99]


Simulating a real life example, were we needed to extract the values from the diagonal of a matrix. To create the `condition` we used a function explained in other example of this notebook `np.identity` that satisfied our needs.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)
arr = np.arange(9).reshape(3,3)
print(arr,'\n')

condition = np.array([[True,False,False,True],
                     [False,True,True,False],
                     [True,False,True,False]])
print(condition,'\n')

result = np.extract(condition,arr)
print(result)

[[0 1 2]
 [3 4 5]
 [6 7 8]] 

[[ True False False  True]
 [False  True  True False]
 [ True False  True False]] 



IndexError: index 10 is out of bounds for axis 0 with size 9

As demonstrated, the function breaks when the shape of the matrices of the parameters `condition` and `arr` does not match.

**Conclusion:** The simple use of this function alouds the extraction of the selected elements of any array in a graffic way. By selecting the elements to extract on the `condition` parameter we can visualize were the values will be extracted. As said before this function does not alter the original array. The use of this function equals `arr[condition]` if `condition` is boolean.

<a id='function5'></a>
## Function 5 - np.count_nonzero

> **np.count_nonzero(a, axis=None, *, keepdims=False)**
>
> Returns the number `int` of non-zero values in the array given in `a`.

##### For more detail please consult the official documentation [here](https://numpy.org/devdocs/reference/generated/numpy.count_nonzero.html)

In [None]:
# Example 1 - working
a = np.random.randint(3, size=10)
print("Preview of 'a':")
print(a,'\n')

nonzeros = np.count_nonzero(a)
print(f"There are {nonzeros} non-zero values in 'a'")

Preview of 'a':
[0 0 0 1 1 1 1 0 1 1] 

There are 6 non-zero values in 'a'


In this simple example, we create a random generated array of 10 numbers, then count the non-zero values using `np.count_nonzero`. In case of a matrix with 2 dimentions or more this simple use of `np.count_nonzero` function will flatten the array in `a` parameter and count all the non-zero values in it.

In [None]:
# Example 2 - working
arr = np.random.randint(5, size=(5,5))
print("Preview of 'arr':")
print(arr, '\n')

# Counting the non-zero values per column on a matrix.
column_count = np.count_nonzero(arr, axis=0)
print(column_count,'- Array with the counts per column.\n')
print(f"There are {column_count[0]} non-zero values on the fist column of 'arr';")
print(f"There are {column_count[1]} non-zero values on the second column of 'arr';")
print(f"There are {column_count[2]} non-zero values on the third column of 'arr';")
print(f"There are {column_count[3]} non-zero values on the fourth column of 'arr';")
print(f"There are {column_count[4]} non-zero values on the fifth column of 'arr'.\n")

# Counting the non-zero values per row on a matrix.
row_count = np.count_nonzero(arr, axis=1, keepdims=True)
print(row_count,'- Array with the counts per row.\n')
print(f"There are {row_count[0][0]} non-zero values on the fist row of 'arr';")
print(f"There are {row_count[1][0]} non-zero values on the second row of 'arr';")
print(f"There are {row_count[2][0]} non-zero values on the third row of 'arr';")
print(f"There are {row_count[3][0]} non-zero values on the fourth row of 'arr';")
print(f"There are {row_count[4][0]} non-zero values on the fifth row of 'arr'.")

Preview of 'arr':
[[1 0 4 4 1]
 [3 4 1 4 2]
 [2 3 2 1 2]
 [2 2 0 3 2]
 [4 4 2 1 2]] 

[5 4 4 5 5] - Array with the counts per column.

There are 5 non-zero values on the fist column of 'arr';
There are 4 non-zero values on the second column of 'arr';
There are 4 non-zero values on the third column of 'arr';
There are 5 non-zero values on the fourth column of 'arr';
There are 5 non-zero values on the fifth column of 'arr'.

[[4]
 [5]
 [5]
 [4]
 [5]] - Array with the counts per row.

There are 4 non-zero values on the fist row of 'arr';
There are 5 non-zero values on the second row of 'arr';
There are 5 non-zero values on the third row of 'arr';
There are 4 non-zero values on the fourth row of 'arr';
There are 5 non-zero values on the fifth row of 'arr'.


Here is demonstrated a way of using all the parameters in `np.count_nonzero`. When `axis=0` the function will count all non-zero values per column and returns an array with the corresponding values. When `axis=1` the function will count all non-zero values per row and returns an array with the corresponding values, with the paremeter `keepdims=True` we can keep the returned array vertically to facilitate the reading of the values.

In [None]:
# Example 3 - breaking (to illustrate when it breaks)
np.count_nonzero(a, axis=1)

AxisError: axis 1 is out of bounds for array of dimension 1

For this example is used the first array `a` with one dimention. On the second parameter was used `axis=1` that outbounds the dimention of the array, the same error would occur if we use `axis=2` in a 2D array.
This problem is easily solved by using the correct value for `axis`, in this case is (0) or leave it blank that would have the same result.

**Conclusion:** We should use this function when a count of non-zero values in an array is needed. Alternatively we could use it to determine how many zeros are in an array, making a simple calculus with the totals `total elements of the array - np.count_nonzero(array) = number of zeros on the array`.

## Conclusion

The creation of this notebook was with the idea to facilitate the understanding on the use of 5 numpy functions that might be less known or used. 

## Reference Links
Here are some links to help find some more info on Numpy:
* Numpy official tutorial : https://numpy.org/doc/stable/user/quickstart.html
* Numpy w3schools.com Tutorial : https://www.w3schools.com/python/numpy/default.asp