<h1 align="center">NumPy Tutorial</h1>

<br>

In this tutorial, I'll cover some useful NumPy functions for machine learning.

<br>

In [1]:
import numpy as np

<h2 align="left">NumPy Arrays</h2>

In [3]:
my_matrix = [[1, 2, 3], [4, 5, 6], [7,8, 9]]

In [4]:
my_matrix

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [5]:
np.array(my_matrix)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

- When passing a nested list inside an **np.array()** function, it becomes a matrix.
- Here, we get a 3x3 matrix because we had a nested list with three lists, each containing three items.

<br>

In [6]:
np.arange(0, 11)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [7]:
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

- **np.arange(start, stop, step)**
- Notice that the stop is not included in the range. 
- The step parameter is "None" by default.

<br>

**np.zeros()** and **np.ones()** are really useful functions, since in data science it's often needed to create large arrays of zeros or ones. All the bullets explained below will apply to both np.zeros() and np.ones().

- When passing just a single argument, a one-dimensional vector is returned.
- When passing a tuple of two arguments, a two-dimensional matrix is returned.

In [11]:
print(np.zeros(5))
print(f"Shape of the vector: {np.zeros(5).shape}")

[0. 0. 0. 0. 0.]
Shape of the vector: (5,)


In [9]:
print(np.zeros((3, 5)))
print(f"Shape of the matrix: {np.zeros((3, 5)).shape}")

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
Shape of the matrix: (3, 5)


In [12]:
print(np.ones(5))
print(f"Shape of the vector: {np.ones(5).shape}")

[1. 1. 1. 1. 1.]
Shape of the vector: (5,)


In [13]:
print(np.ones((5, 3)))
print(f"Shape of the matrix: {np.ones((5, 3)).shape}")

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
Shape of the matrix: (5, 3)


<br>

- The **np.linspace()** function returns the desired amount of evenly spaced numbers.
- np.linspace() takes multiple parameters, such as start, stop, and num. 
- The 'num' parameter specifies the number of evenly spaced steps we want within the range.
- Notice that by default, num=50.

In [12]:
np.linspace(2, 10, 5)

array([ 2.,  4.,  6.,  8., 10.])

In [13]:
np.linspace(0, 10, 11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

- Notice that the end interval (stop parameter) is inclusive with the np.linspace() function.

In [17]:
# Length of the interval, when 'num' parameter is default, should be 50.
len(np.linspace(0, 10))

50

In [18]:
# num=20
len(np.linspace(0, 10, 20))

20

- As seen, the 'num' parameter also specifies the length of the interval.

<br>

The **np.eye()** function creates a identity matrix, which is very often needed in linear algebra.

In [14]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [15]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [16]:
np.eye(2)

array([[1., 0.],
       [0., 1.]])

<br>

<br>

<h2 align="left">The Random Module</h2>

In [17]:
np.random.rand(5)

array([0.01928686, 0.42582302, 0.82074837, 0.05219083, 0.47577634])

In [18]:
np.random.rand(3, 2)

array([[7.38363030e-04, 9.09800647e-01],
       [5.75220515e-01, 9.15803870e-01],
       [4.93402407e-01, 2.85978222e-01]])

- When a single argument is passed to the **rand()** function, such as 5, it returns 5 random floats between zero and one.
- When given two arguments, it returns an m x n matrix, where m is the number of rows and n is the number of columns.

<br>

In [19]:
np.random.randn(3)

array([-0.4305068 , -0.22225605, -0.24559904])

In [20]:
np.random.randn(3, 2)

array([[-0.52701177, -0.24754361],
       [-1.27459705, -0.42479626],
       [-0.08036665, -1.2959277 ]])

- The **randn()** function returns values from a standard normal distribution.
- When given one argument, such as 3, rand() returns a sample of size 3 from the standard normal distribution.
- When given to arguments, then m x n -matrix (a two-dimensional vector) gets returned, where m is the number of rows and n is the number of columns.
- Notice that since np.random.randn() returns samples from a standard normal distribution, floats closer to zero are more likely to appear, because the mean of the standard normal distribution is zero.


<br>

In [21]:
np.random.randint(0, 101, 5)

array([78, 57, 36, 26, 87])

In [22]:
np.random.randint(0, 101, (3, 3))

array([[72, 93, 96],
       [77, 18, 33],
       [16, 90, 13]])

The **randint()** function takes atleast three arguments:

- Start point
- End point (not inclusive)
- How many random integers to return from the specified interval. When the third argument is given as a tuple, the function returns an m x n matrix, where m represents the number of rows and n represents the number of columns.

<br>

- Another crucial function is the **seed()** function, which allows one to create a particular set of random numbers.
- This capability is essential for repeating random distributions, such as when testing various models or experimenting with different visualization methods.

In [20]:
# The argument inside the seed() function can be any integer
np.random.seed(42)
np.random.rand(5)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])

In [24]:
# When the seed is not set, a different set of numbers is returned.
np.random.rand(5)

array([0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258])

In [22]:
# However, by setting the seed to the same value (42) as before, 
# the same "random" numbers are returned.
np.random.seed(42)
np.random.rand(5)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])

<br>

<br>

<h3 align="left">Reshaping Arrays</h3>

In [23]:
arr = np.arange(0, 25)

In [24]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [25]:
arr.reshape(5, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

- Keep in mind that when specifying the desired dimensions (rows and columns) for reshaping your array, their product must equal the size of the original array. Otherwise, you'll receive a 'ValueError' message indicating, 'cannot reshape array of size x into shape (y, z).'
- You can check the size of your array with the size attribute.

In [26]:
arr.size

25

In [32]:
arr.reshape(-1, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

- When you use **-1** as a parameter in the reshape method of a NumPy array or a similar function for reshaping arrays, it is used as a placeholder for an unknown dimension.
- This code is reshaping the array arr into a new shape where the number of columns is 5, and the number of rows is automatically determined based on the original size of the array and the requirement that the total number of elements remains the same. In other words, NumPy will calculate the number of rows for you such that the total number of elements is unchanged.

<br>

<br>

**argmin() & argmax() methods**

In [29]:
ran = np.random.randint(0, 101, 10)

In [30]:
ran

array([21, 52,  1, 87, 29, 37,  1, 63, 59, 20])

In [31]:
ran.max()

87

In [32]:
ran.min()

1

In [33]:
ran.argmax()

3

In [34]:
ran.argmin()

2

- So the max() and min() functions obviously return the max and in values from the array, but argmax() and argmin() functions return the index location of those max and min numbers. 
- This is of course intuitive, since that's how argmax and argmin functions work in calculus.

<br>

<br>

<h2 align="left">NumPy Indexing and Selection</h2>

**Indexing and selection on one-dimensional vectors**

In [34]:
z = np.arange(0, 11)
z

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [35]:
z[0:5]

array([0, 1, 2, 3, 4])

In [38]:
z[:5]

array([0, 1, 2, 3, 4])

In [39]:
z[5:]

array([ 5,  6,  7,  8,  9, 10])

- Slices can retrieve multiple elements from an array.
- z[:5] includes all elements from index 0 to 4.
- On the other hand, z[5:] includes all elements from index 5 to the end of the array.

Numpy arrays also have a feature called broadcasting, which essentially means that we can slice an array and assign new values to the sliced elements. **Broadcasting is not possible for lists in Python.**

In [36]:
z

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [44]:
# Insert the number 100 into the first 5 elements of the array..
z[:5] = 100

In [45]:
z

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [46]:
z2 = np.arange(0, 11)

In [48]:
z2[:5] = [100, 99, 98, 97, 96]

In [49]:
z2

array([100,  99,  98,  97,  96,   5,   6,   7,   8,   9,  10])

In [56]:
z3 = np.arange(0, 11)

In [57]:
z3[:] = 99

In [58]:
z3

array([99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99])

z3[:] selects every element from the array. This way we can assign all the elements of the array to a new value.

<br>

**Indexing on matrices (two-dimensional vectors)**

In [37]:
arr_2d = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])

In [38]:
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [61]:
arr_2d.shape

(3, 3)

- The shape attribute returns the dimensions of the matrix. 
- The first element of the returned shape is the number of rows, and second element is the number of columns.

In [62]:
arr_2d[0]

array([ 5, 10, 15])

- Notice that now when calling index 0 on the matrix, the whole row gets returned from index position zero.
- If we want a single element from the matrix, we'll have to specify the row and column positions. This can be done in two ways, as shown below.

In [63]:
# Return the element from row 2, column 2.
arr_2d[1][1]

25

In [64]:
# Return the element from row 2, column 2.
arr_2d[1, 1]

25

We can also select by slicing from a matrix.

In [65]:
arr_2d[0:2, 1:3]

array([[10, 15],
       [25, 30]])

- This is done rather intuitively just by slicing both the desired rows and columns.
- Here we are telling Python that we want elements from rows 1 and 2 and columns 2 and 3.
- Notice that when slicing, the end point is not included in the selection.

In [40]:
# Select the first column (as a row vector)
# This is done by selecting all the rows, and specifying which column you wish to select.
arr_2d[:, 0]

array([ 5, 20, 35])

- Notice that the column gets returned as a row vector.
- If you wish to return the column as a column vector, you can apply the reshape method to the slicing call.

In [41]:
# Select the first column (as a column vector)
arr_2d[:, 0].reshape(-1, 1)

array([[ 5],
       [20],
       [35]])

<br>

**Conditional Selection**

In [42]:
ara_ara = np.arange(1, 11)
ara_ara

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [67]:
# Returns a boolean array where True indicates the the element is greater than 4.
ara_ara > 4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [68]:
bool_arr = ara_ara > 4

In [69]:
bool_arr

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

Now when we have this boolean array, we can use it to select from the original array (ara_ara) based on this condition (values greater than 4). This is called *boolean indexing*.

In [70]:
ara_ara[bool_arr]

array([ 5,  6,  7,  8,  9, 10])

And as expected, we get all the values from ara_ara array that are greater than 4.
- This can also be done in a single line of code, as shown below.

In [72]:
ara_ara[ara_ara > 4]

array([ 5,  6,  7,  8,  9, 10])

<br>

<br>

<h2 align="left">NumPy Operations</h2>

In [44]:
arr1 = np.arange(0, 10)
arr1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [77]:
arr1 + 5

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

- We can add numbers to arrays. Notice that the number 5 gets added to every element of the array.

We can also perform various arithmetic operations between arrays, such as addition, subtraction, multiplication, and division.

In [78]:
arr1 * arr1

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [79]:
arr1 - arr1

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [80]:
arr2 = np.arange(0, 5)

In [82]:
# ValueError is raised for trying to multiply arr1 and arr2
arr1 * arr2

ValueError: operands could not be broadcast together with shapes (10,) (5,) 

- Notice that the sizes of the arrays must match when doing operations, or ValueError gets raised.
- This is no different from when working with matrices (arrays) in linear algebra.

In [84]:
arr1.max()

9

In [85]:
arr1.var()

8.25

In [86]:
arr1.std()

2.8722813232690143

- When working with one-dimensional arrays, you can calculate basic summary statistics using an appropriate method.

- However, when working with a two-dimensional arrays, we need to take the axis into account.
- axis=0 corresponds to rows, and axis=1 corresponds to columns. By default axis=None.
- The reasoning behind this is that, in a two-dimensional matrix, the shape is represented as (rows, columns), with index 0 representing rows and index 1 representing columns. Hence, axis=0 corresponds to rows, and axis=1 corresponds to columns.

In [89]:
arr_two_dim = np.arange(0, 25).reshape(5, 5)

In [90]:
arr_two_dim

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [92]:
arr_two_dim.shape

(5, 5)

In [94]:
arr_two_dim.sum(axis=None)

300

In [95]:
arr_two_dim.sum(axis=0)

array([50, 55, 60, 65, 70])

In [96]:
arr_two_dim.sum(axis=1)

array([ 10,  35,  60,  85, 110])

With matrices, there are three ways to calculate sums and various other summary statistics:
- axis=None will return the sum of every element in the matrix.
- axis=0 will return row-wise sums (notice the plural).
- axis=1 will return column-wise sums (notice the plural).