Copyright (c) 2025 aamirmd. All Rights Reserved.

This work is licensed under the MIT License. See LICENSE file for details.

# Numpy tutorial

Welcome!

Pre-requisites:
- Basic python (variables, types, lists, indexing, dicts, etc.) 
- Linear Algebra (only for the linear algebra section, otherwise optional)

## Installing  and importing Numpy

Please view official documentation here: https://numpy.org/

In [None]:
# Run this cell to install numpy. Can alternately be run on a command line
!pip install numpy

This notebook can be run on google colab without needing to install python on your computer.

Please follow instructions here: [Google Colab Instructions](https://medium.com/@jessica0greene/running-your-notebooks-in-the-cloud-with-google-colab-4387529bfad4)

In [None]:
import numpy as np

## Creating a numpy array

Numpy arrays can be created in a number of ways. One of the ways is by using a python list such as [1,2,3].

In [None]:
list_of_numbers = [1,2,3]
numpy_array_of_list = np.array(list_of_numbers)

print("The numpy array is ", numpy_array_of_list)
print("The type of a numpy array is ", type(numpy_array_of_list))

Here are some other ways to create numpy arrays

In [None]:
# To create an array of zeros
zeros_array = np.zeros(3)
print(f"Array of zeros: {zeros_array}")

# To create an array of ones
ones_array = np.ones(5)
print(f"Array of ones: {ones_array}")

# To create an array from 15 to 25 with step size 2. Works similar to python list slicing.
range_array = np.arange(15, 25, 2)
print(f"Array from range [15,25) with step size 2: {range_array}")

# Note: the 'start' and 'step' parameters are optional. 
range_array_only_stop = np.arange(6)
print(f"Array from [0,6): {range_array_only_stop}")

# To create an array with 6 elements in the range [13,15] which are evenly spaced
linspace_array = np.linspace(13, 15, num=6)
print(f"Array of 6 elements in range [13,15]: {linspace_array}")

# We can also make the range exclude the endpoint. This is an array with 6 elements in the range [13,15)
linspace_array_exclude_endpoint = np.linspace(13, 15, num=6, endpoint=False)
print(f"Array of 6 elements in range [13,15): {linspace_array_exclude_endpoint}")

We can also create arrays of different types.
- To view a list of all dtypes: https://numpy.org/doc/2.1/reference/arrays.dtypes.html
- Usually, np.int32 and np.float64 are used. Using regular python data types (int, float, etc.) is also fine.
- We can also change from one type to another.

In [None]:
float_array = np.ones(6, dtype=np.float64)
print(f"Float array: {float_array}")
print(f"Float array casted to int: {float_array.astype(int)}")

## Indexing and Slicing

This works similar to regular Python lists.

In [None]:
a = np.array([4,7,2,3,1,6,4,9,0,2,3])
print(f"The element at index 3 is {a[3]}")
print(f"The array from index 2 to 5 (exluding 5) is {a[2:5]}")
print(f"The array from index 3 to 7 with step size 2 is {a[3:7:2]}")

What if you want elements of indices 2,6,4, and 7?

Custom indexing is supported in numpy.

In [None]:
a = np.array([4,7,2,3,1,6,4,9,0,2,3])
print(f"Elements at indices 2,6,4, and 7: {a[[2,6,4,7]]}")

Arrays can be modified. See below for examples.

In [None]:
a = np.array([4,7,2,3,1,6,4,9,0,2,3])
print(f"Array: {a}")

# To change element at index 3 to 57
a[3] = 57
print(f"After changing index 3 to 57: {a}")

# Changing indices 4,6,7 to 123
a[[4,6,7]] = 123
print(f"After changing indices 4,6,7 to 123: {a}")

#### Masks

- Masks are useful if you want all elements/numbers in an array that satisfy a particular condition.
- Masks are a boolean numpy array where the value at an index 'True' if the boolean condition is satisfied and 'False' otherwise.
- Common boolean operators should be used as follows (this is different from regular python syntax):
    * & (and)
    * | (or)
    * ~ (not)
- Parentheses are required when using boolean operators such as '&' and '|'. If not you will likely get an error as follows:
    * `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

For example, what if we want the indices of all elements that are greater than 3. The boolean condition here is if an element is greater than 3.

In [None]:
# To get indices of all elements greater than 3
a = np.array([4,7,2,3,1,6,4,9,0,2,3])
mask = a > 3
print(f"Mask for elements >3 in array: {mask}")
print(f"Elements greater than 3: {a[mask]}")

Masking can sometimes be done in one line of code, without creating a variable called 'mask'.

In [None]:
a = np.array([4,7,2,3,1,6,4,9,0,2,3])
print(f"Elements lesser than or equal to 6 in the array: {a[a <= 6]}")

Here are more ways boolean conditions can be formulated.

In [None]:
a = np.array([4,7,2,3,1,6,4,9,0,2,3])
print(f"Array: {a}")

# To get all elements not equal to 4. There are two ways to do this.
print(f"All elements not equal to 4: {a[a != 4]}")
print(f"All elements not equal to 4: {a[~(a == 4)]}")

# To get all elements between 3 and 7 exluding 7, i.e. in the interval [3,7) i.e. >=3 and <7
print(f"All elements in range [3,7): {a[(a >= 3) & (a < 7)]}")

# To get all elements not greater than 3 (i.e. <=3) or greater than 7
print(f"All elements that are in range (-inf,3] u (7,inf): {a[~(a > 3) | (a > 7)]}")

## Shape

- The shape of a numpy array is a way to represent the number of dimensions and the number of items in each dimension.
- So far, we have looked at lists, which are one dimensional arrays. An array (or list rather) of size 8 would have shape (8,).
- Numpy arrays don't have to be a 1-dimensional list.
- They can be 2,3, or more dimensions. A 2-d numpy array would be a matrix. A 3-d array would be like stacking multiple matrices together, like a cube.
- A shape consists of one or more dimensions/axes. For example, if an array is of shape (10,4,3):
    * The array has 3 dimensions.
    * Number of elements across axis 0 is 10
    * Number of elements across axis 1 is 4
    * Number of elements across axis 2 is 3

In [None]:
# To create a 2-d array from a list of lists
a = np.array([
    [1,2,3,1],
    [4,5,6,1],
    [7,8,9,1]
])
print(f"2-d array from list: \n{a}")
print(f"Shape of 2-d array: {a.shape}")
print()

# To create a 3-d array with ones that is like stacking three 5x7 matrices together
a = np.ones(shape=(3,5,7))
print(f"3-d array of ones: \n{a}")
print(f"Shape of array: {a.shape}")

#### Reshape

- A common operation in numpy is 'reshape'. This involves changing the shape of the numpy array. This is useful if you want to change how a numpy array is structured for specific tasks.
- An array of shape (4,2,2) can be thought of as an array of 4 elements where each element is a (2,2) two dimensional array (or matrix).
- An array of shape (10,4,2,2) can be thought of as an array of 10 elements where each element is a (4,2,2) array.

In [None]:
# Let 'a' be a 1-d array of size 16
a = np.arange(16)
print(f"Array: {a}")
print(f"Shape of array: {a.shape}")
print()

# Let's reshape this array to a 4x4 matrix (2-d array)
a_reshaped_matrix = a.reshape((4,4))
print(f"Array reshaped as a matrix: \n{a_reshaped_matrix}")
print(f"Shape of array: {a_reshaped_matrix.shape}")
print()

# Reshape into 2x8 matrix
a_reshaped_matrix_2 = a.reshape((2,8))
print(f"Array reshaped as a matrix: \n{a_reshaped_matrix_2}")
print(f"Shape of array: {a_reshaped_matrix_2.shape}")

In [None]:
a = np.arange(16)

# Reshape into a 3-d array of shape 4x2x2.
# Please take a moment to look at how the numbers are structured
a_reshaped_3d = a.reshape((4,2,2))
print(f"Array reshaped as 3-d: \n{a_reshaped_3d}")
print(f"Shape of array: {a_reshaped_3d.shape}")
print()

# Reshape into a 3-d array of shape 2x4x2.
# Please take a moment to look at how the numbers are structured
a_reshaped_3d_2 = a.reshape((2,4,2))
print(f"Array reshaped as 3-d: \n{a_reshaped_3d_2}")
print(f"Shape of array: {a_reshaped_3d_2.shape}")

In [None]:
a = np.arange(16)

# Array reshaped as 4-d
a_reshaped_4d = a.reshape((2,2,2,2))
print(f"Array reshaped as 4-d: \n{a_reshaped_4d}")
print(f"Shape of array: {a_reshaped_4d.shape}")

- Sometimes, we may not know the exact shape of the array after reshaping (or we may not feel like calculating it.)
- In this case, we can set one of the dimensions in the argument passed into 'reshape' as -1 and numpy would automatically calculate the shape. Please see example below.
- We cannot set more than one argument as -1 since numpy would not be able to calculate the shape.

In [None]:
# Create a very large array with random numbers. The arguments are the shape
a = np.random.rand(10,6,8,8)
print(f"Shape of array: {a.shape}")

# Let's reshape it into a 2-d array (or matrix) with 3 rows. We don't know the number of columns
a_reshaped = a.reshape(3, -1)
print(f"Shape of reshaped array: {a_reshaped.shape}")

# Reshaping to 3d array
a_reshaped_3d = a.reshape(5,-1,2)
print(f"Shape of reshaped 3d array: {a_reshaped_3d.shape}")

In [None]:
# More than one argument as -1
## THIS WILL THROW AN ERROR
a = np.random.rand(10,6,8,8)
print(f"Shape of array: {a.shape}")
print(f"Shape of new array: {a.reshape(5,-1,-1)}")

**IMPORTANT**: It is necessary to ensure that the new shape of a numpy array would be "compatible" with the old shape. This means the number of elements in both shapes should be the same. 

In the examples above, all shapes had 16 elements. Some of them were (16,), (2,8), (4,2,2), (2,2,2,2), etc.

In [None]:
# What if we reshape to an invalid shape
## THIS WILL THROW AN ERROR
a = np.arange(16)
print(f"Error case: {a.reshape(4,2,3)}")

## Array operations

- Numpy arrays are very useful for performing element-wise operations (addition, subtraction, etc.) between lists, matrices, 3-d arrays, etc.

In [None]:
# Example of element-wise addition and subtraction
a = np.arange(7)
ones_array = np.ones(7, dtype=np.int32)
sum_of_two_arrays = a + ones_array
difference_of_two_arrays = a - ones_array

print(f"Array = {a}")
print(f"Ones array = {ones_array}")
print()
print(f"Sum = {sum_of_two_arrays}")
print(f"Difference = {difference_of_two_arrays}")

In [None]:
# Element wise multiplication between matrices
a = np.arange(9).reshape(3,3) + np.ones(shape=(3,3), dtype=np.int32)
b = np.arange(9).reshape(3,3)

print(f"Array a = \n{a}")
print(f"Array b = \n{b}\n")

print(f"Element-wise multiplication: a * b = \n{a * b}")

#### Broadcasting

- Broadcasting is useful when you want to perform operations that usually require a loop. For example, adding a constant to all numbers in an array.
- Performing operations using broadcasting is usually faster.
- However, it is important to get the shapes right for broadcasting to work as expected. To do this, we may have to use 'reshape'.
- Please see examples below to learn what it means for the shapes to align

In [None]:
# Adding a constant to an array
a = np.arange(3, 10)
print(f"Array = {a}")
print(f"After adding 1: {a + 1}")
print(f"After dividing by 6: {a/6}")
print()

# For 2-d arrays
a = np.ones(shape=(2,5))
print(f"Array = \n{a}")
print(f"Multiplying by 8: \n{a * 8}")

- We can also use this to add 1-d arrays to 2-d arrays on a row or column basis.
- For example, let's say we have a 2-d array called 'A' with shape (3,5) and a 1-d array called 'x' with shape (3,). We can easily multiply 'x' to each column of 'A' without using loops.
    * However, we would have to reshape 'x' from (3,) to (3,1) for the broadcasting to work, since the shape of 'A' is (3,5). 

In [None]:
A = np.arange(15).reshape(3,5)
x = np.arange(3, dtype=np.int32)

print(f"Array A = \n{A}")
print(f"Shape of A = {A.shape}")
print()
print(f"Array x = {x}")
print(f"Shape of x = {x.shape}")
print()

# Reshaping x
x_reshaped = x.reshape(3,1)
print(f"Array x reshaped = \n{x_reshaped}")
print(f"Shape of x reshaped = {x_reshaped.shape}")
print()

# Multiplying
product = A * x_reshaped
print(f"A * x_reshaped = \n{product}")
print(f"Shape of product = {product.shape}")

In [None]:
# Trying to multiply 'A' with 'x' without reshaping would throw an error
## THIS WILL THROW AN ERROR

A = np.arange(15).reshape(3,5)
x = np.arange(3, dtype=np.int32)

print(f"A * x = {A * x}")

What if we had an array of shape (5,) and we want to multiply it with every row of A, which has shape (3,5)?

In [None]:
A = np.arange(15).reshape(3,5)
y = np.arange(5, dtype=np.int32)

print(f"Array A = \n{A}")
print(f"Shape of A = {A.shape}")
print()
print(f"Array y = {y}")
print(f"Shape of y = {y.shape}")
print()

# Reshaping y
y_reshaped = y.reshape(1,5)
print(f"Array y reshaped = {y_reshaped}")
print(f"Shape of y reshaped = {y_reshaped.shape}")
print()

print(f"A * y reshaped = \n{A * y_reshaped}")
print()

# This can be done without reshaping as numpy can understand the shape (5,) as (1,5) since both are row arrays
print(f"A * y = \n{A * y}")
print()

- Let's say we have a (500,2,2) array, which is an array that contains 500 arrays of shape (2,2).
- How do we get an array such that we only retain the diagonal elements in each (2,2) array?

In [None]:
batch = np.random.rand(500,2,2)
i = np.diag(np.ones(2))  # Can also be done using np.eye()

print(f"Array = \n{batch}")
print(f"Shape of array = {batch.shape}")
print()

print(f"i = \n{i}")
print()

print(f"Array with only diagonal elements in (2,2) arrays: \n{batch * i}")

How fast really is numpy broadcasting in relation to standard for loops?
* Run cells below to find out

In [None]:
%%timeit
product = batch * i

In [None]:
%%timeit
product = np.array(batch)
for index in range(product.shape[0]):
    product[index] *= i

#### Aggregation

In [None]:
# Finding sum of all elements in an array
a = np.arange(5)
print(f"Array = {a}")
print(f"Sum of elements in a = {np.sum(a)}")
print()

# What if we want the average of all elements of a 2-d array?
a = np.arange(8).reshape(4,2)
print(f"Array = \n{a}")
print(f"Average of all elements: {np.mean(a)}")

- Axes (or dimensions) are 0-indexed
- If shape is (10,4,3), then the number of elements across each axis is as follows:
    * axis 0 - 10
    * axis 1 - 4
    * axis 2 = 3

In [None]:
# Aggregating across specific dimension (or axis)
a = np.array([
    [4,5,6],
    [2,3,1],
    [9,8,7]
])
print(f"Array = \n{a}")
print()

# Finding max of each row (axis = 1)
print(f"Max across each row: \n{np.max(a, axis=1)}")

# Finding min across each column
print(f"Min across each column: \n{np.min(a, axis=0)}")
print()

# Finding index of max of each row (axis = 1)
# Note, it is 0-based indexing
print(f"Index of max across each row: \n{np.argmax(a, axis=1)}")

# Finding index of min across each column
print(f"Index of min across each column: \n{np.argmin(a, axis=0)}")

Suppose we have an array of shape (7,4,3). We want to do the following:
* Get the maximum of each of the 7 arrays of shape (4,3)
* Average all the maximum values

In [None]:
a = np.random.rand(7,4,3)
# print(f"Array a = \n{a}")
print(f"Shape of array = {a.shape}")

# Changing the inner (4,3) array to a 1-d array
a = a.reshape(a.shape[0], -1)
# print(f"Array reshaped = \n{a}")
print(f"Shape of array after reshape = \n{a.shape}")

# Getting max for each "(4,3)" array
max_values = np.max(a, axis=1)
print(f"Max values = {max_values}")
print(f"Shape of max values = {max_values.shape}")

# Getting average of the max values
print(f"Average of max values = {np.mean(max_values)}")

## Linear Algebra

Numpy can also be used to perform linear algebra operations.

In [None]:
# To create a diagonal matrix using a list or array
diagonal_matrix = np.diag([1,2,3,4])
print(f"Diagonal matrix: \n{diagonal_matrix}")
print()

# To create an identity matrix
identity_matrix = np.eye(3)
print(f"Identity matrix of size 3: \n{identity_matrix}")

In [None]:
# To perform a matrix multiplication, there are two ways

A = np.array([   # 2x3 matrix
    [1,2,3],
    [4,5,6]
])

B = np.array([   # 3x4 matrix
    [1,2,3,1],
    [4,5,6,1],
    [7,8,9,1]
])

print(f"A x B = \n{np.matmul(A, B)}\n")
print(f"A x B = \n{A @ B}")

In [None]:
# If the matrix multiplication is invalid (such as if the dimensions don't match) we get the following error
# THIS WILL THROW AN ERROR
print(f"B x A = {B @ A}")

In [None]:
A = np.array([
    [1,2,3],
    [5,6,7],
    [10,0,12]
])

print(f"A = \n{A}\n")

# To get transpose of a matrix
print(f"Transpose of A = \n{np.transpose(A)}")
print(f"Transpose of A = \n{A.T}\n")

# To get inverse of a matrix
print(f"Inverse of A = \n{np.linalg.inv(A)}\n")

# To get eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues of A = {eigenvalues}")
print(f"Eigenvectors of A = \n{eigenvectors}")