<!--
SPDX-FileCopyrightText: Copyright (c) 2019-2024 Idiap Research Institute <contact@idiap.ch>
SPDX-FileContributor: Olivier Canévet <olivier.canevet@idiap.ch>
-->

# Introduction to NumPy

This notebook is a simple introduction to NumPy. If you are already familiar with NumPy, simply go through the notebook to see if there is anything unknown to you.

Here are more references if you need:

- [NumPy quick start](https://numpy.org/doc/stable/user/quickstart.html)
- [Another NumPy tutorial](https://www.w3schools.com/python/numpy/numpy_intro.asp)
- [NumPy for MATLAB users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)

After completing the notebook, you are invited to do the following entertainment quizzes:
- [w3schools NumPy quiz](https://www.w3schools.com/python/numpy/numpy_quiz.asp)
- [w3schools NumPy exercises](https://www.w3schools.com/python/numpy/numpy_exercises.asp)


This notebook was developped at the [Idiap Research Institute](https://www.idiap.ch) by [Olivier Canévet](mailto:olivier.canevet@idiap.ch). Any reproduction or distribution of this document, in whole or in part, is prohibited unless permission is granted by the author.

We import the NumPy module with `import numpy as np` (instead of simply `import numpy` to use it with `np.something()` instead of `numpy.something()`, which is a common practice in Python.

In [None]:
import time
import numpy as np

## Arrays

Arrays are the core structure in NumPy.

```python
x = np.array([[2.1, 3.2, 1.4], [0.1, 1.7, 8.1]])
print(x)
```

which displays

```
[[2.1 3.2 1.4]
 [0.1 1.7 8.1]]
```

The dimension of the array is given by `shape`:

```python
print(x.shape)
```

which gives

```
(2, 3)
```

meaning an array of 2 rows and 3 columns.

We can specify the type of the values with argument `dtype` (see all the available types in the [documentation](https://numpy.org/doc/stable/user/basics.types.html))

```python
x = np.array([0, 1, 0, 0], dtype=bool) # Boolean: [False  True False False]
x = np.array([1, 2, 3, 4], dtype=float) # double precision: [1. 2. 3. 4.]
x = np.array([128, 255, 0, 10], dtype=np.uint8) # Note np. to get unsigned char
```

If not specified, NumPy select the one that fits your data.

In [None]:
x = np.array([[2.1, 3.2, 1.4], 
              [0.1, 1.7, 8.1]])
print(x)
print(f"Type of values is {x.dtype}")
print(f"Array x has {x.shape[0]} rows and {x.shape[1]} columns")
x = np.array([1, 2, 3, 4], dtype=float)
print(x)
x = np.array([0, 1, 0, 0], dtype=bool)
print(x)
x = np.array([128, 255, 0, 10], dtype=np.uint8)
print(x)

Create a NumPy array `x` of size `(2,3)` containing the following values:

```
0.07 1.414 2.71828
3.14 1.618 0.69314
```

In [None]:
# Create an array of containing the values above
#
# x = ...
#
# (1 line of code)
# YOUR CODE HERE
raise NotImplementedError()
print(x)

Run the next cell to see if your implementation is correct:

In [None]:
assert x.shape == (2, 3), "The shape of `x` should be (2, 3)"
for i, j, v in zip(np.repeat([0,1],[3]), np.tile([0,1,2],[2]), [0.07, 1.414, 2.71828, 3.14, 1.618, 0.69314]):
    assert x[i, j] == v, f"Value at row {i} and column {j} should be {v}"

By using function `np.zeros()`, create an array `x` of size `(3, 2, 4)` full of zeros. See the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) of function `numpy.zeros`.

In [None]:
# Create an array of zeros of shape (3, 2, 4)
# by using np.zeros(...)
#
# x = ...
#
# (1 line of code)
# YOUR CODE HERE
raise NotImplementedError()
print(x)

Run the next cell to see if your implementation is correct:

In [None]:
assert x.shape == (3, 2, 4), "The shape of `x` should be (3, 2, 4)"
assert np.array_equal(x, np.array([[[0.0]*4]*2]*3)), "Array `x` should be full of zeros"

By using function `np.ones()`, create an array `x` of size `(2, 5)` full of ones. See the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.ones.html) of function `numpy.ones`.

In [None]:
# Create an array of ones of shape (2, 5)
# by using np.ones(...)
#
# x = ...
#
# (1 line of code)
# YOUR CODE HERE
raise NotImplementedError()
print(x)

Run the next cell to see if your implementation is correct:

In [None]:
assert x.shape == (2, 5), "The shape of `x` should be (2, 5)"
assert np.array_equal(x, np.array([[1.0]*5]*2)), "Array `x` should be full of ones"

## Accessing the elements of an array

The elements of an array can be accessed with operator `[]` and as usual in computer science, indices start at `0`:

```python
x = np.array(
  [[3, 5, 1, 3],
   [6, 7, 2, 9]]
)
print(x[0,0]) # Prints 3
print(x[1,2]) # Prints 2
```

This can also be used to change the values of an array:

```python
x[1,0] = 8 # First element of second row is now 8
```

We can also access multiple elements with slices. In the example below, we set the last two elements of rows 0 and 1 to zero:

```python
x[0:2, 2:4] = 0
print(x)
```

displays

```
[[3 5 0 0]
 [8 7 0 0]]
```

and we can select all the elements of a dimension with `:`, for instance, selecting one particular row or column:

```python
print(x[1, :]) # Print second row
print(x[:, 2]) # Print third column
```

In [None]:
x = np.array(
  [[3, 5, 1, 3],
   [6, 7, 2, 9]]
)
x[1,0] = 8 # First element of second row is now 8
x[0:2, 2:4] = 0 # Selection of rows 0 to 2 (excluded) and column 2 to 4 (excluded)
print(x)

By using `numpy.full`, slice and [extended slice](https://docs.python.org/2.3/whatsnew/section-slices.html) on the tensor itself (i.e. **don't use a temporary list nor for loops**), generate the following array:
```
1 3 1 3 1 3 1 3 1 3 1 
2 3 2 3 2 3 2 4 4 4 2 
1 3 1 3 1 3 1 4 4 4 1 
2 3 2 3 2 3 2 3 2 3 2 
1 3 1 3 1 3 1 3 1 3 1 
2 3 2 3 2 3 2 3 2 3 2 
1 3 1 3 1 3 1 3 1 3 1 
2 3 2 3 2 3 2 3 2 3 2 
1 3 1 3 1 3 1 3 4 4 4 
2 3 2 3 2 3 2 3 4 4 4 
1 3 1 3 1 3 1 3 4 4 4 
```

For instance, the following tensor

```
1 2 3 3
2 2 2 2
1 2 3 3
```

can be created like this (among other ways...):

```python
m = np.zeros((3,4), dtype=int)
m[0::2,0] = 1
m[:, 1] = 2
m[1] = 2
m[0::2,2:] = 3
```

> This exercice is meant to make you manipulate slices of arrays.

In [None]:
# M = ...
#
# (<10 lines of code)
# YOUR CODE HERE
raise NotImplementedError()
print(M)

Run the next cell to see if your implementation is correct:

In [None]:
assert M.shape == (11, 11)
assert np.array_equal(M, np.array(
    [[1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1],
     [2, 3, 2, 3, 2, 3, 2, 4, 4, 4, 2],
     [1, 3, 1, 3, 1, 3, 1, 4, 4, 4, 1],
     [2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2],
     [1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1],
     [2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2],
     [1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1],
     [2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2],
     [1, 3, 1, 3, 1, 3, 1, 3, 4, 4, 4],
     [2, 3, 2, 3, 2, 3, 2, 3, 4, 4, 4],
     [1, 3, 1, 3, 1, 3, 1, 3, 4, 4, 4]])), "The two arrays should be the same"

It can be useful to change the size of an array without changing its elements. We can do that using function `numpy.reshape()` (see the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)) or directly on an array with `the_array.reshape()` (see the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html)).

```python
x = np.array([[1,2,3],[4,3,2]]) # x is a 2x3 array
y = x.reshape((1,6)) # y is a row 1x6
print(x)
print(y)
```

displays

```
[[1 2 3]
 [4 3 2]]
[[1 2 3 4 3 2]]
```

Note that the `reshape()` function **does not make a copy and returns a reference**, which means that modifying one variable will also affect the other:

```python
y[0,1] = 9
```

leads to `x` being modified:

```
[[1 9 3]
 [4 3 2]]
[[1 9 3 4 3 2]]
```

**Both arrays `x` and `y` are changed, because the underlying storage is the same**.

In [None]:
x = np.array([[1,2,3],[4,3,2]])
y = x.reshape((1,6))
print(x)
print(y)
y[0,1] = 9
print(x)
print(y)

Using function `numpy.arange()` which creates a sequence of integers (see [its documentation](https://numpy.org/doc/stable/reference/generated/numpy.arange.html)) and then, by calling `reshape()` create an array with should be equal to:

```
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
```

In [None]:
# Create an array x using arange and reshape
# which should be equal to the values above.
#
# x = ...
#
# (<3 lines of code)
# YOUR CODE HERE
raise NotImplementedError()
print(x)

Run the next cell to see if your implementation is correct:

In [None]:
assert x.shape == (4,5)
assert np.array_equal(x, np.array(
    [[ 0,  1,  2,  3,  4],
     [ 5,  6,  7,  8,  9],
     [10, 11, 12, 13, 14],
     [15, 16, 17, 18, 19]])), "The two arrays should be the same"

# Operations on arrays

In NumPy, we have **element-wise** operations which should be performed on arrays of the same shape: `+`, `-`, `/`, `*`. For instance, if `A` and `B` are of shape `(m, n)`, then `C = A * B` is the element-wise multiplcation:

$$C_{i, j} = A_{i, j} \times B_{i, j}.$$

If we want to do a matrix-matrix multiplication on `A` of shape `(m, n)` and `B` of shape `(n, p)`, we use `C = A @ B` where

$$C_{i, j} = \sum_{k=1}^n A_{i, k}  \times B_{k, j}$$

Create an array `A` equal to

```
[[ 4.  3.  6. 12.]
 [ 9.  0.  6.  3.]
 [ 1.  2.  3.  8.]]
```

and an array `B` equal to

```
[[2. 2. 4. 6.]
 [3. 9. 6. 1.]
 [2. 2. 2. 4.]]
```

and compute the following element-wise operation "A plus B", "A minus B", "A times B", and "A divided by B".

In [None]:
# A = ...
# B = ...
# A_plus_B = ...
# A_minus_B = ...
# A_times_B = ...
# A_div_B = ...
#
# (~6 lines of code)
# YOUR CODE HERE
raise NotImplementedError()

Run the next cell to see if your implementation is correct:

In [None]:
assert A.shape == (3,4), "The size of `A` should be (3,4)"
assert B.shape == (3,4), "The size of `B` should be (3,4)"
assert np.array_equal(A_plus_B, np.array([[ 6, 5, 10, 18], [12, 9, 12, 4], [ 3, 4, 5, 12]]))
assert np.array_equal(A_minus_B, np.array([[ 2, 1, 2, 6], [ 6, -9, 0, 2], [-1, 0, 1, 4]]))
assert np.array_equal(A_times_B, np.array([[ 8, 6, 24, 72], [27, 0, 36, 3], [ 2, 4, 6, 32]]))
assert np.array_equal(A_div_B, np.array([[2, 1.5, 1.5, 2], [3, 0, 1, 3], [0.5, 1, 1.5, 2]]))

In this section, you will compare an implementation of a "naive" matrix-matrix multiplication with `for` loops with the matrix-matrix multiplication (operator `@`) of NumPy.

If we want to do a matrix-matrix multiplication of `A` of shape `(m, n)` and `B` of shape `(n, p)`, we use `C = A @ B` where

$$C_{i, j} = \sum_{k=1}^n A_{i, k}  \times B_{k, j}$$

In the cell below, implement function `matrix_multiplicaton_3_for_loops` which should return a matrix equal to the matrix-matrix product of the two input matrices, where you implement the product by yourself. Don't do `return A @ B`...

In [None]:
def matrix_multiplicaton_3_for_loops(A, B):
    """Performs matrix-matrix multiplication AB
    
    Args:
      A: input NumPy array of shape (m, n)
      B: input NumPy array of shape (n, p)
      
    Returns:
      NumPy array of shape (m, p) equal to AB
      
    """
    assert A.shape[1] == B.shape[0], "The number of columns of A should be equal to the number of rows of B"
    C = np.zeros((A.shape[0], B.shape[1]))
    # YOUR CODE HERE
    raise NotImplementedError()
    return C

Run the cell below to check whether your implementation is correct:

In [None]:
assert_A = np.array([[1, 2, 3, 2], [5, 4, 3, 2], [6, 3, 7, 5]])
assert_B = np.array([[3, 2], [9, 8], [7, 6], [3, 4]])
assert np.array_equal(matrix_multiplicaton_3_for_loops(assert_A, assert_B), assert_A @ assert_B)

In the cell below, create two arrays `A` and `B` of shape `(m, n)` and `(n, p)` respectively filled with random values, and compare the time it takes between your function `matrix_multiplicaton_3_for_loops` and the NumPy matrix-matrix operator `@`. Use for instance `m=500`, `n=200`, and `p=300`.

You can measure a duration with for instance:

```python
tic = time.perf_counter()
# Do something
toc = time.perf_counter()
print(f"Took {toc-tic:.4f} seconds")
```

Create your matrices with `numpy.random.rand()`, see its [documentation](https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html).

In [None]:
# Compare matrix_multiplicaton_3_for_loops and @
#
# m = ...
# n = ...
# p = ...
# C_3loops = ...
# C_numpy = ...
#
# (<20 lines of code)
# YOUR CODE HERE
raise NotImplementedError()

Run the cell below to check whether your implementation is correct:

In [None]:
assert np.allclose(C_3loops, C_numpy, atol=1e-06), "The two matrices should be equal"

# Computing mean accross different axis

Reduction operations (such as mean, variance, standard deviation, etc.) are performed across one or several axis.

For instance, you may want to compute the mean of the rows of an array, or the mean image of a data set. Note the following three situations given this input array:

```python
x = np.array(
   [[1, 2, 3, 4, 2],
    [1, 6, 5, 2, 2],
    [1, 4, 9, 2, 2],
    [1, 8, 7, 6, 2]]
)
```

Calling `x.mean()` computes the mean of all the coefficients of the array:

```
3.5
```

Calling `x.mean(axis=0, keepdims=True)` computes the mean of each column:

```
[[1.  5.  6.  3.5 2. ]]
```

By default, `keepdims` is `False` and would return `[1.  5.  6.  3.5]` (note a 1d array instead of 2d).

Calling `x.mean(axis=1, keepdims=True)` computes the mean of each row:

```
[[2.4]
 [3.2]
 [3.6]
 [4.8]]
```

In [None]:
x = np.array(
   [[1, 2, 3, 4, 2],
    [1, 6, 5, 2, 2],
    [1, 4, 9, 2, 2],
    [1, 8, 7, 6, 2]]
)
print(x.mean())
print(x.mean(axis=0, keepdims=True))
print(x.mean(axis=1, keepdims=True))

A teacher has 6 students and made 4 exams. Here are the grades:

| Name    | Exam 1 | Exam 2 | Exam 3 | Exam 4 |
|---------|--------|--------|--------|--------|
| Joe     |      6 |    5.5 |      4 |      5 |
| William |    5.5 |    5.5 |      4 |      4 |
| Jack    |    4.5 |      3 |    3.5 |      6 |
| Averell |      6 |      6 |    5.5 |      4 |
| Luke    |    4.5 |      5 |    5.5 |      6 |
| Billy   |      3 |    2.5 |      4 |    4.5 |

After creating an array `grades` of shape `(6,4)` and using function `mean(axis=...)`, compute the mean for each exam, and the mean for each student.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Run the cell below to check whether your implementation is correct:

In [None]:
assert grades[0,0] == 6, "The first grade of Joe is 6"
assert grades[1,-1] == 4, "The last grade of William is 4"
assert np.allclose(exam_means.squeeze(), np.array([4.91666667, 4.58333333, 4.41666667, 4.91666667]), atol=0.5), "The exam means are wrong"
assert np.allclose(student_means.squeeze(), np.array([5.125,4.75,4.25,5.375,5.25,3.5]), atol=0.6), "The student means are wrong"

## Plotting with Matplotlib

In this section, we present the basic functions to plot a graph with Matplotlib.

First, let's import Matplotlib (the second line is specific to Jupyter and not to Python).

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In the next cell, by using `np.random.normal()` (see its [documentation](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html)), create a array `X1` of shape `(100, 2)` representing of 100 2d-samples following a Gaussian distribution with mean `(0, 0)` and standard deviation `(1,1)`, and an array `X2` representing 100 2d-samples following a Gaussian distribution with mean `(8, 2)` and standard deviation `(2, 0.5)`.

Then, using function `matplotlib.pyplot.plot` (see its [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)), plot these two variables in the 2d plan. You may also want to use `matplotlib.pyplot.legend()` as well as play with different [markers](https://matplotlib.org/stable/api/markers_api.html). You should obtain something like the following:

<div style="max-width:40%; margin:auto; padding:10px;">

![Gaussian distributions](pics/gauss.png)

</div>

For instance, the following code generates 100 2d points uniformly in $[0, 1]^2$ and plots them:

```python
X = np.random.rand(100,2)
plt.plot(X[:,0], X[:,1], "ro") # ro is for red circle
plt.show()
```

and displays

<div style="max-width:40%; margin:auto; padding:10px;">

![Uniform distribution](pics/uniform.png)

</div>

In [None]:
# X1 = ...
# X2 = ...
# plt.plot(...)
# ...
# plt.show()
#
# (<15 lines of code)
# YOUR CODE HERE
raise NotImplementedError()

Run the cell below to check whether your implementation is correct:

In [None]:
assert X1.shape == (100, 2), "We expect 100 samples of dim 2"
assert X2.shape == (100, 2), "We expect 100 samples of dim 2"
assert np.allclose(X1.mean(axis=0, keepdims=True), (0, 0), atol=0.5), "The mean of X1 should be around (0,0)"
assert np.allclose(X1.std(axis=0, keepdims=True), (1, 1), atol=0.5), "The stdev of X1 should be around (1,1)"
assert np.allclose(X2.mean(axis=0, keepdims=True), (8, 2), atol=0.5), "The mean of X2 should be around (8,2)"
assert np.allclose(X2.std(axis=0, keepdims=True), (2, 0.5), atol=0.5), "The stdev of X2 should be around (2,0.5)"

Finally, we will plot a curve with Matplotlib. Usually in machine learning, we want to plot the loss of the model being trained as a function of the number of epochs.

Given the following variables `epochs` and `loss`, play with the parameters of `plt.plot` and try to generate the following plot:

<div style="max-width:40%; margin:auto; padding:10px;">

![Epoch and loss](pics/epoch-loss.png)

</div>

You may use `plt.title`, `plt.xlabel`, `plt.ylabel`, etc. If you need to save your plot for a report, you can use:

```python
plt.savefig("plot.pdf", bbox_inches="tight")
```


In [None]:
epochs = [0, 1, 2, 3, 4, 5, 6, 7]
loss = [2.3, 0.8, 0.4, 0.2, 0.1, 0.09, 0.08, 0.05]

# plt.plot(...)
# plt.savefig("plot.png")
# plt.show()
#
# (<10 lines of code)
# YOUR CODE HERE
raise NotImplementedError()

Run the cell below to check whether your implementation is correct:

In [None]:
assert len(epochs) == 8, "This is a dummy test as it is difficult to automatically check a plot"

# Feedback
You have now reached the end of this practical session. Please give us your feedback in the block below:
1. What have you learned?
2. What should be improved?
3. What was unclear?
4. Any comment is welcome!

**Do not forget to save your notebook before submitting it, otherwise you submit the last autosave checkpoint.**

YOUR ANSWER HERE