<h1 align="center"><b>AI Lab: Computer Vision and NLP</b></h1>
<h3 align="center">Lectures 03-04: <code>numpy</code> cheatsheet</h3>

---

The data that will be used throughout this course can be seen, most of the times, as a collection of **arrays**: images, audio files, biological signalsn, and so on and so forth...

No matter the data, it must be translated first in an array in order to be used. The `numpy` package (stands for **Num**erical **Py**thon) allows to store and operate on such data. There are two ways to make a `numpy` array: we either create it from scratch or make one from a Python list. Here is an example:

In [1]:
import numpy as np

In [2]:
py_list = [1, 2, 3, 4]
np_list = np.array([3, 4, 5])

print(f"Python list: {py_list}")
print(f"Numpy list: {np_list}")

Python list: [1, 2, 3, 4]
Numpy list: [3 4 5]


We can also use `C`/`C++` types for the values stored inside an array. When `numpy` creates an array, it uses the type that can universally fit all the numbers. For instance, in a list such as $[1, \; 2, \; 3, \; 4]$, the type is `int8`, while in a list such as $[3.14, \; 9.71, \; 4]$, the type is `float8`.

In [2]:
int_list = np.array([1, 2, 3, 4], dtype="int8")
float_list = np.array([3.14, 9.71, 4], dtype="float")

[recover stuff]

We can create with `np.random.randint()` a matrix of given shape with random integers. The syntax is the following:

> ```python
> np.random.randint(x, y, (n, m))
> ```
> where:
>  - $x$ stands for the **initial number** of the range of the values that will be picked;
>  - $y$ stands for the **last number** of the range of values that will be picked;
>  - $n$ stands for the number of **rows**;
>  - $m$ stands for the number of **columns**;

In [19]:
a_list = np.random.randint(0, 10, (3, 3))

print(a_list)

[[1 1 9]
 [2 1 9]
 [6 5 1]]


We can access to a column or a row of the matrix via the following syntax:
> ```python
> a_row = matrix[x, :]
> a column = matrix[:, x]
> ```
> where $x$ stands for the index of the row or coolumn that we want to access. The `:` represents the slicing.

In [23]:
row_col = a_list[1,:]

print(row_col)

[2 1 9]


We can also access to a subarray of the initial array by using the syntax
> ```python
> subarray = matrix[a:b, c:d, ...]
> ```
> where:
>  - each couple $a:b$ represents the range of rows/columns that slice the original matrix.

The process is described as follows: suppose that we want to run the command `subarray = matrix[1:3, 0:2]` on the following matrix

$$\begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9\end{bmatrix} \quad \rightarrow_{\texttt{1:3} \text{ (on rows)}} \quad \begin{bmatrix} \; & 2 & 3 \\ \; & 5 & 6 \\ \; & 8 & 9\end{bmatrix} \quad \rightarrow_{\texttt{0:2} \text{ (on cols)}} \quad \begin{bmatrix} \; & 2 & 3 \\ \; & 5 & 6 \\ \; & \; & \; \end{bmatrix} \quad \rightarrow_{\text{resize}} \quad \begin{bmatrix} 2 & 3 \\ 5 & 6 \end{bmatrix}$$

In [25]:
subarray = a_list[1:3, 0:2]

print(subarray)

[[2 1]
 [6 5]]


In [28]:
randlist = np.arange(1,10)
print(randlist)

col = randlist[:, np.newaxis]
print(col)

[1 2 3 4 5 6 7 8 9]
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]


We can also concatenate various arrays into a single one. we use the `np.concatenate()` function:
> ```python
> np.concatenate([array1, array2, ...])
> ```
> where:
>  - `array1`, `array2`, etc... are all arrays that we can concatenate. We can concatenate how many arrays we want.

In [33]:
array1 = [1, 2, 3]
array2 = [4, 5, 6]

concatenated_array = np.concatenate([array1, array2])

print(concatenated_array)

[1 2 3 4 5 6]


Of course it works for multi-dimensional arrays as well.

In [38]:
array3 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
array4 = np.random.randint(1, 10, (2, 4))

another_concatenated_array = np.concatenate([array3, array4])

print(another_concatenated_array)

[[1 2 3 4]
 [5 6 7 8]
 [7 6 2 9]
 [8 6 1 3]]


We can also stack arrays one on top of the other, or eventually also horizontally. We can use the `np.vstack()` and `np.hstack()` functions

In [41]:
array5 = np.array([[1, 2, 3], [4, 5, 6]])
to_stack = np.array([[7], [8]])

h_stacked = np.hstack([array5, to_stack])

print(h_stacked)

[[1 2 3 7]
 [4 5 6 8]]


We can also stack in more dimensions, by using the `np.dstack()` function

In [43]:
d_stacked = np.dstack([array5, [[7, 8, 9], [10, 11, 12]]])

print(d_stacked)

[[[ 1  7]
  [ 2  8]
  [ 3  9]]

 [[ 4 10]
  [ 5 11]
  [ 6 12]]]


Arrays can also be split, thanks to the `np.split()` function

In [51]:
a1, a2, a3 = np.split(h_stacked, [2, 1])

print(h_stacked)
print(a1, a2, a3)

[[1 2 3 7]
 [4 5 6 8]]
[[1 2 3 7]
 [4 5 6 8]] [] [[4 5 6 8]]


Numpy allows to run in parallel some computations via the `ufuncs` "function". We should avoid as much as possible to do loops in a classical Python-ish way.

Numpy allows to have random numbers. We can use `np.random.seed(x)`, where $x$ is a fixed seed, that will generate, whenever we will use a function that generates random numbers, the **same** and **identical numbers**.

Let's try to run the following function:

In [52]:
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

values = np.random.randint(1, 10, size=500000)
out = compute_reciprocals(values)
print(out)

[0.16666667 1.         0.25       0.25       0.125     ]


We now run again the function, but with some time measurements

In [57]:
import time

start_time = time.time()
out = compute_reciprocals(values)
end_time = time.time() - start_time

print(out)
print(f"Total time needed (without ufuncs): {end_time}")

[0.08333333 0.02325581 0.04       ... 0.01388889 0.06666667 0.01639344]
Total time needed (without ufuncs): 16.67507004737854


We can now try to use `ufuncs` in order to compare how much faster we can go with `numpy`:

In [56]:
start_time = time.time()
out = 1.0 / values
end_time = time.time() - start_time

print(out)
print(f"Total time needed (with ufuncs): {end_time}")

[0.08333333 0.02325581 0.04       ... 0.01388889 0.06666667 0.01639344]
Total time needed (with ufuncs): 0.07526397705078125


There is a notable difference: 16,6s against 0,07. That's a noticeable speedup in the computations, hence the reason why we should always prefer `numpy`'s functions against Python's standard functions and methods. Here we have some other examples of `ufuncs`:

In [59]:
addition = 1.0 + out
# Short version of np.add(1, values)

print(addition)

[1.08333333 1.02325581 1.04       ... 1.01388889 1.06666667 1.01639344]
