# Worksheet 9A: NumPy

NumPy is one of the most fundamental external package used by Python users. Its array structure is a core component of its functionality offering functionalities functionalities beyond what's possible with Python's built-in data structures in a concise & efficient way.

Let's start by installing the library using `pip` (here we're using a Jupyter magic command to execute the shell command within the notebook):

In [None]:
!pip install numpy

If the installation was successful you should be able to import the package:

In [None]:
import numpy as np

---
## Q1: Creation

Arrays are one of the most important data structures in NumPy. Arrays can be initialised through a number of [creation routines](https://numpy.org/doc/stable/reference/routines.array-creation.html). Let's look at a few of them.

Consider the following `list`:

In [None]:
l = [1, 2, 3, 4, 5, 6]

We can create the equivalent NumPy array using the original data as follows:

In [None]:
a = np.array(l)
a

Notice that its type isn't a `list`:

In [None]:
type(a)

### Q1 a

Similarly, to how we could create `l1` using `range` as follows:

In [None]:
list(range(1, 7))

NumPy also provides an analogous method:

In [None]:
np.arange(1, 7)

Note how we don't need to cast the result to `list`, since an array is immediately returned by `arange`. In fact, if we did this, we would cast the result to a Python `list`.

This package also provides the capability of initialising array with particular values. The following examples initialise arrays of `3` elements with a specific value:

In [None]:
np.ones(3)

In [None]:
np.zeros(3)

In [None]:
np.full(3, 2)

---
## Shape

Let's talk about an array's [`shape`](https://numpy.org/doc/stable/reference/generated/numpy.shape.html), which is an attribute that can be accessed from an array instance:

In [None]:
a.shape

This gives an output similar to what we would get when checking the `len` of a `list`:

In [None]:
len(l)

The `shape` isn't a single integer though, but a tuple with an integer. The reason for this is that NumPy supports multiple dimensions. Let's construct a $2 \times 3$ array:

In [None]:
m = np.array([[1, 2, 3], [4, 5, 6]])
m

Now the `shape` gives us the dimensions in the form of `(row, column)`, but the length of the list gives us the number of nested lists:

In [None]:
m.shape

The length of the list

In [None]:
print(m.tolist())
print(len(m.tolist()))

### Q1: Reshaping

We could [`transpose`](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html) a multi-dimensional array, which inverts the rows & columns:

In [None]:
m_T = m.transpose()
m_T

In [None]:
m_T.shape

A more flexible method is [`reshape`](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html), which allows us to specify a custom shape:

In [None]:
m.reshape((6, 1))

There has to be enough elements in the error to fill the specified shape, otherwise, an error is raised:

In [None]:
m.reshape((11, 2))

Note how `m` is still the $2 \times 3$ array we original had. Since the `reshape` method doesn't reshape the array in-place, but returns a new array which is reshaped.

In [None]:
m

`reshape` arguments can be `-1` when we want NumPy to infer the resultant elements along a particular dimension:

In [None]:
m.reshape(3, -1)

In [None]:
a.reshape((-1, 1))

#### Q1 a

Write code to re-create `m` from the contents of `a` using `reshape`:

In [None]:
# answer:


#### Q1 b

Write code to re-create `a` from the contents of `m` using `reshape`:

In [None]:
# answer:


### Q2

We can create multi-dimensional arrays of a custom shape without needing to use `reshape`.

#### Q2 a

Create an array `a1` which is a $3 \times 3$ array using [`ones`](https://numpy.org/doc/stable/reference/generated/numpy.ones.html) (without using `reshape`).

In [None]:
# answer:


#### Q2 b

Use the [`zeros_like`](https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html) method to create an array `a2` which is the same shape of `a1` (without using `reshape`).

In [None]:
# answer:


---
## Arithmetic & Conditional operations

One of the most powerful functionalities of NumPy array is the ease by which we can perform arithmetic & conditional operations.

In [None]:
x = np.arange(1, 4)
x

In [None]:
x - 1

Note, that the original contents of `x` aren't modified:

In [None]:
x

For these changes to take effect we have to assign the result to the original variable:

In [None]:
x -= 1
x

Conditional operators are also supported. For example:

In [None]:
x > 2

In [None]:
x == 2

In [None]:
x % 2 == True

Apart from doing these operations against a single value, we can also perform element-wise operations between 2 arrays:

In [None]:
y = np.arange(4, 7)
y

In [None]:
x + y

In [None]:
x - y

In [None]:
x * y

In addition to making the resulting code more concise, the code is also much more efficient thanks to NumPy's built-in vectorisation capabilities. Let's compare this to list comprehension (feel free to compare it against a `for` loop as well).

In [None]:
n = 10000
a = np.random.random(n)
b = np.random.random(n)
a.shape

In [None]:
%%timeit
[a_i * b_i for a_i, b_i in zip(a, b)]

In [None]:
%%timeit
a * b

These operations also work on multi-dimensional arrays:

In [None]:
z = np.array([[2, 11, 5],
              [10, 12, 13]])
z

In [None]:
z * 2

In [None]:
z > 10

In [None]:
z + z

### Q3: Broadcasting 

Let's now try performing an operation between arrays of a different shape:

In [None]:
z - x

Why does this work? Explain what's happening & in what cases this won't work. Feel free to modify the operation between the arrays to understand what's happening.

*answer:*


### Q4: Methods

There are a number of methods on arrays which perform different computations.

#### Q4 a

The [`min`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.min.html) method computes the minimum number of an array:

In [None]:
z.min()

But what if we wanted to get the minimum for each column? Provide the code to do this. *Hint: you should be able to do this by specifying an additional parameter to `min`.*

In [None]:
# answer:


#### Q4 b

Use the [`clip`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.clip.html) method to return an array whose elements are between `0` & `9`.

In [None]:
# answer:


#### Matrix operations

We can perform multiplication between matrices

In [None]:
z * z

But this is element-wise multiplication & not matrix multiplication:

In [None]:
np.matmul(z, z.transpose())

Remember that when performing matrix multiplication between matrices of sizes $n \times m$ & $m \times l$, the result is a matrix of size $n \times l$.

In [None]:
m

Let's no compute matrix multiplication between a square matrix & its inverse:

In [None]:
m = np.array([
    [2, 6, 7],
    [2, 6, 5],
    [1, 1, 2],
])

I = np.matmul(m, np.linalg.inv(m))
I

Remember that multiplying a matrix by its inverse is equal to the identity matrix: $A A^-1 = I$.

In [None]:
np.identity(3)

If we try this with different values, we might get some rounding errors:

In [None]:
m = np.array([
    [2, 4, 7],
    [3, 6, 9],
    [1, 1, 2],
])

I = np.matmul(m, np.linalg.inv(m))
I

This is a [common issue in floating point implementations](https://stackoverflow.com/q/21895756), but we can round to get the desired output:

In [None]:
np.around(I)

In [None]:
np.array_equal(np.around(I), np.identity(3))

## Concatenation

### [`append`](https://numpy.org/doc/stable/reference/generated/numpy.append.html)

Similar to `list.append`, we can append items to arrays.

In [None]:
np.append(a, 4)

Notice how the syntax is slightly different to `list`s:

In [None]:
l.append(4)
l

The main reason is that the `append` operation doesn't happen in-place, so the original array wasn't changed by our previous operation:

In [None]:
a

In [None]:
a = np.append(a, 4)  # assign the result to change the array
a

Multiple elements can be added at once by supplying an iterable instead of a single element:

In [None]:
np.append(a, [5, 6])

With `list.append` the list would have been added as a single element in the list, which is why we would use `extend` instead to get the same sequence as above:

In [None]:
l.extend([5, 6])
l

Since NumPy's `append` accepts either a single element or a sequence of elements, the following statements would produce the same sequence:

In [None]:
np.append(a, 5)

In [None]:
np.append(a, [5])

We can also append along a particular dimension for multi-dimensional arrays:

In [None]:
m = np.array([[1, 2, 3], [5, 6, 7]])
m

In [None]:
np.append(m, [[4], [8]], axis=1)

Notice how we are appending a "column" array since we are appending along `axis` `1` of `m`.

### Stacking

While append is generally useful for appending individual elements, stacking is a more versatile functionality which allows us to stack arrays against each other along an axis, to create a larger array. Let's consider the following arrays:

In [None]:
m1 = np.arange(0, 4)
m1

In [None]:
m2 = np.arange(4, 8)
m2

We could stack horizontally:

In [None]:
m3 = np.hstack((m1, m2))
m3

The results is something similar to what would happen with `append`.

However, if we stack vertically, we would end up with a multi-dimensional array:

In [None]:
m3 = np.vstack((m1, m2))
m3

Notice how, similar to `append`, both `hstack` & `vstack` don't modify the array in-place, but return a new array.

We can also stack multi-dimensional arrays against each other, as long as the dimension we're stacking against matches.

In [None]:
rows, columns = m3.shape

In [None]:
# the number of rows in m4 does not necessarily have to be equal to m3, but the number of columns has to!
m4 = np.arange(8, 32).reshape((-1, columns))
m4

In [None]:
m5 = np.vstack((m3, m4))
m5

In [None]:
# the number of columns in m6 does not necessarily have to be equal to m3, but the number of rows has to!
m6 = np.arange(8, 32).reshape((rows, -1))
m6

In [None]:
m7 = np.hstack((m3, m6))
m7

---
## Array access

Let's use the following array

In [None]:
a = np.arange(1, 17)
a

We can access elements in an array similar to a list:

In [None]:
a[0]

In [None]:
a[-1]

In [None]:
a[:2]

In [None]:
a[1:17:2]

In [None]:
a[0:17:2]

Let's now consider multi-dimensional arrays:

In [None]:
m = np.arange(1, 17).reshape((4, 4))
m

Individual elements could be accessed like lists:

In [None]:
m[0][1]

However, we can specifiy multiple indices for each dimension in one go:

In [None]:
m[0, 1]

This syntax allows us to access multiple elements easier. For example, if we wanted to access the first column of an array, this becomes much more straight-forward.

In [None]:
[x[1] for x in m]

In [None]:
m[:, 1]

Where the `:` indicates that we want the entire slice along that dimension. This is similar to how we can write `a[:2]` or `a[2:]` to indicate "from the start" or "till the end", respectively.

### Q5

#### Q5 a

Write the code to get the first 2 columns of `m`, by using the array access directly. _Hint: remember that the end index is not inclusive when accessing a slice._

In [None]:
# answer:


#### Q5 b

Write the code to get columns `0` & `2` (the odd numbers) from `m`, by using the array access directly. _Hint: the columns are spaced by 1 column._

In [None]:
# answer:


#### Q5 c

Write the code to get the `2`, `4`, `10`, & `12` from `m`, by using the array access directly. *Hint: the elements are spaced by 1 column & 1 row.*

In [None]:
# answer:


### Q6

You can also access elements by specifying a mask:

In [None]:
x = np.arange(10)
x[[True, True, True, False, False, False, True, False, True, True]]

This gives back the elements whose corresponding index is `True`.

Write the code to retrieve the elements which are even from `y`. You should not use a loop to do this & your answer should be generic for any array. _Hint: Remember that we get an array of `bool`s by directly by performing conditional operations on an array._

In [None]:
y = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [3, 6, 9],
])

In [None]:
# answer:


## Appendix

This notebook gave you a brief overview of the NumPy package. In this notebook we mostly looked at how we can exploit the vectorisation capabilites to do certain operations with less code & more efficiently.

You are encouraged to go over the documentation & search for tutorials to explore more functionalities that the package has to offer. An exercise you can do is to go over the notebooks where we worked with lists & try replacing each list with a NumPy array to see how you can achieve the same functionality.