# 1) Introduction to NumPy Arrays

## NumPy documentation

**Note: Numpy has a very good and extensive documentation, which you can find at https://numpy.org/doc/stable/. If you need any further details about numpy arrays, you can always refer to it.**

## Motivation for arrays

We have seen that for numerical data, we often use Numpy arrays. Why do we need this additional container and why can't we just use Python lists ?

Let's illustrate this with an example. Imagine we have a list containing weights in gramms:

In [None]:
gramms = [5400, 3491, 2591, 14100]

Now we want to transform this list into kilogramms. We don't have any other choice than using a for loop (or a comprehension list) to divide each element by 1000:

In [None]:
kilogramms = []
for i in range(len(gramms)):
    new_value = gramms[i]/1000
    kilogramms.append(new_value)
kilogramms

You can imagine much more complex cases, e.g. where we mix multiple lists, that makes this writing cumbersome and slow. What arrays provide us is **vectorized** computations.

## Creating an array from a list

To see how this works  with NumPy, let's create a Numpy array. First of all, let's import Numpy.

In [None]:
import numpy as np

We can easily turn our list from above into an array using the ```np.array``` function:

In [None]:
gramms_array = np.array(gramms)
gramms_array

## Vectorized operations

**Vectorization** means now that we can operate on the list as **one object**, i.e. we can do mathematics with it as with a single number. In our example:

In [None]:
kilogramms_array = gramms_array / 1000
kilogramms_array

As mentioned above, this also works if we need to performe a computation which uses multiple arrays. Let's imagine we have a list of price/$m^2$ and surface for a series of appartments:

In [None]:
price_per_m2 = [6, 10.3, 12.4, 10.6, 5.7, 4.3, 14, 0.5, 0.5, 17.8, 12.7, 16, 2.7, 17.5, 5.2, 7.1, 1.2, 7.2, 14.5, 11.9]
surface = [238, 239, 265, 212, 143, 132, 142, 133, 109, 291, 225, 165, 141, 197, 298, 289, 123,  90, 132, 203]

Now if we want to calculate the price of the apartment, we can multiply each price/$m^2$ by the surface. We could do that by creating a for loop and filling a new list with the values:

In [None]:
price = []
for i in range(len(price_per_m2)):
    current_price = price_per_m2[i] * surface[i]
    price.append(current_price)

In [None]:
price

As you might have guessed, it makes more sense, to once more transform the two lists into arrays:

In [None]:
price_per_m2_array = np.array(price_per_m2)
surface_array = np.array(surface)

Instead of having to write a foor loop, Numpy allows us now to just use a standard mathemetical operation where we multiply the two arrays:

In [None]:
price_array = price_per_m2_array * surface_array
price_array

You see that when multiplying two arrays, **Numpy simply multiplies each element of one array by the equivalent element of the other array**.

### Advantages of vectorization
There are two main advantages to this approach. First it makes the code much **simpler**: we achieved the calculation in a single line, what took a cumbersome for loop with lists (note that it would be slightly more efficient even in plain Python via comprehension lists, but still far from NumPy).

Second, it makes our code run much **faster**. When we do a for loop, each operation is done separately, and since Python is dynamically typed (you don't have to say whether a variable is text or numbers) it has to repeatedly carry out verifications. In the Numpy vectorized version, all multiplications can be done **in parallel** because: 1) the array contains only one type of variables so that no controls have to be done and 2) arrays are efficiently stored as blocks in memory so that individual values don't have to be "searched" for.

With this very simple example, we can compare the execution time using the magic command ```%%timeit```:

In [None]:
%%timeit -n 10000 -r 5 
price = []
for i in range(len(price_per_m2)):
    current_price = price_per_m2[i] * surface[i]
    price.append(current_price)

In [None]:
%%timeit -n 10000 -r 5
price_array = price_per_m2_array * surface_array

## Applying functions to arrays

As we've seen, we can do operations on arrays of the **same size** or with a **single number**. We will see later that there are exceptions to this rule (called *broadcasting*).

We can however also do mathematics with just a single array. Numpy implements many standard mathematical functions that you can directly apply to arrays as if you were dealing with a single number. These always have the same syntax as in regular mathematics $y = f(x)$, albeit here $y$ and $x$ are arrrays. Here a few examples, e.g. in trigonometry:

In [None]:
np.cos(price_array)

... or for exponentials and logarithms:

In [None]:
np.exp(price_array)

In [None]:
np.log10(price_array)

## Accessing elements by index

### 1D arrays

As with lists, accessing elements is straight-forward. The standard way to extract information from an array is to use the square parenthesis (bracket) notation. If we want for example to extract the second element of the array we write:

In [None]:
price_array[1]

Remember that **we start counting from 0** in Python, which is why the *second* element has index 1.

We can use negative indices to count from the end of the array. For example, if we want to access the last element, we can write:

In [None]:
price_array[-1]

### Higher dimensions

When working in higher dimensions, NumPy arrays show their potential. We can simply put the indices of each dimension in the square brackets, separated by commas. For example, if we have a 2D array (a matrix) and want to access the element in the first row and second column, we write:

In [None]:
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
array_2d[0, 1]

## Array dimensions

Arrays are pre-destined for work in **higher dimensions**. Let's check the number of dimensions, the shape, and the size of the following array:

(If you want, you can try to understand how this array was created, but it is not key in this the moment.)

In [None]:
my_array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]] * 2)

print("Number of dimensions:", my_array.ndim)
print("Shape:", my_array.shape)
print("Size:", my_array.size)

## Array type

We have mentioned above that computation is fast because the type of the arrays is known. This means that **all the elements of an array** must have the same type. Numpy implements its own types called ```dtype```. We can access the type of an array using the ```dtype``` attribute:

In [None]:
price_per_m2_array.dtype

We see that by default Numpy decided that the price had ```float64``` dtype because the numbers we used had a comma. Notice it also turned the numbers that **didn't** have a comma into floats (like the first element ```6```). Since all elements of an array need to have the same type, Numpy just selects the **most complex one** for the entire array.

Let's see what ```dtype``` the surface array has:

In [None]:
surface_array.dtype

We **only** used integer numbers in that list, and therefore Numpy can use a "simpler" ```dtype``` for that array.

Finally let's see the result of our multiplication:

In [None]:
price_array.dtype

When combining multiple arrays, Numpy always **selects the most complex** ```dtype``` for the output.

If needed, we can also change the ```dtype``` of an array explicitly using the ```as_type``` method. For example if we want our ```surface_array``` to be a float instead of an integer we can write:

In [None]:
surface_array_float = surface_array.astype(np.float64)
surface_array_float.dtype

Notice how we had to create a **new** array: by default, most operations on Numpy arrays are **not done in place** i.e. the array itself is not changed.

# Exercises

1. Create an array with 3 elements and one with 5 elements containing integers

2. Try to multiply the two arrays.

3. You should get an error message. Do you understand the problem ? Change the size of one of the arrays so that you can multiply them.

4. Change the ```dtype``` of the output to float32.

In [None]:

### YOUR CODE HERE


5. Getting shape and size of an array: follow the instructions in the comments of the cell below.

In [None]:
# a) Guess the shape, number dimensions and size of the arrays below

array_1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array_2 = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])

array_3 = np.array([1, 2, 3, 4, 5, 6])

array_4 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])


# b) Check your guesses using the appropriate attributes of the array.

### YOUR CODE HERE


6. Accessing elements in arrays: follow the instructions in the comments of the cell below.

In [None]:
# a) Access the element in the second row and third column of array_2d (should be 6)

array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

### YOUR CODE HERE


# b) Access the number 7 in array_2d using indexing

### YOUR CODE HERE


# c) Access the number 5 in array_3d using indexing

array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

### YOUR CODE HERE


7. (Optional) Creating and accessing a 2D array

- Create the array shown below
- Output the dimensions of the array
- Access the elements marked in bold (individually or per list) and output them

| 11 | 3 |
|---|---|
| 5 | **16** |
| 23 | 10 |
| 13 | 14 |
| **7** | 28 |

In [None]:

### YOUR CODE HERE
