# Numpy: How it Works

```{note}
This page was not shared with MUDE students in 2023-2024 (year 2).

It may have been a new page, or a modified page from year 1.

There may be pages in year 1 and year 2 that are nearly identical, or have significant modifications. Modifications usually were to reformat the notebooks to fit in a jupyter book framework better.
```

Introduction

[theory](https://tudelft-citg.github.io/learn-python/05/Theory/01.html)
[quick reference](https://tudelft-citg.github.io/learn-python/05/In_a_Nutshell/01.html)

Exercises: [airpplane velocity](https://tudelft-citg.github.io/learn-python/05/Exercises/01.html) and [bending moment](https://tudelft-citg.github.io/learn-python/05/Exercises/02.html).

This notebook is based on the Numpy lesson from [Aalto Scientific Computing: Python for Scientific Computing](https://github.com/AaltoSciComp/python-for-scicomp/) and [W3Schools](https://www.w3schools.com/python/numpy/).

## See also


* NumPy manual <https://numpy.org/doc/stable/reference/>`
* Basic array class reference <https://numpy.org/doc/stable/reference/arrays.html>
* Indexing <https://numpy.org/doc/stable/reference/arrays.indexing.html>`
* ufuncs <https://numpy.org/doc/stable/reference/ufuncs.html>`
* 2020 Nature paper on NumPy's role and basic concepts <https://www.nature.com/articles/s41586-020-2649-2>`

## What is an array?

For example, consider `[1, 2.5, 'asdf', False, [1.5, True]]` - this is a Python list but it has different types for every element. When you do math on this, every element has to be handled separately.

Lists may serve the purpose of arrays, but they are slow to process. Numpy aims to provide an array object that is up to 50x faster than traditional Python lists. Numpy is the most used library for scientific computing. Even if you are not using it directly, chances are high that some library uses it in the background.

The array data structure in numpy is called `ndarray`, it provides a lot of supporting functions that make working with `ndarray` very easy.

An array is a ‘grid’ of values, with all the same types. It is indexed by tuples of non negative indices and provides the framework for multiple dimensions. An array has:

- `dtype` - data type. Arrays always contain one type
- `shape` - shape of the data, for example 3×2 or 3×2×500 or even 500 (one dimensional) or [] (zero dimensional).
- `data` - raw data storage in memory. This can be passed to C or Fortran code for efficient calculations.

## Performance check

To quickly show the fast performances of NumPy arrays, we can compare the results of a basic operations using lists and array. In particular we will compute the square of 10000 elements.

We first do this using Python lists, by creating a list with values from 0 to 9999, and one ‘empty’ list, to store the result in.

In [None]:
a = list(range(10000))
b = [ 0 ] * 10000

In [None]:
%%timeit
for i in range(len(a)):
    b[i] = a[i]**2

That looks and feels quite fast. But let’s take a look at how NumPy performs for the same task. We first import the `numpy` module, then we create our *a* and *b* containers again, which are now `ndarray` objects. Finally we perform the square operation.

In [None]:
import numpy as np
a = np.arange(10000)
b = np.zeros(10000)

In [None]:
%%timeit
b = a ** 2

We see that working with numpy arrays provides substantial performance improvements.

> **Note**: To evaluate the time of the computation we used the `%%timeit` command. `%%timeit` is a so-called Jupyter notebook *magic command* which is intiated with a `%` or `%%` prefix for line and cell commands, respectively. This `%%` cell magic has to be the first thing in the Jupyter cell, otherwise it will not work. There are many other interesting magic commands available, such as shown [here](https://towardsdatascience.com/top-8-magic-commands-in-jupyter-notebook-c1582e813560).


## Creating arrays

Arrays can be created using many different functions, this section will provide an overview in the many useful ways in which arrays can be created.

You can create an array from a Python list by using `np.array` and passing a Python list:

>**Note**: To print the values of variables, we will make use of *f-strings*. F-strings have been introduced in Python 3.6, and they are recommended for print formatting since they improve code readability and are less prone to errors. We use f-strings by adding the letter *f* before the string we want to print, and then entering the name of the variables within curly brackets `{` and `}`. More info can be found [here](https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/)

In [None]:
a = np.array([1,2,3])               # 1-dimensional array (rank 1)
b = np.array([[1,2,3],[4,5,6]])     # 2-dimensional array (rank 2)

# the print statements use f-strings to format the print output. 
print(f'a:{a}\n')                                   # \n creates a new line 
print(f'a:\t{a}\n')                                 # \n adds a tab, a specific character for indentation
print(f'b:\n{b}\n')
print(f'shape of a: {a.shape}')                     # the shape (# rows, # columns)
print(f'shape of b: {b.shape}')                     # the shape (# rows, # columns)
print(f'size of a: {a.size}')                       # number of elements in the array b
print(f'size of b: {b.size}')                       # number of elements in the array b

Often it is useful to create an array with constant values; the following functions can be used to achieve this:

In [None]:
print(np.zeros((2, 3)), '\n')           # Create a 2x3 array with all elements set to 0
print(np.ones((1,2)), '\n')             # Create a 1x2 array with all elements set to 1
print(np.full((2,2),7), '\n')           # Create a 2x2 array with all elements set to 7
print(np.eye(2), '\n')                  # Create a 2x2 identity matrix

Other common ways to create a vector include using evenly spaced values in an interval or by specifying the data type

In [None]:
a = np.arange(10)              # Evenly spaced values in an interval, with default stepsize 1
b = np.linspace(0,9,10)        # An array with 10 values between 0 and 9  
                               # (check the difference with np.arange in the next section)

c = np.ones((3, 2), bool)      # 3x2 boolean array

print(f'a:\n{a}\n')
print(f'b:\n{b}\n')
print(f'c:\n{c}')

 ---
 ## Array Data types

 What exactly is the difference between `np.arange(10)` and `np.linspace(0,9,10)`? 
 
 - ``np.arange(10)`` results in ``array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])`` with dtype **int64**,
 - while ``np.linspace(0,9,10)`` results in ``array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])`` with dtype **float64**.

 Both ``np.linspace`` and ``np.arange`` take dtype (data type) as an argument and can be adjusted to match each other in that way:

In [None]:
print('As int64:')
print(np.arange(10))
print(np.linspace(0,9,10, dtype=np.int64))
print('\n')

print('As float64:')
print(np.arange(10, dtype=np.float64))
print(np.linspace(0,9,10))

---
In many occasions (especially when something goes different than expected) it is useful to check and control, or change, the datatype of the array:

In [None]:
d = np.ones((3, 2), bool)

print(f'd:\n{d}\n')
print(f'datatype of d:\n{d.dtype}\n')

e = d.astype(int)
      
print(f'e:\n{e}\n')
print(f'datatype of d:\n{e.dtype}\n')

When converting floats to integers using `.astype()`, all floats in a numpy array are rounded to the largest integer lower than or equal to the float representation:

In [None]:
nums = np.linspace(0,2,11)
print(f'nums:\n{nums}\n')

numsint = nums.astype(np.int64)
print(f'nums as integer:\n{numsint}\n')

Did you notice anything in the previous two cells?

Right! We called the `astype` function not from the `np` module, but from the `ndarray` objects themselves. These are indeed *methods*, rather than *functions*. The main differences are highlighted in the table below.


|Method      | Function|
| :----------- | :-----------|
| is associated with the objects of the class they belong to  | is not associated with any object|
| is called 'on' an object and we cannot invoke it just by its name  | we can invoke a function just by its name.|

Nearly all the method versions do the same thing as the function versions. Choosing the method or the function will usually depend on which one is easier to type or read. Some examples will be provided later in this notebook.

---
### <font color='red'>Exercise</font>

Create an array with elements ranging from 10 up to 15 (inclusive), with data type=unsigned 8 bit integer. 
Use the following functions:
- Creating a python list and converting it to an array using `np.array()`
- using `np.linspace()`
- using `np.arange()`

In [None]:
print('Your code here')
print('Your code here')
print('Your code here')

## Types of operations

There are different types of standard operations in NumPy:

**ufuncs**, or universal functions operats on ndarrays in an element-by-element fashion. They can be *unary*, operating on a single input, or *binary*, operating on two inputs.

They are used to implement vectorization in NumPy which is way faster than iterating over elements. They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.

ufuncs also take additional arguments, like:

`where` boolean array or condition defining where the operations should take place.

`dtype` defining the return type of elements.

`out` output array where the return value should be copied.

A thorough explanation and list of ufunc is available at [W3Schools](https://www.w3schools.com/python/numpy/numpy_ufunc.asp)

There are ufunc equivalents for Python's native arithmetic operators, e.g., the standard addition, subtraction, multiplication, division, negation, exponentiation, and so on. The ufunc however allows for more control, for instance we can use the `out` argument to specify the array where the result of the calculation will be stored (rather than creating a temporary array). This turns out to be particularly useful for large computations.

Example: in-place addition. Create an array, add it to itself using a ufunc.

In [None]:
x = np.array([1, 2, 3])

print(f'x before addition: {x}')
print(f'id before addition: {id(x)}')    # get the memory-ID of x
np.add(x, x, x)                          # Third argument is output array
np.add(x, x, x)
print(f'x after addition: {x}')
print(f'id after addition: {id(x)}')     # get the memory-ID of x
                                         # - notice  it is the same!

Example: broadcasting.  Can you add a 1-dimensional array of shape `(3)`
  to an 2-dimensional array of shape `(3, 2)`?   With broadcasting you
  can, and most of the times it happens 'under the hood'.

In [None]:
a = np.array([[1, 2, 3],
             [4, 5, 6]])
print(f'a:\n{a}\n')                         # Print a 

b = np.array([10, 10, 10])
print(f'b:\n{b}\n')                         # Print b

print(f'np.add(a, b):\n{np.add(a, b)}\n')   # add arrays a and b

Broadcasting is smart and consistent about what it does. The basics of broadcasting are [documented here](https://numpy.org/doc/stable/user/basics.broadcasting.html). The basic idea is that it expands dimensions of the smaller array so that they are compatible in shape.

### Array methods
Array methods also implement useful operations, sometimes similar to the ufuncs.

Remember that array methods are called on the `ndarray` object. You can find the full list of methods [here](https://numpy.org/doc/stable/reference/arrays.ndarray.html) along with all other important informations on `ndarray`.  

In [None]:
x = np.arange(12)
x.shape = (3, 4)
x                    #  array([[ 0,  1,  2,  3],
                     #         [ 4,  5,  6,  7],
                     #         [ 8,  9, 10, 11]])
x.max()              #  11