<center><img src=img/MScAI_brand.png width=70%></center>

# Numpy

Why Numpy?

* Speed
* Abstraction
* A library of pre-written functions

In [1]:
import numpy as np

A cheat sheet: https://www.dataquest.io/blog/numpy-cheat-sheet/

## This is Numpy

Numpy ("Numerical Python", pronounced NUM-pie, not NUM-pee) is a library used in Python for numerical computing. Most scientific computing work in Python relies on Numpy as a base.

But we can already do numerical calculations in Python, so why does Numpy exist? 

1. **Speed**. Numpy makes many numerical calculations much faster.

2. **Abstraction**. It is very handy to be able to think of our equations as (e.g.) $y = \beta x$ as opposed to $y_0 = \beta x_0$, $y_1 = \beta x_1$, etc., even though they mean the same thing.

3. **A library**. Numpy provides many common functions. "Batteries included."

In this notebook, we'll start by seeing a nice 3rd-party tutorial which emphasises abstraction, and then fill in some extra details. In later notebooks, we'll see examples, input/output, plotting, and more.

A nice introduction to Numpy basics which focusses on the benefit of *abstraction*:
https://jalammar.github.io/visual-numpy/. We will look at an example calculation of mean-square-error.


### Numpy array

"NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers" - https://numpy.org/devdocs/user/quickstart.html

In [2]:
a = np.array([4, 5, 6.0]) # make an array, passing in a list
print(a)

[4. 5. 6.]


In [3]:
a.shape

(3,)

In [4]:
a.dtype # data type

dtype('float64')

### Vectorisation

A typical use-case for Numpy: we have a for-loop processing a list of numerical values, and we replace it with a single Numpy line. This is called *vectorisation*. The same concept is essential to good performance in Matlab and R.


In [5]:
L = [4, 5, 6]
for i in range(len(L)):
    L[i] = L[i]**2 # L a Python list

a = a**2 # a a Numpy array


From the point of view of *abstraction*, the for-loop is hidden. From the point of view of *speed*, the for-loop is moved from pure Python into an underlying function written in C or Fortran.

* Abstraction/brevity
* Homogeneous operations/no flexibility -> speed

When dealing with large data, Python can be slow. If we have a list of 10 numbers and we calculate the mean, it is instantaneous. But if we have 10 million numbers, it will be slow. The reason for this is Python's *flexibility*. Python allows a list to contain any type of value, eg we can have a mixed list of `int`s, `float`s, `string`s, other `list`s, and so on. Python has to check what type each value is before deciding how to add it (or whether it even *can* add it). 

In a Numpy array, all elements are of the same type, e.g. all `float`. Thus there is no need for Python to waste time checking what type each value is. The saving is probably a factor of 100, depending on the workload.

Vectorisation is also happening when, e.g., we add two arrays: 

In [6]:
b = np.array([1, 2, 3])
print(a + b)

[17. 27. 39.]


We should become comfortable with the difference between *element-wise arithmetic* and *aggregation*. These look similar, but the result is an `array` in one case and a single value in the other.

In [7]:
x = np.array([4, 9, 16])
print(np.sqrt(x))
print(np.max(x))

[2. 3. 4.]
16


### More ways of making arrays

* `np.zeros(shape)` and `np.ones(shape)`
* Random numbers: we can use `np.random.random(shape)` for uniform values in $[0, 1]$. We can also generate from other distributions, e.g. using `np.random.normal(shape)`. 


### `np.sum(a)` versus `a.sum()`

In several cases, one can write either style. 

In [8]:
a = np.array([4, 9, 16])
print(np.sum(a))
print(a.sum())

29
29


But for arithmetic functions, most are not available as methods of the array, e.g.:

In [9]:
print(np.sqrt(a))
try:
    print(a.sqrt())
except:
    print("That doesn't exist!")

[2. 3. 4.]
That doesn't exist!


`argmax` and friends are often overlooked (and eventually re-implemented) by beginners.

In [10]:
x = np.array([4, 5, 6, 1])
# the index where the largest element is
print(x.argmax()) 

2


We have seen how to create Numpy arrays by passing in lists. Of course, another way is to read from a file.
We'll cover file input/output in a later notebook/video. 

### `np.linspace()`

Another handy way to create an array is to create evenly-spaced values. We use `np.linspace`. We have to say where the values start and stop, and how many there should be. `np.linspace` works out the rest:

In [11]:
# start, stop, n_values
grid = np.linspace(0, 10, 11) 
print(grid)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]


### Exercises

**Exercise**: Using `linspace`, make this array: 

`[ 0. 0.25 0.5 0.75 1. 1.25 1.5 1.75 2.]`

**Exercise**: `np.logspace` does something similar. Use it to make this array: 

`[1.e-06 1.e-05 1.e-04 1.e-03 1.e-02 1.e-01 1.e+00]`

**Exercise**: Generate a list of 20 numbers in a Gaussian (normal) distribution, with mean 10 and standard deviation 10. Confirm the statistics are correct using `np.mean` and `np.std`.

**Exercise**: What happens if we try to add two arrays of different lengths?

### Further reading

* Here is a nice textbook reference on Numpy with several longer worked examples: http://www.labri.fr/perso/nrougier/from-python-to-numpy/
* If you're already good at Matlab: https://www.numpy.org/devdocs/user/numpy-for-matlab-users.html
* If you're already good at R: http://mathesaurus.sourceforge.net/r-numpy.html



### Solutions to exercises

In [12]:
import numpy as np
np.linspace(0, 2, 8)

array([0.        , 0.28571429, 0.57142857, 0.85714286, 1.14285714,
       1.42857143, 1.71428571, 2.        ])

In [13]:
np.logspace(-6, 0, 7)

array([1.e-06, 1.e-05, 1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00])

In [14]:
x = np.random.normal(10, 10, 20)
print(x)
print(x.mean(), x.std())

[ 12.13630796   4.65355434  17.79988275  -3.76665774 -10.44058696
  14.65879083  15.96300359  27.71685188   9.54868388   1.61542693
   0.64936588   1.67462488   4.6468321   18.28134721  -4.90962392
   6.7352366    9.40327919   2.81005109   5.0357925    7.90135541]
7.105675919291992 8.784856407519356


In [15]:
np.array([4, 5, 6]) + np.array([1, 2, 3, 4])

ValueError: operands could not be broadcast together with shapes (3,) (4,) 