# 1. Some information on modern CPUs 
## Single instruction, single data
This is the classic mode of operation

![pic](Non-SIMD_cpu_diagram1.svg)

*Figures by Decora at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30547549*

## Single instruction, multiple data (SIMD)
This mode of operation is supported through CPU "extensions", such as MMX, SSE, AVX, AVX2. These technologies implement both the vector registers (see cartoon below) as well as the instruction set that operates on those registers (addition, multiplication, ...). 

Programming with these technologies in mind is called 'vectorization', and the resulting code is referred to as **vectorized code**. 

GPU programming also relies heavily on exploiting vector registers. 


![pic](SIMD_cpu_diagram1.svg)


![pic](intel_pentiun_ii_mmx_logo.png)

# 2. Vectorization in Python

Python is an interpreted language and typically does not make use of vector registers such as SSE and AVX. Only by the use of **libraries** this can be achieved. 



## NumPy

NumPy is the fundamental package needed for scientific computing with Python. It offers the [n-dimensional array object](https://docs.scipy.org/doc/numpy/reference/arrays.html#arrays) or `ndarray`. 

The `ndarray` similar functionality to Matlab arrays. 

Since NumPy is a pre-compiled library it has **TWO** advantages over native Python:
- it is pre-compiled using C, therefore faster than native Python
- it can make use of SIMD vector extensions of the CPU

In [31]:
import numpy as np

## Speedup example

In [104]:
N = 100000

In [105]:
%%timeit
# Interpreted version
for i in range(N):
    x = i**2

30.1 ms ± 596 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [106]:
%%timeit
# NumPy version
ivec = np.arange(N)
x = ivec**2

124 µs ± 3.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [108]:
30 / .125 # approximate speedup

240.0

## Numpy cheatsheet

### Array creation
------

command|meaning
---|---
`np.asarray()`|from list
`np.diag()`|create matrix diagonal
`np.zeros()`, `np.ones()`|all 0s or 1s
`np.eye()`|identity matrix
`np.linspace(begin, end, N)`|samples domain with `N` points
`np.arange(begin, end, step)`|creates monotonically increasing or decreasing array 
`np.random.rand()`| random values
`np.mgrid[0:5,0:5]`|mesh grid
`np.genfromtxt()`|read array from text file
`np.save(), np.load()`|store and load Numpy arrays in binary format

### Other
-------

command|meaning
---|---
`np.pi`, `np.e`|$\pi$, $e$
`np.sqrt()`|square root (element wise)
`np.sin()`, `np.cos()`,`np.tan()`|trigoniometric functions (element wise)
`np.power(x,y)`| $x^y$
`np.log(x)`|$\ln(x)$
`np.sum()`|sum all elements in array 
`np.sum(axis=)`|sum all elements in array over specified dimension(s)
`np.min()`, `np.max()`, `np.mean()`|report min/max/mean over all elements
`np.dot(x,y)`|dot product
`np.transpose(x)`, `x.T`|$x^T$
`np.where(cond,x,y)`|conditional selection (see below)

More commands are found in the [Numpy Reference](https://docs.scipy.org/doc/numpy/reference/index.html).


## Linear algebra with NumPy

In [73]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

In [51]:
A.T # transpose

array([[ 0, 10, 20, 30, 40],
       [ 1, 11, 21, 31, 41],
       [ 2, 12, 22, 32, 42],
       [ 3, 13, 23, 33, 43],
       [ 4, 14, 24, 34, 44]])

In [52]:
np.diag(A) # Diagonal

array([ 0, 11, 22, 33, 44])

In [58]:
A * A # multiplication, element-wise

array([[   0,    1,    4,    9,   16],
       [ 100,  121,  144,  169,  196],
       [ 400,  441,  484,  529,  576],
       [ 900,  961, 1024, 1089, 1156],
       [1600, 1681, 1764, 1849, 1936]])

In [61]:
np.dot(A,A) # dot product

array([[ 300,  310,  320,  330,  340],
       [1300, 1360, 1420, 1480, 1540],
       [2300, 2410, 2520, 2630, 2740],
       [3300, 3460, 3620, 3780, 3940],
       [4300, 4510, 4720, 4930, 5140]])

In [62]:
A.dot(A) # dot product

array([[ 300,  310,  320,  330,  340],
       [1300, 1360, 1420, 1480, 1540],
       [2300, 2410, 2520, 2630, 2740],
       [3300, 3460, 3620, 3780, 3940],
       [4300, 4510, 4720, 4930, 5140]])

In [78]:
np.linalg.det(A) # determinant

0.0

In [79]:
np.linalg.eigvals(A) # eigenvalues

array([ 1.14371710e+02+0.0000000e+00j, -4.37171044e+00+0.0000000e+00j,
       -5.37702895e-15+0.0000000e+00j,  3.16508106e-16+1.1399284e-15j,
        3.16508106e-16-1.1399284e-15j])

more linear algebra: https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

## Broadcasting
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python ([source](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)).

In [112]:
A + 1 # adding a scalar, works

array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15],
       [21, 22, 23, 24, 25],
       [31, 32, 33, 34, 35],
       [41, 42, 43, 44, 45]])

In [121]:
rowmean = A.mean(axis=1)
rowmean

array([ 2., 12., 22., 32., 42.])

In [128]:
A - rowmean # subtracting a vector WRONG

array([[ -2., -11., -20., -29., -38.],
       [  8.,  -1., -10., -19., -28.],
       [ 18.,   9.,   0.,  -9., -18.],
       [ 28.,  19.,  10.,   1.,  -8.],
       [ 38.,  29.,  20.,  11.,   2.]])

In [130]:
A - rowmean[:,np.newaxis] # subtracting a vector GOOD

array([[-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.]])

## Limitations of ndarray

Unlike the Python list, an ndarray can have only one datatype. Automatic conversions will take place when going from a list to an ndarray. 

In [42]:
x = (4, 5.0)
y = np.array(x)
x, y

((4, 5.0), array([4., 5.]))

In [41]:
type(x[0]) ,type(y[0])

(int, numpy.float64)

## Exercise: mask arrays

In the context of Numpy, a **mask** is defined as a boolean matrix that has the same size as your data array. Masks are useful for selecting elements in an array or matrix. 

**Example.** In climate science, output is usually generated on a latitude-longitude grid with dimensions $N \times M$. Some points of the grid will part of the land, and some part of the ocean. We can use a land-mask (`mask_land`) to distinguish between the two. Given a field named `precip` that represents precipitation we'd have

expression|selection
---|---
`precip[:,:]`| all points
`precip[mask_land]`| only land points
`precip[~mask_land]`| only ocean points

A mask can be created by one of the comparison operators `<`, `>`, `>=`, `<=` , `==`, `!=`. For instance, to select all points that have precipitation $> 100$ mm, we could say `mask_100mm = precip > 100`. 

Arrays may be indexed by masks. 

**Do it Yourself**
1. Create a 5 × 5 matrix `M` filled with random elements, drawn from a uniform distribution over the interval $[−1, 1)$. Use `np.random.rand()`, which draws from the interval $[0, 1)$.
2. Create a mask `mymask`, that is equal `True` where $−0.2 < M_{i,j} \leq 0.2$, and `False` elsewhere.
3. Count the number of $M_{i,j}$ where $−0.2 < M_{i,j} \leq 0.2$ using `np.sum()`. Check by hand.
4. Create a new matrix `P = np.zeros(M.shape)`. Now set $P_{i,j} = M_{i,j}$ where $M_{i,j} > 0$ using a mask.
5. Create a function `blankSubzeroValues()`, that takes a matrix as an input. The function then sets any values that < 0 to zero and returns the result. Test by comparing `Q = blankSubzeroValues(M)` to `P`.
6. Extend this function as follows. Use `print()` to report the message:
     `X values have been blanked`
where `X` is of course the actual number of entries that have been set to zero. Only report this message if 2 or more values have been blanked.

## Solutions
Exercise 3.6
https://nbviewer.jupyter.org/github/lvankampenhout/NM2018-python/blob/master/Tutorial_solutions_part3.ipynb

## Other NumPy features not discussed

* soft vs hard copy
* selection and indexing

These are discussed in my notebook "introduction to Python programming" https://github.com/lvankampenhout/NM2018-python


## Further reading

* NumPy reference https://docs.scipy.org/doc/numpy/reference/index.html
* https://realpython.com/numpy-array-programming/