<img src="images/inmas.png" width=130x align='right' />

# Notebook 16 - More Advanced NumPy
Material covered in this notebook:

- Basic statistical functions
- More array manipulations
- Linear algebra with NumPy

### Prerequisite
Notebook 15

----

### Housekeeping for matplotlib

Let's first import matplotlib and ensure that the plots are embedded nicely in the notebook

In [None]:
%matplotlib inline 

import matplotlib.pyplot as plt
import numpy as np

### Basic Statistical Functions
Numpy provides a number of functions to calculate statistics of datasets in arrays

For example, let's calculate some properties from the Stockholm temperature dataset used before:

In [None]:
data = np.genfromtxt('./data/stockholm_td_adj.csv', delimiter=',', skip_header=1)
np.shape(data)

The average temperature can be computed by taking the mean of column 3:

In [None]:
np.mean(data[:,3])

The daily mean temperature in Stockholm over the last 200 years has been about 5.8 C.

### Standard deviation and variance
Let's now compute the standard deviation, and variance of the temperature by these functions:

In [None]:
print('std deviation is %.3e and the variance %.3e.' % (np.std(data[:,3]), np.var(data[:,3])))

### min and max
The minimum and maximum values can be computed either through array methods or NumPy functions:

In [None]:
print('The lowest and highest daily temperatures are %.1fC and %.1fC.' % (data[:,3].min(), data[:,3].max()))
print('The lowest and highest daily temperatures are %.1fC and %.1fC.' % (np.min(data[:,3]), np.max(data[:,3])))

### Computations on subsets of arrays
- Subsets of the data can be extracted from an array using indexing, fancy indexing, and other methods masking methods
- We now cover additional methods to extract data from an array

For example, let's go back to the temperature dataset:

In [None]:
# On macOS and Linux uncomment the following line
# !head -n 3 data/stockholm_td_adj.csv
# On Windows use the following commands
!type data\stockholm_td_adj.csv

The dataformat is: year, month, day, daily average temperature

If we are interested in the average temperature only in a particular month, say February, then we can create a index mask and use it to select only the data for that month using:

### Other masking techniques
Let's say we want to make sure that months are all between 1 and 12:

In [None]:
months = np.unique(data[:,1]) # the month column takes values from 1 to 12
print('months values are all amongst', months)

If one is interested in the mean temperature in February, a mask for that month can be created:

In [None]:
mask_feb = (data[:,1] == 2)
print('The mean temperature in February is %.2fC.' % np.mean(data[mask_feb,3]))

This can also be done in single line, demonstrating the power of NumPy:

In [None]:
print('The mean temperature in February is %.2fC.' % np.mean(data[(data[:,1] == 2), 3]))

### Calculations along axes of higher-dimensional data
- When functions such as `min`, `max`, `sum`, etc. are applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis

- Using the `axis` argument we can specify how these functions should behave: 


In [None]:
A = np.random.rand(3, 3)
print('A random square array\n', A)
print('The global maximum is', np.max(A))
print('The maximum in each column is', np.max(A, axis=0))
print('The maximum in each row is', np.max(A, axis=1))

Many other functions and methods in the `ndarray` and `matrix` classes accept the same (optional) `axis` keyword argument

### Assignment and copies
- To achieve high performance, assignments in NumPy usually do not copy the underlaying objects
- Also, when objects are passed between functions, a reference is passed to avoid an excessive amount of memory copying
- This behavior is very similar to lists and dictionaries

Let's consider a simple assignment

In [None]:
A = np.array([[1, 2], [3, 4]])
print('A is a 2x2 array:\n', A)
B = A                            # Now B is referring to the same array data as A
print('A is a 2x2 array:\n', A)

Now we make some changes to `B`:

In [None]:
B[0, 0] = 10                     # Changing B affects A
print('B after changing B_00 = 10:\n', B)

Changes have also affected `A` as they are the same object!

In [None]:
print('A after changes made on B\n', A)
print("No surprise since A's and B's ids are identical:", id(A), id(B))

### Copying arrays
- If we want to change this behavior, so that when we get a new completely independent object `B` copied from `A`
- This is done with the function `copy()`:

In [None]:
B = np.copy(A)
print('A is a 2x2 array:\n', A)
print('B is a 2x2 array:\n', B)

Now, if we modify B, A is not affected

In [None]:
B[0, 0] = -5
print('B after changes made on B is:\n', B)
print('A after changes made on B is:\n', A)

In [None]:
print("No surprise since A's and B's ids are different:", id(A), id(B))

### Iterating over array elements
- we want to avoid iterating over the elements of arrays whenever we can
- Iterations are slow compared to vectorized operations 
- In some cases, however, iterations are necessary

For such cases, the Python `for` loop is the most convenient way to iterate over an array:

In [None]:
v = np.array([1, 2, 3, 4, 5, 6])
for element in v[:-1]:
    print(element, end=', ')
print(v[-1])

When converted to a list, a multi-dimensional array becomes a list of arrays:

In [None]:
M = np.array([[1,2], [3,4]])
listofM = list(M)
print(listofM)

If we iterate over a multi-dimensional array as here:

In [None]:
M = np.array([[1,2], [3,4]])
for row in M:
    print("row", row)
    
    for element in row:
        print(element)

We needed to iterate again as each iterator returns an object of N - 1 dimensions

### Using `enumerate()`
- When iterating over each element of an array, it is convenient to use the `enumerate` function to obtain both the element and its index in the `for` loop:

In [None]:
for row_idx, row in enumerate(M):
    print("row_idx", row_idx, "row", row)
    for col_idx, element in enumerate(row):
        print("col_idx", col_idx, "element", element)
        M[row_idx, col_idx] = element ** 2                    # update array M: square each element

Each element in M is now squared:

In [None]:
M

### Using arrays in conditions
When using arrays in conditions,for example `if` statements and other boolean expressions, one needs to use `any` or `all`, which requires that any or all elements in the array evalutes to `True`:

In [None]:
M

In [None]:
if (M > 5).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")

In [None]:
if (M > 5).all():
    print("all elements in M are larger than 5")
else:
    print("all elements in M are not larger than 5")

### Reshaping, resizing, and stacking arrays
- The shape of an NumPy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays

Let's look at some examples:

In [None]:
A

In [None]:
n, m = A.shape

In [None]:
B = A.reshape((1, n*m))
B

In [None]:
B[0,0:5] = 5 # modify the array

B

In [None]:
A # and the original variable is also changed. B is only a different view of the same data

### Flattening arrays
We can also use the function `flatten()` to make a higher-dimensional array into a vector. But this function creates a copy of the data.

In [None]:
B = A.flatten()
B

In [None]:
B[0:5] = 10
B

In [None]:
A # now A has not changed, because B's data is a copy of A's, not referring to the same data

### Stacking and repeating arrays
- Using functions `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones

Let's look at `tile()` and `repeat()`:

In [None]:
a = np.array([[1, 2], [3, 4]])

In [None]:
# repeat each element 3 times
np.repeat(a, 3)

In [None]:
# tile the matrix 3 times 
np.tile(a, 3)

### Concatenating arrays
Arrays can be concatenated togother using a function with the same name:

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

In [None]:
np.concatenate((a, b), axis=0)

In [None]:
np.concatenate((a, b.T), axis=1)

### Arrays can be stacked horizontally or vertically
Using `hstack()` and `vstack()`, arrays can be stacked as follows:

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

In [None]:
np.vstack((a, b))

In [None]:
np.hstack((a, b.T))

### Stacking along a new axis

In [None]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

stacked_vertical = np.vstack((arr1, arr2))
print('vertical -> new shape:', stacked_vertical.shape, '\n', stacked_vertical)

stacked_horizontal = np.hstack((arr1, arr2))
print('horizontal -> new shape:', stacked_horizontal.shape, '\n', stacked_horizontal)

stacked_new_axis = np.stack((arr1, arr2), axis=1)
print('new axis -> new shape:', stacked_new_axis.shape, '\n', stacked_new_axis)

### Key Points
- As lists and dictionaries, arrays assignment does not copy the array
- Avoid using for loops on arrays
    - when necessary, use enumerate
- Slicing and masking are powerful techniques for avoiding loops
- NumPy arrays support basic statistical operations
- The `axis` keyword can select specific axes
- Arrays can be concatenated, tiled, repeated, and stacked


### Further Readings
- NumPy reference [manual](https://numpy.org/doc/stable/reference/)

### What's Next?
- Complete the exercises in this associated exercise notebook [X-16-NumPy3.ipynb](X-16-NumPy3.ipynb)
- Next notebook is [N-17-Matplotlib.ipynb](N-17-Matplotlib.ipynb)