# Computer Class 0b - The NumPy Library

The examples and exercises of this computer class introduce the student to working with the NumPy library. It can be used in conjunction with chapter 4 of the McKinney book.

*Authors: Cees Diks and Bram Wouters, Faculty Economics and Business, University of Amsterdam (UvA)* <br>
*Copyright (C): UvA (2023)* <br>
*Credits: some of the examples and formulations are taken from McKinney and/or the material of the Computational Finance course by Simon Broda (UvA)*

## Modules

Apart from the built-in objects, functions, etc. of the Python Standard Library, packages and modules for additional functionality need to be imported. Most of the packages/modules relevant for this course come preinstalled with Anaconda.


**Example:** importing modules can be done with `import`. There are some conventions for the shorthands of some packages (e.g., `np` for `numpy`). Following them improves code readability. For the same reason, it is good practice to put your `import` statements at the beginning of your document (which we didn't do in this notebook).

In [1]:
import math  # Importing the math module of the Python Standard Library.
import numpy as np # Importing the NumPy module and giving it the conventional shorthand name np.

print(math.factorial(7)) # Calling functions from the math modules requires the math.-prefix.

# Note that the following functions are not the same functions, because they are defined in different modules.
print(2**(1/2)) # Using Python's basic arithmetic functionality
print(math.sqrt(2)) # Using the square root function of math
print(np.sqrt(2)) # Using the square root function of NumPy

5040
1.4142135623730951
1.4142135623730951
1.4142135623730951


**Exercise 1:** you can use *tab completion* to discover which functions are defined by the math module: type `math.` and press the Tab key. Alternatively, use dir(math). Try both options!

**Example:**  note that importing the package/module does not bring the functions into the *global namespace*: they need to be called as `module.function()`. It is possible to bring a function into the global namespace (see example below). Heavy use of this option is discouraged, because it can lead to confusion and/or conflicting functions names.

In [2]:
from math import factorial # Importing the math-function factorial into the global namespace

factorial(7)

5040

## Numpy's ndarray

Arguably the most important object of the NumPy package is its N-dimensional array object, or ndarray. Using them is efficient in two ways:
* they are easy/intuitive to use and very flexible, meaning that there is an implementation for most operations you can think of.
* they are computationally efficient (i.e. much faster than Python's standard objects, like lists and tuples).

**Example:** the NumPy function `array` converts the input data (e.g. list, tuple, array, or other sequence type) to an ndarray.

In [3]:
arr1 = np.array([1.0, 5.1, -8.9, 0.2])
arr2 = np.array([[1,2,3,4],[5,6,7,8]])

arr1

array([ 1. ,  5.1, -8.9,  0.2])

In [4]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

**Example:** the objects `arr1` and `arr2` are of ndarray type.

In [5]:
print(isinstance(arr1, np.ndarray))

type(arr2)

True


numpy.ndarray

**Example:** all ndarrays have attributes dimension (`ndim`), shape (`shape`) and type of data that it contains (`dtype`). 

In [6]:
print(arr1.ndim)
print(arr1.shape)
print(arr1.dtype)

print(arr2.ndim)
print(arr2.shape)
print(arr2.dtype)

1
(4,)
float64
2
(2, 4)
int64


**Example:** an ndarray is a container for homogeneous data, meaning that all data must be of the same type. It is important to keep this in mind, because Python will automatically transform data when uploaded into an ndarray. Using the `dtype` keyword in the `array` function, you can overrule this.

In [7]:
np.array([1, 2, 3, 4.0])

array([1., 2., 3., 4.])

In [8]:
np.array([1, 2, 3, 4.0], dtype=int)

array([1, 2, 3, 4])

In [9]:
np.array([1, 2, 3, '4'])

array(['1', '2', '3', '4'], dtype='<U21')

In [10]:
np.array([1, 2, 3, '4'], dtype=np.float64) # float64 is a NumPy object of double-precision floating-point format

array([1., 2., 3., 4.])

**Example:** typical other ways of creating an ndarray.

In [11]:
print(np.zeros((2,3))) # 2x3 ndarray of zeros (floats).
print(np.ones((1,8))) # 1x8 ndarry of ones (floats).

print(np.identity(5)) # 5x5 identity matrix (floats).

print(np.arange(1, 15, 2)) # Using the arange function with (start, stop[, step]).

arr3 = np.random.randn(2,4) # Using NumPy's random number generator to create a 2x4 ndarray.

print(arr3)

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1. 1. 1. 1. 1.]]
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
[ 1  3  5  7  9 11 13]
[[ 0.95048318 -1.32839753 -0.31033462 -0.65608246]
 [-0.78131663  0.0540972   0.4330958   1.60287434]]


**Exercise 2:** in contrast to Python's list objects, arithmetic operations on an ndarray are executed element-by-element. Use this to create and print the following ndarrays:
* element-by-element sum of `arr2` and `arr3`.
* element-by-element product of `arr2` and `arr3`.
* elements of `arr3` multiplied by a factor of 10.
* elements of `arr2` to the power 4.

* one divided by the elements of `arr2`.

## Indexing and slicing

The basics of selecting elements or slices of an ndarray are the same as for other sequence types. Most of the time (the exception is so-called "fancy slicing"), the extra dimensions are separated by a comma between the brackets `[]`. For the exercises in this subsection, you may want to use pp. 94-105 of McKinney.

In [12]:
arr1D = np.arange(1,11)
arr2D = np.arange(1,25).reshape(4,6)
arr3D = np.arange(1,13).reshape(2,2,3)

print('arr1D:\n {0}'.format(arr1D))
print('arr2D:\n {0}'.format(arr2D))
print('arr3D:\n {0}'.format(arr3D))

arr1D:
 [ 1  2  3  4  5  6  7  8  9 10]
arr2D:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]
arr3D:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


**Exercise 3:** use the arrays `arr1D`, `arr2D` and `arr3D` defined above to print the following objects.
* for `arr1D`: the fifth element.
* for `arr1D`: the array containing (only) the fifth element.
* for `arr1D`: the array containing the third until the nineth element.
* for `arr1D`: the array containing the first, second and seventh element. (see p.103 of McKinney)


* for `arr2D`: the element in the second row and third column.
* for `arr2D`: the array that is the fourth row.
* for `arr2D`: the array containing the second and the third row.
* for `arr2D`: the array containing the first and third row.
* for `arr2D`: the array containing the first and second row, but only the third until the fifth column thereof.
* for `arr2D`: the array containing the fifth and sixth column.
* for `arr2D`: the array containing the fifth and sixth column, which are represented as rows. (Hint: use `tranpose()`, or simply `T`)
* for `arr2D`: the 1-dimensional array containing the elements whose values are 1, 22, 12 and 3. (i.e., the output should be `[1,22,12,3]`) (see p.103 of McKinney, use fancy indexing)
* for `arr2D`: the array containing the first, second and fourth row, but only the fourth and second column thereof (in that order). (see p.103 of McKinney)


* for `arr3D`: the first 2-dimensional array of `arr3D`.
* for `arr3D`: the 2-dimensional array that looks like `[[5,6],[11,12]]`.
* for `arr3D`: the 1-dimensional array containing the elements whose values are 1, 5 and 9 (in that order). (see p.103 of McKinney, use fancy indexing)

**Example:** assigning a single value to a slice of an ndarray means propagation (also called: "broadcasted") to the entire selection.

In [13]:
arr = np.arange(10)

arr[3:6] = [10, 10, 10]
print(arr)

arr[3:6] = 12 # Broadcasting to entire selection
print(arr)

[ 0  1  2 10 10 10  6  7  8  9]
[ 0  1  2 12 12 12  6  7  8  9]


**Example:** there is a subtle difference between slicing an ndarray and slicing other sequence types (like a list). A slice of an array is a "view" on the original slice. This means that the data is not copied, and any modifications to the view will be reflected in the original array (and vice versa). This is in contrast with a list object for example, in which case a slice is a copy of the original data and hence a new object in the memory of your machine. (see McKinney pp. 94-95 for more information)

In the example below we take slices of a list and an ndarray, change the originals and inspect what happens to the slices. For completeness, we also included the corresponding elements (as opposed to slices). Elements are always copies, regardless of the sequence type.

N.B. If you want a slice to be a copy of the data instead of a view, you can use the ndarray method `copy`.

In [14]:
list0 = [0,1,2,3]
array0 = np.array(list0)

list_element = list0[2]
array_element = array0[2]
list_slice = list0[2:3]
array_slice = array0[2:3]

list0[2] += 10
array0[2] += 10

print('list: {0}'.format(list0))
print('array: {0}'.format(array0))
print('list_element: {0}'.format(list_element))
print('array_element: {0}'.format(array_element))
print('list_slice: {0}'.format(list_slice))
print('array_slice: {0}'.format(array_slice))

list: [0, 1, 12, 3]
array: [ 0  1 12  3]
list_element: 2
array_element: 2
list_slice: [2]
array_slice: [12]


## Boolean indexing

**Example:** suppose 4 people participate three times in a series of 5 experiments. The names of the participants (in order of participation) are in `names`, their test results are in `scores`. Putting an ndarray in a logical statement produces an array with boolean values. Subsequently, this can be used to manipulate (e.g. filter) other ndarray objects.

In [15]:
names = np.array(['Bob', 'Jane', 'Will','Bob','Mary','Mary','Mary','Jane','Jane','Will','Will','Bob'])

scores = np.random.randn(12,5)

print(scores, '\n')

print(names == 'Bob') # An array in a logical statement produces an array with boolean values.

scores[names == 'Bob'] # Selecting the test results of Bob.

[[-0.37394261  0.95842531  1.91995986  0.63723933 -0.89034841]
 [-2.14304098 -0.01535365 -1.63128078  0.53406586 -0.58365284]
 [-1.56363108  0.64181971  0.81384607  0.5943425  -0.91074869]
 [ 1.18539447  0.57578773  0.79923561  0.3232153  -1.11880407]
 [ 1.11524528 -0.05305181 -0.38754277 -1.20608234  0.76470202]
 [ 0.54019374 -0.62588508  0.86108819  1.21899344  0.43297839]
 [ 0.53455918 -0.32528575  0.98036348 -0.11744094 -0.9780744 ]
 [-0.63393008  0.45723257 -1.3898691  -0.72202899 -1.92453906]
 [-0.05258662 -0.02008933  0.3653588   0.9310195   0.46555773]
 [ 0.06666344 -0.45005783  0.48966905  1.42031658 -1.28594306]
 [ 1.15231323 -0.92216861  0.19006232  1.02623619  1.00688032]
 [-0.35147073 -0.81062346  0.16361772  0.97626893  0.31604747]] 

[ True False False  True False False False False False False False  True]


array([[-0.37394261,  0.95842531,  1.91995986,  0.63723933, -0.89034841],
       [ 1.18539447,  0.57578773,  0.79923561,  0.3232153 , -1.11880407],
       [-0.35147073, -0.81062346,  0.16361772,  0.97626893,  0.31604747]])

**Example**: selecting the test results of Bob of only the last two experiments of each series. Note that the slicing of the columns is performed in the same way as before.

In [16]:
scores[names == 'Bob', 3:]

array([[ 0.63723933, -0.89034841],
       [ 0.3232153 , -1.11880407],
       [ 0.97626893,  0.31604747]])

**Exercise #**: create an array with the test results of Jane and Mary. (Hint: use `|` instead of `or` to combine NumPy arrays with an OR-statement.)

**Exercise #**: it turns out that something went wrong during Will's experiments. Replace his values by 0.0 (a float) and print the resulting `scores` ndarray. (Hint: use broadcasting)

**Exercise #**: instead of putting Will's scores to 0.0, it makes more sense to remove Will's data completely. Use boolean indexing to remove Will's name from `names` and his test results from `scores`. Make sure the variables `scores` and `names` refer to the newly created Python objects (without Will's data in it). 

**Example:** NumPy's `where` function is the ndarray-version of the expression 'x if condition else y' (see pp.109-111 of McKinney). For NumPy arrays, this becomes `np.where(condition, xarr, yarr)`. It is often very useful. For example, replacing all negative test results by their positive counterparts:

In [17]:
np.where(scores >= 0, scores, -scores)

array([[0.37394261, 0.95842531, 1.91995986, 0.63723933, 0.89034841],
       [2.14304098, 0.01535365, 1.63128078, 0.53406586, 0.58365284],
       [1.56363108, 0.64181971, 0.81384607, 0.5943425 , 0.91074869],
       [1.18539447, 0.57578773, 0.79923561, 0.3232153 , 1.11880407],
       [1.11524528, 0.05305181, 0.38754277, 1.20608234, 0.76470202],
       [0.54019374, 0.62588508, 0.86108819, 1.21899344, 0.43297839],
       [0.53455918, 0.32528575, 0.98036348, 0.11744094, 0.9780744 ],
       [0.63393008, 0.45723257, 1.3898691 , 0.72202899, 1.92453906],
       [0.05258662, 0.02008933, 0.3653588 , 0.9310195 , 0.46555773],
       [0.06666344, 0.45005783, 0.48966905, 1.42031658, 1.28594306],
       [1.15231323, 0.92216861, 0.19006232, 1.02623619, 1.00688032],
       [0.35147073, 0.81062346, 0.16361772, 0.97626893, 0.31604747]])

**Exercise #**: in the cell below two 2-dimensional arrays are defined. Create a new ndarray of the same dimensions. For each element in the new object, choose between the corresponding elements of `arr1` and `arr2` and select the one with the largest absolute value.

In [18]:
arr1 = np.random.randn(3,5)
arr2 = np.random.randn(3,5)



**Exercise #**: in the cell below a third 2-dimensional array is defined. As in the previous exercise, create a new ndarray by choosing between elements of `arr1` and `arr2`. But now `arr3` decides, element-by-element, whether you should pick the element with the largest or smallest absolute value. If the element in `arr3` is positive, select the element with the largest absolute value. If the element in `arr3` is negative, pick the element with the smallest absolute value.

In [19]:
arr3 = np.random.randn(3,5)


## Universal functions

A NumPy universal function (or "ufunc") performs element-by-element operations on data in ndarray's.

**Example:** calculating the exponential of the elements of `arr`. Note that one of the elements is NumPy's NaN (Not a Number). This is a float, representing the absence of a value. A NumPy ufunc does not give an error if it encounters NaN. Instead, it simply propagates the value as missing

In [20]:
arr = np.array([1.0, -.5, 2.0, 5.9,-2.0, 0.4, -3.1,4.7])
arr[np.random.randint(1,8)] = np.nan

print('arr: {}'.format(arr))

print('exponentials: {0}'.format(np.exp(arr)))

arr: [ 1.  -0.5  nan  5.9 -2.   0.4 -3.1  4.7]
exponentials: [2.71828183e+00 6.06530660e-01            nan 3.65037468e+02
 1.35335283e-01 1.49182470e+00 4.50492024e-02 1.09947172e+02]


**Exercise 4:** replace the NaN entry in `arr` by 0.0 and print the resulting array. (Hint: use the universal function `isnan`)

**Example:** creating a boolean ndarray that shows for which experiments Jane scored higher than Mary. `greater` is an example of a binary universal function, because it performs an operation on two input arrays element-by-element.

In [21]:
np.greater(scores[names == 'Jane'], scores[names == 'Mary'])

array([[False,  True, False,  True, False],
       [False,  True, False, False, False],
       [False,  True, False,  True,  True]])

## Reductions

Aggregations (often called reductions) are mathematical functions that compute statistics about an entire array, or about data along an axis of an array.

**Example:** two equivalent syntactical expressions to call a reduction. Let's go back to the three participants in `names` (Bob, Jane and Mary) and their test results in `scores`. Here, we compute the mean score for all experiments.

In [22]:
print(np.mean(scores))

print(scores.mean())

0.049520437518704566
0.049520437518704566


**Example:** with the keyword `axis` we specify the axis over which the aggregation should take place. Here, we compute the total of all scores per experiment.

In [23]:
np.sum(scores, axis=0)

array([-0.52423275, -0.58925019,  3.17450843,  5.61614537, -4.7059446 ])

**Example:** the total of all scores per series of 5 experiments.

In [24]:
np.sum(scores, axis=1)

array([ 2.25133349, -3.83926239, -0.42437149,  1.76482905,  0.23327037,
        2.42736867,  0.09412157, -4.21313465,  1.68926007,  0.24064817,
        2.45332345,  0.29383993])

**Exercise 5:** for each series of 5 experiments, one can calculate the standard deviation of the test results using the NumPy function `std`. Use this to create a dictionary, with as keys the names of the participants (Bob, Jane and Mary in `names`) and as values the average of the 3 standard deviations of the 3 series of experiments of each participant. (Hint: using a dict comprehension, you only need one line of code.)

**Exercise 6:** reductions interprete boolean values in arrays as 1 (`True`) and 0 (`False`). Use this to count for how many of the 5 experiments Jane had the highest maximum score.

## Linear algebra

NumPy contains standard operations for linear algebra. Most of them are located in the submodule `linalg`. Note that one cannot use `*` for multiplication between matrices and/or vectors, since NumPy has reserved this symbol for element-by-element multiplication. Instead, one can use the NumPy-function `dot`.

**Example**: using the NumPy-function `dot` to perform matrix-vector multiplications between a matrix $A$ and a vector $b$.

In [25]:
A = np.random.randint(1,10, size=9).reshape(3,3)
b = np.arange(1,4)

print('matrix A:\n {0}'.format(A) + '\n')
print('vector b:\n {0}'.format(b) + '\n')

print(A.dot(b)) # Computing A*b. Equivalently, one can write "np.dot(A,b)".

print(np.dot(b, A)) # Computing b^T*A. Equivalently, one can write "b.dot(A)".

matrix A:
 [[7 2 3]
 [6 5 7]
 [7 1 8]]

vector b:
 [1 2 3]

[20 37 33]
[40 15 41]


**Example**: instead of using `dot`, one can use the abbreviated ``@``-notation:

In [26]:
print(A @ b)
print(b @ A)

[20 37 33]
[40 15 41]


**Example:** one can use the NumPy function `outer` to compute the outer product of a column vector with a row vector, resulting in a 2-dimensional ndarray.

In [27]:
c = np.arange(5,7)

print(np.outer(b, c))
print('')
print(np.outer(c, b)) # Changing the order is equivalent to transposing the resulting matrix.

[[ 5  6]
 [10 12]
 [15 18]]

[[ 5 10 15]
 [ 6 12 18]]


**Exercise 7:** the `numpy.linalg` module has a standard set of matrix decompositions and functions calculating things like inverse, trace and determinant. Solve the set of linear equations $A x = b$ for the vector $x$ in two ways:
* use the `linalg`-function `inv()` to calculate the inverse and perform matrix multiplication.
* use the `linalg`-function `solve().`

Verify the solutions are the same (a neat way of verifying this would be to use NumPy's binary boolean reduction `allclose()`).

**Exercise 8:** write a function called `adjoint` that computes the adjoint of a square matrix and returns it as an ndarray. The (i,j)-th element of the adjoint of a matrix $B$ can be computed with $$(\text{adj}(B))_{ij} = (-1)^{i+j} M_{ji},$$ where $M_{ji}$ is the determinant of the matrix that you get when you remove row $j$ and column $i$ from $B$ (called the $(j,i)$-minor of $B$).

Finally, apply your function to the matrix $A$ that was defined earlier and print the result.

(Hint: this is a hard exercise with many valid solutions. One valid solution uses a nested list comprehension to create the adjoint matrix as a list object, that subsequently can be turned into an ndarray. To keep your code readable, you may want to use a nested function (see below for an explanation) to compute the minors.)

**Exercise 9:** run the cell below to test your `adjoint` function for the matrix $A$, making use of the general identity 
$$A^{-1} = \frac{1}{\text{det}(A)}\text{adj}(A).$$ The cell output should be `True`.

In [28]:
try:
    print(np.allclose(np.linalg.inv(A), adjoint(A)/np.linalg.det(A)))
except:
    pass