<center>
    
# R406: Applied Economic Modelling with Python

</center>

<br> <br> 

<center>

## Introduction to NumPy

</center>

<br><br> 

<center>
<b> Andrey Vassilev </b>
</center>



# Outline

1. NumPy basics: arrays and array operations
2. Linear algebra with NumPy
3. A little on random sampling with NumPy

# What is NumPy?

From the [homepage](http://www.numpy.org/) of the NumPy project:

>NumPy is the fundamental package for scientific computing with Python. It contains among other things:
>   - a powerful N-dimensional array object
>   - sophisticated (broadcasting) functions
>   - tools for integrating C/C++ and Fortran code
>   - useful linear algebra, Fourier transform, and random number capabilities

You have briefly seen it before. Traditionally it is imported as follows:

In [None]:
import numpy as np

# NumPy arrays

- The basic NumPy object is the *array*.
- An array in its simplest form is similar to a matrix, i.e. it is a rectangular table of numbers.
- An array can be a higher-dimensional object (think a "cube" of numbers, `n` equally-sized "cubes" etc.).
- An array is an object of class `ndarray`.
- Arrays can hold various objects. We'll focus on the case of numeric values.

An array can be created from different objects:

In [None]:
# one-dimensional array from a list
x1 = np.array([1.0, 3.0, 5.15]) 
x1

In [None]:
print(type(x1))

In [None]:
# two-dimensional array from a list of lists
# (practically a 2 X 3 matrix)
x2 = np.array([[1.0, 3.0, 5.15],[7,6,5]]) 
# types are upcasted as needed
x2

In [None]:
# one-dimensional array from a tuple
x3 = np.array((1.0, 3.0, 5.15))
x3

We shall learn more advanced functionality for dealing with data sources but, as a first encounter, you can also import an array from a text file.

In [None]:
%%writefile A.csv
2.04796174,3.90837432,2.59414031,0.66654074,4.63299543,4.55432788,4.61540282
4.9486033,0.72658201,3.17077112,2.46879128,0.4254717,2.50250232,3.78406652
4.4470623,1.8189737,4.91585375,2.99834827,1.63081687,4.8331579,1.14999237

In [None]:
x = np.loadtxt('A.csv',delimiter=',')
x

# NumPy data types

- Unlike, for example, Python lists, NumPy arrays contain data of the same type. 
- The data types (aka `dtypes`) themselves can be different. (See the [docs](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html) for a complete description.)
- Examples of `dtypes` include:
   - floats: `float`, `float16`, `float64` etc.
   - ints: `int`, `int32`, `int64` etc.
   - Booleans
- These (and others) can be explicitly specified when constructing an array via the `dtype` argument.

In [None]:
x = np.array([0,1,2,3],dtype = 'int')
x

In [None]:
x = np.array([0,1,2,3],dtype = 'float')
x

In [None]:
x = np.array([0,1,2,3],dtype = 'float32')
x

In [None]:
x = np.array([0,1,2,3],dtype = 'bool')
x

# Commonly used arrays

- It is also possible to create special kinds of arrays, e.g. arrays filled with ones, zeros or empty arrays of predefined dimensions.
- This is done using special functions from NumPy.

## Range-type arrays

The `np.arange()` function returns a range in the form of an array object:

In [None]:
np.arange(5)

In [None]:
np.arange(3,15)

In [None]:
np.arange(3,15,2)

## Zero arrays

These contain zeros, as the name suggests. They take as arguments either an integer or a sequence of integers containing the dimensions:

In [None]:
np.zeros(6)

In [None]:
np.zeros([2,3])

In [None]:
np.zeros((2,3,2))

## Arrays of ones

These are constructed through the function `np.ones()`. It works similarly to `np.zeros()`:

In [None]:
np.ones(3)

In [None]:
np.ones((5,5), dtype='int')

## The identity matrix/array

The function `np.eye()` creates a 2-D identity array, i.e. with ones on the main diagonal and zeros everywhere else. Note that the syntax is different from the preceding examples.

In [None]:
np.eye(3)

In [None]:
np.eye(3,5) # It doesn't have to be square

In [None]:
np.eye(3,5,1) # We can specify upper and lower diagonals 
              # using positive or negative integers

## Empty arrays

We can also construct empty arrays for subsequent use. They will be filled with random values (whatever was in the memory segment allocated, often zeros). They are (marginally) faster to create than, for example, an array of zeros or ones of the same dimensions.

In [None]:
np.empty((3,4))

## Filling with a value

An array can be filled with a specific value:

In [None]:
np.full((4,5),3.14)

## Arrays of identical shapes

We can also instruct NumPy to create an array of zeros, ones, etc. that has the same dimensions as a pre-specified array.

In [None]:
a = np.array([[1,2,3,4],
              [5,6,7,8]])

In [None]:
np.ones_like(a)

In [None]:
np.zeros_like(a)

# Array operations

- We have seen different arrays but so far we have done little with them.
- Arrays are objects and therefore have attributes:
    - `ndim`, which contains the number of dimensions
    - `shape`, which contains the dimensions themselves as a tuple
    - `size`, which contains the number of elements of the array
    - `itemsize`, which contains the size (in bytes) of each array element, and 
    - `nbytes`, which lists the total size (in bytes) of the array

In [None]:
a = np.array([[1,2,3,4],
              [5,6,7,8]])

In [None]:
a.ndim

In [None]:
a.shape

In [None]:
a.size

In [None]:
a.itemsize

In [None]:
a.nbytes # size * itemsize

## Changing array dimensions

An array can be reshaped by modifying its `shape` attribute...

In [None]:
print(a)

In [None]:
a.shape = (4,2)
print(a)

In [None]:
a.shape = (2,2,2)
print(a)

...or by using the `reshape()` method.

In [None]:
a.reshape((2,4))

## Transposing an array

We can access a transposed version of an array via the `.T` attribute...

In [None]:
a = np.array([[1,2,3,4],
              [5,6,7,8]])
a.T

...or via the `transpose()` method.

In [None]:
a.transpose()

## Array indexing

An array can be indexed similarly to a list:

In [None]:
print(a) # Just to remind ourselves what we are dealing with

In [None]:
a[0,0]

In [None]:
a[1,3]

In [None]:
a[0] # A single index refers to the first dimension (rows here)

## Array slicing

Arrays can also be sliced in the familiar manner:

In [None]:
a[1,0:3]

In [None]:
a[0,2:]

In [None]:
a[1,:]

In [None]:
a[:,2]

## Array assignment

This works the same way as for lists, with one exception — the array type is strictly observed.

In [None]:
a = np.array([[1,2,3,4],
              [5,6,7,8]])
a[1,1] = 66.6 # This will be truncated to an integer
a

Slicing works the same way as well:

In [None]:
a[1,1:-1] = np.array([666,777])
a

However, unlike lists, array slices return views instead of copies, meaning that we can modify the original array through them:

In [None]:
print(a)
b = a[1,1:-1]
print(b)
b[0] = 5
print(b)
a

Compare with:

In [None]:
L1 = [1,2,3,4]
L2 = L1[2:]
L2[0] = 666
L1

Use the `copy()` method if you want to override this behaviour.

## Array concatenation

There are several ways of combining arrays. The simplest is the `np.concatenate()` method.

In [None]:
x1 = np.arange(1,4)
x2 = np.arange(4,9)
x3 = np.concatenate([x1,x2]) # the argument is a tuple or a list
x3

If you need to control explicitly the concatenation process, you can use `np.hstack()` and `np.vstack()`.

In [None]:
x1 = np.arange(4).reshape(2,2)
x2 = np.arange(4,10).reshape(2,3)
np.hstack((x1,x2))

In [None]:
x1 = np.arange(6).reshape(2,3)
x2 = np.arange(12).reshape(4,3)
x3 = np.arange(3)
np.vstack((x1,x2,x3))

## Array splitting

You can use the general `np.split()` function or the specialized `np.vsplit()` and `np.hsplit()` functions. Here is how the latter two work:

In [None]:
x4 = np.vstack((x1,x2,x3))
a,b = np.vsplit(x4,[4]) # splits vertically (row-wise)
print("a =",a)
print("b =",b)

In [None]:
x4 = np.vstack((x1,x2,x3))
a,b,c = np.hsplit(x4,[1,2]) # splits horizontally (column-wise)
print("a =",a)
print("b =",b)
print("c =",c)

# UFuncs and vectorized operations

NumPy maintains vectorized operations via universal functions or *UFuncs*. In essence this means that you can call such functions on array objects and the respective operations will be performed by using (hidden) compiled routines that are fast and efficient. This, for instance, allows us to perform addition, subtraction etc. operations on arrays.

In [None]:
x1 = np.arange(1,5).reshape(2,2)
x2 = np.arange(10,14).reshape(2,2)
x1 + x2 # Obviously, the arrays must be conformable

In [None]:
x1 - x2

Array multiplication, division etc. are performed element-wise, **not** according to matrix operation rules.

In [None]:
x1 * x2

In [None]:
x1 / x2

In [None]:
x1**2

In [None]:
x2**x1

# A selection of NumPy functions

There is a rich selection of NumPy functions that work on arrays in a vectorized manner. A small sample follows below.

## Absolute values

In [None]:
x = np.array([[-1,3],[6,-2]])
np.abs(x)

## Exponentiation and logarithms

In [None]:
np.exp(x)

In [None]:
np.log(np.abs(x)) # try np.log(x) to see 
                  # how undefined results are handled

In [None]:
np.log10(np.abs(x))

## Trigonometric functions

In [None]:
np.sin(x)

In [None]:
np.cos(x)

In [None]:
np.sin(x)**2 + np.cos(x)**2

## Sums, maxima and minima of arrays

In [None]:
np.sum(x)

In [None]:
np.prod(x)

In [None]:
np.max(x)

In [None]:
np.min(x)

In [None]:
np.argmin(x) # returns index of minimizer

## Other aggregation operations on arrays

In [None]:
np.mean(x)

In [None]:
np.median(x)

In [None]:
np.std(x)

In [None]:
np.var(x)

**Note:** There exist NaN-safe versions of the above functions, e.g. `np.nansum()`, `np.nanmean()` etc.

# Comparisons, masks and Boolean logic

Comparison operations are also UFuncs in NumPy. They produce arrays of Booleans.

In [None]:
x >= 0

In [None]:
x == -1

We can pass an array of Booleans as an index to extract a subset from another array. This is known as *masking*.

In [None]:
y = np.arange(4)
bl = np.array([True,False,False,True])
y[bl]

In [None]:
y = np.array([[2,1,4],
              [5,6,15],
              [9,8,7]
             ])
bl = np.array([[True,False,True],
              [False,True,True],
              [False,True,True]
             ])
y[bl]

Vectorized comparisons and masking allow us to implement different filtering and subsetting conditions:

In [None]:
print(y)

In [None]:
y[y <= 7]

To construct more complex statements you'll need to use bitwise operators instead of `and`, `or` and `not`. The bitwise 
counterparts of `and`, `or` and `not` are, respectively, `&`, `|` and `~`. These can be thought of as vectorized versions of the familiar Boolean operations.

In [None]:
y[(y >= 7) & ~(y == 15)]

Checking whether a condition is true for at least one element of an array can be done with the `np.any()` function:

In [None]:
np.any(y > 15)

Checking whether a condition is true for every element of an array is done with the `np.all()` function:

In [None]:
np.all(y <= 15)

# Linear algebra with NumPy: an overview

- NumPy is capable of performing many linear algebra operations. (Even more are available through the SciPy library).
- NumPy has a special matrix class for the purpose.
- You will discover that a lot of the linear algebra functionality is also accessible through the `ndarray` object.
- Still, if you are working with linear algebra objects like matrices, using the appropriate type instead of a generic array will help ensure the correct computations and shield you from errors (at least to some extent).


# Defining matrices

Matrices can be defined by calling the `np.matrix()` function and providing it with an array-like object (an array or a list of lists) or a special string.

In [None]:
a = np.arange(10,19).reshape(3,3)
M1 = np.matrix(a)
M1

In [None]:
M2 = np.matrix([[1,2,3],[4,5,6],[7,8,9]])
M2

In [None]:
# Commas (or spaces) separate entries in rows
# Semicolons denote new rows
M3 = np.matrix("1, 2, 3; 4, 5, 6", dtype = 'float')
M3

# Matrix operations

Matrix addition and subtraction is done as usual:

In [None]:
M1 + M2

In [None]:
M1 - M2

## Matrix multiplication

For objects of class `matrix` the multiplication operation (in the linear algebra sense) is done using the `*` operator.

In [None]:
M1 * M2

Equivalently, you can use the `dot()` method, which also works on arrays:

In [None]:
M1.dot(M2)

In [None]:
np.array(M1).dot(np.array(M2))

Python 3.5 and higher also have the `@` operator, which performs matrix multiplication. It is a convenient way to ensure that you are not accidentally confusing elementwise multiplication with proper matrix multiplication.

In [None]:
M1 @ M2

In [None]:
np.array(M1) @ np.array(M2) # Matrix multiplication on arrays

In [None]:
np.array(M1) * np.array(M2) # Elementwise multiplication on arrays

Be careful with the dimensions of matrices and vectors (= (1 `X` N) or (N `X` 1) matrices). These matter in operations:

In [None]:
x = np.matrix([1,2,3])

In [None]:
M2 @ x # shapes do not match

In [None]:
M2 @ x.T

In [None]:
x @ M2

In [None]:
x.T @ M2 # shapes do not match

## Inverse matrices

A good part of NumPy linear algebra functionality resides in the `linalg` module. For example, the following computes an inverse matrix.

In [None]:
M = np.matrix('1 2 3; 0 1 4; 5 6 0')
np.linalg.inv(M)

# Solving linear systems

This is done through `np.linalg.solve()`. Consider the system 
\begin{array}{rcr}
x-2y+3z&=&7\\
2x+y+z&=&4\\
-3x+2y-2z&=&-10
\end{array}

In [None]:
A = np.array([[1,-2,3],[2,1,1],[-3,2,-2]])
b = np.array([7,4,-10])
np.linalg.solve(A,b)

# Matrix rank

In [None]:
np.linalg.matrix_rank(M3)

In [None]:
np.linalg.matrix_rank(M)

# Computing eigenvalues and eigenvectors

In [None]:
B = np.matrix("-2 -4 2;-2 1 2;4 2 5")
vals,vecs = np.linalg.eig(B)
print(vals)
print(vecs) # normalized to unit length

In [None]:
i = 0
print(B @ vecs[:,i]) 
print(vals[i]*vecs[:,i])

# Determinants

In [None]:
np.linalg.det(M)

In [None]:
np.linalg.det(A)

# Generating random values

NumPy has some basic facilities for working with random variables. (SciPy has more extended functionality.) Here are some examples.

The function `np.random.rand()` will generate an array of uniformly distributed random values on [0,1]

In [None]:
np.random.rand(3,5)

## Random normal sampling

The function `np.random.randn()` generates `N(0,1)` distributed variates.

In [None]:
np.random.randn(2,3,5)

## Random permutations of an array


In [None]:
A = np.arange(5,10)
print(A)
print(np.random.permutation(A))

In [None]:
A = np.arange(6).reshape(3,2)
print(A)
print(np.random.permutation(A)) # by default along the first dimension

# Creating your own vectorized functions with `np.vectorize()`

- Sometimes you want to apply a function that is designed to work on an individual element of an array to the whole array.
- One approach would be to write a loop that applies the function elementwise.
- However, Numpy offers a convenient shortcut for that: the `vectorize` function.

Consider the following example:

In [None]:
a = np.array([[2,-3,4],[0,6,7]])

def f(x):
    if x < 0:
        return 5
    elif x > 0:
        return -3
    else:
        return 0   

In [None]:
# This raises an error because f() is not designed to work on arrays
f(a)

Now let us apply the `vectorize` function.

It takes as an argument a function that works on individual elements and returns another function that works on arrays by applying the first function elementwise.

**Note: `vectorize` is merely a convenience function that implements a loop under the hood. Don't expect it to deliver performance gains.**

In [None]:
fvec = np.vectorize(f)
fvec(a)

# Pointers for additional reading

- The [Numpy and Scipy Documentation](https://docs.scipy.org/doc/) contains all you never wanted to know.
- You may be interested in:
  - iterating over arrays: `np.nditer()`
  - the functions `np.tile()`, `np.unique()`, `np.fliplr()`, `np.flipud()`.
  - broadcasting: see the docs or the corresponding section in *Python Data Science Handbook*
  - financial functions (for a while residing in the numpy-financial package): FV, PV, NPV, IRR etc.