# Introduction to NumPy

* [NumPy](https://docs.scipy.org/doc/numpy/reference/) is a widely-used scientific computing package for brings fast array processing to Python

* Runs fast compiled code written in C & Fortran under the hood

Consider the following example... we want to calculate the mean of 10,000 numbers

We will do this in both standard Python and using NumPy and compare computing times

#### Python version

In [1]:
%%timeit

python_list = list(range(10000))
sum(python_list) / len(python_list)

194 µs ± 5.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


#### NumPy version

In [2]:
import numpy as np

In [3]:
%%timeit

numpy_array = np.arange(10000)
numpy_array.mean()

17.8 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


As you can see, the NumPy version is significantly faster
* Much of this speed-up is a result of NumPy knowing the **type** of data it is dealing with

In [4]:
a = np.array([1.5, 2, 4])
a

array([1.5, 2. , 4. ])

All elements of the array must be of the same type

In [5]:
[type(a_element) for a_element in a]

[numpy.float64, numpy.float64, numpy.float64]

We can specify the data type of the array

The most common data types are:

* float64: 64 bit floating point number
* int64: 64 bit integer
* bool: 8 bit True or False

In [6]:
a = np.array([1.8, 2, 4], dtype=int)
a

array([1, 2, 4])

In [7]:
[type(a_element) for a_element in a]

[numpy.int64, numpy.int64, numpy.int64]

By construction, one dimensional NumPy arrays are **flat**

In [8]:
z = np.zeros(10)

In [9]:
z

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [10]:
z.shape

(10,)

Although we can transform them into "column vectors" and "row vectors" if we wish:

In [11]:
z.shape = (10, 1)
z

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [12]:
z.reshape(2, 5)

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

## NumPy array basics

A reference sheet can be found at [QuantEcon.cheatsheets](https://cheatsheets.quantecon.org/)

In [20]:
z = np.empty(3)
z

array([1.5, 2. , 4. ])

In [21]:
z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
z

array([2. , 2.5, 3. , 3.5, 4. ])

In [22]:
z = np.eye(5, k=-1)
z

array([[0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])

We can build arrays from lists and tuples, like so:

In [23]:
z = np.array([10, 20])

2D array from list of lists:

In [24]:
z = np.array([[1, 2], [3, 4]]) 
z

array([[1, 2],
       [3, 4]])

### Indexing

In [25]:
z = np.linspace(1, 2, 5)
z

array([1.  , 1.25, 1.5 , 1.75, 2.  ])

In [26]:
z[0]

1.0

In [27]:
z[0:2]  # Two elements, starting at element 0 up until (but not including) 2

array([1.  , 1.25])

In [28]:
z[-1]  # Last element

2.0

In [29]:
z[::2]  # Every second element

array([1. , 1.5, 2. ])

In [30]:
z = np.array([[1, 2], [3, 4]])
z

array([[1, 2],
       [3, 4]])

In [31]:
z[0, 1]

2

Selecting rows and columns:

In [32]:
z[0, :]

array([1, 2])

In [33]:
z[:, 1]

array([2, 4])

### NumPy array methods

Type `.` after your numpy array name and click `tab` to see available methods

In [34]:
a = np.array((4, 3, 2, 1))
a

array([4, 3, 2, 1])

In [35]:
a.

SyntaxError: invalid syntax (<ipython-input-35-b4ce96ddcee0>, line 1)

In [36]:
a.std()               # Sum

1.118033988749895

In [37]:
a.shape = (2, 2)
a.T                   # Equivalent to a.transpose()

array([[4, 2],
       [3, 1]])

### Arithmetic operations

Standard arithmetic operators act **elementwise** on arrays:

In [38]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
a + b

array([ 6,  8, 10, 12])

In [39]:
a * b

array([ 5, 12, 21, 32])

In [40]:
a + 10

array([11, 12, 13, 14])

In [41]:
A = np.ones((2, 2))
B = np.ones((2, 2))
A + B

array([[2., 2.],
       [2., 2.]])

In [42]:
A + 10

array([[11., 11.],
       [11., 11.]])

In [43]:
A

array([[1., 1.],
       [1., 1.]])

In [44]:
B

array([[1., 1.],
       [1., 1.]])

In [45]:
A * B

array([[1., 1.],
       [1., 1.]])

As you can see, `*` is *not* matrix multiplication.

Here's how you do it:

In [46]:
A = np.ones((2, 2))
B = np.ones((2, 2))
A @ B

array([[2., 2.],
       [2., 2.]])

In [47]:
A = np.array((1, 2))
B = np.array((10, 20))
A @ B

50

## Exercise

Write a function `matrix_power` to compute $A^n$
* The function should take a square array and an integer as arguments

### Implications of mutability

* NumPy arrays are mutable, ie. their contents can be changed
* This has an implication which often tricks people...

In [48]:
a = np.ones(3)
a

array([1., 1., 1.])

The next statement binds `b` to the same object

In [49]:
b = a

Now changing `b` mutates the data that `a` points to

In [50]:
b[0] = 0.0
a

array([0., 1., 1.])

In [51]:
a is b

True

How to make a separate copy when you need to

* Note that making a copy is a more expensive operation

In [52]:
a = np.ones(3)
a

array([1., 1., 1.])

In [53]:
b = np.copy(a)
b[0] = 2
b

array([2., 1., 1.])

In [54]:
a

array([1., 1., 1.])

### Ufuncs

* *Universal functions* are *vectorized functions* that act element-wise on arrays
* Instead of looping through an array and applying an operation, the operation is sent in batches to optimized C and Fortran code

In [55]:
z = np.arange(10000)
z_2 = np.empty_like(z)

In [72]:
%%timeit

for i in z:
    z_2[:] = z**2

57.4 ms ± 853 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [73]:
%%timeit

z**2

3.24 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Scalar functions (`sin`, `log`, `exp`, etc.) act individually on scalars and elementwise on arrays

A list of available ufuncs can be found [here](https://docs.scipy.org/doc/numpy-1.14.0/reference/ufuncs.html#available-ufuncs) 

In [56]:
np.sin(1)

0.8414709848078965

In [57]:
z = np.array([1, 2, 3])
np.sin(z)

array([0.84147098, 0.90929743, 0.14112001])

### NumPy Subpackages

The `random` subpackage:

In [58]:
z = np.random.randn(5)

In [59]:
y = np.random.binomial(10, 0.5, size=1000) 
y.mean()

5.041

The `linalg` subpackage

In [60]:
A = np.array([[1, 2], [3, 4]])

np.linalg.det(A)           # Compute the determinant

-2.0000000000000004

In [61]:
np.linalg.inv(A)           # Compute the inverse

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

Solve $Ax = B$

In [62]:
B = np.array([3, 1])
np.linalg.solve(A, B)

array([-5.,  4.])

Computer $A^n$

In [63]:
np.linalg.matrix_power(A, 5)

array([[1069, 1558],
       [2337, 3406]])

## Exercise

Consider a linear regression model

$$
y = X \beta + \varepsilon \quad \quad \varepsilon \sim N(0, 1)
$$

We can estimate $\beta$ as
$$
\hat{\beta} = (X'X)^{-1} X' y
$$

where $X'$ is the transpose of $X$

Given

$$
y =
\begin{bmatrix}
3 \\
7 \\
10 \\
5 \\
\end{bmatrix}
$$

$$
X = 
\begin{bmatrix}
5 & 3 \\
2 & 3 \\
3 & 1 \\
2 & 8 \\
\end{bmatrix}
$$

Compute $\hat{\beta}$

## Exercise

We can represent an AR(1) model in the form

$$
Ay = \varepsilon \quad \quad \varepsilon \sim N(0, 1)
$$

where $A$ is

$$ A = \begin{bmatrix} 1  & 0 & \cdots & 0 & 0  \cr
                       -a  & 1 & \cdots & 0 & 0 \cr
                       \vdots & \vdots & \cdots & \vdots & \vdots \cr
                       \vdots & \vdots & \cdots & 1 & 0 \cr
                       0 & 0 & \cdots & -a & 1 \end{bmatrix} $$
                       
and $y$ and $\varepsilon$ are $(T x 1)$ vectors

Generate an AR(1) series with $T=100$ and $\alpha = 0.9$ using matrix algebra

Hint: you will need to use `np.eye`, `np.random.randn` and `np.linalg.inv`

### More resources

* [QuantEcon NumPy Tutorial](https://lectures.quantecon.org/py/numpy.html)
* [QuantEcon Numerical Computing Cheatsheet](https://cheatsheets.quantecon.org/)
* [Introduction to Numerical Computing with NumPy - SciPy 2017](https://www.youtube.com/watch?v=lKcwuPnSHIQ&ab_channel=Enthought)
* [NumPy documentation](https://docs.scipy.org/doc/numpy/reference/)