# Number Crunching

Although `lambda` is much faster than the conventional for loop, there is something even faster: NumPy arrays! The NumPy package (numerical Python) provides access to a new data structure called arrays which allow efficient linear algebra (vector and matrix) operations. 

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Vectors-(1D)" data-toc-modified-id="Vectors-(1D)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Vectors (1D)</a></span><ul class="toc-item"><li><span><a href="#Array-characteristics" data-toc-modified-id="Array-characteristics-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Array characteristics</a></span></li><li><span><a href="#Creating-and-using-arrays" data-toc-modified-id="Creating-and-using-arrays-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Creating and using arrays</a></span></li><li><span><a href="#Growing-arrays" data-toc-modified-id="Growing-arrays-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Growing arrays</a></span></li></ul></li><li><span><a href="#Matrices-(2D)" data-toc-modified-id="Matrices-(2D)-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Matrices (2D)</a></span><ul class="toc-item"><li><span><a href="#Indexing-matrices" data-toc-modified-id="Indexing-matrices-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Indexing matrices</a></span></li><li><span><a href="#Basic-matrix-operations" data-toc-modified-id="Basic-matrix-operations-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Basic matrix operations</a></span></li></ul></li><li><span><a href="#Linear-algebra" data-toc-modified-id="Linear-algebra-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Linear algebra</a></span></li><li><span><a href="#Polynomials" data-toc-modified-id="Polynomials-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Polynomials</a></span></li><li><span><a href="#Other-abilities" data-toc-modified-id="Other-abilities-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Other abilities</a></span></li></ul></div>

In [None]:
import numpy as np

This notebook was written for NumPy version 1.15.4

In [None]:
np.__version__

## Vectors (1D)  

### Array characteristics

An array appears to be very similar to a list, but an array can keep only elements of the same type (whereas a list can mix different kinds of objects). This means arrays are more efficient to store (because we don't need to store the type for every element). It also makes arrays the data structure of choice for numerical calculations.

In [None]:
x1 = np.array([1, 2, 3])
x1

In [None]:
type(x1)
# The nd refers to n-dimensional

In [None]:
# Indexing and slicing is still the same
x1[1:3]

If you use help on the array, you'll see this for the data type keyword argument:
> If not given, then the type will
  be determined as the minimum type required to hold the objects in the
  sequence: 

In [None]:
# If you include 1 float, all integers will become floats
x2 = np.array([4, 5, 6.0])
x2

In [None]:
# Likewise, all objects here become strings
x3 = np.array([1, 1.5, '2'])
x3

In [None]:
# You can still convert between lists and arrays
list(x3)

Unlike scalars, arrays have a _shape_. The elements of the shape tuple give the lengths of the corresponding array dimensions.

In [None]:
np.shape(x3)
# This vector contains 3 elements

The biggest (and most useful) difference between arrays and lists are that you can apply a calculation to every number in the sequence with one statement (i.e. _broadcasting_)!

Consider this next example. Instead of duplicating the object (as with lists), every element in the array is multiplied. Two dimensions are compatible when they have exactly the same shape, or one of them has a shape of 1.

In [None]:
x1*2

In [None]:
x1*np.array([2, 3, 5])

We use `math` if to do simple comutations with only scalars, but `numpy` for lists, arrays, matrices,  or large datasets. It also contains the same functions (and more).

In [None]:
np.sin(x1) + np.cos(x2)

In [None]:
# Mathematical constants
print(np.pi)
print(np.e)
print(np.inf) # infinity
print(np.nan) # not a number, e.g. np.log(-1)

### Creating and using arrays

There are other ways to create arrays. Instead of using `range()`, we'll use `numpy.arange()`. You can also specify the number of values and their spacing, or use the built-in random subpackage for random values.

In [None]:
a1 = np.arange(0, 10, 2)   # creates values from 0 to 10 in steps of 2 
a2 = np.linspace(0, 10, 5) # creates 5 values linearly spaced from 0 to 10 
a3 = np.logspace(0, 10, 5) # creates 5 values logarithmically spaced from 0 to 10

print("{}, \n{}, \n{}".format(a1, a2, a3))

In [None]:
a4 = np.random.rand(5)                # floats from 0 to 1 and shape 5
a5 = np.random.randn(5)               # Gaussian floats around zero and shape 5
a6 = np.random.randint(1, 7, size=10) # 10 integers between 1 and 7

print("{}, \n{}, \n{}".format(a4, a5, a6))

Here are some array attributes. Some are very useful for statistics:

In [None]:
vec = np.arange(2, 11.0) # If any value is a float, the array will only contain floats
vec

In [None]:
vec.mean()

In [None]:
# Standard deviation
vec.std()

In [None]:
print(vec.max())
print(max(vec)) # This also works

In [None]:
# Return indices of the maximum values along the given axis.
vec.argmax()

<span style="color:red"> **Warning:** </span> A shallow copy of an array can't be created by slicing.

In [None]:
import copy

In [None]:
old = np.array([1, 2, 3, 4])

a = old.copy()
b = old[:]
c = list(old)
d = copy.copy(old)
e = copy.deepcopy(old)

old[-1] = 100

print("original: {}\n list.copy(): {}\n slice: {}\n list(): {}\n copy: {}\n deepcopy: {}"
      .format(old, a, b, c, d, e))

### Growing arrays

<span style="color:red"> **Warning:** </span> When appending only once, using `np.append` on your array should be fine. The drawback of this approach is that memory is allocated for a *completely new array* every time it is called. When growing an array for a significant amount of samples it would be better to either:
 + append to a list and convert to an array afterwards, or
 + make an array of only ones or zeros (if you know how long it should be) and then replace the values.
 
(Answer from [StackOverflow](https://stackoverflow.com/questions/7332841/add-single-element-to-array-in-numpy)).

In [None]:
# Try to avoid doing this repeatedly:
x = np.array([10, 20, 30])
x = np.append(x, [40, 50, 60])
x

In [None]:
# Also try to avoid stacking repeatedly:
print(np.hstack([x, x])) 
print(np.vstack([x, x]))

See how the `zeros_like` attribute creates an array of zeros with the same length and data types as the input.

In [None]:
print(np.zeros(5))
print(np.ones(5))
print(np.full(5, 23)) # shape of 5, value of 23
print('')

x = np.array([10, 20, 30, 40, 50, 60])
print(x)
print(np.zeros_like(x))
print(np.ones_like(x))
print(np.full_like(x, 43))

In [None]:
lys = []
for i in range(10000):
    lys.append(i)
arr = np.array(lys) # convert the list to an array
arr

In [None]:
# Make an array of the correct length, consisting only of zeros, then change every value in the loop
n = 10000
arr = np.zeros(n)

for i in range(n):
    arr[i] = i
arr

## Matrices (2D) 

Matrices are essentially 2-dimensional arrays. Here are 5 ways to create them:
 + Turning nested lists into an array (where each list is a row)
 + Creating a matrix of zeros, ones, or other values (as above)
 + Creating a diagonal matrix
 + Reshaping a 1-D array
 + Making a meshgrid (useful for plotting; see next unit)
 
Note: It is no longer recommended to use np.matrix, even for linear algebra. Instead use regular arrays. [The class may be removed in the future.](https://numpy.org/doc/stable/reference/generated/numpy.matrix.html)

In [None]:
A = np.array([[1,2,3], [4,5,6]]) 
A

In [None]:
B = np.zeros((5, 4))
print(B)
print(np.shape(B))

In [None]:
np.full((3,4), 7)

In [None]:
C = np.diag(np.array([1, 2, 3, 4]))
C

In [None]:
D = np.arange(12).reshape(2, 6)
D

To turn a matrix back to a vector:

In [None]:
D.flatten()

Meshgrid forms 2 matrices, as illustrated below. Essentially, if you were to make a 3-D plot, you require a grid of points on the XY-plane. The XX matrix therefore represents the x-coordinates over that grid. 

![](Assets/Meshgrid.png)

(Answer from [StackOverflow](https://stackoverflow.com/questions/36013063/what-is-the-purpose-of-meshgrid-in-python-numpy))

In [None]:
x = [1, 2, 3, 4]
y = [7, 5, 6]
XX, YY = np.meshgrid(x, y)
print(XX)
print('')
print(YY)

### Indexing matrices

Remember how we indexed nested lists in Unit 1? The same principle applies here.

In [None]:
A = np.array([[ 7, 4,  5, 12],
              [-5, 8,  1,  0],
              [-6, 7,  9, 10]])

print(A[0]) # 1st row
print(A[0][0]) # 1st row, 1st column
print(A[0, 0]) # also acceptable
print(A[-1][1:3]) # last row, slice from 1st to 3rd comma
print(A[:][0]) # 1st row

To access columns, use this notation:

In [None]:
print(A[:, 0]) # 1st column
print(A[::, 0]) # also acceptable
print('----')
print(A[0:4, 0:4]) # entire matrix
print(A[:2, :3]) # first 2 rows; first 3 columns 

### Basic matrix operations

Let's consider the addition, multiplication and transposition of matrices.

Remember multiplication is only possible if the number of columns in the first matrix equals the number of rows in the second matrix. 

In [None]:
A = np.array([[2, 4], [1, -3]])
B = np.array([[0, -1], [2, 1]])
print(A)
print('')
print(B)

In [None]:
print(A + B) # element-wise addition

In [None]:
print(A @ B) # preferred

print(A.dot(B)) # Note that this notation is preferred for the dot product of arrays

In [None]:
# both accepted
print(A.transpose())
print(A.T)

The identity matrix, $I$, had to be named as a pun to prevent confusion 😜

In [None]:
np.eye(5)

## Linear algebra 

Lastly let's illustrate the computational capabilities of the `linalg` submodule. Consider the system of linear equations given by 

$ Ax = b $

If the system is well-determined, use `LA.solve()` to solve it. If the system is under-, over-determined use the `LA.lstsq()` function to return the least-squares solution to a linear matrix equation. We will also calculate the determinant, inverse, eigenvalue, eigenvector and first order norm of the matrix $A$.

Try changing just 1 element in the system and see how it affects the results!

In [None]:
import numpy.linalg as LA

In [None]:
A = np.array([[2, 3, 4, 1], 
              [1, 1, 2, 1], 
              [2, 4, 5, 2], 
              [1, 2, 3, 4]])
b = np.array([[10], [5], [13], [10]])

In [None]:
LA.solve(A, b)

In [None]:
LA.det(A)

In [None]:
LA.inv(A)

In [None]:
evalues, evectors = LA.eig(A) 
print(evalues)
print(evectors)

In [None]:
LA.norm(A, ord=1)

## Polynomials 

For numerical evaluations of polynomials, NumPy provides "convenience classes" for quick calculations, including root-finding and curve-fitting. See the [docs](https://docs.scipy.org/doc/numpy/reference/routines.polynomials.classes.html).

For information on *plotting, symbolic calculations and nonlinear curve-fitting* see the next Units.

First, let's define $ p(x) = 1 + 2x + 4x^3 $

In [None]:
from numpy.polynomial import Polynomial as P

In [None]:
p = P([1,2,0,4])
p

In [None]:
p.roots()

To fit a polynomial to data with `np.polyfit`, specify the x- and y-data, as well as the order of the polynomial. The output will be the coefficients with the highest power first.

In [None]:
xdata = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0]) 
ydata = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z = np.polyfit(xdata , ydata , 3)
z

To see other subpackages in NumPy like `random`, `fft` and `polynomial`, use help.

In [None]:
# np?

## Other abilities

There are many more functions in the `numpy` library that can't be discussed here. Although Python is currently at version 3.7, the modules have their own version numbers:

 + [Mathematical ("universal") functions like arcsin](https://docs.scipy.org/doc/numpy/reference/routines.math.html)
 + [Financial functions like NPV](https://docs.scipy.org/doc/numpy/reference/routines.financial.html)
 + [More statistics](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html)
 + [More linear algebra](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
 + [More polynomials](https://docs.scipy.org/doc/numpy/reference/routines.polynomials.html)
 + [Dates and time](https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html)
 + [Discrete (fast) Fourier transform](https://docs.scipy.org/doc/numpy/reference/routines.fft.html)
 + [Sorting, searching, and counting in arrays](https://docs.scipy.org/doc/numpy/reference/routines.sort.html)

# Citation

If NumPy contributes to a scientific publication, you may cite it as follows:

Oliphant, TE (2006) _A guide to NumPy_, Trelgol Publishing, USA.

Van der Walt, S, Colbert, SC and Varoquaux, G (2011) "The NumPy array: A structure for efficient numerical computation", _Computing in Science & Engineering_, 13(2), 22–30, DOI: 10.1109/MCSE.2011.37.

[(Publisher link)](https://aip.scitation.org/doi/abs/10.1109/MCSE.2011.37)