<p style="text-align: center;"><font size="8"><b>Section 3.0: NumPy</b></font><br>


# NumPy

The `NumPy` module (http://www.numpy.org/) is an almost indispensible module for scientific computing. It provides objects such as arrays and matrices as well as functions spanning linear algebra, fourier transforms and statistics among numerous other things. 

NumPy will be one of the modules you'll use often in this camp (and likely in most other scientific Python codes).

To start with we must import NumPy. To reduce the amount of typing for ourselves later we will rename the module `np` when we import it. Using `np` as shorthand for NumPy is a relatively standard convention in Python programming. 

In [None]:
import numpy as np

## Arrays

One important class that NumPy provides is the `array` class. An array is similar to a `list` in that it is a collection of objects. Typically arrays store numbers. 

NumPy arrays can be initialized in a similar way to lists. Below you'll see we are giving a list to the np.array() function to create our array object.

In [None]:
a = np.array([1,2.0,3.2])
print(a)
type(a)

[1.  2.  3.2]


numpy.ndarray

You'll notice the type of `a` is `numpy.ndarray`. NumPy arrays can be multidimensional. You can think of a 1D array as a kind of list (but not a Python list) and a 2D array as a kind of grid or table of values. Higher dimensional arrays are certainly possible, you can think of a 3D array as a stack of grids. 

![np array](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/np_array.jpg?raw=true)
(Image credit: Dalesha Hemrajani)


The attribute `ndim` stores the number of dimensions in the array.

In [None]:
a.ndim

1

Multidimensional arrays can be initialized as an array of arrays. 

In [None]:
b = np.array([[1, 2, 3.0], [1.2,2.2,2]])
print(b)

[[1.  2.  3. ]
 [1.2 2.2 2. ]]


In [None]:
print(b.ndim)

2


The `shape` property tells us how many rows and columns are in our array, while the `size` property tells us the total number of elements in the array.

In [None]:
print(b.shape)
print(b.size)

(2, 3)
6


Here `b.shape` is the tuple (2,3) meaning that `b` has 2 rows and 3 columns.

We could initialize an array as an array of arrays of different sizes. 

In [None]:
c = np.array([[1,2],[3,4,5,6]])

  """Entry point for launching an IPython kernel.


This is perfectly valid. What is the dimension, size, and shape of `c`?

In [None]:
print(c.ndim)
print(c.shape)
print(c.size)
print(c)

1
(2,)
2
[list([1, 2]) list([3, 4, 5, 6])]


The way we are initializing `c`, it looks like we are trying to make a multidimensional array with the first row being [1,2] and the second row being [3,4,5,6]. Clearly since the lengths of these two rows are unequal, we cannot make a grid out of them. 

Python can recognize this and instead of making an array of dimension 2, it creates an array of dimension 1. Instead of having 6 elements, it only has 2. Each of the elements is a Python list.

### Indexing

It's important to know how arrays are numbered. Like lists and strings, arrays are 0 indexed, meaning the first entry in an array is at index 0. 

Two-dimensional arrays have rows and columns. The entry at index [0,0] (first row, first column) is located at the upper left hand corner of the array. 

![row map](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/row_column.gif?raw=true)

Like lists and strings, arrays support indexing and slicing.

In [None]:
a = np.array([1,2.0,3.2])
a[0] # first entry in a

1.0

In [None]:
b = np.array([[1, 2, 3.0], [1.2,2.2,2]])
b[1] # second entry in b, each entry is a row

array([1.2, 2.2, 2. ])

When we have a multidimensional array (or an array of arrays of equal or unequal length) we can access the element at row i and column j using the syntax:

In [None]:
b[1][0] # first element in the second row of b

1.2

Or the equivalent syntax:

In [None]:
b[1,0]

1.2

Note that `b[1][0]` is the element in b at row 1 column 0.

`b[1][0]` can also be thought of as the element at index 0 of `b[1]`.

Note that arrays are mutable. For example we can modify an element of `b`.

In [None]:
b[0][0] = 8
print(b)

[[8.  2.  3. ]
 [1.2 2.2 2. ]]


Slicing is done in exactly the same way.

In [None]:
a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
print(a)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [None]:
print(a[0,0:2]) # first 2 entries of row 0

[1 2]


In [None]:
print(a[0:2,1:3]) # first 2 rows and columns 2 and 3

[[2 3]
 [5 6]]


The colon operator by itself means the entire row or column.

In [None]:
print(a[:,1]) # entire second column

[ 2  5  8 11]


## Exercise

Create a NumPy array representing the data:
$$ \begin{bmatrix} 1 & 2 & 3 & 4\\ 5 & 6 & 7 & 8\\ 9 & 10 & 11 & 12\\ 13 & 14 & 15 & 16\end{bmatrix}$$

Now extract the middle 2x2 array from the array from the previous exercise. i.e. use slicing to extract the array:
$$ \begin{bmatrix} 6 & 7\\ 10 & 11\end{bmatrix}$$

### Operations

Arrays support several familiar operators. For example you can multiply or divide them by a number.

In [None]:
a = np.array([1,2])
print(2*a)
print(a/2)

[2 4]
[0.5 1. ]


You can also add a number to them. 

It should be made clear, when you add/subtract/multiply/divide/exponentiate/etc an array by a single number (known as a scalar) the operation is applied to **every** element of the array.

In [None]:
a = np.array([1,2])
print(a + 1)

[2 3]


You can even add two arrays together.

In [None]:
b = np.array([3,4])
print(a + b)

[4 6]


When you add two arrays together they must be the same size.

In [None]:
a = np.array([1,2])
a = np.array([2,4,5])
print(a + b)

ValueError: ignored

Suppose we want to multiply two NumPy arrays, $a = [a_1,a_2,a_3]$ and $b = [b_1,b_2,b_3]$, how does NumPy do it?


In [None]:
a = np.array([1,2,3])
b = np.array([3,4,5])

print(a*b)

[ 3  8 15]


NumPy does what is called *element-wise multiplication*. In other words, `a*b` is equal to $[a_1 b_1, a_2 b_2, a_3 b_3]$.

Now suppose $A$ is a 2D array and $x$ is a 1D array. What is `A*x` in Python? 

In [None]:
x = np.array([1,2])
A = np.array([[3,2], [1,2]])

print(A*x)

[[3 4]
 [1 4]]


It's still an elementwise multiplication. $A\mathbf{x}$ in this case is defined to be:

$$ \begin{bmatrix} A_{11}x_1 & A_{12}x_2\\ A_{21}x_1 & A_{22} x_2\end{bmatrix}$$

We multiply each column of `A` by the corresponding index in `x`.

Likewise if we call $A^2$, we get

In [None]:
print(A**2)

[[9 4]
 [1 4]]


which is 
$$ \begin{bmatrix} A_{11}^2 & A_{12}^2\\A_{21}^2 & A_{22}^2\end{bmatrix}.$$
We still get elementwise multiplication.

## Exercise

Write a code fragment that adds the arrays:

$$A = \begin{bmatrix} 1 & 2 \\ 3 & 4\end{bmatrix}$$
and 
$$ B = \begin{bmatrix} 5 & 6\\ 7 & 8\end{bmatrix}.$$

## Other Useful Operations


### arange
NumPy provides many useful operations to generate and manipulate arrays. 

For example suppose we want to create an array ranging from 1 to 20. NumPy provides a function `np.arange` that does just that.

In [None]:
a = np.arange(1,21)
print(a)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]


The arange function works in a similar way to the `range` class we saw earlier. It can take in up to three arguments: a starting value, an end value and a step size. It returns a 1D array that starts at the starting value and adds the step size until it reaches or exceeds the end value. 


Note that unlike the `range` class, the starting and ending values as well as the step size can be floats. 

In [None]:
a = np.arange(1.1,2.0,0.1)
print(a)

[1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]


In [None]:
a = np.arange(1,2.2,0.1)
print(a)

[1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.  2.1 2.2]


You'll notice however that when we use floating point numbers, we may or may not have the ending value as part of our array. This is due to floating point error. 

### linspace

This ambiguity in arange can cause problems. Fortunately, NumPy provides a separate function that creates an array of a specified length. The `np.linspace` function takes in a start and end value as well as the number of points. Unless you are dealing with integers, linspace is preferred over arange to generate equally spaced arrays. 

In [None]:
a = np.linspace(0,2.2,5)
print(a)

[0.   0.55 1.1  1.65 2.2 ]


The last parameter in linspace is the number of  desired points in the array. If the start value is greater than the end value, then linspace automatically takes negative step sizes.

In [None]:
a = np.linspace(5,1,5)
print(a)

[5. 4. 3. 2. 1.]


### reshape

Suppose we wanted to create the array:
$$ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 &8&9\\10 & 11& 12\end{bmatrix}.$$

We could create this matrix by hand, but if it was much larger that would be a pain. What if instead we used arange or linspace to create a 1D array with the same data, and then reshaped into an array with 4 rows and 3 columns? 

NumPy provides the function `np.reshape` that does just that.

In [None]:
a = np.arange(1, 13)
A = np.reshape(a, (4, 3))

print(A)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


Note that the second argument to `np.reshape` is a tuple (r,c) where r is the number of rows and c is the number of columns we want the new array to have. 

## Exercise

Using the linspace and reshape functions, create the array:
$$ A = \begin{bmatrix} 0.1 & 0.2 & 0.3 & 0.4 & 0.5\\ 0.6 & 0.7 & 0.8 & 0.9 & 1\end{bmatrix}.$$

### zeros

Sometimes it can be useful to create an array of zeros to fill in with non-zero values later:

$$ A = \begin{bmatrix} 0 & 0 & 0\\ 0 & 0 & 0\\ 0 &0&0\\0 & 0& 0\end{bmatrix}.$$

We find that we would repeatedly create rows of zeros if we did this by hand. And again, this would be just as painful to do for a larger matrix as stated in the `reshape` example. 

Again NumPy saves the day with `np.zeros((n,m))`, where n is the number of rows and m is the number of columns of zeros that you want. We can even make 1D arrays of zeros with this function a la `np.zeros(n)`

In [None]:
a = np.zeros(5)
A = np.zeros((4, 3))
print(a,'\n')
print(A)

[0. 0. 0. 0. 0.] 

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


### sum and mean/average

We can also easily get the sum and mean/average of an array using `np.sum(array)` and `np.mean(array)`, respectively. If the arrays are 2D or greater, you can specify the `axis` over which the sum or mean is taken. For example with a 2D array, `np.sum(A,axis=0)` would return a sum across the rows, `np.mean(A,axis=1)` would return the mean across the columns.

![axis](https://i.stack.imgur.com/Z29Nn.jpg)

In [None]:
a = np.arange(1, 13)
B = np.reshape(a, (4, 3))
print(a)

print("Sum  of a:", np.sum(a))
print("Mean of a:", np.mean(a))

print("\n", B)
print("Sum  of B:", np.sum(B))
print("Mean of B:", np.mean(B))
print("\nSum  across B rows:", np.sum(B,axis=0))
print("Mean across B rows:", np.mean(B,axis=0))
print("\nSum  across B columns:", np.sum(B,axis=1))
print("Mean across B columns:", np.mean(B,axis=1))



[ 1  2  3  4  5  6  7  8  9 10 11 12]
Sum  of a: 78
Mean of a: 6.5

 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Sum  of B: 78
Mean of B: 6.5

Sum  across B rows: [22 26 30]
Mean across B rows: [5.5 6.5 7.5]

Sum  across B columns: [ 6 15 24 33]
Mean across B columns: [ 2.  5.  8. 11.]


### max and min

Finding the largest and the smallest values in an array is also very easy with NumPy! One simply needs to use `np.max(Array)` to find the largest value and `np.min(Array)` to find the smallest value.

For 2D or greater arrays, you can get the maximum/minimum values for a given axis of the array, e.g. `np.max(B, axis=0)` would get the maximum values across the rows of `B`, `axis=1` would give the maximum values acros the columns.

In [19]:
a = np.arange(1, 13)
B = np.reshape(a, (4, 3))
print(a)
print("Max of a:", np.max(a))
print("Min of a:", np.min(a), "\n")
print(B)
print("Max of B:", np.max(B))
print("Max across B rows:", np.max(B,axis=0))
print("Min across B columns:", np.min(B,axis=1))

[ 1  2  3  4  5  6  7  8  9 10 11 12]
Max of a: 12
Min of a: 1 

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Max of B: 12
Max over B rows: [10 11 12]
Min over B columns: [ 1  4  7 10]


### where

Suppose we wanted to take all of the values of an array that were larger than 2.5 and make them zero. This is certainly doable by using `for` loops (we'll learn about these later) or simply doing it by hand.

However, `np.where()` can make this very easy. For our example, you would use `np.where(B < 2.5, B, 0.0)` which translates to "for where in B it is less than 2.5, use the corresponding values in B, otherwise set to 0.0"

In [18]:
B = np.reshape(np.arange(1, 13), (4, 3))
print(B,"\n")
print(np.where(B < 2.5, B, 0.0), "\n")
print(np.where(B > 2.5, 0.0, B), "\n")

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]] 

[[1. 2. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]] 

[[1. 2. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]] 



### stack

Finally, we'll go over the `stack` method in NumPy. This function allows us to stack together multiple arrays in a larger, higher dimensional array. Suppose we had three 1D arrays of length 3 each and we wanted to stack them on top of each other to form a single, (3 by 3) 2D array, `np.stack` would provide us the means to do this.

In [17]:
a = np.array([1,1,1])
b = np.array([2,2,2])
c = np.array([3,3,3])
print("a = ", a)
print("b = ", b)
print("c = ", c)
D = np.stack((a,b,c),axis=0)
print("\nD = \n",D)




a =  [1 1 1]
b =  [2 2 2]
c =  [3 3 3]

D = 
 [[1 1 1]
 [2 2 2]
 [3 3 3]]


## Exercise

Try constructing the following 2D array

$$ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 & 8 & 9\end{bmatrix}$$

HINT: Constructing this array can be done with some combination of `np.arange`, `np.reshape`, and/or `np.stack`

### Mathematical Operations

We saw earlier the math module. The math module provides functions like sine, cosine, arc-tangent and so on. These functions only work on numbers. If we pass in a NumPy array we get an error.

In [None]:
import math

a = np.array([0,1])
print(math.sin(a))

TypeError: ignored

NumPy provides its own implementations of many mathematical functions that take in arrays and perform operations on each element.

In [None]:
a = np.array([0,1])
print(np.sin(a))

[0.         0.84147098]


The fact that the NumPy and math modules provide functions with the same names demonstrates why it is a bad idea to import everything at once from a module.

## Exercise

Evaluate the expression:
$ 3a + b,$
where $a = [1, 2, 3, 4, 5]$ and $b = [1.1,1.2,1.3,1.4,1.5].$ Use linspace or arange to create $a$ and $b$.

### Arrays vs. Lists

Arrays and lists are similar in many ways. Both represent a collection of objects. In this camp (and beyond) arrays and lists will be the most common data structures you will use. When should you use one over the other? 

For starters, arrays are mutable, however they do not support methods such as `pop` or `append`. Once initialized the size of an array cannot be easily changed. If your application needs to change the size of a collection, lists are the preferred option. 

Another difference between arrays and lists is how operators are defined. We saw earlier how we can add two arrays together or multiply them by a number. This behaviour is different from how it is handled with lists.

In [None]:
a_array = np.array([1,2])
b_array = np.array([3,4])

a_list = [1,2]
b_list = [3,4]

print("a_array + b_array:", a_array + b_array)
print("a_list + b_list:  ", a_list + b_list)

print("2*a_array:", 2*a_array)
print("2*a_list: ", 2*a_list)

a_array + b_array: [4 6]
a_list + b_list:   [1, 2, 3, 4]
2*a_array: [2 4]
2*a_list:  [1, 2, 1, 2]


It's possible to convert from a list to an array or vice versa. NumPy arrays provide the method `tolist()` which converts an array to a list.

In [None]:
c = a.tolist()
type(c)

list

NumPy also provides the function `asarray` that takes a list and returns an array.

In [None]:
d = np.asarray(c)
type(d)

numpy.ndarray

# Documentation - NumPy

But what if you forget how to properly use a function from NumPy? Or maybe you want to look for other functions that might be of use to you? Well, computer programmers have you covered, with **documentation**. Just like with informational user manuals that come with a new car, microwave, cell phone, etc, Python modules come with their own user manuals! A link the to NumPy's documentation manual can be found here [NumPy docs link](https://numpy.org/doc/stable/user/index.html). At this website, you'll find a list of contents leading to different and helpful resources for using NumPy in Python code, such as their [absolute beginner's guide](https://numpy.org/doc/stable/user/absolute_beginners.html). When in doubt, look at the documentation website!

For completeness, the Python `math` module documentation can be found here [math docs](https://docs.python.org/3/library/math.html#) which is a part of the greater [Python documentation website](https://docs.python.org/3/).

A word of caution, larger projects like NumPy are well supported, but some projects are not so fortunate. As a result, smaller Python module projects may not have as good documentation as good documentation takes a lot of time and resources to make! However, there are still ways to find out useful information concerning individual modules and their functions. Often times, Google Search will be your friend, particularly any search results that take you to stackoverflow.com, and a useful resource when stuck on a programming problem.