Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and group below:

In [None]:
COURSE = "StatModels_2020_q3"
GROUP = "" # Either D2A or D2B
NAME = "" # Match your GitHub Classroom ID

---

###### Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2017 L.A. Barba, N.C. Clementi
###### Modified (2020) Gonzalo G. Peraza Mues

# Play with NumPy Arrays

Remember, this course assumes no coding experience, so the first three lessons were focused on creating a foundation with Python programming constructs using essentially _no mathematics_. 

In engineering applications, most computing situations benefit from using *arrays*: they are sequences of data all of the _same type_. They behave a lot like lists, except for the constraint in the type of their elements. There is a huge efficiency advantage when you know that all elements of a sequence are of the same type—so equivalent methods for arrays execute a lot faster than those for lists.

The Python language is expanded for special applications, like scientific computing, with **libraries**. The most important library in science and engineering is **NumPy**, providing the _n-dimensional array_ data structure (a.k.a, `ndarray`) and a wealth of functions, operations and algorithms for efficient linear-algebra computations.

In this lesson, you'll start playing with NumPy arrays and discover their power. You'll also meet another widely loved library: **Matplotlib**, for creating two-dimensional plots of data.

## Importing libraries

First, a word on importing libraries to expand your running Python session. Because libraries are large collections of code and are for special purposes, they are not loaded automatically when you launch Python (or IPython, or Jupyter). You have to import a library using the `import` command. For example, to import **NumPy**, with all its linear-algebra goodness, we enter:

```python
import numpy
```

Once you execute that command in a code cell, you can call any NumPy function using the dot notation, prepending the library name. For example, some commonly used functions are:

* [`numpy.linspace()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)
* [`numpy.ones()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html#numpy.ones)
* [`numpy.zeros()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros)
* [`numpy.empty()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html#numpy.empty)
* [`numpy.copy()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.copy.html#numpy.copy)

Follow the links to explore the documentation for these very useful NumPy functions!

You will find _a lot_ of sample code online that uses a different syntax for importing. They will do:
```python
import numpy as np
```
All this does is create an alias for `numpy` with the shorter string `np`, so you then would call a **NumPy** function like this: `np.linspace()`. This is just an alternative way of doing it, but has become the standard, so we'll follow along.

In [None]:
import numpy as np

## Creating arrays

To create a NumPy array from an existing list of (homogeneous) numbers, we call **`numpy.array()`**, like this:

In [None]:
np.array([3, 5, 8, 17])

NumPy offers many [ways to create arrays](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation) in addition to this. We already mentioned some of them above. 

Play with `numpy.ones()` and `numpy.zeros()`: they create arrays full of ones and zeros, respectively. We pass as an argument the number of array elements we want. 

In [None]:
np.ones(5)

In [None]:
np.zeros(3)

Another useful one: `numpy.arange()` gives an array of evenly spaced values in a defined interval. 

*Syntax:*

`numpy.arange(start, stop, step)`

where `start` by default is zero, `stop` is not inclusive, and the default
for `step` is one.  Play with it!


In [None]:
np.arange(4)

In [None]:
np.arange(2, 6)

In [None]:
np.arange(2, 6, 2)

In [None]:
np.arange(2, 6, 0.5)

`numpy.linspace()` is similar to `numpy.arange()`, but uses number of samples instead of a step size. It returns an array with evenly spaced numbers over the specified interval.  

*Syntax:*

`numpy.linspace(start, stop, num)`

`stop` is included by default (it can be removed, read the docs), and `num` by default is 50. 

In [None]:
np.linspace(2.0, 3.0)

In [None]:
len(np.linspace(2.0, 3.0))

In [None]:
np.linspace(2.0, 3.0, 6)

In [None]:
np.linspace(-1, 1, 9)

## Array operations

Let's assign some arrays to variable names and perform some operations with them.

In [None]:
x_array = np.linspace(-1, 1, 9)

Now that we've saved it with a variable name, we can do some computations with the array. E.g., take the square of every element of the array, in one go:

In [None]:
y_array = x_array**2
print(y_array)

We can also take the square root of a positive array, using the `numpy.sqrt()` function:

In [None]:
z_array = np.sqrt(y_array)
print(z_array)

Now that we have different arrays `x_array`, `y_array` and `z_array`, we can do more computations, like add or multiply them. For example:

In [None]:
add_array = x_array + y_array 
print(add_array)

Array addition is defined element-wise, like when adding two vectors (or matrices). Array multiplication is also element-wise:

In [None]:
mult_array = x_array * z_array
print(mult_array)

We can also divide arrays, but you have to be careful not to divide by zero. This operation will result in a **`nan`** which stands for *Not a Number*. Python will still perform the division.  

Let's see how this might look:

In [None]:
x_array / y_array

## Multidimensional arrays

### 2D arrays 

NumPy can create arrays of N dimensions.  For example, a 2D array is like a matrix, and is created from a nested list as follows:

In [None]:
array_2d = np.array([[1, 2], [3, 4]])
print(array_2d)

2D arrays can be added, subtracted, and multiplied:

In [None]:
X = np.array([[1, 2], [3, 4]])
Y = np.array([[1, -1], [0, 1]])

The addition of these two matrices works exactly as you would expect:

In [None]:
X + Y

What if we try to multiply arrays using the `'*'`operator?

In [None]:
X * Y

The multiplication using the `'*'` operator is element-wise. If we want to do matrix multiplication we use the `'@'` operator:

In [None]:
X @ Y

Or equivalently we can use `numpy.dot()`:

In [None]:
np.dot(X, Y)

### 3D arrays

Let's create a 3D array by reshaping a 1D array. We can use [`numpy.reshape()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html), where we pass the array we want to reshape and the shape we want to give it, i.e., the number of elements in each dimension. 

*Syntax*
 
`numpy.reshape(array, newshape)`

For example:

In [None]:
a = np.arange(24)

In [None]:
a_3D = np.reshape(a, (2, 3, 4))
print(a_3D)

We can check for the shape of a NumPy array using the function `numpy.shape()`:

In [None]:
np.shape(a_3D)

Visualizing the dimensions of the `a_3D` array can be tricky, so here is a diagram that will help you to understand how the dimensions are assigned: each dimension is shown as  a coordinate axis. For a 3D array, on the "x axis", we have the sub-arrays that themselves are two-dimensional (matrices). We have two of these 2D sub-arrays, in this case; each one has 3 rows and 4 columns. Study this sketch carefully, while comparing with how the array `a_3D` is printed out above. 

<img src="images/3d_array_sketch.png" style="width: 400px;"/> 


When we have multidimensional arrays, we can access slices of their elements by slicing on each dimension. This is one of the advantages of using arrays: we cannot do this with lists. 

Let's access some elements of our 2D array called `X`.

In [None]:
X

In [None]:
# Grab the element in the 1st row and 1st column 
X[0, 0]

In [None]:
# Grab the element in the 1st row and 2nd column 
X[0, 1]

Play with slicing on this array:

In [None]:
# Grab the 1st column
X[:, 0]

When we don't specify the start and/or end point in the slicing, the symbol `':'` means "all". In the example above, we are telling NumPy that we want all the elements from the 0-th index in the second dimension (the first column).

In [None]:
# Grab the 1st row
X[0, :]

##### Exercises:

From the X array:

1. Grab the 2nd column, store it in `x_c2`.
2. Grab the 2nd row, store it in `x_r_2`.

In [None]:
# Exercise 1

# YOUR CODE HERE
raise NotImplementedError()

# Exercise 2

# YOUR CODE HERE
raise NotImplementedError()

Let's practice with a 3D array. 

In [None]:
a_3D

If we want to grab the first column of both matrices in our `a_3D` array, we do:

In [None]:
a_3D[:, :, 0]

The line above is telling NumPy that we want:

* first `':'` : from the first dimension, grab all the elements (2 matrices).
* second `':'`: from the second dimension, grab all the elements (all the rows).
* `'0'`       : from the third dimension, grab the first element (first column).

If we want the first 2 elements of the first column of both matrices: 

In [None]:
a_3D[:, 0:2, 0]

Below, from the first matrix in our `a_3D` array, we will grab the two middle elements (5,6):

In [None]:
a_3D[0, 1, 1:3]

##### Exercises:

From the array named `a_3D`: 

1. Grab the two middle elements (17, 18) from the second matrix, store them in `mid_el`.
2. Grab the last row from both matrices, store it in `last_r`.
3. Grab the elements of the 1st matrix that exclude the first row and the first column, save them in `f_mat`. 
4. Grab the elements of the 2nd matrix that exclude the last row and the last column, save them in `s_mat`. 

In [None]:
# Exercise 1

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Exercise 2

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Exercise 3

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Exercise 4

# YOUR CODE HERE
raise NotImplementedError()

## NumPy == Fast and Clean! 

When we are working with numbers, arrays are a better option because the NumPy library has built-in functions that are optimized, and therefore faster than vanilla Python. Especially if we have big arrays. Besides, using NumPy arrays and exploiting their properties makes our code more readable.

For example, if we wanted to add element-wise the elements of 2 lists, we need to do it with a `for` statement. If we want to add two NumPy arrays, we just use the addtion `'+'` symbol!

Below, we will add two lists and two arrays (with random elements) and we'll compare the time it takes to compute each addition.

### Element-wise sum of a Python list

Using the Python library [`random`](https://docs.python.org/3/library/random.html), we will generate two lists with 10,000 pseudo-random elements in the range [0,10000), with no numbers repeated.

In [None]:
#import random library
import random

In [None]:
lst_1 = random.sample(range(10000), 10000)
lst_2 = random.sample(range(10000), 10000)

In [None]:
#print first 10 elements
print(lst_1[0:10])
print(lst_2[0:10])

We need to write a `for` statement, appending the result of the element-wise sum into a new list we call `result_lst`. 

For timing, we can use the IPython "magic" `%%time`. Writing at the beginning of the code cell the command `%%time` will give us the time it takes to execute all the code in that cell. 

In [None]:
%%time
res_lst = []
for i in range(10000):
    res_lst.append(lst_1[i] + lst_2[i])

In [None]:
print(res_lst[0:10])

### Element-wise sum of NumPy arrays

In this case, we generate arrays with random integers using the NumPy function [`numpy.random.randint()`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.randint.html). The arrays we generate with this function are not going to be like the lists: in this case we'll have 10,000 elements in the range [0, 10000) but they can repeat. Our goal is to compare the time it takes to compute addition of a _list_ or an _array_ of numbers, so all that matters is that the arrays and the lists are of the same length and type (integers).

In [None]:
arr_1 = np.random.randint(0, 10000, size=10000)
arr_2 = np.random.randint(0, 10000, size=10000)

In [None]:
#print first 10 elements
print(arr_1[0:10])
print(arr_2[0:10])

Now we can use the `%%time` cell magic, again, to see how long it takes NumPy to compute the element-wise sum.

In [None]:
%%time
arr_res = arr_1 + arr_2

Notice that in the case of arrays, the code not only is more readable (just one line of code), but it is also faster than with lists. This time advantage will be larger with bigger arrays/lists. 

(Your timing results may vary to the ones we show in this notebook, because you will be computing in a different machine.)

##### Exercise

1. Repeat the analysis, but now computing the operation that raises each element of an array/list to the power two. Use arrays of 10,000 elements. 

In [None]:
%%time
# Time list operations

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
%%time
# Time array operations

# YOUR CODE HERE
raise NotImplementedError()

## Time to Plot

You will love the Python library **Matplotlib**! You'll learn here about its module `pyplot`, which makes line plots. 

We need some data to plot. Let's define a NumPy array, compute derived data using its square, cube and square root (element-wise), and plot these values with the original array in the x-axis. 

In [None]:
xarray = np.linspace(0, 2, 41)
print(xarray)

In [None]:
pow2 = xarray**2
pow3 = xarray**3
pow_half = np.sqrt(xarray)

To plot the resulting arrays as a function of the orginal one (`xarray`) in the x-axis, we need to import the module `pyplot` from **Matplotlib**.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

The line `%matplotlib inline` is an instruction to get the output of plotting commands displayed "inline" inside the notebook. Other options for how to deal with plot output are available, but not of interest to you right now. 

We'll use the `pyplot.plot()` function, specifying the line color (`'k'` for black) and line style (`'-'`, `'--'` and `':'` for continuous, dashed and dotted line), and giving each line a label. Note that the values for `color`, `linestyle` and `label` are given in quotes.

In [None]:
#Plot x^2
plt.plot(xarray, pow2, color='k', linestyle='-', label='square')
#Plot x^3
plt.plot(xarray, pow3, color='k', linestyle='--', label='cube')
#Plot sqrt(x)
plt.plot(xarray, pow_half, color='k', linestyle=':', label='square root')
#Plot the legends in the best location
plt.legend(loc='best')

To illustrate other features, we will plot the same data, but varying the colors instead of the line style. We'll also use LaTeX syntax to write formulas in the labels. If you want to know more about LaTeX syntax, there is a [quick guide to LaTeX](https://users.dickinson.edu/~richesod/latex/latexcheatsheet.pdf) available online.

Adding a semicolon (`';'`) to the last line in the plotting code block prevents that ugly output, like `<matplotlib.legend.Legend at 0x7f8c83cc7898>`. Try it.

In [None]:
#Plot x^2
plt.plot(xarray, pow2, color='red', linestyle='-', label='$x^2$')
#Plot x^3
plt.plot(xarray, pow3, color='green', linestyle='-', label='$x^3$')
#Plot sqrt(x)
plt.plot(xarray, pow_half, color='blue', linestyle='-', label='$\sqrt{x}$')
#Plot the legends in the best location
plt.legend(loc='best'); 

That's very nice! By now, you are probably imagining all the great stuff you can do with Jupyter notebooks, Python and its scientific libraries **NumPy** and **Matplotlib**. We just saw an introduction to plotting but we will keep learning about the power of **Matplotlib** in the next lesson. 

If you are curious, you can explore all the beautiful plots you can make by browsing the [Matplotlib gallery](http://matplotlib.org/gallery.html).

##### Exercise:

Pick two different operations to apply to the `xarray` and plot them the resulting data in the same plot. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## What we've learned

* How to import libraries
* Multidimensional arrays using NumPy
* Accessing values and slicing in NumPy arrays
* `%%time` magic to time cell execution.
* Performance comparison: lists vs NumPy arrays
* Basic plotting with `pyplot`.

## References

1. _Effective Computation in Physics: Field Guide to Research with Python_ (2015). Anthony Scopatz & Kathryn D. Huff. O'Reilly Media, Inc.

2. _Numerical Python: A Practical Techniques Approach for Industry_. (2015). Robert Johansson. Appress. 

2. ["The world of Jupyter"—a tutorial](https://github.com/barbagroup/jupyter-tutorial). Lorena A. Barba - 2016