<div>
<img src="https://ymeglobal.org/wp-content/uploads/2019/02/YME-LOGO-2017.png" width="300px" align = "left"/>
</div>

# YME: Data Science with Machine Learning Workshop

## Introduction to NumPy and Matplotlib
In this notebook, we will learn how to use NumPy and Matplotlib methods to perform multi-dimensional array operations and plot MATLAB-like publication-style figures of data.

#### Required Libraries:
* [NumPy](http://www.numpy.org/)
* [Matplotlib](http://matplotlib.org/)

---
#### Author: Ken Yew Piong

<i class="fa fa-linkedin-square fa-1x" aria-hidden="true"></i> Linkedin: [**@Ken Yew Piong**](https://www.linkedin.com/in/ken-yew-piong/)

<i class="fa fa-github-square fa-1x" aria-hidden="true"></i> GitHub: [**@KenYew**](https://github.com/KenYew)

<i class="fa fa-facebook-square fa-1x" aria-hidden="true"></i> Messenger: [**@kkenyew**](https://m.me/kkenyew)

<i class="fa fa-envelope-square" aria-hidden="true"></i> Mail: josephpiong@live.com


---
## Import Libraries

In [None]:
%%html
<style>
table {float:left}
</style>

In [None]:
# Import NumPy library
import numpy as np 
# Import Matplotlib library
import matplotlib.pyplot as plt

---
## Chapter 1: The NumPy Library
<div>
<img src="https://raw.githubusercontent.com/KenYew/YME-Python-Workshop/master/images/numpy_logo.png" width="500px" align = "left"/>
</div>

#### __NumPy is the fundamental package for scientific computing with Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. The benefits of NumPy are:__

- **Size** - Numpy data structures take up less space
- **Performance** - Efficient memory storage of NumPy arrays enable faster performances than lists 
- **Functionality** - NumPy have optimized mathematical functions on arrays such as linear algebra operations built in.
- **Flexibility** - High flexibility with N-dimensional arrays


<div>
<img src="https://raw.githubusercontent.com/KenYew/YME-Python-Workshop/master/images/numpy_array_t.png" width="500px" align = "left"/>
</div>

#### __An array is an N-dimensional structure that represents any regular data. The dimension in NumPy is called "axis" and it starts from 0.__
#### For instance, in a 2-dimensional array, axis=0 indicates the rows and axis=1 indicates the columns.


---
## 1.1 Array Creation in NumPy
#### __List of NumPy methods__
| Function    |          Description              |
| --------- | --------------------------------- |
| np.zeros   |   n-D array of zeros                |
| np.ones |   n-D array of ones |
| np.full 	  |   n-D array of constant value      |
| np.eye |   Identity matrix of a specific size |

### (A) 1-D Array Creation
#### Explicitly inputing values

In [None]:
input_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
arr = np.array(input_list, dtype=float) # OR arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype=float)
print(arr)

#### Lineary spaced values

In [None]:
arr_lin = np.linspace(start=0, stop=1, num=11, dtype=float)
# arr_lin = np.linspace(0, 1, 11) # This syntax has the same outcome
print(arr_lin)

#### Random values

In [None]:
arr_rand = np.random.randn(11)
print(arr_rand)

### (B) 2-D Array Creation
#### Zeros matrix

In [None]:
arr_zeros = np.zeros((3,3))

print(arr_zeros)
print('Datatype:', type(arr_zeros)) # array datatype
print('Array shape:', arr_zeros.shape) # array shape
print('Array dimensions:', arr_zeros.ndim) # array dimension

#### Ones matrix

In [None]:
arr_ones = np.ones((3,3))

print(arr_ones)
print('Datatype:', type(arr_ones)) # array datatype
print('Array shape:', arr_ones.shape) # array shape
print('Array dimensions:', arr_ones.ndim) # array dimension

#### Identity Matrix

In [None]:
arr_id = np.eye(3, 3)

print(arr_id)
print('Datatype:', type(arr_id)) # array datatype
print('Array shape:', arr_id.shape) # array shape
print('Array dimensions:', arr_id.ndim) # array dimension

#### Matrix with same values

In [None]:
arr_fulls = np.full((3, 3), 8) # Create a 3x3 array filled with numbers 8

print(arr_fulls)
print('Datatype:', type(arr_fulls)) # array datatype
print('Array shape:', arr_fulls.shape) # array shape
print('Array dimensions:', arr_fulls.ndim) # array dimension

---
## 1.2 Array Manipulation in NumPy
#### __NumPy array arithmetic methods__
```python 
np.add(X1, X2) # Two arrays to be arithmetically computed
```

| Function    |          Description              |
| --------- | --------------------------------- |
| `np.add` |   Add two arrays |
| `np.subtract` |   Subtract two arrays   |
| `np.multiply` |   Multiply two arrays |
| `np.dot` |   Dot product two arrays   |
| `np.cross` |   Cross product two arrays |
| `np.divide` |   Divide two arrays   |
| `np.sin` |   Apply sine to the array      |
| `np.cos`    |   Apply cosine to the array   |
| `np.exp` | Apply exponential to the array |
| `np.log` | Apply logarithm to the array |

#### __Create 2-D arrays with random integers__

In [None]:
X1 = np.random.randint(low=1, high=5, size=(3, 3)) # Create a 3x3 array with random integers from 1 to 5
X2 = np.random.randint(low=1, high=5, size=(3, 3)) # Create a 3x3 array with random integers from 1 to 5
print('Array 1:\n', X1, '\n')
print('Array 2:\n', X2, '\n')

#### __Array Mathematical Operations__

In [None]:
print('Addition:\n', np.add(X1, X2), '\n')
print('Subtraction:\n', np.subtract(X1, X2), '\n')
print('Multiplication:\n', np.multiply(X1, X2), '\n')
print('Dot Product:\n', np.dot(X1, X2), '\n')
print('Cross Product:\n', np.cross(X1, X2), '\n')
print('Division:\n', np.divide(X1, X2), '\n')
print('Reciprocal:\n', np.reciprocal(X1), '\n')

---
## 1.3 Array Inspection in NumPy

#### __NumPy array inspection methods__
```python 
my_list = [[1,1,1], [2,2,2], [3,3,3]]
a = np.array(my_list)
a.shape # Call the NumPy method directly on array of interest
```

| Function    |          Description              |
| --------- | --------------------------------- |
| `a.shape` |   Array dimensions |
| `len(a)` |   Length of array |
| `a.ndim` |   Number of array dimensions |
| `a.size` |   Number of array elements |
| `a.dtype` |   Data type of array elements |
| `a.dtype.name` |   Name of data type |
| `a.astype(int)` |   Convert an array to a different type |



#### __Determine size of array__

In [None]:
my_list = [[1,1,1], [2,2,2], [3,3,3]]
a = np.array(my_list)
print(a, '\n')

print('Array shape:', a.shape) # Dimensions of array
print('Array length:', len(a)) # Length of 3 (width of array only)
print('Array no. of dimensions:', a.ndim) # 2-dimensional array
print('Array no. of elements:', a.size) # Size of 9 elements
print('Array element datatype:', a.dtype) # Data type of array elements
print('Array element datatype:', a.dtype.name) # Data type of array elements
print('Array datatype conversion:\n', a.astype(float)) # Convert array elements data types to float

---
## Your Turn!
First, let's run this code - we shall use it later to visualise our arrays using Matplotlib. Don't worry if you don't understand what it is doing.

In [None]:
def plot_2D_array(my_array,rescale_fig=0.7):
    '''
    Visualise a given my_array 2D numpy data. Use the optional rescale_fig 
    to resize the plot accordingly.
    
    '''
    i_dim, j_dim = my_array.shape

    fig, ax = plt.subplots(figsize=(j_dim*rescale_fig, i_dim*rescale_fig))
    im = ax.imshow(my_array)

    ax.set_xticks(np.arange(j_dim))
    ax.set_yticks(np.arange(i_dim))
    ax.set_xticklabels(np.arange(j_dim))
    ax.set_yticklabels(np.arange(i_dim))
    ax.set_xlabel('j', fontsize=14)
    ax.set_ylabel('i', fontsize=14,rotation=0)

    # Loop over data dimensions and create text annotations.
    for i in range(i_dim):
        for j in range(j_dim):
            text = ax.text(j, i, my_array[i, j],
                           ha="center", va="center", color="w")
    fig.tight_layout()
    plt.show()

We now can call ```plot_2D_array``` pass a 2D array for visualisation. For example:

In [None]:
np.random.seed(seed=4400)
# create an array
A = np.random.randint(100,size=(3,4))
print(A)
# visualise it by calling plot_2D array we defined above.
plot_2D_array(A)

In the diagram above, the address or index of each array entry is shown on the axes. To access the i-th row and j-th column, we use ```A[i,j]```.

In [None]:
A[1,3]

We can also access a range of entries in the array, for example:

In [None]:
B = A[0:2,1:3]
B

## Task 1: Array Manipulation
Below, we create a $8\times8$ array with random integer between 0 to 100. Change the values in a subset of array A to -50 using the following indexes. We've completed the first one for you.
1. ```A[3,:-2]```
1. ```A[0,:-1]```
1. ```A[:,2:5]```
1. ```A[3:6,3:]```

In [None]:
np.random.seed(seed=4400)
# create an array
A = np.random.randint(100,size=(8,8))

# Change the indexing here
A[3,:-2] = -50

# YOUR CODE


# visualise it by calling plot_2D array we defined above.
plot_2D_array(A)

---
## Chapter 2: The Matplotlib Library

<div>
<img src="https://matplotlib.org/_static/logo2_compressed.svg" width="500px" align = "left"/>
</div>

Matplotlib is a visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays. 

One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram etc.

Pyplot is a Matplotlib module which provides a MATLAB-like interface, with the advantage of being free and open-source. 
`Some useful pyplot functions:`

| Function    |          Description              |
| ----------- | --------------------------------- |
| plt.axhline |   Horizontal line across the axis |
| plt.axvline |   Vertical line across the axis   |
| plt.boxplot |   Box and whisker plot            |
| plt.plot    |   Simple line plot                |
| plt.hist    |   Histogram plot                  |
| plt.scatter |   Scatter plot of y vs. x with varying marker size and/or color |
| plt.xlim 	  |   Get or set the x limits         |
| plt.xticks  |   Get or set the current tick locations and labels of the x-axis |
| plt.ylabel  |   Set the label for the y-axis |
| plt.ylim    |	  Get or set the y limits      |
| plt.yticks  |   Get or set the current tick locations and labels of the y-axis |
| plt.title   |   Set the title of the plot    |

---
## 2.1 Simple 2-D plot
We first create a 1D array called ```time``` using ```np.linspace(0,2,50)```. 

Then create a sinusoidal signal using $signal = 2\times np.sin(2\times np.pi\times time)$. Here, were are calling built-in ```sin``` and ```pi``` functions in Numpy.

In [None]:
time = np.linspace(0,2,50)
signal = 2*np.sin(2*np.pi*time)

Plot ```time``` vs. ```signal```.

In [None]:
plt.plot(time,signal)

## 2.2 Basic Formating
We now explore some of the typical formatting we need for our plot. First run the following ```example_plot``` function (it won't give any output for now)

In [None]:
def example_plot(x,y):
    '''
    Plots x vs. y values with some pre-determined formatting.  
    '''
    plt.plot(x, y, '--r', linewidth=3, label='red line')

    plt.text(1.25, 1,'Peak')

    plt.title('Title with Greek letter $\sigma$', fontsize=15)

    # axis limit
    plt.xlim(0,2)
    plt.ylim(-2,2)

    # axis labels
    plt.xlabel('x-axis')
    plt.ylabel('y-axis')
    
    # grids
    plt.grid()

    # activate legend
    plt.legend(loc='upper right')       

Then, call ```example_plot``` by passing ```time``` and ```signal``` as `x` and `y`.

Then ```plt.savefig``` to save the plot. Optionally, we can use ```plt.tight_layout()``` to get rid of the large white margin in our saved plot.

In [None]:
# plot time vs. signal by calling example_plot function
example_plot(time,signal)

# save the figure
plt.tight_layout()
plt.savefig('./my_saved_plot.png')

Check your working folder. You should have a new file called 'my_saved_plot.png'. Here we specified it as '.png' format. Of course, you can also use '.jpg'.

---
## Your Turn!
Now we want to explore some of the formatting ourselves.

Without using the ```example_plot``` function, reproduce the plot above but with the following modifications
1. Change the solid red line to green star and label this as ```label='Old signal'```
2. Add new plot with ```0.1*signal``` in the y axis. Use blue line with ```linewidth=5``` and ```label='New signal'```
2. Add an arrow pointing to the peak at (1.25,2) using ```plt.arrow```.
2. Relabel the axes with 'Time (s)' (x axis) and 'Signal (kV)' (y axis)
3. Change $\sigma$ in the title to $\hat{\omega}_n$ (hint: Google 'Latex greek letters') and change the font size to 20 (hint: use ```fontsize```). 
4. Make sure the legend is not blocking the lines.

Make sure to add the formatting one by one, and see the resulting plot before adding the next formatting. We have added ```plt.plot(time, signal, '*g')``` for item 1 for you the get started.

In [None]:
# e.g to change the solid red line to green star
plt.plot(time, signal, '*g', label='Old signal')

# YOUR CODE


---
## 2.3 Multiple subplots
We can also have the previous two plots on separate subplots. To do this we can use ```plt.subplot(total_plot_in_y, total_plot_in_x, increment)```. For example:

In [None]:
plt.subplot(2, 1, 1)
plt.plot(time, signal, '*g', label='Old signal')
plt.title('Old signal')
plt.xlabel('Time (s)')
plt.ylabel('Signal (kV)')

plt.subplot(2, 1, 2)
plt.plot(time, 0.1*signal, '--k', linewidth=5,label='New signal')
plt.title('New signal')
plt.xlabel('Time (s)')
plt.ylabel('Signal (kV)')

plt.tight_layout()

So far we have been adding the plot on figures, without explicitly creating a figure object. Most of the times, we can get away with this. However, sometimes we may need to have further control on the figure, for example to control the size of the figure in inch. To do this we have to create a figure object using ```plt.figure()``` as pass the ```figsize=(my_x_length, my_y_length)``` as an argument. For example:

In [None]:
# create a figure object
my_fig = plt.figure(figsize=(8,3)) 
# replot time vs. signal
plt.plot(time,signal)

Similarly, we can also explicitly create multiple axis objects using ```plt.subplots(total_plot_in_y, total_plot_in_x)```. For example.

In [None]:
# adding multiple axis objects  
my_fig, (my_ax1, my_ax2) = plt.subplots(2,1)

# plot by calling first axis
my_ax1.plot(time, signal, '*g', label='Old signal')
my_ax1.set_title('Old signal')
my_ax1.set_xlabel('Time (s)')
my_ax1.set_ylabel('Signal (kV)')

# plot by calling second axis
my_ax2.plot(time, 0.1*signal, '--k', linewidth=5,label='New signal')
my_ax2.set_title('New signal')
my_ax2.set_xlabel('Time (s)')
my_ax2.set_ylabel('Signal (kV)')

my_fig.tight_layout()

Notice that we need to plot on the axis (e.g. ```my_ax1.plot()``` instead of ```plt.plot()```) and the formatting syntax is a bit different now (e.g. ```my_ax1.set_title``` instead of ```plt.title()```).

---
## 2.4 Histogram

First, we generate a 1000 random data with a mean $\mu$=20 and standard deviation $\sigma$=10.

In [None]:
# Generate random data 
data = np.random.normal(100,10,1000)

Check that we are really getting the right data by calculating $\mu$ and $\sigma$ by using ```np.mean``` and ```np.std```

In [None]:
# Fit a normal distribution to the data:
mu, std = np.mean(data), np.std(data)
mu, std

Use ```plt.hist``` to plot the histogram of data. Use the optional ``bins`` for number of bins and ```alpha``` to control the transparency of the plot. 

In [None]:
# Plot the histogram.
plt.hist(data, bins=25, alpha=0.5);

---
## 2.5 3D plot

First we create a 3D data in the  x, y and z directions.
1. Create the ranges of values in x and y axes. For example for -5 ot 5 data with 0.25 step in between each point, we use ```x = np.arange(-5, 5, 0.25)```.
2. Create X and Y coordinates using ```X, Y = np.meshgrid(x, y)```. This will give us two 2D arrays of ```X``` and ```Y```.
3. For a given coordinate in ```X``` and ```Y```, we need to give the value in ```Z``` axis. Let's assume ```Z = np.sin(np.sqrt(X**2 + Y**2))``` (just to get a pretty plot).

In [None]:
# Make data.
x = np.arange(-5, 5, 0.25)
y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(x, y)
Z = np.sin( np.sqrt(X**2 + Y**2) )

To plot ```X```, ```Y``` and ```Z```: 
1. We need to import Axes3D using ```from mpl_toolkits.mplot3d import Axes3D```. 
2. We also need to plot in a dedicated axis object specified as a 3D projection.

3. Then use ```plot_surface(X, Y, Z, cmap=my_cmap)``` to plot X, Y and Z data. ```cmap``` is the type of colour mapping. 

4. We can add the color bar using ```fig.colorbar(my_surface)```. Here we need to explicitly point it to a specific plot, hence the reason we have to give our plot in step 3 a name.

For example, the corresponding code for each steps are:

In [None]:
# step 1
from mpl_toolkits.mplot3d import Axes3D

# step 2
my_fig = plt.figure() # instantiate a figure object
my_ax = my_fig.gca(projection='3d') # instantiate 3D axis

# step 3
my_surface  = my_ax.plot_surface(X, Y, Z, cmap='viridis')

# step 4
my_fig.colorbar(my_surface)