# Session 2: Handling data and plotting

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="right"/></a>

Authors: 
- Dr Valentina Erastova 
- Dr Matteo Degiacomi 
- Dr Tom Slater

Email: slatert2@cardiff.ac.uk


## Learning objectives  <a id="learning"></a>

* use the `numpy` library 
* perform mathematical operations on `numpy` arrays in 1D
* access parts of arrays
* load arrays to or from files
* plot data using `matplotlib`

Some of the material was adapted from [Python4Science](https://github.com/Degiacomi-Lab/python4science/blob/master/2_Python_numerical_data.ipynb).

## Table of Contents

1. [Arrays and NumPy](#1-arrays-and-numpy)    
    1.1 [1D Arrays](#11-1d-arrays)    
    1.2 [Tasks 1](#tasks-1)    
2. [Mathematical operations on 1D arrays](#2-mathematical-operations-on-1d-arrays)    
3. [Accessing slices of 1D arrays](#3-accessing-slices-of-1d-arrays)  
    3.1 [Loading an array to/from a file](#loading-an-array-to-from-a-file)  
4. [Plotting data](#plotting-data)    
    4.1 [A simple plot](#simple-plot)     
    4.2 [Quick aside on string formatting](#quick-aside-on-string-formatting)     
    4.3 [Object Oriented Plotting](#oo-plotting)    
5. [End of Session Task](#final-task)

**<span style="color:black">Jupyter Cheat Sheet</span>**
- To run the currently highlighted cell and move focus to the next cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- To run the currently highlighted cell and keep focus in the same cell, hold <kbd>&#x21E7; Ctrl</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;

## Links to documentation

You can find useful information about using `numpy` and `matplotlib` at
* [NumPy](https://numpy.org)
* [matplotlib](https://matplotlib.org)


If you are running this notebook in Google Colab, please copy this notebook to your Google Drive (Copy to Drive) in order to save all of your changes. You will also need to run the following code cell in order to use the data later in the notebook.

In [None]:
! git clone https://github.com/TomSlater/Introduction-To-Programming-with-Python
%cd Introduction-To-Programming-with-Python

# 1. Arrays and NumPy <a id="1-arrays-and-numpy"></a>

* An **array** is a smart way of storing multidimensional numerical data. A one dimensional array is particularly useful for storing spectroscopy data, 2D arrays are used for storing images.

* **NumPy**, which stands for *Numerical Python*, is a module consisting of multidimensional array objects and a collection of routines for processing those arrays.

* A module is a collection of code that someone has packaged together for us by others. We can use a pre-built module in our code by using the `import` statement.

* We can use `NumPy` to perform mathematical and logical operations on arrays.

* `NumPy` is a base for many other modules.

### Import the NumPy library

For `NumPy`, the standard-practice alias is `np.`:

In [None]:
import numpy as np

## 1.1 1D Arrays <a id="11-1d-arrays"></a>

### Lists

Before we start using NumPy arrays, let's first take a look at another built-in python object, the `list`. Lists are a one dimensional collection of data, which can have any types.

```python
#This list is just integers.
my_list = [4, 2, 7, 4, 5]

#This list is a mixture of data types (including another list).
my_list = [4, 'Harry', 7.0, [2,"Potter"], 5]

```

NumPy arrays can only contain **one datatype**, i.e. all integers, all floats, etc. For this reason, we'll just be using lists of either integers or floats for the rest of the notebook.

### Creating 1D arrays 

To create an array of integers (single numbers like 1, 2, 3, 4, 5) we can do it by converting a list to an array as:

```python
import numpy as np

my_list = [1, 2, 3, 4, 5]

my_array = np.array(my_list)
```


### Example 1

In [None]:
# Create a 1D numpy array:
a = [1, 2, 3, 4, 5] # Your list can be of any length
my_array = np.array(a)

### Example 2

Let's look at some of the **properties** of our array. 

How do you get the **dimensions**, **shape**, **size**, and **data type** of an array?

In [None]:
# Check the properties of this 1D array
print(f"Dimensions {my_array.ndim}")
print(f"Shape {my_array.shape}")
print(f"Size {my_array.size}")
print(f"Datatype {my_array.dtype}")

### Example 3

We can also use **functions** to generate arrays.

We can generate one-dimensional arrays of equally-spaced numbers with:
* `np.linspace(start, end, quantity)` or
* `np.arange(start, end, step_size)`

We can also generate multidimensional arrays filled with zeros or ones with NumPy functions:
* `np.zeros(shape)`
* `np.ones(shape)`

where `shape` has to be an `int` for 1D arrays and `tuple`, such as `(5, 6)`, for creating a 2D array.

**Let's use `np.zeros(shape)` to create a 1D array full of zeros:**

In [None]:
z = np.zeros(10)
print(f"My array of zeros {z} is of type {z.dtype}")

## Tasks 1 <a id="tasks-1"></a>

We will continue to generate 1D arrays, access parts of an array, and perform some mathematical operations on them. 


<div class="alert alert-success">
    <b>Task 1.1 </b> : Generate a 1D array of length 5, filled with ones.
</div>



In [None]:
# FIXME



<details><summary {style="color:green;font-weight:bold"}> Click here to see the solution to Task 1.1 </summary>

```python
ones = np.ones(5)
print("Array of five ones: "+ones)
```
</details>

<div class="alert alert-success"><b> Task 1.2: Create an array with `np.arange`</b>

Using `np.arange`, create a 1D array as a sequence from 0 to 20 in steps of 2.

</div>

In [None]:
# FIXME


<details><summary {style="color:green;font-weight:bold"}> Click here to see the solution to Task 1.2</summary>

```python
sequence = np.arange(0, 21, 2) 
print(sequence)
```
</details>

<div class="alert alert-success"><b>Question</b>: What number did you have to stop at to include 20 as a last number? Why?
</div>

<details><summary {style="color:green;font-weight:bold"}>Click here to see the answer to the above question.</summary>

Python starts counting from 0 and in `np.arange(start, stop, step)`, the `stop` value is not inclusive.
</details>

<div class="aler alert-warning"><b> Advanced Task 1.3</b> <a id="task-13"></a>

Find the last number in an array `np.arange(0, 20, 2)`.

Is the answer as you expected?
</div>

In [None]:
# FIXME


<details><summary {style="color:green;font-weight:bold"}>Click here to see the solution to the Advanced task 1.3.</summary>

```python
a = np.arange(0, 20, 2)
last = a[-1]
print(last)
```
</details>

<div class="alert alert-success"><b> Task 1.4: Generate another array</b>

Generate the same array as we did with `np.arange(0, 20, 2)` but this time using `np.linspace(start, stop, n_steps)`.

How do these two functions differ?

</div>

In [None]:
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to Task 1.4</summary>

```python
b = np.linspace(0, 20, 11)
print(b)
```

Note that in this case, the end point is included in the generated array. This is also explained in the [documentation](https://numpy.org).
</details>

# 2. Mathematical operations on 1D arrays <a id="2-mathematical-operations-on-1d-arrays"></a>

All mathematical operations between NumPy arrays act element by element. This is not the same for lists, which is why using NumPy is so useful. 

Operations with scalar numbers act on every element of the array. 

For example:

If we define: 
```python
a = np.array([1, 2, 3])
b = np.array([0, 1, 2])
```
then
* `a * b` returns the array `[0, 2, 6]`
* `a - b` returns the array `[1, 1, 1]`
* `a + 1` returns the array `[2, 3, 4]`

Arrays can be used to conduct mathematical operations in a compact way. If we were using *lists*, we would have to loop through each element of the list to perform similar operations.

We will see some examples of this below.

<div class="alert alert-success"><b> Task 2.1: Add a scalar to an array </b>

Create an array called `my_array` containing the numbers 3, 6, 7, 2 and 8. Add the number 3 to every number of the array.

</div>

In [None]:
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to Task 2.1 </summary>

```python

my_array = np.array([3, 6, 7, 2, 8])

new_array = my_array + 3

print("my_array + 3 =",new_array)
```
</details>

We can also do mathematical operations between two arrays. 

**Note:** the arrays have to have the same dimensions.

<div class="alert alert-success">
    <b>Task 2.2: Mathematical operations between two arrays.</b>

   Create 2 arrays of your liking and perform mathematical operations.
   
   For example: multiply them, substract one from another, and add them up.
   
   Print the answers.
</div>




In [None]:
# FIXME
a = 
b = 

print("multiplication a * b = ",)
print("substraction a - b = ",)
print("addition a + b = ",)


<details><summary {style="color:green; font-weight:bold"}> Click here to see solution to Task 2.2 </summary>

```python
a = np.array([1, 2, 4])
b = np.array([0, 1, 2])

print("multiplication a * b = ",a * b)
print("substraction a - b = ",a - b)
print("addition a + b = ",a + b)
```
</details>

 <div class="alert alert-success">
    <b>Task 2.3: Square each value in <code>my_array</code></b> 
</div>

<div class="alert alert-info"><b> Hint:</b> you can use <code>**</code> as an operator to raise to a power, i.e. $x^2$ would be written as <code>x**2</code> in Python.
</div>

In [None]:
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the soluton to Task 2.3. </summary>

```python

my_array = np.array([3, 6, 7, 2, 8])
my_array_squared = my_array ** 2

print(my_array_squared)

```
</details>

### Example 4

What is the difference between using `numpy` and using `math`?

How do you calculate:
* the square-root of a single number?
* the square-root of a list?
* the square-root of an array?

See what happens when you run the code below.

<div class="alert alert-info">
<b>Note:</b> The community-agreed alias for the math library is just <code>m</code>.
</div>

In [None]:
import math as m
import numpy as np

# Square-root of a single number:
# with math
print (m.sqrt(4)) 
# with numpy
print (np.sqrt(4))
# mathematically, by calculating 4^{1/2} 
print (4**0.5) 

# Square-root of a list of numbers
l = [4, 9, 16] 
# numpy: square root of every element 
print (np.sqrt(l)) 
 # Can you use math here?
print (m.sqrt(l)) 

# Square-root of an array
a = np.array(l)
# square root of every element of a numpy array
print(np.sqrt(a)) 
# would this work?
print(m.sqrt(a)) 

# 3. Accessing *slices* of 1D arrays <a id="3-accessing-slices-of-1d-arrays"></a>

Slicing an array is the operation of extracting a subset of it, as shown in the figure below.

<img src="images/slicing1.png" width="500">

We will learn about *slicing* in the following task.

<div class="alert alert-success"><b> Task 3.1: Slicing arrays </b>

1. Generate a 1D array of 20 elements and fill it with random numbers.
2. Pick every 3rd value within the first 10 values.
3. Print how many values you get
4. What is the last number in your array? (See [Advanced task 1.3](#task-13))
</div>


<div class="alert alert-info"><b> Hint</b>

 Try executing  `np.random.default_rng(seed)`

This is a random number generator, where the `seed` is used to "initialise" the number generator. You can read more about this in the [Random Generator Documentation from NumPy](https://numpy.org/doc/stable/reference/random/generator.html).

 </div>

In [None]:
# 1. Generate a 1D array of 20 elements and fill it with random numbers.
# FIXME

# 2. Pick every 3rd value within the first 10 values.
# FIXME

# 3. Print how many values you get
# FIXME

# 4. What is the last number in your array?
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to Task 3.1.</summary>

```python

# 1. Generate a 1D array of 20 elements and fill it with random numbers.

random_generator = np.random.default_rng(12345)
random_numbers = random_generator.random(20)
print(random_numbers)

# 2. Pick every 3rd value within the first 10 values.
picked = random_numbers[0:10:3]

# 3. Print how many values you get
print(len(random_numbers))
print(len(picked))

# 4. What is the last number in your array?    
last = random_numbers[-1]
print(last)
```
</details>


## 3.1 Loading an array to/from a file <a id="loading-an-array-to-from-a-file"></a>

We can also load arrays from a plain text file. 


There are many options available for loading the file, such as:

To load a file `array.txt`: 

```python

loaded_array = np.loadtxt("array.txt")

```

We can skip some lines, for example in the case where the file has a header over the first 5 lines of the file, using the option `skiprows`. 

Similarly, if the file contains comments, we can use the option `comments` to specify the character used for comments, so that these lines also get ignored by python. 
```python
clean_array = np.loadtxt("array.txt", comments="#", skiprows=5)
```

To save the array called `my_array` into the file, use `np.savetxt`:

```python
np.savetxt("my_array.txt", data)
```

<div class="alert alert-success"><b>Task 3.2: Load data to and from a file with arrays</b>

1. Load in the file `data/slice_me.txt` and skip the first row. (The `data/` part specifies the folder in which the file is.)
2. Print the shape of this data
3. Save this to another file called `data/slice_me_copy.txt`

</div>

In [None]:
# 1. Load in the file data/slice_me.txt and skip the first row.
# FIXME

# 2. Print the shape of this data
# FIXME

# 3. Save this to another file called data/slice_me_copy.txt
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to Task 3.2</summary>

```python

# 1. Load in the file data/slice_me.txt and skip the first row.
data = np.loadtxt("data/slice_me.txt", skiprows=1)

# 2. Print the shape of this data
print(data.shape)

# 3. Save this to another file called data/slice_me_copy.txt
np.savetxt("data/slice_me_copy.txt", data)

```
</details>

<div class="alert alert-success"><b> Task 3.3: Slicing data arrays</b> <a id="task-23"></a>

The folder `data` contains a file called `ethyl_cyanoacetate.txt`, which contains NMR data given in two columns: chemical shift and intensity.

1. Read in the file `ethyl_cyanoacetate.txt`
2. Create a sub-sample of the intensities data by extracting every 10th line into a variable called `subdata`.
3. Save the `subdata` into a new file called `sub_intensities.txt` in the `data` folder.

<b>Note:</b> it might be a good idea to print the shapes of `data` and `subdata` to check if your slicing is correct after step 2.
</div>

In [None]:
# 1. Read in the file ms.txt
# FIXME

# 2. Create a sub-sample of the data by extracting every 10th line into a variable called `subdata`.
# FIXME

# 3. Save the intensities column from `subdata` into a new file.
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to Task 3.3. </summary>

```python
# 1. Read in the file ms.txt
data = np.loadtxt("data/ethyl_cyanoacetate.txt")

# 2. Create a sub-sample of the data by extracting every 10th line into a variable called `subdata`.
subdata = data[::10, 1]

# Check the shapes of the datasets
print(data.shape)
print(subdata.shape)

# 3. Save the intensities column from `subdata` into a new file.
np.savetxt("data/sub_intensities.txt", subdata)

```
</details>

<div class="alert alert-warning"><b> Advanced Task 3.4</b>

Can you do the above without numpy, only using in-built python functionality?

In [None]:
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to the Advanced task 3.4 </summary>

```python

# Read file in line by line
with open("data/ethyl_cyanoacetate.txt", "r") as input_file:
    lines = input_file.readlines()

# Counter for counting every 10th line
counter = 0

# Create an empty list to store intensity values
intensities = []

# Loop over the lines in the file
for line in lines:

    # If counter is divisible by 10
    if counter % 10 == 0:
        # split the line (string) into two columns:
        columns = line.split()

        # the second column is intensity
        intensity = columns[1]

        # append intensity value to intensities list
        intensities.append(intensity)

    # increment the counter
    counter += 1

# Open file for writing:
with open("data/sub_densities.txt", "w") as output_file:
    # Loop over all the values in the list intensities
    for intensity in intensities:
        # Write each intensity to the file on separate lines
        output_file.write(f"{intensity} \n")

```
</details>

# Key Points <a id="recap"></a>

<div class="alert alert-info">

- Numpy is a Python package to efficiently read/write and manipulate numerical data
- it can handle data of arbitrary size and shape
- algebraic operations across arrays take place element by element, i.e. arrays are <b>not</b> matrices.
- numpy enables applying mathematical operations along desider axes.
</div>

# 4. Plotting data <a id="plotting-data"></a>

We can use the [matplotlib](http://matplotlib.org) package to plot data using Python. 

We first look at the `pyplot` functional interface, which allows us to manipulate a given current figure. `pyplot` is great to quickly visualize data we are working with, but it is **not suitable** for plots of multiple data quantities, subplots, or more complex customizations.  In this case, an *object-oriented* plotting approach is needed. We will discuss the object-oriented plotting in Section 1.3.

## 4.1 A simple plot <a id="simple-plot"></a>

### Example 5

As always, we begin with **importing the `matplotlib.pyplot` module** with the alias `plt`. 

This is the community-agreed alias for `matplotlib.pyplot`.

In [None]:
import matplotlib.pyplot as plt

### Example 6

To create a plot, we use the `matplotlib` function `plt.plot()`. 

Load in the file `data/sub_intensities.txt` that you created in Task 3.3.

It is good practice to use `plt.show()` to show the plot, even though the plot will pop up in Jupyter without this as well.

In [None]:
# Read the file
data = np.loadtxt("data/sub_intensities.txt")

# Plot 
plt.plot(data)
plt.show()

**Note:** the plot displayed is generated from the sub-sampled data, which only has intensities. Therefore, this data does not have the chemical shift column, so x-axis is just the row number.


### Labeling the plot and the data <a class="anchor" id="labelplt"></a>

It is always good practice to **label the plots**. <a class="anchor" id="labelplt"></a>

Use the following commands to add the labels to your plot:
 - `xlabel()`
 - `ylabel()` 
 - `title()`

<div class="alert alert-success">
    <b>Task 4.1</b> : Plot the <code>ethyl_cyanoacetate.txt</code> data as chemical shift vs Intensity, label the plot.
        
</div>

In [None]:
# FIXME


<details><summary {style="color:green; font-weight:bold"}> Click here to see the solution to Task 4.1.</summary>

```python
# Load in the data
data = np.loadtxt("data/ethyl_cyanoacetate.txt")

# Assign the columns to 'chemical_shift' and 'intensity'
chemical_shift = data[:,0]
intensity = data[:,1]

# plot mz against intensity
plt.plot(chemical_shift, intensity)

# label the plot
plt.title("NMR spectrum of ethyl_cyanoacetate")
plt.xlabel("Chemical Shift (ppm)")
plt.ylabel("Intensity")

# save the plot
plt.savefig("data/myfigure.png")

# show the plot
plt.show()
```
</details>

## 4.2 Quick aside on string formatting <a id="quick-aside-on-string-formatting"></a>

We can use **f-strings** to format strings in a nice way. This is useful for, e.g., labelling scientific plots.

For example, let's say we want to creare a plot label for pressure as "Pressure ($\mathrm{N / m}^2$)" in Python:

```python
    plt.plot(x, y)
    plt.xlabel(f"pressure (N / m$^2$)")
```

We can do this using LaTex notation given inside the `$ $` signs. 

[Click here](https://oeis.org/wiki/List_of_LaTeX_mathematical_symbols) for a list of some of the mathematical symbols you can write in this format. 

Some of the most useful ones for chemists are **superscripts** `$^{-2}$` and **subscripts** `$_{\mathrm{exp}}$`. The expression `\mathrm{}` stands for "maths roman" which ensures the superscript is written in non-italic. 

You can use this "math mode" in markdown cells in a similar way to write equations. 

Another useful method of f-strings is formatting the number of significant figures of values. For example, let's say we want to print the mass of something with 2 significant figures:

```python
    mass = 0.198 # in g
    print(f"The final mass is {mass:.2f} g.")
```

which prints: `The final mass is 0.20 g.`

## 4.3 Object Oriented Plotting <a id="oo-plotting"></a>

In [Section 4.1](#simple-plot), we have done only very basic plots  with the `pyplot` module of the matplotlib package. In this section, we will introduce more complex plotting, by adopting a more sophisticad **Object Oriented Plotting** approach. If you are eager to know more, please see discussion on [PyPlot vs. Object Oriented Interfaces](https://matplotlib.org/matplotblog/posts/pyplot-vs-object-oriented-interface/) on the matplotlib blog.

Object oriented plotting enables us to get control on all the components of a plot, shown in the figure below.

<img src="images/anatomy-of-a-figure.webp" width="600"></img>

To achieve this, we start with declaring an *object* which is a container for all elements that are rendered onto the object, i.e. our **figure**.

### Declare a figure *object*:

The following command produces a single figure (called `fig`) containing a single axes (i.e. a single plot called `ax` inside figure)

```python
fig, ax = plt.subplots()
```
With matplotlib, a figure can be created in different ways: 
    
```python
# an empty figure with no Axes
fig = plt.figure()  
# a figure with a single Axes
fig, ax = plt.subplots()  
# a figure with a 2x2 grid of Axes
fig, axs = plt.subplots(2, 2)  
```

### Add the data onto the axes of the plot with:  

```python
ax.plot(time, distance)
```

We can also include labels, markers, colors:
    
```python
# Plot some data on the axes
ax.plot(x, x, label="linear")  
# Plot more data on the axes...
ax.plot(x, x**2, label="quadratic", "x")  
# ... and some more:
ax.plot(x, x**3, label="cubic", color="orange")
````

### Add other elements, such as labels:

```python
# Add a y-label to the axes.
ax.set_ylabel("Distance (m)")
# Add an x-label to the axes. 
ax.set_xlabel("Time (s)")
# Add a title to the axes.  
ax.set_title("My plot")  
# Add a legend.
ax.legend()  
```

### Adjust figure size and resolution:  

```python
fig.set_size_inches(6,4)
fig.set_dpi(200)
```

### To finish the figure, render it together:

```python
plt.show()
```

# 5. End of Session Task <a id="final-task"></a>

<div class="alert alert-warning">
    <b>End of Class Task</b> : Plot the NMR spectra contained in <code>ethyl_cyanoacetate.txt</code> and <code>ethyl_phenylcyanoacetate.txt</code> (in the data directory) using object oriented plotting. Here, you should create four subplots in a 2X2 grid. On the two left subplots you should plot the original data for the entire NMR spectrum. On the two right subplots, you should plot a cropped region around 4.45 and 4.15 ppm (the indices for these values are approximately 45100 and 42300). You should appropriately label your plot and subplots (e.g. titles and axes labels).
<br/>
<br/>
When plotting NMR spectra it is common to invert the x-axis. You should do this here using the <code>invert_xaxis()</code> function, which is associated with one axis object (i.e. axs[0] in the example above).
        
</div>