# Jupyter Guidebook for MSE 250 Lab

Jupyter notebooks are a versatile tool that allows you to write and run Python code directly in your browser.

This Guidebook is designed to be user-friendly, and no programming background is required.
You can easily generate plots based on the provided code snippets.

However, you will need to upload your data to your EGR home directory (reflected in the left panel), and manually adjust settings such as line colors, variable names and the like as required.

Please feel free to read along and click ▶ ("Run this cell and advance") at the top to execute each cell and see how it works.
Note that the order of cell execution matters as the internal state (e.g. values of defined variables) gets updated with each cell execution.


## Content

1. [How to upload data](#How-to-upload-data-files)
1. [How to import data](#How-to-import-data)
1. [How to do calculations](#How-to-do-calculations)
1. [How to plot graphs](#How-to-plot-graphs)
    1. [X–Y plots](#X-Y-scatter-and-line-plots)
    1. [Subplots](#Subplots)
    1. [Probability distributions](#Probability-distributions)
1. [Data analysis](#Data-analysis)
    1. [Basic statistics](#Basic-statistics)
    1. [Smoothing](#Smoothing)
    1. [Tangent to a curve](#Tangent-to-a-curve)
1. [Formatting equations with $\LaTeX$](#Formatting-equations-with-$\LaTeX$)

## How to upload data files

Any data files that we want to analyse and/or visualize have to be transferred to your EGR home directory.
One way to accomplish this is to click ⇪ ("Upload Files") at the top of the left Jupyter panel to upload the files that we need.
Of course, any other way, such as mounting your EGR home directory on your local machine, to copy necessary files would equally work.

After successfully uploading/transferring files, they will show up in the list of files on the left.

## Required Python modules

Before using Python for data processing and visualization, some required modules need to be imported.

* [pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) for dealing with spreadsheet-like data
* [NumPy](https://numpy.org) for performing calculations on data arrays
* [Seaborn](https://seaborn.pydata.org) for high-level access to plotting functionality
* [Matplotlib](https://matplotlib.org) as base-level plotting engine

Note that we provide (the typical) shorthand names for each of these modules with the `as alias` modifier!

Each of these modules provides an assortment of functions (called "methods") that can be accessed as `module.method()`.
For example, `numpy` provides math functions such as `np.sqrt()` or `np.cos()`, which will calculate the square root and cosine of their argument, respectively.
Similarly, a module can contain variables that control the overall behavior.
For example, `plt.rcParams` is a dictionary of runtime configuration properties that determine the look and feel of Matplotlib.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

plt.rcParams['text.usetex'] = True

## How to import data

For demonstration purposes, we will use an exemplary output file from our Instron tensile testing frame.
The file is called "force-displacement.csv" and contains comma-separated data like this
```
Results Table 1
,Specimen label,Width,Thickness,Length,Tensile strain (Displacement) at Break (Standard),Maximum Tensile stress
,,(mm),(mm),(mm),(%),(MPa)
"1","BrassS4_01","12.73","1.52","78.49","34.28","427.03"

1,Time,Displacement,Force
,(s),(mm),(N)
"","0.0000","0.0000","39.3799"
"","0.1000","0.0042","43.7967"
"","0.2000","0.0175","45.0747"
...
```



### Using `pandas` to read a text file into a spreadsheet (called a pandas "data frame")

Pandas offers a convenient [`read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function that we are going to use to parse comma-separated values (csv files) into a data frame that we will call `df`.

* Since the first seven lines of the "force-displacement.csv" file contain information that we do not want to keep, we use the `skiprows=7` option to skip them. (They would actually confuse the parser, which relies on the first line to estimate how the rest of the file looks like!)
* Similarly, because the first column is always empty, we only `usecolumns=[1,2,3]`
* Lastly, to specify our own (more meaningful) names for the columns of interest, we provide those with the `names=['Time/s','Displacement/mm','Force/N']` argument.

Note that Python indices are starting at zero and ranges _exclude_ the last item.
E.g., the range `2:5` contains indices `2, 3, 4`, but _not_ 5!

In [None]:
df = pd.read_csv('force-displacement.csv',
                 skiprows=7,
                 usecols=[1,2,3],
                 names=['Time/s','Displacement/mm','Force/N'],
                )


In [None]:
df.head(n=5)          # Display the first 5 rows of the data frame

One can access values in the data frame by first selecting the column and (optionally) a range of rows.
In the below example, we extract the time column for rows 10 to 20.
The pandas documentation provides a much more involved [explanation of how to access parts of data frames](https://pandas.pydata.org/docs/user_guide/indexing.html) in case you want to learn more.

In [None]:
df['Time/s'][10:20]

## How to do calculations

Any constants should be declared with a descriptive name. 
Using these names (in contrast to an explicit value) in any subsequent calculations is good practice because it makes the formulas general and easy to understand.
Let's start by defining some named constants:

In [None]:
width = 12.73
thickness = 1.52
length = 78.49

inch_to_mm = 25.4

### Example: unit conversions

Suppose you need to perform a unit conversion of `Displacement/mm` from mm to inch and `Force/N` from N to kN.
The results are stored as two new columns in the existing data frame `df`.

In [None]:
df['Displacement/in'] = df['Displacement/mm'] / inch_to_mm     # Converting displacements from mm to inch
df['Force/kN'] = df['Force/N'] * 1e-3                          # Converting forces from N to kN

In [None]:
df.head(n=5)

## How to plot graphs

Plotting data with Python can be accomplished in many different ways.
The foundational plotting functionality is provided by [`matplotlib`](https://matplotlib.org/stable/tutorials/pyplot.html) with numerous other modules building on top of it for enhanced ease-of-use.
A useful one is [`seaborn`](https://seaborn.pydata.org/), but a more comprehensive comparison of polular options can be found in 
[this article](https://towardsdatascience.com/top-6-python-libraries-for-visualization-which-one-to-use-fe43381cd658).


We will demonstrate how to generate some basic plotting types with [Matplotlib](https://matplotlib.org/stable/tutorials/pyplot.html) and [Seaborn](https://seaborn.pydata.org/) in the following.
Choose the option that best suits your style.

For later use, let us define a basic set of colors...

In [None]:
linecolor =  '#FFA500'     # orange
markercolor = '#00a00030'  # a semitransparent shade of green

... and shorthand names for two datasets that we will be frequently using.

In [None]:
x = df['Displacement/in']
y = df['Force/kN']

### X-Y scatter and line plots

####  Matplotlib

The command to draw a line graph or scatter plot is `plt.plot()`, and it comes with a lot of options for adjusting the drawing style.
For more details, please see the [pyplot tutorial](https://matplotlib.org/stable/tutorials/pyplot.html).

In [None]:
plt.plot(x,
         y,
         linestyle='-', color=linecolor, linewidth=1,
         marker='.', markerfacecolor=markercolor, markeredgecolor=markercolor)

plt.title('Exemplary data')
plt.xlabel('Displacement / inch')
plt.ylabel('Force / kN')
plt.show()                 # show graph

#### Seaborn

In [None]:
fig,ax = plt.subplots()
fig.suptitle('Exemplary data')
sns.lineplot(data=df,
             x='Displacement/in',
             y='Force/kN',
             color=linecolor,
             marker='.',
             markerfacecolor=markercolor,
             markeredgecolor=markercolor,
            )
plt.show()

### Subplots

Subplots are a tabular arrangement of individual plots that frequently share one or both axes to offer an easy way to compare multiple aspects of an underlying dataset.

We create a 2 by 2 grid of plots in below examples.

#### Matplotlib

In [None]:
# The first subplot contains two curves: force versus displacement and a scaled version of these
plt.subplot(2, 2, 1)  # (number of rows, number of columns, subplot number)
plt.plot(x,
         y,
         color='blue')
plt.plot(0.5 * x,
         0.5 * y,
         color='green')

# The second subplot contains the square root of force versus displacement
plt.subplot(2, 2, 2)
plt.plot(x,
         np.sqrt(y),
         color='red')

# The third subplot contains force raised to the cube versus displacement
plt.subplot(2, 2, 3)
plt.plot(x,
         np.power(y,3),   # np.power is used to calculate nth power, here raised to the cube
         color='orange')

# The fourth subplot contains inverse force versus displacement shifted by 10 mm
plt.subplot(2, 2, 4)
plt.plot(3 + x,
         1 / y,
         color='gray')

plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()

#### Seaborn

In [None]:
fig,axes = plt.subplots(2, 2, sharex=True)    # generates a 2 by 2 array of "axes" to hold our plots
fig.suptitle('2 row x 2 columns axes example')

sns.lineplot(ax=axes[0, 0], data=df, x='Displacement/in', y='Force/N', color='blue')  # use keys of the data frame
sns.lineplot(ax=axes[0, 0], x=0.5*x, y=0.5*y, color='green')                             # or directly specify the data
sns.lineplot(ax=axes[0, 1], x=x, y=np.sqrt(y), color='red')
sns.lineplot(ax=axes[1, 0], x=x, y=np.power(y,3), color='orange')
sns.lineplot(ax=axes[1, 1], x=3+x, y=1/y, color='gray',)

axes[0,1].set_ylabel(r'$\sqrt{\mathrm{Force/N}}$')
axes[1,0].set_ylabel(r'$\mathrm{(Force/N)}^3$')
axes[1,1].set_ylabel('1/(Force/N)')
axes[1,1].set_yscale('log')

plt.subplots_adjust(wspace=0.5, hspace=0.2)    # adjust the spacing between plots
plt.show()

### Probability distributions

#### Seaborn

Seaborn offers a number of ways to illustrate the statistics of a population of data.
Below, the three most commonly used ones are demonstrated.
If in doubt, the "empirical cumulative distribution function" (ecdf) is generally the most useful one as it does not depend on an (arbitrary) binning choice and, hence, does truly and holistically reflect the entirety of your data.

In [None]:
fig,ax = plt.subplots(figsize=(5,5))
sns.ecdfplot(data=df,x='Force/kN',stat='proportion',ax=ax,color='blue')           # data specified by key
sns.histplot(data=df,x='Force/kN',stat='proportion',ax=ax,color='orange')
sns.rugplot (        x=y,                           ax=ax,color='gray',alpha=0.2) # data specified directly
plt.show()

#### Matplotlib

The command for plotting a histogram is `plt.hist()`.
The number of bins can be specified (as below) or left to be automatically chosen.

In [None]:
plt.hist(y, bins=25,
         edgecolor='black',
         facecolor='orange',
        )

plt.show()

It is possible to have non-equal bins by customizing the bin edges.

In [None]:
edges = [0, 3, 7, 7.5, 9]
plt.hist(y, bins=edges,
         edgecolor='black',
         facecolor='orange',
        )

plt.show()

## Data analysis

### Basic statistics

A `pandas` "Series", i.e. a column of a data frame such as `df['Force/kN']`, offers multiple methods to extract basic statistical information.
A few useful ones are: 

* `Series.max()` to find the maximum
* `Series.min()` to find the minimum
* `Series.mean()` to calculate the mean (average)
* `Series.median()` to calculate the median
* `Series.mode()` to calculate the most frequent value(s)
* `Series.std()` to calculate the standard deviation 

In [None]:
print(f"""
Maximum displacement: {df['Displacement/mm'].max()} mm
Maximum force: {df['Force/kN'].max()} kN
Average force: {df['Force/kN'].mean()} kN
""")

### Smoothing

In cases where the data is too noisy to be useful, it can be helpful to smooth it.

Please note that smoothing is different from curve fitting.
Curve fitting adjusts the parameters of a given function until it best fits the observed values as closely as possible based on statistical criteria and can be used to extrapolate outside of the data interval.
Smoothing, on the other hand, only reduces the weight of outlying points and makes the trends in the data more obvious, with very limmited possibilities for extrapolation.

One possibility for smooting a data series that we are demonstrating here is to use the [Savitzky–Golay filter](https://docs.scipy.org/doc/scipy-1.14.0/reference/generated/scipy.signal.savgol_filter.html) from the `scipy.signal` module.

In [None]:
from scipy.signal import savgol_filter

df['Smooth Force/kN'] = savgol_filter(x=df['Force/kN'],
                                      window_length=101, # larger window results in greater smoothing
                                      polyorder=2,
                                     )

fig,ax = plt.subplots()
sns.lineplot(data=df,
             x='Displacement/mm',
             y='Force/kN',
             color='blue',
             ax=ax,
               )
sns.lineplot(data=df,
             x='Displacement/mm',
             y='Smooth Force/kN',
             color='orange',
             ax=ax,
            )
_ = plt.show()

### Tangent to a curve

To calculate the tangent at a particular point along a curve, its derivative is required.
One possibility to calculate this derivative is to first smooth the underlying data with a spline representation and then calculate the derivative of that representation.

Such functionality is conveniently provided by the `scipy.interpolate` module.
The method `interpolate.splrep` generates a spline representation and `interpolate.splev` can be used to calculate derivatives at specific points of interest.

In [None]:
from scipy import interpolate

In [None]:
def tangent(points,derivative,d=1):
    """
    Return end points on the local tangent of given points.
    
    Parameters
    ----------
    points : float, shape (...,2)
        Coordinates of the curve.
    derivative : float, shape (...)
        Local derivative of the curve.
    d : float
        Separation between tangent end points (along first coordinate).

    Returns
    -------
    endpoints : coordinates of tangent end points, shape (...,2,2)

    """
    return np.stack((points-0.5*d*np.asarray([1,derivative]),
                     points+0.5*d*np.asarray([1,derivative])))


In [None]:
l = df['Displacement/mm'][:100]  # only use the first 100 data points
F = df['Force/kN'][:100]

F0 = 6.5                         # point of interest for tangent

In [None]:
tck = interpolate.splrep(x=F,y=l)          # spline representation of l(F)
dldF = interpolate.splev(F0,tck,der=1)     # first derivative at F0 based on spline representation
l0 = interpolate.splev(F0,tck)             # interpolated displacement at F0, i.e l0 = l(F0)

tgt = tangent([F0,l0],dldF,d=2)            # tangent end points at (F0,l0)

In [None]:
fig,ax = plt.subplots()
sns.lineplot(x=l,y=F,ax=ax)
sns.lineplot(x=tgt[:,1],
             y=tgt[:,0],
             linewidth=5,
             alpha=0.5,
             ax=ax)
plt.show()

# Formatting equations with $\LaTeX$

Mathematical formulas can be nicely typeset within a notebook using $\LaTeX$. Its syntax might be off-putting at first, but with some practice it becomes increasingly easier to master.

Some examples to illustrate possible use cases:
* `$\sin^2 \alpha + \cos^2 \alpha = 1$`  
   $\sin^2 \alpha + \cos^2 \alpha = 1$
* `$\sqrt{x+y}$`  
   $\sqrt{x+y}$
* `$\sum_{i=1}^{n} i^2$`  
   $\sum_{i=1}^{n} i^2$
* `$\displaystyle \sum_{i=1}^{n} i^2$`  
   $\displaystyle \sum_{i=1}^{n} i^2$
* `$\displaystyle \int_{a}^{b} x^2 \mathrm{d}x = \frac{b^3 - a^3}{3}$`  
   $\displaystyle \int_{a}^{b} x^2 \mathrm{d}x = \frac{b^3 - a^3}{3}$

A point-and-click online editor that can quickly generate $\LaTeX$ equations can be found at https://editor.codecogs.com.

For more details on how to add math equations in Pages, please check: https://support.apple.com/guide/pages/add-mathematical-equations-tanca5a4fbd9/mac

For more details on how to add math equations in Word, please check: https://support.microsoft.com/en-au/office/linear-format-equations-using-unicodemath-and-latex-in-word-2e00618d-b1fd-49d8-8cb4-8d17f25754f8