# Matplotlib

Matplotlib is a plotting library that provides functionality very similar to that seen in Matlab. In this tutorial we will give a brief introduction to making simple line/scatter plots, and a few other types of plots. You will be introduced to more advanced plots over the course of the lecture series. The first thing we need to do is import the modules used throughout the notebook, including the Matplotlib plotting module ```pyplot```, and adding ```%matplotlib inline``` to ensure all plots are rendered inline, after the cells that call them.

In [6]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

__Note__
*  ```%matplotlib inline``` will make your plot outputs appear and be stored within the notebook.
* ```%matplotlib inline``` is a "magic" built-in command for jupyter/ipython; it is not a Python statement.

## Basic Plotting

Let's start by plotting a simple linear function $y = 2x$.

__Task 1.1__: Plot the function $y = 2x$ using the ```plt.plot``` function.

In [None]:
# Prepare the data
x = np.linspace(-10, 10, 50) # 50 values in the range [-10,10]
print('x = \n', x)

# Create the function
y = 2*x

### (Task 1.1): Plot the data
# Add your code here

# Show the plot
# (not necessary when plotting with %inline keyword
#  but essential when plotting from scripts)
plt.show()

We can label the plot and the axes, and add a legend and a title to the plot.

__Task 1.2__: Add labels to the axis using ```plt.xlabel``` and ```plt.ylabel```

__Task 1.3__: Add a title to your plot using ```plt.title```


In [None]:
# Plot the data
plt.plot(x, y)

### (Task 1.2): label both axis
plt.xlabel('$x$') # The expression $x$ tells matplotlib to format x as an equation (and italicize x); you can use LaTeX-like formatting
# Add your code for the y-axis label here

### (Task 1.3): add a title
# Add your code for the title here

# Legend
plt.legend(['$y = 2x$'], loc='lower right')
# Note: the first argument is a list of strings;
#       the second argument is the desired location for the legend (within the plot)

# Note: Alternative to creating a legend is:
#plt.plot(x, y, label = '$y = 2x$')
#plt.legend(loc='lower right')

plt.show()

**In order to plot multiple functions in the same figure, one simply adds more ```plt.plot``` calls.**

You can also add them in a single line such as: ```plt.plot(x, y, x, y2)```.

Let's look at an example for $y = x^2$:

In [None]:
# Create a new function
y2 = x**2

# Plot y = 2x
plt.plot(x, y)

# Plot y = x^2
plt.plot(x, y2)

# Labels
plt.xlabel('$x$')
plt.ylabel('$y$')

# Legend
plt.legend(['$y = 2x$', '$y = x^{2}$'], loc='lower right')

# Title
plt.title('Two functions')

plt.show()

__Task 1.4__: Add a third function $y = \text{exp}(x/2)$. Change the labels to 'x' and 'f(x)' for the x and y axis respectively. Add the third function to the legend of the plot and change the title to 'My three functions'.

In [None]:
### (Task 1.4)

# Create a third function
y3 = np.exp(x/2)

# Plot y_1
plt.plot(x, y)

# Plot y_2
plt.plot(x, y2)

# Plot y_3
# Add your code here

# Labels
# Add your code here

# Legend
plt.legend(['$y = 2x$', '$y = x^2$', '$y = e^{x/2}$'], loc='lower right')

# Title
# Add your code here

plt.show()

You have the option to modify the plot style through a third argument that specifies the format string. This string, informed by MATLAB conventions, allows you to define both the color and the line style for your plot; you simply need to combine the respective codes for each in one string. To give you an idea, **the default option is 'b-', which represents a solid blue line.**


__Task 1.5__: Change the code below to use **red circles ('ro')** for $y = 2x$ and **green triangles ('g^')** for $y = x^2$.


In [None]:
### (Task 1.5)
plt.plot(x, y, 'b-', x, y2, 'b-') # Edit this line

# Labels
plt.xlabel('$x$')
plt.ylabel('$y$')

# Legend
plt.legend(['$y = 2x$', '$y = x^2$'], loc='lower right')

# Title
plt.title('Different styles')

plt.show()

For many more arguments controlling the properties of ```pyplot``` lines and markers see https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

**Task 1.6** Want to save a figure as a file? Use `savefig`

In [None]:
### (Task 1.6)
plt.plot(x, y, 'ro', x, y2, 'g^')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.legend(['$y = 2x$', '$y = x^2$'], loc='lower right')
plt.title('Different styles')
plt.savefig('saved_plot.png')

## Setting ticks and limits

In some cases, it may be necessary to **change the range of the x and y axis and/or change the regularity or spacing of the ticks on each axis**. This can be achieved using ```xlim/ylim``` and ```xticks/yticks``` functions

__Task 2.1__: **Change the axis limits **

to $[\text{min}(x)-5, \text{max}(x)+5]$ for the x axis and to $[\text{min}(y)-5, \text{max}(y)+5]$ for the y axis.

*Hint*: use `plt.xlim(lower_limit, upper_limit)` and `plt.ylim(lower_limit, upper_limit)`

In [None]:
### Task 2.1: changing the axis limits
plt.xlim(x.min() - 5, x.max() + 5)
# Add your code for the y-axis here

# Plot data
plt.plot(x, y, 'ro', x, y2, 'g^')

# Labels
plt.xlabel('$x$')
plt.ylabel('$y$')

# Legend
plt.legend(['$y = 2x$', '$y = x^2$'], loc='lower right')

# Title
plt.title('New axis limits')

plt.show()

__Task 2.2__: **changing the axis ticks**

to only show $\{-10, 0, 10\}$ for the x axis and $\{0,50,100\}$ for the y axis.

*Hint*: use `plt.xticks(list_of_values)` and `plt.yticks(list_of_values)`

In [None]:
### Task 2.2: changing the axis ticks
plt.xticks([-10, 0, 10])

# Add your code for the y-axis here

# Plot data
plt.plot(x, y, 'ro', x, y2, 'g^')

# Labels
plt.xlabel('$x$')
plt.ylabel('$y$')

# Legend
plt.legend(['y', 'y2'], loc='lower right')

# Title
plt.title('New ticks')

plt.show()

## Subplots

When using MATLAB and pyplot for your plots, you will work with two main components: **"figures"** and **"axes"**.

A **"figure"** refers to the entire window that appears in the user interface. It is essentially the container for your plot.

Within this "figure," you can add "axes." **Axes** are the specific areas where the data will be plotted. You can customize these areas with various functions such as `plot()` to include ticks, labels, and more to accurately represent your data.

You have the flexibility to include multiple axes within a single figure. To organize multiple plots neatly, you can use "**subplots**." This allows you to place different plots side by side within the same figure, providing a structured view of various data sets at once.


In [None]:
# Initialize the plot specifying the figure size
fig = plt.figure(figsize=(8,3)) # note the use of figsize to format figure size

# subplot 1: similar to MATLAB: 121 = 1 row, 2 columns, 1st subplot
plt.subplot(121)
plt.plot(x, y, 'ro')
plt.title('$2x$')

# subplot 2: similar to MATLAB: 122 = 1 row, 2 columns, 2nd subplot
plt.subplot(122)
plt.plot(x, y2, 'g^')
plt.title('$x^2$')

plt.show()

Here is another version where the axes are defined using variables:

In [None]:
# Initialize the plot specifying the figure size
fig = plt.figure(figsize=(8,3))

# Subplot 1
ax1 = fig.add_subplot(121)
ax1.plot(x, y, 'ro')
ax1.set_title('$2x$')

# Subplot 2
ax2 = fig.add_subplot(122)
ax2.plot(x, y2, 'g^')
ax2.set_title('$x^2$')

plt.show()

__Note__: Here we have also introduced the function argument ```figsize```.

__Task 3.1__: In the code above, try changing the size dimensions to view the impact on the figure.

__Task 3.2__: Add a third subplot to the code above to display also the function $y=\exp(x/2)$


__Note__: **The default use of ```subplot``` enforces an even share of the figure to each subplot. In order to fit plots of different sizes within each figure consider using ```gridspec```** (https://matplotlib.org/stable/api/gridspec_api.html)

## Brief introduction to other plot styles

We'll go over the different types of plots in more detail when we come across them in examples. Just to give you a hint of what's to come, with matplotlib, you can create a variety of plots including __bar charts__, __histograms__ and __box plots__ - plenty to look forward to!

Now, let's get hands-on. Run the following code to load the file named 40subjdata.pk1. This file gives us a peek into the data of 40 students, detailing their height, gender, and the distance they traveled to attend the class.


In [3]:
import requests, os

# Download a data file from GitHub
def download_data(source, dest):
    base_url = 'https://raw.githubusercontent.com/'
    owner = 'SirTurtle'
    repo = 'ML-BME-UofA-data'
    branch = 'main'
    url = '{}/{}/{}/{}/{}'.format(base_url, owner, repo, branch, source)
    r = requests.get(url)
    f = open(dest, 'wb')
    f.write(r.content)
    f.close()

if not os.path.exists('temp'):
   os.makedirs('temp')

download_data('Week-1-Python-programming/data/40subjdata.pkl', 'temp/40subjdata.pkl')
download_data('Week-1-Python-programming/data/fetalbrain.jpeg', 'temp/fetalbrain.jpeg')

In [None]:
# Import pandas; we will cover pandas in the next notebook
import pandas as pd

# Load data
df = pd.read_pickle('temp/40subjdata.pkl')

# Inspect data with head()
display(df.head())

# Extract height data
heights = df['height']

# Select all heights for female, and all heights for males
girls_height = df['height'][df['gender']=='F']
boys_height  = df['height'][df['gender']=='M']

### Histogram

Next up, we're going to create a histogram using the 'distance' values from our data. To do this, we'll set the optional `bins` parameter to 10, giving us a nice breakdown of the data into 10 segments. Give it a try by running the following code:

__Task 4.1:__ Play with the number of bins

In [None]:
### Task 4.1

plt.hist(heights, bins=10, edgecolor='black', linewidth=1.2)
# The edgecolor and linewidth parameters will add an outline to the histogram bars

# Labels
plt.xlabel('Height (cm)')
plt.ylabel('Number of students')

# Title
plt.title('Student height histogram')

plt.show()

### Bar chart

Bar charts can be created using function `bar` with 4 arguments:
* the locations for the bars on the x-axis
* the height of the bars
* the width of the bars
* the error bars (`yerr`)

__Task 4.2__: Let's build a **bar chart** that displays the average height for both girls and boys. To give a clear picture of the variation in heights, we'll also include error bars representing the standard deviation on the y-axis. Here's a start, just fill in the missing parts in the code below to complete it:

See documentation: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html

In [None]:
### Task 4.2

# Define the locations of the bars on the x-axis
bar_locs = [0,1]

# Set the width of the bars
width = 0.8

# Calculate the average height for girls and for boys
avg_girls = np.mean(girls_height)
avg_boys  = None # Edit this line

# Calculate the standard deviation in the height for girls and for boys
std_girls = None # Edit this line
std_boys  = None # Edit this line

# Plot the bar chart - Uncomment and fill in
# plt.bar(bar_locs, [None, None], width, yerr=[None, None], edgecolor='black', linewidth=1.2)

# Change ticks to show girls and boys
plt.xticks(bar_locs, ('girls','boys'))

# Labels
plt.ylabel('Average height (cm)')

# Title
plt.title('Student height bar chart')

plt.show()

__Task 4.3__: Next, let's create **boxplots** (use ```boxplot```) to visually represent the distribution of heights for girls and boys. Use the boxplot function to get this done. Go ahead and give it a try!

See documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html

In [None]:
### Task 4.3

# Plot the boxplots - Uncomment and fill in
#plt.boxplot([None, None], vert=True)

# Label the x-axis - the first boxplot is at x=1 and the second at x=2
plt.xticks((1,2), ('girls','boys'))

# Labels
plt.ylabel('Height (cm)')

# Title
plt.title('Student height boxplots')

plt.show()

Task 4.4: Create **violinplots** of the girls and the boys heights.

See documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.violinplot.html

**Note:** A violin plot is a type of data visualization that **combines aspects of a boxplot and a kernel density plot**. Here's how it breaks down:

**Boxplot component:**

Like a boxplot, it shows the **median of the data**, which is typically marked by a white dot or a horizontal line. It also displays the interquartile range, which represents the middle 50% of the data.

**Kernel Density Plot component:**

This is what makes a violin plot distinctive. On each side of the boxplot, there is a rotated kernel density plot, which gives a visual representation of the data's distribution. This plot shows the density of the data points at different values, creating a shape like a violin, hence the name.

Together, these components allow a violin plot to provide a deeper understanding of the **distribution of data points**, revealing not only the central tendency and spread of the data but also any bimodality, skewness, and other complex structures that might be present.

In [None]:
### Task 4.4

# Plot the violinplots - Uncomment and fill in
# plt.violinplot([None, None], vert=True)

# Label the x-axis - the first boxplot is at x=1 and the second at x=2
plt.xticks((1,2), ('girls','boys'))

# Labels
plt.ylabel('Height (cm)')

# Title
plt.title('Student height violinplots')

plt.show()

## Images

Images can be loaded and viewed using the `imread` and `imgshow` functions.

__Task 4.4__: Load in the image called 'fetalbrain.jpeg' using `imread` function. Then plot it using the `imshow` function.

See documentation: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html

In [None]:
### Task 4.4

# read image
image = plt.imread('temp/fetalbrain.jpeg')

# display image
plt.imshow(image)

plt.show()

# Citation

John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55 (publisher link)