# Matplotlib

Matplotlib is a commonly used library for creating plots, used through its `pyplot` API, which is commonly abbreviated as `plt`.  
The fundamental objects are the `Figure` (fig) and the `subplot` (ax).  
Below is a figure [from the docs](https://matplotlib.org/stable/tutorials/introductory/usage.html) of the common elements of a plot, along with their respective methods.

<img src="assets/matplotlib-anatomy.webp" style="max-width: 700px"/>


You can find a lot of small recipes for creating plots in [my git repository](https://gitlab.com/marvin.vanaalst/matplotlib-cookbook/).  

In [None]:
from __future__ import annotations

from typing import Any, cast

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numpy.typing import NDArray


def test(x: Any, expected: Any) -> None:
    if isinstance(x, np.ndarray) and isinstance(expected, np.ndarray):
        if not np.array_equal(x, expected):
            raise AssertionError(f"Expected {expected}, got {x}")
    else:
        if x != expected:
            raise AssertionError(f"Expected {expected}, got {x}")
    print("Test passed")


## Single subplots

You can create a figure and a single subplots using `plt.subplots`.  
Note that the first two arguments (`nrows` and `ncols`) are set to `1` by default, so you can leave them out.  
We will use them later to create figures with multiple subplots.  


In [None]:
fig, ax = plt.subplots()
plt.show()


### Exercise: figure size

`plt.subplots` takes an optional keyword argument `figsize=(width, height)`, which you can use to change the size of the figure.  

Create three separate plots with sizes `(2, 2)`, `(2, 4)` and `(4, 2)`

In [None]:
fig, ax = plt.subplots(figsize=(2, 2))
plt.show()

fig, ax = plt.subplots(figsize=(2, 4))
plt.show()

fig, ax = plt.subplots(figsize=(4, 2))
plt.show()



## Line plot

We can draw a line-plot using the `.plot` method on a subplot (ax).  
We then can annotate that plot further, using the `set_xlabel`, `set_ylabel` and `set_title` methods on the subplot.  


In [None]:
x = np.linspace(-np.pi, np.pi, 256)
y = np.sin(x)

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel("Time / a.u.")
ax.set_ylabel("Value / a.u.")
ax.set_title("My first plot")
plt.show()

### Exercise

Extend the plot above by adding the cosine as a second line to it.

In [None]:
x = np.linspace(-np.pi, np.pi, 256)

fig, ax = plt.subplots()
ax.plot(x, np.sin(x))
ax.plot(x, np.cos(x))
ax.set_xlabel("Time / a.u.")
ax.set_ylabel("Value / a.u.")
ax.set_title("My first plot")
plt.show()

## Fill below line

You can fill the area between a line and the x-axis with `.fill_between(x, y)`.  
This is a great technique if you want to enhance the perception of the magnitude of the data.  

In [None]:
x = np.linspace(-np.pi, np.pi, 256)
y1 = np.sin(x) + 2

fig, ax = plt.subplots()
ax.plot(x, y1)
ax.fill_between(x, y1, color='black', alpha=0.2)
plt.show()

Similarly, you can fill the area between two lines by supplying a second array to `.fill_between(x, y1, y2)`

In [None]:
x = np.linspace(-np.pi, np.pi, 256)
y1 = np.sin(x) + 2
y2 = np.sin(x) + 4

fig, ax = plt.subplots()
ax.plot(x, y1)
ax.plot(x, y2)
ax.fill_between(x, y1, y2, color='black', alpha=0.2)
plt.show()

### Exercise

Reproduce the plot below

<img src="assets/plot-fill-between.png">

In [None]:
x = np.linspace(-3 / 2 * np.pi, 1 / 2 * np.pi, 256)
y1 = np.sin(x) + 2
y2 = np.cos(x + np.pi / 2) + 4

fig, ax = plt.subplots()
ax.plot(x, y1)
ax.plot(x, y2)
ax.fill_between(x, y1, y2, color="black", alpha=0.2)
plt.show()


## Scatter plot

To plot two-dimensional data, the easiest plot is a scatter-plot, which for every pair $x_i$, $y_i$ a circle is displayed.

In [None]:
n = 256
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)

fig, ax = plt.subplots()
ax.scatter(x, y)
plt.show()

If more information is available, the third dimension can be mapped into other visible information. For example, here we are using color to convey a third set of data

In [None]:
# With color (c)

n = 256
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)

# Distance to center
c = np.sqrt(x ** 2 + y ** 2)

fig, ax = plt.subplots()
ax.scatter(x, y, c=c)
plt.show()

Here the same with the size of the circles

In [None]:
# Change marker size (s)

n = 256
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)

# Distance to center
s = np.sqrt(x ** 2 + y ** 2) * 15

fig, ax = plt.subplots()
ax.scatter(x, y, s=s)
plt.show()

### Exercise

Again using the "distance to center" information as the third data set, make a scatter plot where that information is used for both the color and the size of the circles.

In [None]:
# Change marker size (s)

n = 256
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)

# Distance to center
s = np.sqrt(x ** 2 + y ** 2) * 15

fig, ax = plt.subplots()
ax.scatter(x, y, s=s, c=s)
plt.show()

## Bar plots

While line plots are useful for *continuous* data, bar plots are useful for *discrete* data, e.g. when you have categories of data

In [None]:
x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]

fig, ax = plt.subplots()
ax.bar(x, y)
plt.show()

### Exercise: xtick labels

Currently the plot just shows the coordinates 1, 2, 3, 4, 5 as the labels for each of the bars. Usually that is not what you want to display, but rather a more meaningful name.  

You can set custom ticks using `ax.set_xticks` and labels using `ax.set_xticklabels`.  

**Note that you always need to supply ticks before you are supplying tick labels.**

Using the x-coordinates as xticks and the labels "a", "b", "c", "d", "e", set custom tick labels to the x-axis of the bar-plot above

In [None]:
x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]
xticklabels = ["a", "b", "c", "d", "e"]

fig, ax = plt.subplots()
ax.bar(x, y)
ax.set_xticks(x)
ax.set_xticklabels(xticklabels)
plt.show()

## Histogram

Histograms are useful to quickly display the distribution of large data sets.  
In those plots the data is separated into "bins" (e.g. $0 < x <= 1$, then $1 < x <= 2$ etc) and then the number of data points in each bin is counted, which is then displayed as a bar plot

In [None]:
# Count of values

fig, ax = plt.subplots()
ax.hist(np.random.normal(0, 1, 256), bins=None)
plt.show()


In [None]:
# Normalised

fig, ax = plt.subplots()
ax.hist(np.random.normal(0, 1, 256), bins=None, density=True)
plt.show()


In [None]:
# Cumulative

fig, ax = plt.subplots()
ax.hist(np.random.normal(0, 1, 256), bins=None, cumulative=True)
plt.show()


In [None]:
# 2d

x = np.random.normal(0, 1, 256)
y = np.random.normal(0, 1, 256)

fig, ax = plt.subplots()
ax.hist2d(x, y)
plt.show()


## Boxplot

In [None]:
data = np.random.normal(0, 1, (50, 10))

fig, ax = plt.subplots()
ax.boxplot(data)
plt.show()

## Adding a legend

You can create and position a legend how you want.  
There is a small catch here though: the position of the label is defined using a handle (the red square in the figure below), and the position of that handle changes with the keywords.

![](assets/matplotlib-legend.png)

The possible positions are: best, upper right, upper left, lower left, lower right, right, center left, center right, lower center, upper center, center


In [None]:
# Default label position (matplotlib chooses best position)

n = 256
x = np.linspace(-np.pi, np.pi, n)
sin = np.sin(x)
cos = np.cos(x)

fig, ax = plt.subplots()
ax.plot(x, sin, label="sin(x)")
ax.plot(x, cos, label="cos(x)")
ax.legend()
plt.show()

In [None]:
# Specify position using the loc keyword

fig, ax = plt.subplots()
ax.plot([0, 1], [0, 1], color=(1, 0, 0), label="Red")
ax.plot([0, 1], [1, 2], color=(0, 1, 0), label="Green")
ax.plot([0, 1], [2, 3], color=(0, 0, 1), label="Blue")
ax.legend(loc="lower right")
plt.show()

In [None]:
# my personal favorite, place it outside the axes
# note that the "upper left" means that the point (1.01, 1) refers to the upper left corner of the legend
# which is then at x-position 1.01 (so slightly outside the plot) and y-position 1, so at the top

fig, ax = plt.subplots()
ax.plot([0, 1], [0, 1], color=(1, 0, 0), label="Red")
ax.plot([0, 1], [1, 2], color=(0, 1, 0), label="Green")
ax.plot([0, 1], [2, 3], color=(0, 0, 1), label="Blue")
ax.legend(
    loc="upper left",
    bbox_to_anchor=(1.01, 1),  # Moves the axis the the position 1.01, 1
    borderaxespad=0,  # Removes the padding between the ax and the legend
)
plt.show()


## Scaling the axes

In [None]:
# You can change the scaling of the x and y axis with set_yscale
# linear, log, symlog, logit, ...

x = np.geomspace(1e0, 1e6)
y = x

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xscale("log")
ax.set_yscale("log")

## Grid

Using `ax.grid` you can set grid-lines at the ticks.  
You can choose whether to use minor, major or **both** ticks and whether you want to add it to the x, y or **both** axes.  
In most cases the default setting looks good.  

In [None]:
n = 256
x = np.linspace(-np.pi, np.pi, n)
sin = np.sin(x)
cos = np.cos(x)

fig, ax = plt.subplots()
ax.plot(x, sin, label="sin(x)")
ax.plot(x, cos, label="cos(x)")
ax.legend()
ax.grid()
plt.show()

## Multiple subplots

Or you can create a figure with multiple subplots (axes) by changing the `nrows` and `ncols` arguments of `plt.subplots`.  
Note that the function now returns a tuple of the figure object and an **array** of axes objects.  

You can directly access those arrays, but that is rather cumbersome.

In [None]:
n = 256
x = np.linspace(-np.pi, np.pi, n)

fig, axs = plt.subplots(2, 2)

axs[0, 0].plot(x, np.sin(x ** 0))
axs[0, 1].plot(x, np.sin(x ** 1))
axs[1, 0].plot(x, np.sin(x ** 2))
axs[1, 1].plot(x, np.sin(x ** 3))

fig.tight_layout()
plt.show()


Since multiple subplots are often 2-dimensional arrays, one particular handy trick is to *flatten* them before iterating.  
This turns them into a 1-dimensional array, which in this case is often easier to handle.  

In [None]:
x = np.linspace(-np.pi, np.pi, 256)

fig, axs = plt.subplots(2, 2)
for i, ax in enumerate(axs.flatten()):
    ax.plot(x, np.sin(x ** i))

fig.tight_layout()
plt.show()


Another trick is to unpack them directly, but in this case you will need to update the code whenever you change the number of subplots.

In [None]:
x = np.linspace(-np.pi, np.pi, 256)

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)

ax1.plot(x, np.sin(x ** 0))
ax2.plot(x, np.sin(x ** 1))
ax3.plot(x, np.sin(x ** 2))
ax4.plot(x, np.sin(x ** 3))

fig.tight_layout()  # otherwise it will look ugly
plt.show()


### Exercise: 2D-distributions

Create two arrays (x and y) of 256 randomly created, normally-distributed, points.  
Calculate their distance to the center.  

Create a figure with 2x2 subplots, in which you plot

- a histogram of x
- a histogram of y
- a scatter plot of x and y, coloring their distance
- a 2d-histogram of x and y

Which representation of the data did help you understand its properties the best?  
Can you think of other representations? Try plotting them as well.

In [None]:
n = 256
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)

# Distance to center
c = np.sqrt(x ** 2 + y ** 2)

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 8))
ax1.hist(x)
ax2.hist(y)
ax3.scatter(x, y, c=c)
ax4.hist2d(x, y)

fig.tight_layout()
plt.show()

## Further learning

Documentation

- [numpy documentation](https://numpy.org/doc/stable/user/index.html#user)
- [pandas documentation](https://pandas.pydata.org/docs/user_guide/index.html)
- [matplotlib documentation](https://matplotlib.org/stable/index.html)

Further packages

- [seaborn](https://seaborn.pydata.org/): package built on top of matplotlib for statistical plots
- [SciPy](https://docs.scipy.org/doc/scipy/tutorial/index.html#user-guide): advanced scientific computing library
- [statsmodels](https://www.statsmodels.org/stable/index.html): statistical models
- [scikit-learn](https://scikit-learn.org/stable/): machine learning library
- [PyTorch](https://pytorch.org/docs/stable/index.html): deep learning library
- [tensorflow](https://www.tensorflow.org/tutorials): deep learning library
- [Keras](https://keras.io/): deep learning library 
- [aesara](https://github.com/aesara-devs/aesara) (used to be Theano): symbolic maths on multi-dimensional arrays
- [JAX](https://github.com/google/jax): Composable transformations of Python & numpy on GPUs 

Books

- [Jake VanderPlas - Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
- [Wes McKinnery - Python for Data Analysis](https://wesmckinney.com/book/)