# Data Visualization in Python: `matplotlib`

If you want to create reproducible, concise, and effective charts in Python, there is almost no way around one library: `matplotlib`.

`matplotlib` is a Python library for creating static, animated, and interactive visualizations. Its core design philosophy is that you should be able to create simple plots with just one line of code, and complex visualizations with just a few more. Matplotlib is especially popular in the context of scientific visualizations for publications in research papers and supports ways to save your visualizations in various formats, such as PNGs, SVGs, or even PDFs. It is so well-made, versatile, and popular that other data visualization libraries also use it under-the-hood!

<div class="alert alert-block alert-info">

**Fun Fact:**

`matplotlib` was originally developed by John D. Hunter, who was using the commercial software MATLAB for his work in neuroscience and got frustrated by some limitations imposed by plotting capabilities of MATLAB at the same time. He decided to write his own data visualization library for the free and open-source programming language Python, and cheekily put a reference to MATLAB in the name. In fact, `matplotlib`'s interface (how you use its various plotting functions) was heavily inspired by MATLAB. The first version of `matplotlib` was released in 2003.

</div>

In this notebook, we will go over the basic structure of `matplotlib` and define some of the terms that you will see pop up wherever `matplotlib`'s many plottling functions are discussed. All of the information compiled in here is based on [`matplotlib`'s official documentation](https://matplotlib.org/stable/), which is a great place to continue your journey afterwards!


## Getting Started

Just like with any other Python library, you first need to install `matplotlib`, for example using `pip`:

```
pip install matplotlib
```


To use any of `matplotlib`'s functions, we first need to import it. The import statement, however, looks a little unusual at first:


In [None]:
import matplotlib.pyplot as plt

You might notice that we are not importing the top-level package `matplotlib`, but a subpackage called `pyplot`. This is because `matplotlib` offers two ways to use its functionality (called interfaces):

- an _explicit_ interface that creates a visualization element by element using an object-oriented paradigm,
- an _implicit_ `pyplot` interface that creates most visualizations in a single line of code (and is very similar to MATLAB's way of doing things).

You are not restricted to using exclusively one or the other, and often you will even use both to create more complex visualizations. However, every `matplotlib` journey usually starts out using the `pyplot` interface and so it is commonly imported as above.

Using `pyplot` (under its import alias `plt`), we can now create a very basic line plot like this:


In [None]:
x = [1, 2, 6, 7]
y = [2, 0, 3, 4]

plt.plot(x, y)

<div class="alert alert-block alert-info">

`matplotlib` supports a variety of ways to display the created figures called _backends_. When you use `matplotlib` in a Jupyter notebook, a backend is selected that displays a figure immediately when a cell with the plotting code is run below the executed cell. In a regular python script (a `*.py` file), the figure is not displayed until the `plt.show()` command is executed.

You can find more information on backends in [the official docs](https://matplotlib.org/stable/users/explain/figure/index.html).

</div>


### Exercise

Try out some different numbers for `x` and `y`!

1. What happens when the numbers are not strictly increasing?
2. What happens when the number of numbers in `x` and `y` is not the same?
3. Create a scatter plot of the same data using the function `plt.scatter()`!


In [None]:
x = [1, 2, 6, 7]
y = [2, 0, 3, 4]

plt.scatter(x, y)

## Anatomy of a figure

A good visualization can consist of many different elements: Lines, markers, labels, annotations, a legend, and many more.

In `matplotlib`, these elements are called _Artists_. Here is an overview of the available Artists:

<figure>
<img src="../img/anatomy-of-a-figure.webp" alt="an annotated figure showing the available artists" style="width: 50%;"/>
</figure>

Let's create a more complex version of our basic example above to showcase how to work with these Artists!


### Creating a figure and axes

We could create a figure with an empty axes separately like so:


In [None]:
# Create an empty figure
fig = plt.figure()
# Add a single axes
fig.add_axes(plt.axes())

Since you very rarely would want a figure without an axes, it is usually more convenient to create both in one go using `plt.subplots()`:


In [None]:
fig, ax = plt.subplots()

The same function will even let you create a figure with multiple axes at once:


In [None]:
fig, ax = plt.subplots(1, 2)

<div class="alert alert-block alert-info">

When you create multiple axes this way, the function returns the axes artists in an array of the same shape as the axes grid.

</div>


#### Exercise

1. Create a figure with six axes arranged in 2 rows and 3 columns.
2. Only label the x axis once per column, and the y axis once per row. **Hint:** Check the [`subplots()` function's documentation ](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html) for helpful parameters!


In [None]:
fig, ax = plt.subplots(2, 3, sharex=True, sharey=True)

### Adding a plot to an axis

We can plot into a particular axis, by using the associated `Axes` artist's matching plotting function:


In [None]:
x = [1, 2, 6, 7]
y = [2, 0, 3, 4]


fig, ax = plt.subplots()
# Note that we call `plot()` directly on the axis artist!
ax.plot(x, y)

As mentioned above, `plt.subplots()` returns an array of `Axes` artists when we define multiple axes in a figure.

In this case, we therefore need to use an index to choose the axes we want to plot into.

**Note:** The first element has the index 0, as is common in Python.


In [None]:
fig, axs = plt.subplots(2, 1)

axs[0].plot(x, y)

#### Exercise

1. Add a bar plot of the same data to the second axis (index `1`)!
2. Add a scatter plot to the first axis _in addition to the line plot_!


In [None]:
fig, axs = plt.subplots(2, 1)

axs[0].plot(x, y)
axs[1].bar(x, y)

axs[0].scatter(x, y)

### Customizing a plot's features

To communicate information efficiently in a data visualization, we often want to use multiple [visual variables](https://en.wikipedia.org/wiki/Visual_variable). This includes the style of a line (solid, dashed, dotted, ...), the shape and size of a marker in a scatter plot, the color of an element, and many other things.

In `matplotlib`, you usually set the properties of a feature (a.k.a. an Artist) when you call the function that creates it.

For example, we can add markers to the line plot created without calling the `scatter()` function by specifying the `marker` parameter during the call to `plot()`:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, marker="o")

Here, we chose a filled circle as the marker using the argument `"o"`. There are many [other marker shapes available](https://matplotlib.org/stable/api/markers_api.html).

We could also change the color of the line in a similar way:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, marker="o", color="darkgreen")

You can specify the color using any name from [this list](https://matplotlib.org/stable/gallery/color/named_colors.html#css-colors), or by using a [hex code](https://en.wikipedia.org/wiki/Web_colors#Hex_triplet) (e.g. `"#008080"` or `"#00693E"`).

We can also change the style of the line using the `linestyle` parameter:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, marker="o", color="green", linestyle="dashed")

The full list of linestyles can be found [here](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html#linestyles).

Since these three features are quite commonly modified, `matplotlib` offers a shorthand notation to change them all with a single parameter called `fmt` (format string). It has the following structure:

```
fmt = "[marker][line][color]"
```

Our example above translates to:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, "o--g")  # Note: The fmt parameter must not be named!

The supported values for each of the three features can be found [here (scroll down to _Notes_)](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html).

<div class="alert alert-block alert-info">

Note that this shorthand does not support all customization options, especially with regards to color. If you need more fine-grained control, you need to use the named keyword parameters introduced above.

</div>


#### Exercise

1. Try out some different options for the markers, color, and linestyle using both the named parameters and the format string.
2. Assign a different color to the markers' faces than to the line. **Hint:** This is only possible using named arguments!


In [None]:
# Exercise 1

fig, ax = plt.subplots()

ax.plot(x, y, "x-.k")

In [None]:
# Exercise 2

fig, ax = plt.subplots()

ax.plot(x, y, marker="o", linestyle="--", color="blue", markerfacecolor="black")

### Adding a legend

When you have more than one plot in the same axis, a legend can be very helpful to explain to the viewer what each plot represents.

You can control how each plot is labeled in the legend by specifying the `label` parameter when you call the plotting function.

To actually show the legend, you can call the `legend()` function on the axes where you want it to appear:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, "--", label="interpolation")
ax.scatter(x, y, label="data points")

ax.legend()

You can control the location of the legend by adding the parameter `loc`:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, "--", label="interpolation")
ax.scatter(x, y, label="data points")

ax.legend(loc="right")

You can see the full list of pre-defined locations, and all other available options for the legend [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html).


#### Exercise

1. Move the location to the bottom right corner.
2. Make the entries in the legend appear next to each other, i.e., in two columns.
3. Reverse the order of the entries in the legend.


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y, "--", label="interpolation")
ax.scatter(x, y, label="data points")

ax.legend(loc="lower right", ncols=2, reverse=True)

### Customizing labels

Labels add important context information to the raw data in you visualizations. Most labels are defined on the `Axes` artist and can be read or manipulated using so-called _getter_ and _setter_ functions.

For example, to set the title or axis labels, we can use the following functions:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_title("My first data visualization in matplotlib")

ax.set_xlabel("The x axis")
ax.set_ylabel("The y axis")

You can retrieve a current label of an axes using the corresponding getter function:


In [None]:
ax.get_xlabel()

You can also add a label anywhere on the axis:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_title("My first data visualization in matplotlib")

ax.set_xlabel("The x axis")
ax.set_ylabel("The y axis")
ax.text(5, 1.5, "Look, an annotation!")

Similar to the line and marker features above, you can change the properties of the labels when you set them:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_title("My first data visualization in matplotlib", weight="bold")

ax.set_xlabel("The x axis", color="green")
ax.set_ylabel("The y axis")
ax.text(5, 1.5, "Look, an annotation!", size=6)

You can learn more about the [available properties here](https://matplotlib.org/stable/users/explain/text/text_props.html#text-props).


#### Exercise

1. Add some annotations to the plot!
2. Change the color of the title to dark green!


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_title(
    "My first data visualization in matplotlib", color="darkgreen", weight="bold"
)

ax.set_xlabel("The x axis", color="green")
ax.set_ylabel("The y axis")
ax.text(5, 1.5, "Look, an annotation!", size=6)
ax.text(1, 2.5, "Look, another one!", size=6)

### Limits, ticks, and grid lines

`matplotlib` does its best to automatically find suitable limits for the axes and derive an appropriate number of ticks along each axis. However, you can customize all of these features using a similar syntax as for the labels:


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_ylim(-1, 5)

ax.set_xticks([1, 4, 7])

ax.grid()

#### Exercise

1. Change the tick marks on the y-axis so that only 0, 2, and 4 are labeled.
2. Chage the limits of the x-axis to the range from 0 to 8.
3. Only show horizontal grid lines. **Hint:** Look for a suitable parameter [in the documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html).


In [None]:
fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_xlim(0, 8)
ax.set_ylim(-1, 5)

ax.set_xticks([1, 4, 7])
ax.set_yticks([0, 2, 4])


ax.grid(axis="y")

## Next steps

In this notebook, you have learned about the basics of `matplotlib`'s interface and how construct a visualization by creating and modifying various artists.

In the [next notebook](02-plot-overview.ipynb), we will showcase a variety of plots for different kind of data that are available in `matplotlib`.


<table >
<tbody>
  <tr>
    <td style="padding:0px;border-width:0px;vertical-align:center">    
    Created by Simon Stone for Dartmouth College Library under <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons CC BY-NC 4.0 License</a>.<br>For questions, comments, or improvements, email <a href="mailto:researchdatahelp@groups.dartmouth.edu">Research Data Services</a>.
    </td>
    <td style="padding:0 0 0 1em;border-width:0px;vertical-align:center"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by/4.0/88x31.png"/></td>
  </tr>
</tbody>
</table>
