# Week 3: matplotlib - Data visualisation in Python

This lecture introduces a common visualisation tool in Python: Matplotlib

In this lecture, we will also discuss the way to present your modelling results and understand the data

## 1. Introduction

Matplotlib is an excellent 2D and 3D graphics library for generating scientific figures. Some of the many advantages of this library include:

* Easy to get started
* Support for $\LaTeX$ formatted labels and texts
* Great control of every element in a figure, including figure size and DPI. 
* High-quality output in many formats, including PNG, PDF, SVG, EPS, and PGF.
* GUI for interactively exploring figures *and* support for headless generation of figure files (useful for batch jobs).

One of the key features of matplotlib that I would like to emphasize, and that I think makes matplotlib highly suitable for generating figures for scientific publications is that all aspects of the figure can be controlled *programmatically*. This is important for reproducibility and convenient when one needs to regenerate the figure with updated data or change its appearance. 

More information at the Matplotlib web page: http://matplotlib.org/

## 2. matplotlib


The main matplotlib interface is object-oriented. The main idea with object-oriented programming is to have objects that one can apply functions and actions on, and no object or program states should be global (such as the MATLAB-like API). The real advantage of this approach becomes apparent when more than one figure is created, or when a figure contains more than one subplot. 

To use the object-oriented API we start out very much like in the previous example, but instead of creating a new global figure instance we store a reference to the newly created figure instance in the `fig` variable, and from it we create a new axis instance `axes` using the `add_axes` method in the `Figure` class instance `fig`:

### 2.1 matplotlib fundamentals

In [None]:
# This line configures matplotlib to show figures embedded in the notebook, 
# instead of opening a new window for each figure. More about that later. 
# If you are using an old version of IPython, try using '%pylab inline' instead.
%matplotlib inline

#then we import the matplotlib libraries
import matplotlib
import matplotlib.pyplot as plt


Here we use `Pandas`, the library that we learnt last week, and also `Numpy`, another very popular data analytic package in Python

In [None]:
import pandas as pd
import numpy as np

Let's load our `NZ_cars.csv` data in the Data folder, with index column is "OBJECTID"

`plot` is the main plotting function in matplotlib, if we use '.' then it's a scatterplot, before it we can set the color, e.g. `r` for Red, `b` for blue

Now as we can see there are quite a bit of data where Power rating equals zero, which perhaps because we have no information on the Power rating of the car, let's use what we learnt in the previous week to filter these data out and plot again

Much better now, what can you see in the Figure above? 

The x-axis looks odd now, let's refine the axes to see better

Let's look at another way of coding where we deal with directly with axes instead

Although a little bit more code is involved, the advantage is that we now have full control of where the plot axes are placed

### 2.2 Multiple axes and subplots

And we can easily add more than one axis to the figure:

If we don't care about being explicit about where our plot axes are placed in the figure canvas, then we can use one of the many axis layout managers in matplotlib. My favorite is `subplots`, which can be used like this:

That was easy, but it isn't so pretty with overlapping figure axes and labels, right?

We can deal with that by using the `fig.tight_layout` method, which automatically adjusts the positions of the axes on the figure canvas so that there is no overlapping content:

As we can see, `axes` is a list, that we can potentially loop over to plot more things on the same screen

In this example we plot exactly the samething but 

### 2.3 Figure size, aspect ratio and DPI

Matplotlib allows the aspect ratio, DPI and figure size to be specified when the `Figure` object is created, using the `figsize` and `dpi` keyword arguments. `figsize` is a tuple of the width and height of the figure in inches, and `dpi` is the dots-per-inch (pixel per inch). To create an 800x400 pixel, 100 dots-per-inch figure, we can do: 

### 2.4 Saving figures

To save a figure to a file we can use the `savefig` method in the `Figure` class:

Here we can also optionally specify the DPI and choose between different output formats:

You can find the test_figure.png file in the same folder with this notebook

#### What formats are available and which ones should be used for best quality?

Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF.

### 2.5 Legends

Now that we have covered the basics of how to create a figure canvas and add axes instances to the canvas, let's look at how decorate a figure with legends.

Let's start with an example of some math functions

$$ y = x^2 $$
and

$$ y = x^3 $$

We can use the `label="label text"` keyword argument when plots or other objects are added to the figure, and then using `legend` method without arguments to add the legend to the figure: 

If curves are added or removed from the figure, the legend is automatically updated accordingly.

The `legend` function takes an optional keyword argument `loc` that can be used to specify where in the figure the legend is to be drawn. The allowed values of `loc` are numerical codes for the various places the legend can be drawn. See http://matplotlib.org/users/legend_guide.html#legend-location for details. Some of the most common `loc` values are:

Example for legend location

**Exercise**

Re-Plot the 4 figures that we plotted with NZ cars and add legends in those figures

### 2.6 Formatting text: LaTeX, fontsize, font family

The figure above is functional, but it does not (yet) satisfy the criteria for a figure used in a publication (look at those equations in the legend, e.g. `y= x**2`, they look odd!)

First and foremost, we need to have LaTeX formatted text, and second, we need to be able to adjust the font size to appear right in a publication.

Matplotlib has great support for LaTeX. All we need to do is to use dollar signs encapsulate LaTeX in any text (legend, title, label, etc.). For example, `"$y=x^3$"`.

But here we can run into a slightly subtle problem with LaTeX code and Python text strings. In LaTeX, we frequently use the backslash in commands, for example `\alpha` to produce the symbol $\alpha$. But the backslash already has a meaning in Python strings (the escape code character). To avoid Python messing up our latex code, we need to use "raw" text strings. Raw text strings are prepended with an '`r`', like `r"\alpha"` or `r'\alpha'` instead of `"\alpha"` or `'\alpha'`:

We can also change the global font size and font family, which applies to all text elements in a figure (tick labels, axis labels and titles, legends, etc.):

A good choice of global fonts are the STIX fonts: We can change the overal font in Matplotlib using `matplotlib.rcParams.update`

Or, alternatively, we can request that matplotlib uses LaTeX to render the text elements in the figure:

### 2.7 Setting colors, linewidths, linetypes

#### Colors

With matplotlib, we can define the colors of lines and other graphical elements in a number of ways. First of all, we can use the MATLAB-like syntax where `'b'` means blue, `'g'` means green, etc. The MATLAB API for selecting line styles are also supported: where, for example, 'b.-' means a blue line with dots:

We can also define colors by their names or RGB hex codes and optionally provide an alpha value using the `color` and `alpha` keyword arguments:

#### Detailed line and marker styles

To change the line width, we can use the `linewidth` or `lw` keyword argument. The line style can be selected using the `linestyle` or `ls` keyword arguments:

### 2.8 Control over axis appearance

The appearance of the axes is an important aspect of a figure that we often need to modify to make a publication quality graphics. We need to be able to control where the ticks and labels are placed, modify the font size and possibly the labels used on the axes. In this section we will look at controling those properties in a matplotlib figure.

#### Plot range

The first thing we might want to configure is the ranges of the axes. We can do this using the `set_ylim` and `set_xlim` methods in the axis object, or `axis('tight')` for automatrically getting "tightly fitted" axes ranges. 

Also here we will try to plot multiple lines using a single `plot` function.

#### Logarithmic scale

You perhaps have heard about the use of logarithmic scale to show the number of Covid-19 cases, e.g. (https://coronavirus.jhu.edu/data/cumulative-cases). 

In Matplotlib, it is also possible to set a logarithmic scale for one or both axes. This functionality is in fact only one application of a more general transformation system in Matplotlib. Each of the axes' scales are set seperately using `set_xscale` and `set_yscale` methods which accept one parameter (with the value "log" in this case):

### 2.9 Placement of ticks and custom tick labels

We can explicitly determine where we want the axis ticks with `set_xticks` and `set_yticks`, which both take a list of values for where on the axis the ticks are to be placed. We can also use the `set_xticklabels` and `set_yticklabels` methods to provide a list of custom text labels for each tick location:

There are a number of more advanced methods for controlling major and minor tick placement in matplotlib figures, such as automatic placement according to different policies. See http://matplotlib.org/api/ticker_api.html for details.

### 2.10 Axis number and axis label spacing

#### Axis position adjustments

Unfortunately, when saving figures the labels are sometimes clipped, and it can be necessary to adjust the positions of axes a little bit. This can be done using `subplots_adjust`:

### 2.11 Axis grid

With the `grid` method in the axis object, we can turn on and off grid lines. We can also customize the appearance of the grid lines using the same keyword arguments as the `plot` function:

### 2.12 Twin axes

Sometimes it is useful to have dual x or y axes in a figure; for example, when plotting curves with different units together. Matplotlib supports this with the `twinx` and `twiny` functions:

### 2.14 Axes where x and y is zero

### 2.15 Other 2D plot styles

In addition to the regular `plot` method, there are a number of other functions for generating different kind of plots. See the matplotlib plot gallery for a complete list of available plot types: http://matplotlib.org/gallery.html. 




Some of the more useful ones are show below:

### 2.16 Text annotation

Annotating text in matplotlib figures can be done using the `text` function. It supports LaTeX formatting just like axis label texts and titles:

### 2.17 Figures with multiple subplots and insets

Axes can be added to a matplotlib Figure canvas manually using `fig.add_axes` or using a sub-figure layout manager such as `subplots`, `subplot2grid`, or `gridspec`:

#### subplots

We had examples using subplot before. This is the most common way to plot multiple figures on the page

#### subplot2grid

A more advanced (and more beautiful) way of ploting is to use `subplot2grid`, which gives us a lot more freedom of how to plot things

#### gridspec

`gridspec` is more flexible than `subplot` and less than `subplot2grid`

## 3. Exercise

Open `N03b - Matplotlib Exercise` and try to finish the questions there. It's challenging, I know, as you'll plot on real data, but I'd like you to try anyway. We can help

## Further reading

* http://www.matplotlib.org - The project web page for matplotlib.
* https://github.com/matplotlib/matplotlib - The source code for matplotlib.
* http://matplotlib.org/gallery.html - A large gallery showcaseing various types of plots matplotlib can create. Highly recommended! 
* http://www.loria.fr/~rougier/teaching/matplotlib - A good matplotlib tutorial.
* http://scipy-lectures.github.io/matplotlib/matplotlib.html - Another good matplotlib reference.
