<img src="images/logo-matplotlib.png" style="float:right; width: 200px; "/>

## Python for Scientific Programming

# Introduction to matplotlib

#### S. Caillou, EIT&AIC Master, 10-2019

In [None]:
%matplotlib inline
%config InlineBackend.figure_format ='retina'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.options.display.max_rows = 8

import warnings
warnings.filterwarnings('ignore')

## Plan

- Introduction
- Simple line plots
- Simple scatter plots
- Errors bars
- Boxplots
- Histograms and Binnings
- Customizing legends
- Multiple Subplots
- Text and annotation
- Settings and Stylesheets
- Three dimensional plotting
- Saving figures
- Interaction with Pandas
- What I didn't talk about
- Other Python Graphics Libraries

## Introduction

## - dry stuff - The matplotlib `Figure`, `axes` and `axis`

At the heart of **every** plot is the figure object. The "Figure" object is the top level concept which can be drawn to one of the many output formats, or simply just to screen. Any object which can be drawn in this way is known as an "Artist" in matplotlib.

Lets create our first artist using pyplot, and then show it:

In [None]:
fig = plt.figure()
plt.show()

On its own, drawing the figure artist is uninteresting and will result in an empty piece of paper (that's why we didn't see anything above).

By far the most useful artist in matplotlib is the "Ax**e**s" artist. The Axes artist represents the "data space" of a typical plot, a rectangular axes (the most common, but not always the case, e.g. polar plots) will have 2 (confusingly named) Ax**i**s artists with tick labels and tick marks.

There is no limit on the number of Axes artists which can exist on a Figure artist. Let's go ahead and create a figure with a single Axes artist, and show it using pyplot:

In [None]:
ax = plt.axes()

Matplotlib's ``pyplot`` module makes the process of creating graphics easier by allowing us to skip some of the tedious Artist construction. 

For example, we did not need to manually create the Figure artist with ``plt.figure`` because it was implicit that we needed a figure when we created the Axes artist.

Under the hood matplotlib still had to create a Figure artist, its just we didn't need to capture it into a variable. 

We can access the created object with the "state" functions found in pyplot called **``gcf``** and **``gca``**.

<center><img src="./images/410_figure_axes_axis.png" style="width: 700px";/></center>

- The Figure object:
    - Top-level matplotlib object that represents a figure.
    - Can contain one or more plot.
- The Axes object:
    - Represents a graphical support, which one can call "reference", or "subplot"
    - Contains most elements of the graph: Axis, Tick, Line2D, Text, Polygon, etc.
    - Warning: This is not an axis!
- Axis objects:
    - Coordinate axis of the graph

#### MATLAB-style Interface

Matplotlib was originally written as a Python alternative for MATLAB users, and much of its syntax reflects that fact.
The MATLAB-style tools are contained in the pyplot (``plt``) interface.
For example, the following code will probably look quite familiar to MATLAB users:

In [None]:
plt.figure()  # create a plot figure

# create the first of two panels and set current axis
plt.subplot(2, 1, 1) # (rows, columns, panel number)
plt.plot(x, np.sin(x))

# create the second panel and set current axis
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x));

It is important to note that this interface is *stateful*: it keeps track of the "current" figure and axes, which are where all ``plt`` commands are applied.
You can get a reference to these using the ``plt.gcf()`` (get current figure) and ``plt.gca()`` (get current axes) routines.

While this stateful interface is fast and convenient for simple plots, it is easy to run into problems.
For example, once the second panel is created, how can we go back and add something to the first?
This is possible within the MATLAB-style interface, but a bit clunky.
Fortunately, there is a better way.

#### Object-oriented interface

The object-oriented interface is available for these more complicated situations, and for when you want more control over your figure.
Rather than depending on some notion of an "active" figure or axes, in the object-oriented interface the plotting functions are *methods* of explicit ``Figure`` and ``Axes`` objects.
To re-create the previous plot using this style of plotting, you might do the following:

In [None]:
# First create a grid of plots
# ax will be an array of two Axes objects
fig, ax = plt.subplots(2)

# Call plot() method on the appropriate object
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x));

For more simple plots, the choice of which style to use is largely a matter of preference, but the object-oriented approach can become a necessity as plots become more complicated.

## Simple line plots

In [None]:
fig = plt.figure()
ax = plt.axes()

ax.grid(True)

In Matplotlib, the *figure* (an instance of the class ``plt.Figure``) can be thought of as a single container that contains all the objects representing axes, graphics, text, and labels.
The *axes* (an instance of the class ``plt.Axes``) is what we see above: a bounding box with ticks and labels, which will eventually contain the plot elements that make up our visualization.
Throughout this book, we'll commonly use the variable name ``fig`` to refer to a figure instance, and ``ax`` to refer to an axes instance or group of axes instances.

Once we have created an axes, we can use the ``ax.plot`` function to plot some data. Let's start with a simple sinusoid:

In [None]:
fig = plt.figure()
ax = plt.axes()

x = np.linspace(0, 10, 1000)
ax.plot(x, np.sin(x));

ax.grid(True)

Alternatively, we can use the pylab interface and let the figure and axes be created for us in the background

In [None]:
plt.plot(x, np.sin(x));

ax = plt.gca()
ax.grid(True)

### Adjusting the Plot: Line Colors and Styles

The first adjustment you might wish to make to a plot is to control the line colors and styles.
The ``plt.plot()`` function takes additional arguments that can be used to specify these.
To adjust the color, you can use the ``color`` keyword, which accepts a string argument representing virtually any imaginable color.
The color can be specified in a variety of ways:

In [None]:
plt.plot(x, np.sin(x - 0), color='blue')        # specify color by name
plt.plot(x, np.sin(x - 1), color='g')           # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75')        # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44')     # Hex code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 to 1
plt.plot(x, np.sin(x - 5), color='chartreuse'); # all HTML color names supported

If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple lines.

In [None]:
x1 = np.linspace(-2*np.pi, 2*np.pi, 100) # x entre -2pi et 2pi

plt.plot(x1, np.sin(x1), x1, np.sin(x1+1), x1, np.sin(x1+2) , x1, np.sin(x1+3))
plt.plot(x1, x1*np.sin(x));


Similarly, the line style can be adjusted using the ``linestyle`` keyword:

In [None]:
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');

# For short, you can use the following codes:
plt.plot(x, x + 4, linestyle='-')  # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':');  # dotted

If you would like to be extremely terse, these ``linestyle`` and ``color`` codes can be combined into a single non-keyword argument to the ``plt.plot()`` function:

In [None]:
plt.plot(x, x + 0, '-g')  # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r');  # dotted red

### Adjusting the Plot: Axes Limits

Matplotlib does a decent job of choosing default axes limits for your plot, but sometimes it's nice to have finer control.
The most basic way to adjust axis limits is to use the ``plt.xlim()`` and ``plt.ylim()`` methods:

In [None]:
plt.plot(x, np.sin(x))

plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);

A useful related method is ``plt.axis()`` (note here the potential confusion between *axes* with an *e*, and *axis* with an *i*).
The ``plt.axis()`` method allows you to set the ``x`` and ``y`` limits with a single call, by passing a list which specifies ``[xmin, xmax, ymin, ymax]``:

In [None]:
plt.plot(x, np.sin(x))
plt.axis([-1, 11, -1.5, 1.5]);

The ``plt.axis()`` method goes even beyond this, allowing you to do things like automatically tighten the bounds around the current plot:

In [None]:
plt.plot(x, np.sin(x))
plt.axis('tight');

It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit in ``x`` is equal to one unit in ``y``:

In [None]:
plt.plot(x, np.sin(x))
plt.axis('tight');

### Labeling Plots

Titles and axis labels are the simplest such labels—there are methods that can be used to quickly set them:

In [None]:
plt.plot(x, np.sin(x))
plt.title("A Sine Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");

The position, size, and style of these labels can be adjusted using optional arguments to the function.
For more information, see the Matplotlib documentation and the docstrings of each of these functions.

### Aside: Matplotlib Gotchas

While most ``plt`` functions translate directly to ``ax`` methods (such as ``plt.plot()`` → ``ax.plot()``, ``plt.legend()`` → ``ax.legend()``, etc.), this is not the case for all commands.

In particular, functions to set limits, labels, and titles are slightly modified.

For transitioning between MATLAB-style functions and object-oriented methods, make the following changes:

- ``plt.xlabel()``  → ``ax.set_xlabel()``
- ``plt.ylabel()`` → ``ax.set_ylabel()``
- ``plt.xlim()``  → ``ax.set_xlim()``
- ``plt.ylim()`` → ``ax.set_ylim()``
- ``plt.title()`` → ``ax.set_title()``

In the object-oriented interface to plotting, rather than calling these functions individually, it is often more convenient to use the ``ax.set()`` method to set all these properties at once:

In [None]:
ax = plt.axes()
ax.plot(x, np.sin(x))
ax.set(xlim=(0, 10), ylim=(-2, 2),
       xlabel='x', ylabel='sin(x)',
       title='A Simple Plot');

## Simple scatter plots

Another commonly used plot type is the simple scatter plot, a close cousin of the line plot.
Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape.

## Scatter Plots with ``plt.plot``

In the previous section we looked at ``plt.plot``/``ax.plot`` to produce line plots.
It turns out that this same function can produce scatter plots as well:

In [None]:
x = np.linspace(0, 10, 30)
y = np.sin(x)

plt.plot(x, y, 'o', color='black');

The third argument in the function call is a character that represents the type of symbol used for the plotting. Just as you can specify options such as ``'-'``, ``'--'`` to control the line style, the marker style has its own set of short string codes. The full list of available symbols can be seen in the documentation of ``plt.plot``, or in Matplotlib's online documentation. Most of the possibilities are fairly intuitive, and we'll show a number of the more common ones here:

In [None]:
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
    plt.plot(rng.rand(5), rng.rand(5), marker,
             label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);

For even more possibilities, these character codes can be used together with line and color codes to plot points along with a line connecting them:

In [None]:
plt.plot(x, y, '-ok');

In [None]:
Additional keyword arguments to ``plt.plot`` specify a wide range of properties of the lines and markers:

In [None]:
plt.plot(x, y, '-o', color='gray',
         markersize=15, linewidth=4,
         markerfacecolor='white',
         markeredgecolor='gray',
         markeredgewidth=2)
plt.ylim(-1.2, 1.2);

## Scatter Plots with ``plt.scatter``

A second, more powerful method of creating scatter plots is the ``plt.scatter`` function, which can be used very similarly to the ``plt.plot`` function:

In [None]:
plt.scatter(x, y, marker='o');

The primary difference of ``plt.scatter`` from ``plt.plot`` is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let's show this by creating a random scatter plot with points of many colors and sizes.
In order to better see the overlapping results, we'll also use the ``alpha`` keyword to adjust the transparency level:

In [None]:
rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,
            cmap='viridis')
plt.colorbar();  # show color scale

Notice that the color argument is automatically mapped to a color scale (shown here by the ``colorbar()`` command), and that the size argument is given in pixels.
In this way, the color and size of points can be used to convey information in the visualization, in order to visualize multidimensional data.

For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured:

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T

plt.scatter(features[0], features[1], alpha=0.2,
            s=100*features[3], c=iris.target, cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);

We can see that this scatter plot has given us the ability to simultaneously explore four different dimensions of the data:
the (x, y) location of each point corresponds to the sepal length and width, the size of the point is related to the petal width, and the color is related to the particular species of flower.
Multicolor and multifeature scatter plots like this can be useful for both exploration and presentation of data.

## Errors bars

For any scientific measurement, accurate accounting for errors is nearly as important, if not more important, than accurate reporting of the number itself.
For example, imagine that I am using some astrophysical observations to estimate the Hubble Constant, the local measurement of the expansion rate of the Universe.
I know that the current literature suggests a value of around 71 (km/s)/Mpc, and I measure a value of 74 (km/s)/Mpc with my method. Are the values consistent? The only correct answer, given this information, is this: there is no way to know.

Suppose I augment this information with reported uncertainties: the current literature suggests a value of around 71 $\pm$ 2.5 (km/s)/Mpc, and my method has measured a value of 74 $\pm$ 5 (km/s)/Mpc. Now are the values consistent? That is a question that can be quantitatively answered.

In visualization of data and results, showing these errors effectively can make a plot convey much more complete information.

### Basic Errors bars

In [None]:
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)

plt.errorbar(x, y, yerr=dy, fmt='.k');

### Continuous Errors

In [None]:
from sklearn.gaussian_process import GaussianProcess

# define the model and draw some data
model = lambda x: x * np.sin(x)
xdata = np.array([1, 3, 5, 6, 8])
ydata = model(xdata)

# Compute the Gaussian process fit
gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1E-1,
                     random_start=100)
gp.fit(xdata[:, np.newaxis], ydata)

xfit = np.linspace(0, 10, 1000)
yfit, MSE = gp.predict(xfit[:, np.newaxis], eval_MSE=True)
dyfit = 2 * np.sqrt(MSE)  # 2*sigma ~ 95% confidence region

In [None]:
# Visualize the result
plt.plot(xdata, ydata, 'or')
plt.plot(xfit, yfit, '-', color='gray')

plt.fill_between(xfit, yfit - dyfit, yfit + dyfit,
                 color='gray', alpha=0.2)
plt.xlim(0, 10);

## Boxplots

In [None]:
# Fixing random state for reproducibility
np.random.seed(19680801)

# fake up some data
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low))

In [None]:
fig1, ax1 = plt.subplots()
ax1.set_title('Basic Plot')
ax1.boxplot(data);

In [None]:
green_diamond = dict(markerfacecolor='r', marker='D')
fig3, ax3 = plt.subplots()
ax3.set_title('Changed Outlier Symbols')
ax3.boxplot(data, flierprops=green_diamond);

In [None]:
spread = np.random.rand(50) * 100
center = np.ones(25) * 40
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
d2 = np.concatenate((spread, center, flier_high, flier_low))
data.shape = (-1, 1)
d2.shape = (-1, 1)

In [None]:
data = [data, d2, d2[::2,0]]
fig7, ax7 = plt.subplots()
ax7.set_title('Multiple Samples with Different sizes')
ax7.boxplot(data)

plt.show()

## Histograms and Binnings

A simple histogram can be a great first step in understanding a dataset.

In [None]:
data = np.random.randn(1000)

In [None]:
plt.hist(data);

In [None]:
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

n, bins, patches = plt.hist(x, bins=100, density=1, facecolor='b', alpha=0.5);

The ``plt.hist`` docstring has more information on other customization options available.
I find this combination of ``histtype='stepfilled'`` along with some transparency ``alpha`` to be very useful when comparing histograms of several distributions:

In [None]:
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

kwargs = dict(histtype='stepfilled', alpha=0.3, normed=True, bins=40)

plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);

### Two-Dimensional Histograms and Binnings

Just as we create histograms in one dimension by dividing the number-line into bins, we can also create histograms in two-dimensions by dividing points among two-dimensional bins.
We'll take a brief look at several ways to do this here.
We'll start by defining some data—an ``x`` and ``y`` array drawn from a multivariate Gaussian distribution:

In [None]:
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T

#### ``plt.hist2d``: Two-dimensional histogram

One straightforward way to plot a two-dimensional histogram is to use Matplotlib's ``plt.hist2d`` function:

In [None]:
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('counts in bin')

#### ``plt.hexbin``: Hexagonal binnings

The two-dimensional histogram creates a tesselation of squares across the axes.
Another natural shape for such a tesselation is the regular hexagon.
For this purpose, Matplotlib provides the ``plt.hexbin`` routine, which will represents a two-dimensional dataset binned within a grid of hexagons:

In [None]:
plt.hexbin(x, y, gridsize=30, cmap='Blues')
cb = plt.colorbar(label='count in bin')

In [None]:
# create data
x = np.random.normal(size=50000)
y = (x * 3 + np.random.normal(size=50000)) * 5

fig, (ax1, ax2) = plt.subplots(1,2,figsize=(10,4))

# Control the color
img1 = ax1.hexbin(x, y, gridsize=(15,15))
fig.colorbar(img1, ax=ax1)

img2 = ax2.hexbin(x, y, gridsize=(150,150))
fig.colorbar(img2, ax=ax2)

plt.show()

#### Kernel density estimation

Another common method of evaluating densities in multiple dimensions is *kernel density estimation* (KDE).
This will be discussed more fully in [In-Depth: Kernel Density Estimation](05.13-Kernel-Density-Estimation.ipynb), but for now we'll simply mention that KDE can be thought of as a way to "smear out" the points in space and add up the result to obtain a smooth function.
One extremely quick and simple KDE implementation exists in the ``scipy.stats`` package.
Here is a quick example of using the KDE on this data:

In [None]:
from scipy.stats import gaussian_kde

# fit an array of size [Ndim, Nsamples]
data = np.vstack([x, y])
kde = gaussian_kde(data)

# evaluate on a regular grid
xgrid = np.linspace(-3.5, 3.5, 40)
ygrid = np.linspace(-6, 6, 40)
Xgrid, Ygrid = np.meshgrid(xgrid, ygrid)
Z = kde.evaluate(np.vstack([Xgrid.ravel(), Ygrid.ravel()]))

# Plot the result as an image
plt.imshow(Z.reshape(Xgrid.shape),
           origin='lower', aspect='auto',
           extent=[-3.5, 3.5, -6, 6],
           cmap='Blues')
cb = plt.colorbar()
cb.set_label("density")

## Customizing legends

In [None]:
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '-b', label='Sine')
ax.plot(x, np.cos(x), '--r', label='Cosine')
ax.axis('equal')
leg = ax.legend();

In [None]:
t = np.arange(0.0, 6.0, 0.05)
amplitude = np.exp(-t) * np.cos(2*np.pi*t)
plt.plot(t, amplitude, label = r'$\exp(-t)*\sin(2\pi t)$')
plt.plot(t, np.exp(-t), label = r'$exp(-t)$')

plt.legend()
plt.legend(fontsize=16, loc='upper right', framealpha=0.6)

plt.show()

## Multiple Subplots

### ``plt.axes``: Subplots by Hand

In [None]:
ax1 = plt.axes()  # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])

In [None]:
fig = plt.figure()
ax1 = fig.add_axes([0.1, 0.5, 0.8, 0.4],
                   xticklabels=[], ylim=(-1.2, 1.2))
ax2 = fig.add_axes([0.1, 0.1, 0.8, 0.4],
                   ylim=(-1.2, 1.2))

x = np.linspace(0, 10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x));

### ``plt.subplot``: Simple Grids of Subplots

In [None]:
for i in range(1, 7):
    plt.subplot(2, 3, i)
    plt.text(0.5, 0.5, str((2, 3, i)),
             fontsize=18, ha='center')

In [None]:
fig = plt.figure()
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(1, 7):
    ax = fig.add_subplot(2, 3, i)
    ax.text(0.5, 0.5, str((2, 3, i)),
           fontsize=18, ha='center')

### plt.subplots: The Whole Grid in One Go

In [None]:
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

In [None]:
# axes are in a two-dimensional array, indexed by [row, col]
for i in range(2):
    for j in range(3):
        ax[i, j].text(0.5, 0.5, str((i, j)),
                      fontsize=18, ha='center')
fig

## Text and annotation

In [None]:
mu, sigma = 180, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, density=1, facecolor='g', alpha=0.75)

plt.xlabel('data', size=20)
plt.ylabel('Probability', size = 20)
plt.title('Histogram of data', size=20)
plt.text(190, .026, r'$\mu=100,\ \sigma=15$', size=16)
plt.grid(True)

plt.show()

In [None]:
#plt.style.use('default')
plt.rcParams['mathtext.fontset'] = 'cm'
plt.rcParams['mathtext.rm'] = 'serif'

x = np.linspace(0, 5, 10)

fig, ax = plt.subplots(figsize=(5,3))

ax.plot(x, x**2, label="$y=x^2$")
ax.plot(x, x**3, label="$y=x^3$")
ax.legend(loc=2, fontsize=16);
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_title('2 plots $ y = f(x)$', fontsize=20);
plt.show()

In [None]:
#from matplotlib import rc, rcParams
#plt.style.use('default')
plt.rcParams['mathtext.fontset'] = 'cm'
plt.rcParams['mathtext.rm'] = 'serif'

t = np.arange(0.0, 2.0, 0.01)
s = np.sin(2*np.pi*t)

fig = plt.figure(figsize = (6,4))
ax = plt.axes()

plt.plot(t,s)
plt.title(r'$\alpha_i > \beta_i$', fontsize=20)
plt.text(1, -0.6, r'$\sum_{i=0}^\infty x_i$', fontsize=20)
plt.text(0.55, 0.6, r'$\mathcal{A}\mathrm{sin}(2 \omega t)$', fontsize=20)
plt.text(1.6, 0.75, '$rendu$ \n$mathtext$', color="purple", style='italic', fontsize=15)
plt.xlabel('time (s)')
plt.ylabel('volts (mV)')
plt.show()

## Settings and Stylesheets

### Changing the Defaults: ``rcParams``

Each time Matplotlib loads, it defines a runtime configuration (rc) containing the default styles for every plot element you create.
This configuration can be adjusted at any time using the ``plt.rc`` convenience routine.
Let's see what it looks like to modify the rc parameters so that our default plot will look similar to what we did before.

We'll start by saving a copy of the current ``rcParams`` dictionary, so we can easily reset these changes in the current session:

In [None]:
IPython_default = plt.rcParams.copy()

In [None]:
from matplotlib import cycler
colors = cycler('color',
                ['#EE6666', '#3388BB', '#9988DD',
                 '#EECC55', '#88BB44', '#FFBBBB'])
plt.rc('axes', facecolor='#E6E6E6', edgecolor='none',
       axisbelow=True, grid=True, prop_cycle=colors)
plt.rc('grid', color='w', linestyle='solid')
plt.rc('xtick', direction='out', color='gray')
plt.rc('ytick', direction='out', color='gray')
plt.rc('patch', edgecolor='#E6E6E6')
plt.rc('lines', linewidth=2)

In [None]:
x = np.random.randn(1000)
plt.hist(x);

In [None]:
for i in range(4):
    plt.plot(np.random.rand(10))

In [None]:
print(plt.style.available)

In [None]:
The basic way to switch to a stylesheet is to call

``` python
plt.style.use('stylename')
```

But keep in mind that this will change the style for the rest of the session!
Alternatively, you can use the style context manager, which sets a style temporarily:

``` python
with plt.style.context('stylename'):
    make_a_plot()
```


In [None]:
def hist_and_lines():
    np.random.seed(0)
    fig, ax = plt.subplots(1, 2, figsize=(11, 4))
    ax[0].hist(np.random.randn(1000))
    for i in range(3):
        ax[1].plot(np.random.rand(10))
    ax[1].legend(['a', 'b', 'c'], loc='lower left')

In [None]:
# reset rcParams
plt.rcParams.update(IPython_default);

In [None]:
hist_and_lines()

In [None]:
with plt.style.context('seaborn'):
    hist_and_lines()

In [None]:
with plt.style.context('Solarize_Light2'):
    hist_and_lines()

In [None]:
with plt.style.context('grayscale'):
    hist_and_lines()

In [None]:
with plt.style.context('seaborn-whitegrid'):
    hist_and_lines()

In [None]:
with plt.xkcd():
    hist_and_lines()

In [None]:
with plt.xkcd():
    # Based on "Stove Ownership" from XKCD by Randall Munroe
    # https://xkcd.com/418/

    fig = plt.figure()
    ax = fig.add_axes((0.1, 0.2, 0.8, 0.7))
    ax.spines['right'].set_color('none')
    ax.spines['top'].set_color('none')
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_ylim([-30, 10])

    data = np.ones(100)
    data[70:] -= np.arange(30)

    ax.annotate(
        'THE DAY I BEGAN\nMY MASTER CLASS',
        xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10))

    ax.plot(data)

    ax.set_xlabel('time')
    ax.set_ylabel('my overall health')

## Three dimensional plotting

Matplotlib was initially designed with only two-dimensional plotting in mind.
Around the time of the 1.0 release, some three-dimensional plotting utilities were built on top of Matplotlib's two-dimensional display, and the result is a convenient (if somewhat limited) set of tools for three-dimensional data visualization.
three-dimensional plots are enabled by importing the ``mplot3d`` toolkit, included with the main Matplotlib installation:

In [None]:
from mpl_toolkits import mplot3d

Once this submodule is imported, a three-dimensional axes can be created by passing the keyword ``projection='3d'`` to any of the normal axes creation routines:

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')

In [None]:
#%matplotlib notebook
%matplotlib inline
ax = plt.axes(projection='3d')

# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')

# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');
plt.show()

In [None]:
def f(x, y):
    return np.sin(np.sqrt(x ** 2 + y ** 2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');

In [None]:
ax.view_init(60, 35)
fig

In [None]:
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,
                cmap='viridis', edgecolor='none')
ax.set_title('surface');

### Surface Triangulations

For some applications, the evenly sampled grids required by the above routines is overly restrictive and inconvenient.
In these situations, the triangulation-based plots can be very useful.
What if rather than an even draw from a Cartesian or a polar grid, we instead have a set of random draws?

In [None]:
theta = 2 * np.pi * np.random.random(1000)
r = 6 * np.random.random(1000)
x = np.ravel(r * np.sin(theta))
y = np.ravel(r * np.cos(theta))
z = f(x, y)

In [None]:
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, c=z, cmap='viridis', linewidth=0.5);

In [None]:
ax = plt.axes(projection='3d')
ax.plot_trisurf(x, y, z,
                cmap='viridis', edgecolor='none');

## Interaction with Pandas

What we have been doing while plotting with Pandas:

In [None]:
import pandas as pd

In [None]:
aqdata = pd.read_csv('data/20000101_20161231-NO2.csv', sep=';', skiprows=[1], na_values=['n/d'],
                     index_col=0, parse_dates=True)
aqdata = aqdata["2014":].resample('D').mean()

In [None]:
aqdata.plot()

### The pandas versus matplotlib

#### Comparison 1: single plot

In [None]:
aqdata.plot(figsize=(16, 6)) # shift tab this!

Making this with matplotlib...

In [None]:
fig, ax = plt.subplots(figsize=(16, 6))
ax.plot(aqdata.index, aqdata["BASCH"],
        aqdata.index, aqdata["BONAP"], 
        aqdata.index, aqdata["PA18"],
        aqdata.index, aqdata["VERS"])
ax.legend(["BASCH", "BONAP", "PA18", "VERS"])

or...

In [None]:
fig, ax = plt.subplots(figsize=(16, 6))
for station in aqdata.columns:
    ax.plot(aqdata.index, aqdata[station], label=station)
ax.legend()

#### Comparison 2: with subplots

In [None]:
axs = aqdata.plot(subplots=True, sharex=True,
                  figsize=(16, 8), colormap='viridis', # Dark2
                  fontsize=15)

Mimicking this in matplotlib (just as a reference):

In [None]:
from matplotlib import cm
import matplotlib.dates as mdates

colors = [cm.viridis(x) for x in np.linspace(0.0, 1.0, len(aqdata.columns))] # list comprehension to set up the colors

fig, axs = plt.subplots(4, 1, figsize=(16, 8))

for ax, col, station in zip(axs, colors, aqdata.columns):
    ax.plot(aqdata.index, aqdata[station], label=station, color=col)
    ax.legend()
    if not ax.is_last_row():
        ax.xaxis.set_ticklabels([])
        ax.xaxis.set_major_locator(mdates.YearLocator())
    else:
        ax.xaxis.set_major_locator(mdates.YearLocator())
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
        ax.set_xlabel('Time')
    ax.tick_params(labelsize=15)
fig.autofmt_xdate()

### Best of both worlds...

In [None]:
aqdata.columns

In [None]:
fig, ax = plt.subplots() #prepare a matplotlib figure

aqdata.plot(ax=ax) # use pandas for the plotting

# Provide further adaptations with matplotlib:
ax.set_xlabel("")
ax.tick_params(labelsize=15, pad=8, which='both')
fig.suptitle('Air quality station time series', fontsize=15)

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1) #provide with matplotlib 2 axis

aqdata[["BASCH", "BONAP"]].plot(ax=ax1) # plot the two timeseries of the same location on the first plot
aqdata["PA18"].plot(ax=ax2) # plot the other station on the second plot

# further adapt with matplotlib
ax1.set_ylabel("BASCH")
ax2.set_ylabel("PA18")
ax2.legend()

<div class="alert alert-info">

 <b>Remember</b>: 

 <ul>
  <li>You can do anything with matplotlib, but at a cost... [stackoverflow!!](http://stackoverflow.com/questions/tagged/matplotlib)</li>
  <li>The preformatting of Pandas provides mostly enough flexibility for quick analysis and draft reporting. It is not for paper-proof figures or customization</li>
</ul>
<br>


</div>

## Saving figures

In [None]:
import numpy as np
x = np.linspace(0, 10, 100)

fig = plt.figure()
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(4*x), '--');

One nice feature of Matplotlib is the ability to save figures in a wide variety of formats.
Saving a figure can be done using the ``savefig()`` command.
For example, to save the previous figure as a PNG file, you can run this:

In [None]:
fig.savefig('./images/my_figure.png')

In [None]:
# commande pour visualiser la création du fichier
%ls -lah ./images/my_figure.png

To confirm that it contains what we think it contains, let's use the IPython ``Image`` object to display the contents of this file:

In [None]:
from IPython.display import Image
Image('./images/my_figure.png')

In [None]:
# plt.savefig a ses propres valeurs des paramètres par défaut: help(plt.savefig)
plt.savefig('./images/my_figure1.png')
plt.savefig('./images/my_figure2.png', dpi = 600) # fixe la résolution.
plt.savefig('./images/my_figure3.png', transparent = True) # fond transparent.
plt.savefig('./images/my_figure4.png', facecolor=fig.get_facecolor()) # on récupère la couleur de fond de fig 

In ``savefig()``, the file format is inferred from the extension of the given filename.
Depending on what backends you have installed, many different file formats are available.
The list of supported file types can be found for your system by using the following method of the figure canvas object:

In [None]:
fig.canvas.get_supported_filetypes()

## What I didn't talk about

- Density and contour plots
- Customizing colormaps
- customizing ticks
- Geographic data with basemap
...

## Other Python Graphics Libraries

We only use matplotlib (or matplotlib-based plotting) in this notebook, and it is still the main plotting library for many scientists, but it is not the only existing plotting library. 

Although Matplotlib is the most prominent Python visualization library, there are other more modern tools that are worth exploring as well.

- [Seaborn](https://seaborn.pydata.org/) the main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.
- [Bokeh](http://bokeh.pydata.org) is a JavaScript visualization library with a Python frontend that creates highly interactive visualizations capable of handling very large and/or streaming datasets. The Python front-end outputs a JSON data structure that can be interpreted by the Bokeh JS engine.
- [Plotly](http://plot.ly) is the eponymous open source product of the Plotly company, and is similar in spirit to Bokeh. Because Plotly is the main product of a startup, it is receiving a high level of development effort. Use of the library is entirely free.
- [Vispy](http://vispy.org/) is an actively developed project focused on dynamic visualizations of very large datasets. Because it is built to target OpenGL and make use of efficient graphics processors in your computer, it is able to render some quite large and stunning visualizations.
- [Vega](https://vega.github.io/) and [Vega-Lite](https://vega.github.io/vega-lite) are declarative graphics representations, and are the product of years of research into the fundamental language of data visualization. The reference rendering implementation is JavaScript, but the API is language agnostic. There is a Python API under development in the [Altair](https://altair-viz.github.io/) package.
- ...

A nice overview of the landscape of visualisation tools in python was recently given by Jake VanderPlas: (or matplotlib-based plotting): https://speakerdeck.com/jakevdp/pythons-visualization-landscape-pycon-2017
 

### Seaborn: Statistical Exploration

In [None]:
import seaborn as sns

* Built on top of Matplotlib, but providing
    1. High level functions
    2. Much cleaner default figures
* Works well with Pandas

In [None]:
We will use the Titanic example again:

In [None]:
titanic = pd.read_csv('data/titanic.csv')

In [None]:
titanic.head()

#### Histograms, KDE, and densities

**Histogram**: Getting the univariaite distribution of the `Age`

In [None]:
fig, ax = plt.subplots()
sns.kdeplot(titanic["Age"].dropna(), color='#1f77b4', shade=True, ax=ax)
sns.distplot(titanic["Age"].dropna(), color='#1f77b4', ax=ax)# Seaborn does not like Nan values...

#sns.rugplot(titanic["Age"].dropna(), color="g", ax=ax) # rugplot provides lines at the individual data point locations
ax.set_ylabel("Frequency")
plt.show()

<div class="alert alert-info">

 <b>Remember</b>: 

Similar to Pandas handling above, we can set up a `figure` and `axes` and add the seaborn output to it; adapt it afterwards

</div>

Compare two variables (**scatter-plot**):

In [None]:
g = sns.jointplot(x="Fare", y="Age", 
                  data=titanic, 
                  kind="scatter") #kde, hex

In [None]:
g = sns.jointplot(x="Fare", y="Age", 
                  data=titanic, 
                  kind="scatter") #kde, hex

In [None]:
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=200)
data = pd.DataFrame(data, columns=['x', 'y'])

In [None]:
sns.jointplot("x", "y", data=data, kind='reg');

#### Pair plots

In [None]:
iris = sns.load_dataset("iris")
iris.head()

In [None]:
sns.pairplot(iris, hue='species', size=2.5);

#### Factor plots

Factor plots can be useful for this kind of visualization as well. This allows you to view the distribution of a parameter within bins defined by any other parameter:

In [None]:
sns.factorplot(x="Sex", 
               y="Fare", 
               col="Pclass", 
               data=titanic) #kind='strip' # violin,...
plt.show()

In [None]:
sns.factorplot(x="Sex", y="Fare", col="Pclass", row="Embarked", 
               data=titanic, kind='bar')
plt.show()

In [None]:
g = sns.factorplot(x="Survived", y="Fare", hue="Sex",
                   col="Embarked", data=titanic, 
                   kind="box", size=4, aspect=.5);
g.fig.set_figwidth(15)
g.fig.set_figheight(6)

### Bokeh: Interactive, web-based Visualization

In [None]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

output_notebook()

In [None]:
from IPython.display import HTML
HTML('<iframe src="https://bokeh.pydata.org/en/latest/docs/gallery.html" width=1000 height=500></iframe>')

In [None]:
from bokeh.plotting import figure, show

# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]

# output to the notebook
output_notebook()

# create a new plot
p = figure(
   tools="pan,box_zoom,reset,save",
   y_axis_type="log", y_range=[0.001, 10**11], title="log axis example",
   x_axis_label='sections', y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=8)
p.line(x, y0, legend="y=x^2", line_width=3)
p.line(x, y1, legend="y=10^x", line_color="red")
p.circle(x, y1, legend="y=10^x", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

In [None]:
p = figure(plot_width=500, plot_height=500)

from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=1000, 
                  centers=3, 
                  random_state=42, 
                  cluster_std=1)

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

centers = kmeans.cluster_centers_

p.circle(X[:, 0], X[:, 1], size=14, alpha=0.2)
#p.circle(centers[:, 0], centers[:, 1], size=kmeans.inertia_/12, alpha=0.2, color='green') 

show(p)
#kmeans.keys

In [None]:
from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers


colormap = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}
colors = [colormap[x] for x in flowers['species']]

p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Petal Width'

p.circle(flowers["petal_length"], flowers["petal_width"],
         color=colors, fill_alpha=0.2, size=10)
show(p)

#### Hexbin histogram 2-D

In [None]:
import numpy as np

from bokeh.io import show
from bokeh.plotting import figure
from bokeh.transform import linear_cmap
from bokeh.util.hex import hexbin

n = 50000
x = np.random.standard_normal(n)
y = np.random.standard_normal(n)

bins = hexbin(x, y, 0.1)

p = figure(tools="wheel_zoom,reset", match_aspect=True, background_fill_color='#440154')
p.grid.visible = False

p.hex_tile(q="q", r="r", size=0.1, line_color=None, source=bins,
           fill_color=linear_cmap('counts', 'Viridis256', 0, max(bins.counts)))

show(p)

#### Image rendering

In [None]:
from __future__ import division
import numpy as np

# set up some data
N = 20
img = np.empty((N,N), dtype=np.uint32)
view = img.view(dtype=np.uint8).reshape((N, N, 4))
for i in range(N):
    for j in range(N):
        view[i, j, 0] = int(i/N*255) # red
        view[i, j, 1] = 158          # green
        view[i, j, 2] = int(j/N*255) # blue
        view[i, j, 3] = 255          # alpha
        
# create a new plot (with a fixed range) using figure
p = figure(x_range=[0,10], y_range=[0,10])

# add an RGBA image renderer
p.image_rgba(image=[img], x=[0], y=[0], dw=[10], dh=[10])

show(p) # show the results

#### select data

In [None]:
p = figure(plot_width=400, plot_height=400, tools="tap", title="Select a circle")
renderer = p.circle([1, 2, 3, 4, 5], [2, 5, 8, 2, 7], size=50,

                    # set visual properties for selected glyphs
                    selection_color="firebrick",

                    # set visual properties for non-selected glyphs
                    nonselection_fill_alpha=0.2,
                    nonselection_fill_color="grey",
                    nonselection_line_color="firebrick",
                    nonselection_line_alpha=1.0)

show(p)

In [None]:
from bokeh.models.tools import HoverTool
from bokeh.sampledata import download
#download()
from bokeh.sampledata.glucose import data

subset = data.loc['2010-10-06']

x, y = subset.index.to_series(), subset['glucose']

# Basic plot setup
p = figure(width=600, height=300, x_axis_type="datetime", title='Hover over points')

p.line(x, y, line_dash="4 4", line_width=1, color='gray')

cr = p.circle(x, y, size=20,
              fill_color="grey", hover_fill_color="firebrick",
              fill_alpha=0.05, hover_alpha=0.3,
              line_color=None, hover_line_color="white")

p.add_tools(HoverTool(tooltips=None, renderers=[cr], mode='hline'))

show(p)

https://demo.bokeh.org/selection_histogram

In [None]:
import pandas as pd

from bokeh.plotting import figure
from bokeh.sampledata.stocks import AAPL

df = pd.DataFrame(AAPL)
df['date'] = pd.to_datetime(df['date'])

In [None]:
p = figure(plot_width=800, plot_height=250, x_axis_type="datetime")
p.line(df['date'], df['close'], color='navy', alpha=0.5)

show(p)

#### Web app with Bokeh!

In [None]:
%%writefile hello.py
# hello.py 

from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models.widgets import TextInput, Button, Paragraph

# create some widgets
button = Button(label="Say HI")
input = TextInput(value="Student!")
output = Paragraph()

# add a callback to a widget
def update():
    output.text = "Hello," + input.value
button.on_click(update)

# create a layout for everything
layout = column(button, input, output)

# add the layout to curdoc
curdoc().add_root(layout)



In [None]:
!bokeh serve --show hello.py

In [None]:
%%writefile app.py
# app.py

from numpy.random import random

from bokeh.io import curdoc
from bokeh.layouts import column, row
from bokeh.plotting import ColumnDataSource, Figure
from bokeh.models.widgets import Select, TextInput

def get_data(N):
    return dict(x=random(size=N), y=random(size=N), r=random(size=N) * 0.03)

source = ColumnDataSource(data=get_data(200))

p = Figure(tools="", toolbar_location=None)
r = p.circle(x='x', y='y', radius='r', source=source,
             color="navy", alpha=0.6, line_color="white")

COLORS = ["black", "firebrick", "navy", "olive", "goldenrod"]
select = Select(title="Color", value="navy", options=COLORS)
input = TextInput(title="Number of points", value="200")

def update_color(attrname, old, new):
    r.glyph.fill_color = select.value
select.on_change('value', update_color)

def update_points(attrname, old, new):
    N = int(input.value)
    source.data = get_data(N)
input.on_change('value', update_points)

layout = column(row(select, input, width=400), row(p))

curdoc().add_root(layout)


In [None]:
!bokeh serve --show app.py

In [None]:
%%writefile app2.py
# app2.py

# myapp.py

from random import random

from bokeh.layouts import column
from bokeh.models import Button
from bokeh.palettes import RdYlBu3
from bokeh.plotting import figure, curdoc

# create a plot and style its properties
p = figure(x_range=(0, 100), y_range=(0, 100), toolbar_location=None)
p.border_fill_color = 'black'
p.background_fill_color = 'black'
p.outline_line_color = None
p.grid.grid_line_color = None

# add a text renderer to our plot (no data yet)
r = p.text(x=[], y=[], text=[], text_color=[], text_font_size="20pt",
           text_baseline="middle", text_align="center")

i = 0

ds = r.data_source

# create a callback that will add a number in a random location
def callback():
    global i

    # BEST PRACTICE --- update .data in one step with a new dict
    new_data = dict()
    new_data['x'] = ds.data['x'] + [random()*70 + 15]
    new_data['y'] = ds.data['y'] + [random()*70 + 15]
    new_data['text_color'] = ds.data['text_color'] + [RdYlBu3[i%3]]
    new_data['text'] = ds.data['text'] + [str(i)]
    ds.data = new_data

    i = i + 1

# add a button widget and configure with the call back
button = Button(label="Press Me")
button.on_click(callback)

# put the button and plot in a layout and add to the document
curdoc().add_root(column(button, p))

In [None]:
!bokeh serve --show app2.py

In [None]:
%%writefile stream.py
# stream.py
from math import cos, sin

from bokeh.io import curdoc
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure

p = figure(x_range=(-1.1, 1.1), y_range=(-1.1, 1.1))
p.circle(x=0, y=0, radius=1, fill_color=None, line_width=2)

# this is the data source we will stream to
source = ColumnDataSource(data=dict(x=[1], y=[0]))
p.circle(x='x', y='y', size=12, fill_color='white', source=source)

def update():
    x, y = source.data['x'][-1], source.data['y'][-1]

    # construct the new values for all columns, and pass to stream
    new_data = dict(x=[x*cos(0.1) - y*sin(0.1)], y=[x*sin(0.1) + y*cos(0.1)])
    source.stream(new_data, rollover=8)

curdoc().add_periodic_callback(update, 150)
curdoc().add_root(p)


In [None]:
!bokeh serve --show stream.py

---
# Acknowledgement


> This notebook is partly based on material of © 2016, Joris Van den Bossche and Stijn Van Hoey  (<mailto:jorisvandenbossche@gmail.com>, <mailto:stijnvanhoey@gmail.com>, licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/) and partly on material of the Met Office (Copyright (C) 2013 SciTools, GPL licensed): https://github.com/SciTools/courses
