# Python Modules - Matplotlib and Seaborn

*Dr Chas Nelson and Mikolaj Kundegorski*

*Part of https://github.com/ChasNelson1990/python-zero-to-hero-beginners-course*

## Objectives

* Know about the plotting functions provided by Matplotlib (`matplotlib`)
* Know about the plotting functions provided by Seaborn (`seaborn`)
* Know how to plot a scatterplot (with a regression model) with Seaborn
* Know how to plot boxplots with Seaborn
* Know how to edit and save plots with Matplotlib
* See that it is possible to do more complex plotting with Seaborn

## Matplotlib

Matplotlib (`matplotlib`) is the most widely used scientific plotting module in Python. Many other modules are built upon Matplotlib and we will explore one of these in particular: Seaborn.

Matplotlib is a huge module and we will only introduce you to a few plotting tools today.

In order to make Jupyter show plots just saved with command `plt.savefig()` we need to use a 'magic' command: `%matplotlib inline`

Most of the functions we will need are in the `matplotlib.pyplot` submodule - so we will only import that today.

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.1:</strong> In the cell below, add a line to import the <code>matplotlib.pyplot</code> submodule. It is conventional to give this the alias <code>plt</code>. In the same cell import the <code>pandas</code> module.
<br/>
If you get stuck, see the video <a href='https://youtu.be/8eMkAYZYGEs'>here</a> for a walkthrough, which also covers the next task.</div>

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.2:</strong> Find the Matplotlib Documentation on-line. Can you easily navigate the documentation to find useful functions such as <code>plot</code>?
<br/>
When you've done this and the previous task, or if you get stuck, see the video <a href='https://youtu.be/8eMkAYZYGEs'>here</a> for a walkthrough, which also includes the previous task.</div>

In [None]:
import pandas as pd

# Add you imports here


iris = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")
display(iris.head())

### Scatter Plotting with Matplotlib

Plotting with Matplotlib is powerful but can be complicated (especially when you first start).

The basic framework for a Matplotlib figure is the following:

```python
plt.figure()

<FIGURE CODE>  # Where line and scatter plots are added

plt.legend()

plt.title('Plot Title')
plt.xlabel('X Axis Label')
plt.ylabel('Y Axis Label')

plt.savefig('myplot.png')  # To save the plot
plt.show()  # To display the plot

```

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.3:</strong> Run the following cell to show how to create a scatter plot between two variables for each iris species (using a different colour) with a linear regression model fit for each. Don't worry about understanding everything - this is just to show you the complexities of plotting.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/wLPjm0XR7es'>here</a> for a walkthrough.</div>

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats

# Create figure thats 5 by 5 inches
plt.figure(figsize=[5, 5])

# Create a mask for each species
mask_setosa = iris.loc[:, "species"] == "setosa"
mask_versicolor = iris.loc[:, "species"] == "versicolor"
mask_virginica = iris.loc[:, "species"] == "virginica"

# Plot a scatter for each species in a unqiue colour showing sepal_length against sepal_width
plt.scatter(
    iris.loc[mask_setosa, "sepal_length"],
    iris.loc[mask_setosa, "sepal_width"],
    color="#FF0000",
    label="Setosa",
)
plt.scatter(
    iris.loc[mask_versicolor, "sepal_length"],
    iris.loc[mask_versicolor, "sepal_width"],
    color="#00FF00",
    label="Versicolor",
)
plt.scatter(
    iris.loc[mask_virginica, "sepal_length"],
    iris.loc[mask_virginica, "sepal_width"],
    color="#0000FF",
    label="Virginica",
)

# Calculate a linear regression model for each species
(
    slope_setosa,
    intercept_setosa,
    r_value_setosa,
    p_value_setosa,
    std_err_setosa,
) = scipy.stats.linregress(iris.loc[mask_setosa, "sepal_length"], iris.loc[mask_setosa, "sepal_width"])
(
    slope_versicolor,
    intercept_versicolor,
    r_value_versicolor,
    p_value_versicolor,
    std_err_versicolor,
) = scipy.stats.linregress(iris.loc[mask_versicolor, "sepal_length"], iris.loc[mask_versicolor, "sepal_width"])
(
    slope_virginica,
    intercept_virginica,
    r_value_virginica,
    p_value_virginica,
    std_err_virginica,
) = scipy.stats.linregress(iris.loc[mask_virginica, "sepal_length"], iris.loc[mask_virginica, "sepal_width"])

# Plot a line for each model over the range of sepal widths using the colours from the appropriate scatter
x_Setosa = np.linspace(
    iris.loc[mask_setosa, "sepal_length"].min(),
    iris.loc[mask_setosa, "sepal_length"].max(),
    100,
)
y_Setosa = slope_setosa * x_Setosa + intercept_setosa
plt.plot(x_Setosa, y_Setosa, color="#FF0000")
x_versicolor = np.linspace(
    iris.loc[mask_versicolor, "sepal_length"].min(),
    iris.loc[mask_versicolor, "sepal_length"].max(),
    100,
)
y_versicolor = slope_versicolor * x_versicolor + intercept_versicolor
plt.plot(x_versicolor, y_versicolor, color="#00FF00")
x_virginica = np.linspace(
    iris.loc[mask_virginica, "sepal_length"].min(),
    iris.loc[mask_virginica, "sepal_length"].max(),
    100,
)
y_virginica = slope_virginica * x_virginica + intercept_virginica
plt.plot(x_virginica, y_virginica, color="#0000FF")

# Add a legend
plt.legend()

# Add a title and axis labels
plt.title("Sepal length against sepal width")
plt.ylabel("Sepal width")
plt.xlabel("Sepal length")

plt.savefig("my_matplotlib_figure.png")

## Seaborn

I'm sure we all agree that that's quite a lot of code - and quite daunting if you've never seen it before. But don't worry! Seaborn is here to make you life easier.

Matplotlib is an extremely powerful module. However, it can be complex, so some packages, like Seaborn, build upon Matplotlib to make plotting a little quicker and easier.

### Scatter Plotting with `seaborn`

Let's start by recreating the plot above.

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.4:</strong> Run the following cell to show how import <code>seaborn</code> and to create a scatter plot between two variables for each iris species (using a different colour) with a linear regression model fit for each.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/YQrCY9YWUr0'>here</a> for a walkthrough.</div>

In [None]:
# Imports
import seaborn as sns

# Create a plot of sepal_length vs sepal_width where colour (hue) is controlled by the species
#
# 'height' controls the figure height in inches
# 'truncate' prevents the regression extending beyond the data
sns.lmplot(x="sepal_length", y="sepal_width", data=iris, hue="species", height=5, truncate=True)

# Save figure
plt.savefig("my_seaborn_figure.png")

### Faceted plotting with Seaborn

Isn't that a lot simpler?!

Seaborn is doing all the hard work for you - it creates the figure, the scatter plots, the legend and it does the regression and plots the model with error bounds too.

But what if we want to split the data across three plots? Again, Seaborn comes to the rescue.

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.5:</strong> Compare the following code cell to the code cell above. Can you spot the difference? Run the cell to show how easy it is to create a faceted plot (which is what this is called).
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/rLcfMBMMNKM'>here</a> for a walkthrough.</div>

In [None]:
# Imports
import seaborn as sns

# Create a plot of sepal_length vs sepal_width where colour (hue) is controlled by the species
# height controls the figure height in inches
# truncate prevents the regression extending beyond the data
sns.lmplot(
    x="sepal_length",
    y="sepal_width",
    data=iris,
    hue="species",
    col="species",
    height=5,
    truncate=True,
)

# Save figure
plt.savefig("my_faceted_seaborn_figure.png")

## Boxplots

Scatter and line plots are all part of Seaborn's relational plot tools. But sometimes we have categorical data (such as species) and might want to use box plots to explore this data.

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.6:</strong> Read the cell below. This cells aims to create a boxplot using <code>seaborn</code> for the sepal widths of each species (each species should be a different colour). Create a new Markdown cell below and write down, in plain English, what each line is doing. What does <code>.melt()</code> do and why is it needed?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/8HMumN8EGYo'>here</a> for a walkthrough.</div>

In [None]:
# 'Melt' the data
iris_melted = iris.melt(
    id_vars="species",
    value_vars=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    var_name="measure",
    value_name="measurement",
)

# Plot the melted data
sns.catplot(
    x="species",
    y="measurement",
    col="measure",
    data=iris_melted,
    kind="box",
    height=5,
    aspect=0.5,
)

# Save the plot
plt.savefig("my_seaborn_boxplot.png")

## Plotting Contexts

And finally, we often make plots for different purposes. And Seaborn has, yet again, got us covered. 

<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.7:</strong> Run the following cell to show how the same scatter plot as above can be easily replicated with subtle display difference for four different contexts.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/HSj180pc0q4'>here</a> for a walkthrough.</div>

In [None]:
with sns.plotting_context("notebook"):
    sns.lmplot(
        x="sepal_length",
        y="sepal_width",
        data=iris,
        hue="species",
        height=5,
        truncate=True,
    )

with sns.plotting_context("paper"):
    sns.lmplot(
        x="sepal_length",
        y="sepal_width",
        data=iris,
        hue="species",
        height=5,
        truncate=True,
    )

with sns.plotting_context("talk"):
    sns.lmplot(
        x="sepal_length",
        y="sepal_width",
        data=iris,
        hue="species",
        height=5,
        truncate=True,
    )

with sns.plotting_context("poster"):
    sns.lmplot(
        x="sepal_length",
        y="sepal_width",
        data=iris,
        hue="species",
        height=5,
        truncate=True,
    )

## Key Points

* `matplotlib` adds plotting functionality to your Python codes
* `seaborn` makes plotting lots of data very quick and easy
* `matplotlib` can be used to modify plots produced by `seaborn`
* `sns.plotting_context()` can be used to create different plots for different purposes
* Knowing how to plot exactly what you want will come with time, practice and a bit of on-line searching!

## Any Bugs/Issues/Comments?

If you've found a bug or have any comments about this notebook, please fill out this on-line form: https://forms.gle/tp2veeF8e7fbQMvY6.

Any feedback we get we will try to correct/implement as soon as possible.