![Matplotlib logo](img/matplotlib.svg)

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from src.training import *
from src.matplotlib_exercises import *

# Subplots and Placement

We created some simple plots in the last module.  Let us start to arrange them with some helpful visual details.  One common device is to create multiple subplots that express different facets or ranges of the same dataset—or of related datasets.

## Using Pandas

Pandas builds in some access to basic subplots.  It is not as customizable as the underlying Matplotlib, but it is worth knowing about.

In [None]:
df = (pd.read_csv('data/verlegenhuken.csv', 
                  parse_dates=True, 
                  index_col='DATE')
      .sort_index()
      .asfreq('D')
      .interpolate()
     )
df.loc[:, ['TEMP', 'DEWP', 'WDSP']].plot();

In [None]:
df.loc[:, ['TEMP', 'DEWP', 'WDSP']].plot(subplots=True);

There are advantages and disadvantages to this style that are very specific to your domain and presentation focus.  But the simplest API is as presented, and it does "pretty well" with almost no work.

## Subplots with Matplotlib

Let us turn to using the pure-Matplotlib style for these graphs.  We will do several variations using the same data we quickly plotted with Pandas.

As a baseline, we simply want several arrays to work with, and will explicitly not use any Pandas special capabilities.  When we read in the data, we filled in missing days and missing values.  For the "x" axis of graphs we will convert the datetime value into an integer.  In a later module we look at datetime handling, but for now let us only use numbers.

In [None]:
days = df.index.dayofyear.values  # could be arange(1, 355)
temp = df.TEMP.values
dewp = df.DEWP.values
wdsp = df.WDSP.values
days[:5], temp[:5], dewp[:5], wdsp[:5]

### The .subplots() function

The most common way to arrange subplots in a grid is using `plt.subplots()`.  The function returns two things:

1. A figure object that defines the entire "canvas"
2. One or more "axis" objects that define the individual plots

In the object-oriented style, you will always work with these, rather than with the general `plt` magic module object. Remembering which methods and objects pertain to each is a matter of experience and reading documentation; but there is *some* sense to the patterns.

In [None]:
# Just one axis, set a size in notional "inches"
fig, ax = plt.subplots(figsize=(10,4))
# Put data in the axis object
ax.plot(days, temp)
ax.plot(days, wdsp)
# Modify label of axis object
ax.set_xlabel("Day of Year")
# Title of the entire graph
fig.suptitle('Temperature and wind speed at Verlegenhuken')
# Position from right/bottom as fraction
fig.text(0.55, 0.15, '(℉ and mph units occupy similar numeric range)')
plt.show()   # Outside Jupyter use this, harmless here

### Multiple subplots

It is funny to use `plt.subplots()` when you are only making one plot, but it makes the interface consistent.  A key difference comes in when you specify multiple rows or columns. Instead of getting one axis object, you get an array of axis objects. You may unpack that array either using Python tuple unpacking or later in a loop or the like.  The latter is useful especially if there is something dynamic about the number and details of plots to create.

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(10,4))
# Put data in the axis object
ax1.plot(days, temp, color="darkred")
ax2.plot(days, wdsp)
# Modify label of axis object
ax2.set_xlabel("Day of Year")
# Title of the entire graph
fig.suptitle('Temperature and wind speed at Verlegenhuken')
# Position from right/bottom as fraction
plt.show()   # Outside Jupyter use this, harmless here

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=(10,4))
# Put data in the axis object
axes[0].plot(days, temp, color="darkred")
axes[0].set_title("Temperature ℉")
axes[1].plot(days, wdsp)
axes[1].set_title("Wind speed (mph)")
# Title of the entire graph
fig.suptitle('Weather at Verlegenhuken')
# Position from right/bottom as fraction
plt.show()   # Outside Jupyter use this, harmless here

We can also specify both rows and columns of miniature plots.  This is often useful in faceting data, for example.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
axes[0, 0].hist(temp, density=True, bins=20, 
                orientation='horizontal', color='darkred')
axes[0, 0].set_title("Distribution")
axes[0, 0].set_ylabel("Temperature")
axes[0, 0].set_xlabel("Fraction of Days")
axes[0, 1].plot(days, temp, color='darkred')
axes[0, 1].set_title("Day of 2019")
axes[1, 0].hist(wdsp, orientation='horizontal')
axes[1, 0].set_ylabel("Wind Speed")
axes[1, 0].set_xlabel("Count of Days")
axes[1, 1].plot(days, wdsp)
fig.tight_layout()

## Non-Uniform Grids

Notice that we were able to place different kinds of plots inside the various subplot positions.  This applies equally with the plots generated by wrappers like Pandas or Seaborn, equally.  As well, we can create graphs of different sizes and shapes within the same overall figure.

In [None]:
with plt.style.context('ggplot'):
    fig = plt.figure(figsize=(10, 6))
    ax1 = plt.subplot2grid(shape=(2, 3), loc=(0, 0), colspan=3)
    ax1.set_title("Daily Trends over 2019")
    ax1.set_ylabel("temp ℉")
    ax1.plot(days, temp, label="temperature", color="darkred")
    ax1.plot(days, wdsp, label="wind speed", color="steelblue")
    ax1.plot(days, dewp, label="dew point", color="goldenrod")
    ax1.legend(loc=4)  # bottom right legend

    ax2 = plt.subplot2grid(shape=(2, 3), loc=(1, 0))
    ax2.hist(temp, color="darkred")
    ax2.set_title("Temperature Distribution", fontsize=10)

    ax3 = plt.subplot2grid(shape=(2, 3), loc=(1, 1))
    ax3.hist(wdsp, color="steelblue")
    ax3.set_title("Wind Speed Distribution", fontsize=10)
    
    # We use a Seaborn plot in this subplot
    ax4 = plt.subplot2grid(shape=(2, 3), loc=(1, 2))
    sns.kdeplot(dewp, color="goldenrod", shade=True, ax=ax4)
    ax4.set_title("Dew Point Kernel Density Estimate", fontsize=10)
    
    fig.tight_layout()

A similar subplot placement manager is in `matplotlib.gridspec.GridSpec`.  This class has a slightly higher level API, but is slightly less powerful.  Read the documentation to consider this option.

## Special Positions

We can also place axis objects manually within a figure, not following any grid.  This could be used for things like [sparklines](https://en.wikipedia.org/wiki/Sparkline) or other small component graphs. The most common use is probably for insets.

In [None]:
temp.max(), temp.argmax()

In [None]:
with plt.style.context('fivethirtyeight'):
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.plot(days, temp, color="darkred")
    ax.set_title("Daily Temperature Trend, 2019")
    # Position as X, Y, width, height relative to figure size
    inset = fig.add_axes([0.12, 0.65, 0.217, 0.2])
    sns.kdeplot(temp, color="red", shade=True, ax=inset)
    inset.grid(False)
    inset.yaxis.set_ticklabels([])
    inset.xaxis.set_ticklabels([])
    inset.set_title("KDE", fontsize=10)
    ax.text(213, 44, "← Day 209 was 44.3°", fontsize=10)

# Exercise

This module will have a single longer exercise, but with some optional elements to refine after the first pass.

* Choose 12 of the 1253 stations in the 2019 NOAA datast we have worked with.
  * Select 12 in whatever manner you like.  You might consider factors such as:
    * the lattitudes of each
    * the completeness of data for each
    * the nation(s) each fall within
    * interesting characteristics in the data about that station
  * Extract the data for the various selections into NumPy arrays

* Create 12 subplots arranged into 2 columns and 6 rows.

* Place a graph concerning each station within a subplot.
  * Mean daily temperature?
  * Max daily temperature?
  * Min daily temperature?
  * Other features of the daily data?

* Set appropriate elements of each subplot:
  * Title
  * X axis label
  * Y axis labe
  * Legend
  * Tick marks
  * Colors for each
  * Line styles
  * Fonts
  * Etc.

* Create an overall title and/or annotations for the entire figure.

* Create inset plots within each subplot to illustrate another aspect of the data.

* Extra Credit: Identify the highest/lowest temperatures within each series.
  * Create a visual mark to identify the day and number of those extremes
  * Textually label those notable points on the curves

In [None]:
url = ("https://bitbucket.org/davidmertz/sample-data/raw/"
       "61872271984f66e3094c367cf90dfc4875a22e8d/NOAA-2019-partial.csv.gz")
temperatures = pd.read_csv(url)
temperatures['DATE'] = pd.to_datetime(temperatures.DATE, format="%Y-%m-%d")

In [None]:
# Create the compound visualization described
# HINT: Creating small reusable functions to create similar 
#   elements across subplots is excellent programming!
...