<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>Basic Time Series Plotting</h1>
<h3>Unidata Python Workshop</h3>

<div style="clear:both"></div>
</div>

<hr style="height:2px;">

<div style="float:right; width:250 px"><img src="http://matplotlib.org/_images/date_demo.png" alt="METAR" style="height: 300px;"></div>


## Overview:

* **Teaching:** 45 minutes
* **Exercises:** 30 minutes

### Questions
1. How can we read data with Pandas?
1. How are plots created in Python?
1. What features does Matplotlib have for improving our time series plots?
1. How can multiple y-axes be used in a single plot?

### Objectives
1. <a href="#loaddata">Reading in data</a>
1. <a href="#basictimeseries">Basic timeseries plotting</a>
1. <a href="#multiy">Multiple y-axes</a>

<a name="loaddata"></a>
## Reading in Data
To learn about time series analysis, we first need to find some data and get it into Python. In this case we're going to use a file that was downloaded from the [National Data Buoy Center](http://www.ndbc.noaa.gov). Specially we're going to look at [buoy 41056](http://www.ndbc.noaa.gov/station_page.php?station=41056) as hurricane Irma passed over it.

We'll use the [pandas](http://pandas.pydata.org) library for our data reading and modification as it provides a convenient way to subset and manipulate data. The data does not come in an easily usable format from the NDBC, so it's a good chance to get our hands dirty with real world data manipulation and time series plotting.

First, let's start out by reading the text file into a pandas dataframe. If we look at the file we can see it's in a "fixed-width" format - i.e. each column has the same number of characters always.

```
#YY  MM DD hh mm WDIR WSPD GST  WVHT   DPD   APD MWD   PRES  ATMP  WTMP  DEWP  VIS PTDY  TIDE
#yr  mo dy hr mn degT m/s  m/s     m   sec   sec degT   hPa  degC  degC  degC  nmi  hPa    ft
2017 09 21 19 00 140  8.0 11.0   1.1     6    MM  93 1009.0  28.5    MM    MM   MM -1.0    MM
2017 09 21 18 00 140  8.0 10.0   1.1     6    MM  90 1009.5  28.6    MM    MM   MM -1.3    MM
2017 09 21 17 00 150  8.0 11.0   1.2     7    MM  90 1010.1  28.6    MM    MM   MM -0.4    MM
2017 09 21 16 00 130  8.0 11.0   1.1     6    MM  89 1010.0  28.5    MM    MM   MM -0.4    MM
2017 09 21 15 00 140  9.0 11.0   1.1     6    MM 109 1010.8  28.8    MM    MM   MM +1.0    MM
```

The data columns are year, month, day, hour, minute, wind direction, wind speed, wind gust, wave height, dominant wave period, domininant wave direction, pressure, air temperature, water temperature, dewpoint, visibility, pressure tendency, and tide. As you can see, this buoy does not have all of those sensors, so some columns are filled with `MM`, representing missing data.

In [None]:
fname = '41056.txt'

In [None]:
import pandas as pd
df = pd.read_fwf(fname)

In [None]:
df

Getting the data read was pretty easy, but we immediatly see that we've got some cleanup to do. The header row contains column names that are less than ideal. The first data row is actually a row of units as well. We also notice that the date is broken up between multiple columns. It would be nice to have that as one timestamp that is a Python datetime object. Finally, we need to replace `MM` with `NaN`. Luckily these tasks are not too onerous with pandas.

In [None]:
# Much better column names, remember to be descriptive and use tab completion when using these!
col_names = ['year', 'month', 'day', 'hour', 'minute', 'wind_direction', 'wind_speed',
             'wind_gust', 'wave_height', 'dominant_wave_period', 'average_wave_period',
             'dominant_wave_direction', 'pressure', 'temperature', 'water_temperature', 'dewpoint',
             'visibility', '3hr_pressure_tendency', 'water_level_above_mean']

In [None]:
df = pd.read_fwf(fname, skiprows=2, na_values='MM', names=col_names)

While we're manupulating the data frame, let's get rid of the columns with all missing data. We could use the `drop` method and manually name all of the columns, but that would require us to know which are all `NaN` and that sounds like manual labor - something that programmers hate. Pandas has the `dropna` method that allows us to drop rows or columns where any or all values are `NaN`. In this case, let's drop all columns with all `NaN` values.

In [None]:
df = df.dropna(axis='columns', how='all')

In [None]:
df.head()

Next, let's get the time stamps fixed up nicely. We need to combine the columns `year` `month` `day` `hour` and `minute` into a single column called `time`. We could cast all of these columns as strings, build the date time stamp string, then parse that, but that's a lot of steps! Looking in the documentation, we see that `parse_dates` can do all that for us. Here's an example of combining the `year` and `month` columns.

In [None]:
df = pd.read_fwf(fname, skiprows=2, na_values='MM', names=col_names)
df['time'] = pd.to_datetime(df[['year', 'month', 'day']])

In [None]:
df.head()

<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
      <li>Read the data in again, but this time use all of the time stamp columns.</li>
      <li>Use the <code>drop</code> method to remove the now unused columns for year,
          month, day, hour, and minute. <b>HINT</b>: Look at the <code>axis</code> keyword
          argument in the documentation.</li>
    </ul>
</div>

In [None]:
# Your code goes here


In [None]:
# %load solutions/timeseries_parse_dates.py

<div class="alert alert-info">
    <b>TIP</b>:
    Many of the pandas functions have the <code>inplace</code> keyword argument. This allows us to modify the dataframe without continually needing to reassign it. <code>df = df.command(...)</code> becomes <code>df.command(..., inplace=True)</code>.
</div>

Finally, we need to trim down the data. The file contains 45 days worth of observations. We don't want to trim it too tightly and miss interesting things surroudning the hurricane's landfall, but having all 45 days is a bit overkill. Let's trim the data to dates between (and including) 9/18-9/23.

In [None]:
from datetime import datetime
idx = (df.time >= datetime(2017, 9, 6)) & (df.time <= datetime(2017, 9, 8))
df = df[idx]
df.head()

We're almost ready, but now the index column is not that meaningful. It starts are row 306, which is fine with our initial file, but let's re-zero the index so we have a nice clean data frame to start with.

In [None]:
df.reset_index(drop=True, inplace=True)
df.head()

<a href="#top">Top</a>
<hr style="height:2px;">

<a name="basictimeseries"></a>
## Basic Timeseries Plotting

Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. We're going to learn the basics of creating timeseries plots with matplotlib by plotting buoy wind, gust, and pressure data.

In [None]:
# Convention for import of the pyplot interface
import matplotlib.pyplot as plt

# Set-up to have matplotlib use its support for notebook inline plots
%matplotlib inline

# Register pandas converters with matplotlib
from pandas.tseries import converter
converter.register()

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))

# Specify how our lines should look
ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')

# Same as above
ax.set_xlabel('Time')
ax.set_ylabel('Speed')
ax.set_title('Buoy 41056 Wind Data')
ax.grid(True)
ax.legend(loc='upper left')

In [None]:
# Helpers to format and locate ticks for dates
from matplotlib.dates import DateFormatter, DayLocator

# Set the x-axis to do major ticks on the days and label them like '07/20'
ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))

fig

<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
    <li>Add a yellow line with the gust speed. Set the <code>linestyle</code> keyword argument to <code>--</code>
        to produce a dashed line.</li>
    <li>Redisplay the legend on the plot to show your new wind gust line</li>
    <li>Change the x-axis major tick labels to read 'Sep DD' where DD is the day number. Look at the
        <a href="https://docs.python.org/3.6/library/datetime.html#strftime-and-strptime-behavior">
            table of formatters</a> for help.
    </ul>
</div>

<div class="alert alert-info">
    <b>Tip</b>:
     If your figure goes sideways as you try multiple things, try running the notebook up to this point again
     by using the Cell -> Run All Above option in the menu bar.
</div>

In [None]:
# Your code goes here


In [None]:
# %load solutions/timeseries_gustplot.py

<a href="#top">Top</a>
<hr style="height:2px;">

<a name="multiy"></a>
## Multiple y-axes
What if we wanted to plot another variable in vastly different units on our plot?

In [None]:
ax.plot(df.time, df.pressure, color='black', label='Pressure')
ax.set_ylabel('Pressure')

ax.legend(loc='upper left')

fig

That is less than idea. We can't see detail in the data profiles! We can create a twin of the x-axis and have a secondary y-axis on the right side of the plot. We'll create a totally new figure here.

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
axb = ax.twinx()

# Same as above
ax.set_xlabel('Time')
ax.set_ylabel('Speed')
ax.set_title('Buoy 41056 Wind Data')
ax.grid(True)
ax.legend(loc='upper left')

# Plotting on the first y-axis
ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')
ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')

# Plotting on the second y-axis
axb.set_ylabel('Pressure')
axb.plot(df.time, df.pressure, color='black', label='pressure')

ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b %d'))

axb.legend(loc='upper left')

We're closer, but the data are plotting over the legend and not included in the legend. That's because the legend is associated with our primary y-axis. We need to append that data from the second y-axis.

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
axb = ax.twinx()

# Same as above
ax.set_xlabel('Time')
ax.set_ylabel('Speed')
ax.set_title('Buoy 41056 Wind Data')
ax.grid(True)
ax.legend(loc='upper left')

# Plotting on the first y-axis
ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')
ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')

# Plotting on the second y-axis
axb.set_ylabel('Pressure')
axb.plot(df.time, df.pressure, color='black', label='pressure')

ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b %d'))

# Handling of getting lines and labels from all axes for a single legend
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = axb.get_legend_handles_labels()
axb.legend(lines + lines2, labels + labels2, loc='upper left')

<div class="alert alert-success">
    <b>EXERCISE</b>:
    Create your own plot that has the following elements:
     <ul>
    <li>A blue line representing the wave height measurements.</li>
    <li>A green line representing wind speed on a secondary y-axis</li>
    <li>Proper labels/title.</li>
    <li>**Bonus**: Make the wave height data plot as points only with no nice. Look at the documentation for the linestyle and marker arguments.</li>
    </ul>
</div>

In [None]:
# Your code goes here


In [None]:
# %load solutions/timeseries_basicfinalplot.py

<a href="#top">Top</a>
<hr style="height:2px;">