<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Time Series: Decomposition

Splitting a time series into several components is useful for both understanding the data and diagnosing the appropriate forecasting model. Each of these components will represent an underlying pattern. 

- **Trend**: A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes, we will refer to a trend “changing direction” when, for example, it might go from an increasing trend to a decreasing trend.

- **Seasonal**: A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.

- **Residual**: The leftover or error component.

A time series can also have **cyclical** components that repeat at irregular intervals, such as what are often called "business cycles." Our decompositions will not have cyclical components.

### Guided Practice

We are going to play around with some bus data from Portland, Oregon. Load in the data set below and check it out.

In [None]:
import datetime
import dateutil

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

In [None]:
bus = pd.read_csv('../assets/data/bus.csv')
bus.head()

In [None]:
bus.tail()

We'll need to clean this data a little. Let's simplify the names of the columns. There are also a couple of bad rows at the end of the file. We'll get rid of those. Additionally, we need to make the `riders` column an integer. 

In [None]:
# Drop the last two rows


In [None]:
# Change the column names to "month" and "riders"


In [None]:
# Cast "riders" to int


In [None]:
bus.head()

In [None]:
bus.tail()

In [None]:
# Convert "month" to datetime and set it as an unnamed index


### Plot the raw data.

We can look at the raw data first. Let's plot the time series.

## Decompose the time series and plot using the `.seasonal_decompose()` function.

Using the `seasonal_decompose()` function, we can break the time series into its constituent parts.

Use the function on the `riders` data with a frequency of `12`, then plot the data. We're using a frequency of 12 because the data are monthly.

The decomposition object from `seasonal_decompose()` has a `.plot()` function, like with Pandas DataFrames.

In [None]:
# Import seasonal_decompose


In [None]:
# Apply seasonal_decompose to the bus data and plot the result


If you don't specify a frequency then `seasonal_decompose` will infer in. In this case, it seems clear that we should expect 12-month seasonality, so putting it in by hand makes sense.

### Plot a single component of the decomposition plot.

We can pull out just one component of the decomposition plot.

In [None]:
# Plot just the seasonal component


In [None]:
# Plot just the seasonal component


In [None]:
# Plot just the residuals


## Examining the residuals and their ACF and PACF.

Let's examine the residuals of our data.

In [None]:
# Import plot_acf and plot_pacf


In [None]:
# Plot the ACF of the residuals


In [None]:
# Compare to ACF of original aseries


In [None]:
# Plot the PACF of the residuals


In [None]:
# Compare to PACF of original data


We notice that the the biggest autocorrelationß in the original data are not present in the residuals, because they have been captured in the seasonal component of the decomposition.

# Recap

* Trend is a long-term change in the data. 
* Seasonality is a pattern of a fixed period that repeats in the data. 
* Residuals are the error components of the data.
* StatsModels contains a `seasonal_decompose()` function that breaks a time series into its components.

# Cool Example

["Slick Time Series Decomposition of the Birthdays Data"](https://andrewgelman.com/2012/06/19/slick-time-series-decomposition-of-the-birthdays-data/)

**Exercise (10 mins., pair programming)**

In [None]:
airline = pd.read_csv('../assets/data/airline.csv')
airline.head()

In [None]:
airline.tail()

- Get rid of the last row

- Rename the columns "month" and "passengers", respectively.

- Cast "passengers" to int

- Convert "month" to datetime.

- Make "month" the index, and drop its name.

- Make a lineplot of the number of passengers over time.

- Decompose the time series using the `.seasonal_decompose()` function with a seasonality frequency of 12, and plot the result.

- Interpret these plots.

$\blacksquare$