# Decomposing Time Series

In [None]:
import matplotlib.pyplot as plt
from statsmodels.datasets import co2
from statsmodels.tsa.seasonal import STL

In [None]:
data = co2.load().data

In [None]:
data

In [None]:
data.plot()

Looking at the above, there are clearly some missing values.

In [None]:
data.isna().sum()

In [None]:
data = data.resample('M').mean().ffill()

In [None]:
data.plot()

In [None]:
data.isna().sum()

There are several methods in the statsmodels library for decomposing time series.

In [None]:
res = STL(data).fit()
res.plot()
plt.show()

In [None]:
data

In [None]:
data[1:24].plot()

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf, month_plot, quarter_plot

print('Autocorrelation')
acf_plot = plot_acf(data['co2'], lags=24, title='Autocorrelation')
pacf_plot = plot_pacf(data['co2'], lags=24, title='Partial Autocorrelation')

If you're looking at a model, a plot of the autocorrelation of the residuals will tell you whether some information or variables may be missing in your model.

The residuals for the CO2 data do not have a significant autocorrelation.

In [None]:
acf_plot = plot_acf(res.resid, lags=24, title='Autocorrelation')

But if we look at one of our earlier examples on fitting data with polynomials, we can see how the autocorrelation of the residuals indicates that one order of polynomial is not as good for modeling as another.

In [None]:
import numpy as np
import seaborn as sns

# generate 100 points from a normal 
# distribution that has mean = 0 and std dev = 3.5
np.random.seed(42)
noise = np.random.normal(0,3.5,100)

# make the data arrays (y = x^2 + noise)
x = np.linspace(0,10,100)
y = x**2 + noise
z1 = np.polyfit(x,y,1)
y1 = z1[1] + z1[0]*x
resid1 = y - y1
z2 = np.polyfit(x,y,2)
y2 = z2[2] + z2[1]*x + z2[0]*x**2
resid2 = y - y2

# plot them with order 1 and 2 fits
fig,ax = plt.subplots(2,2)

ax[0,0].plot(x,y,'ko')
ax[0,0].plot(x,y1,'b')
ax[0,1].plot(x,resid1,'ko')
ax[1,0].plot(x,y,'ko')
ax[1,0].plot(x,y2,'b')
ax[1,1].plot(x,resid2,'ko')

In [None]:
acf_plot = plot_acf(resid1, lags=24, title='Autocorrelation')

In [None]:
acf_plot = plot_acf(resid2, lags=24, title='Autocorrelation')

Back to the CO2 data.

We can look at long-term behavior for each of our shorter-term measures.

In [None]:
m_plot = month_plot(data['co2'])

What does this tell us?

How about quarterly data?

In [None]:
q_plot = quarter_plot(data['co2'])

We need to resample our data.  (This is a common time-series operation).

In [None]:
data_quarterly = data.resample('Q').mean()

In [None]:
data_quarterly

In [None]:
q_plot = quarter_plot(data_quarterly['co2'])