# Introduction #

In the previous two lessons, we learned how to model the time-dependent nature of time series through trend and seasonality.

In this lesson, we'll learn how to model the serially-dependent nature of time series, how the past can affect the future.

# Cycles #

In addition to trend and seasonality, time series can display **cyclic** behavior. Cycles are fluctuations in a time series associated with how the value in a series at one time depends on values at previous times. Cyclic behavior is characteristic of systems that can affect themselves, like economies, disease outbreaks, animal populations, and natural phenomena.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/rQEt43N.png" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center>Time series with cyclic behavior.
</center></figcaption>
</figure>

The fluctuating patterns of seasonality and cyclic behavior appear similar. What distinguishes cyclic behavior is often its irregularity. As you can see from the plots above, cycles can vary in magnitude and duration, apparently at random. Seasonality, on the other hand, tends to conform rigidly to particular dates and times. What happens in a cycle is less about the particular time, and more about what has happened just before. The distinction isn't strict, however, and the fluctuations in some series could just as well be modeled either as seasons or cycles, or even some combination of the two.

Cycles often follow a pattern of accumulation and decay back to a baseline or else fluctuations around a mean. Sometimes, though, a series can continually accumulate change without ever "mean reverting", which makes it difficult to model. Often the changes themselves will show cyclic behavior, and we might prefer to forecast the series of changes instead.

<figure style="padding: 1em;">
<img src="" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center>Change series.
</center></figcaption>
</figure>

# Lag Plots and the Lag Embedding #

<note>Reorganize. Do 1. lag plots, 2. interpret lag plots, 3. lag embedding, 4. diff + lag plot</note>

We can help our model capture serial dependence by using the "lags" of a time series as features. "Lagging" a time series means altering the series so that its observations appear to have occured with a delay. Shifting the observations forward by one step is equivalent to adding one step to the time index: either operation would give you the "1st lag" of the series. 

<img>lag diagram?</img>

Lagged features make information from previous time steps available in later time steps. The goal is to include as many lags as needed to give each time step a "complete set" of information from the past. Such a complete set of lags is known as a **lag embedding** for the time series.

<note>table: lag embedding of fluview</note>

In principal information could persist indefinitely from step to step, but practically influence from previous steps will decay so quickly that no more than a few lags are needed to capture everything of importance. (Using moving averages as features can be beneficial, too.)

Visualizations can help determine which lags should go in the embedding. A great place to start is with a lag plot. A **lag plot** is simply a scatter plot of the time series against one of its lags:

<figure style="padding: 1em;">
<img src=";https://i.imgur.com/wGAtQIe.png" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center>
</center></figcaption>
</figure>

To see time-dependent properties (like trend and seasonality), we plotted time series against the time index, like in the moving-average and seasonal plots. The goal of those sections was essentially to fit curves to those plots, which we accomplished by creating new kinds of features. The goal of modeling serial dependence is essentially to fit curves to lag plots.

The **autocorrelation** of a time series is the correlation a series has with its lags. Correlation is a measure of *linear* dependence, so the autocorrelations tell you how well a line can fit each of the lag plots. (We can see in the plots above that this sometimes isn't very well.)

*Non*-linear relationships seem to be the rule rather than the exception in real-world time series, so it's a good idea to look at the complete lag plot when designing your forecasting model, instead of just the autocorrelations. (We'll learn how to capture these kinds of non-linear relationships with XGBoost in Lesson 6.)

Whenever changes in a series persist for a while, it can be hard to tell which lags really contain new information, and which just contain information that has carried over from previous lags. A lag plot of *changes* in the series can help distinguish which terms are really informative.

<figure style="padding: 1em;">
<img src=";https://i.imgur.com/wGAtQIe.png" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center>
</center></figcaption>
</figure>

From these plots it appears the first two lags are the most informative ones.

# Example - Flu Trends #

The *Flu Trends* dataset contains records of doctor's visits for the flu together with the popularity of flu-related search terms as captured by Google Trends, for weeks between 2009 and 2016. Our goal is to forecast the number of flu cases for the coming weeks.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.signal import periodogram
from sklearn.linear_model import LinearRegression
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.ar_model import ar_select_order
from statsmodels.tsa.tsatools import lagmat

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=16,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)


def plot_ccf(x, y, lags=10, ax=None):
    from matplotlib.ticker import MaxNLocator
    if ax is None:
        fig, ax = plt.subplots()
    ax.xcorr(
        x, y, maxlags=lags, usevlines=True, normed=True, lw=2, color="C3",
    )
    ax.xcorr(
        x, y, maxlags=lags, usevlines=False, normed=True, ms=4, color="C3",
    )
    # ax.set_xticks(np.arange(-lags, lags + 1, dtype=np.int8))
    ax.xaxis.set_major_locator(MaxNLocator(integer=True))
    ax.set(
        xlabel="Lag", ylabel="Correlation", title="Cross-Correlation",
    )
    return ax


data_dir = Path("../input/ts-course-data")
flu_trends = pd.read_csv(data_dir / "flu-trends.csv")
flu_trends.set_index(
    pd.PeriodIndex(flu_trends.Week, freq="W"),
    inplace=True,
)
flu_trends.drop("Week", axis=1, inplace=True)

flu_trends.FluVisits.plot(subplots=True, title="Flu Visits", **plot_params);

This data shows irregular cycles instead of a regular seasonality: the peak tends to occur around the new year, but sometimes earlier or later, sometimes larger or smaller. This irregularity means that modeling this series with seasonal features won't likely be very successful. Instead, we'll use lag features. This way, the model can learn to react as the case numbers rise and fall.

We'll look at two models. In the first, we'll only use lagged values of the series itself as features. In the second, we'll only use lagged values of Google Trends.

Let's take a look at the lag and autocorrelation plots first:

In [None]:
fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 7), sharex=True)
_ = plot_acf(flu_trends.FluVisits, ax=ax0)
_ = plot_pacf(flu_trends.FluVisits, ax=ax1)

These show an AR effect at lags 1, 2, 3, and 4, which we can verify with AR selection:

In [None]:
ar_res = ar_select_order(flu_trends.FluVisits, maxlag=8)
print(ar_res.ar_lags)

Now let's create the corresponding lag features:

In [None]:
from statsmodels.tsa.tsatools import lagmat

y = flu_trends.FluVisits
X = lagmat(y, maxlag=4, use_pandas=True)

X.head()

And now we'll fit a model and look at the regression curve and residuals:

In [None]:
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
y_pred = pd.Series(y_pred, index=y.index)

ax = y.plot(**plot_params)
_ = y_pred.plot(ax=ax)

The model does seem to be unable to anticipate when the case numbers make sudden changes in direction though, and so its predictions are always a week behind.

In [None]:
ax = y["2014-11-01":"2015-03-01"].plot(**plot_params)
_ = y_pred["2014-11-01":"2015-03-01"].plot(ax=ax)

Now let's imagine an alternate scenario. Let's suppose that data about flu visits is unavailable. Instead, we'll use search term related to the flu.

We can anticipate these changes if we use a leading indicator. Let's look at a cross correlation plot to see if the Google Trends feature could provide this:

In [None]:
visit_diff = flu_trends.FluVisits.diff().dropna()
symptoms_diff = flu_trends.FluSymptoms.diff().dropna()
_ = plot_ccf(visit_diff, symptoms_diff, lags=12)

Sure enough, the rise in Google Trends tends to lead the rise in doctors' visits by a week or two. 

Let's create the lag features and fit a model to see how well we can fit case numbers through search-term popularity:

In [None]:
X = pd.DataFrame(index=flu_trends.index)
y = flu_trends.FluVisits.copy()

X, _ = add_lags(X, flu_trends.drop("FluVisits", axis=1), lags=[1])
y = y[X.index]

model = LinearRegression(normalize=True)
model.fit(X, y)
y_pred_2 = model.predict(X)
y_pred_2 = pd.Series(y_pred_2, index=y.index)

ax = y.plot(**plot_params)
_ = y_pred_2.plot(ax=ax)

In [None]:
ax = (y - y_pred).plot()
_ = (y - y_pred_2).plot(ax=ax)

Comparing the residuals, we can see that the fitted predictions...

# Your Turn #