# Introduction #

In the previous two lessons, we learned how to model the temporally-dependent nature of time series through trend and seasonality.

In this lesson, we'll learn how to model the serially-dependent nature of time series, how the past can affect the future.

# Cycles #

In addition to trend and seasonality, time series can display **cyclic** behavior. Cycles are fluctuations in a time series associated with how the value in a series at one time depends on values at previous times.

Mean reversion.

Cyclic behavior is characteristic of a system that reacts on itself.

What distinguishes cycles from seasons is irregularity. Cycles might be irregular either in duration or in magnitude.

<img>examples of cyclic behavior</img>

- business cycles
- predator-prey cycles
- sunspot cycles
- flu cycles
- chemical cycles

It's possible for series to have trend and seasonality in addition to cyclic behavior, though distinguishing the three can sometimes be difficult: very long cycles can resemble trend and very regular cycles can resemble seasons. For the purposes of creating a forecasting model, however, the distinction may not matter so long as you're able to capture the behavior of the series through the forecasting period.

The features we created for trend and seasonality are sometimes called *deterministic*. (You may have noticed they came from the `deterministic` module in statsmodels.) A deterministic feature doesn't need to be forecast: it's values are completely determined ahead of time. The trend and seasonality features are deterministic because they only depend on the calendar, which is fixed.

How something is presently can often tell you a lot about what it will be like in the future.

It can also happen that the fluctations don't decay back towards a mean. In this case, you have a *highly-persistant* series. We'll discuss how to handle these kinds of series in Lesson 7.

# Serial Dependence and Lag Features #

The most common kind of nondeterministic feature is the target itself, or rather, values of the target from the past. If you wanted to predict the size of a fish population this year, you could include as a predictor the size of the population the previous year.

<mark>table</mark>

Notice that we've actually created a new column with shifted values of `Population`. The target has now re-entered the model as a feature. Features created from time-shifted series in this way are called **lag** features. Most nondeterministic features are lag features.

One way to determine if your target has a temporal relationship to a feature is by plotting it against the lagged feature.

<note>TODO: lag plots</note>
<figure style="padding: 1em;">
<img src="" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center></center></figcaption>
</figure>


Correlation of a time series with past-values of itself is called *autocorrelation*. A plot of a series' autocorrelation as a function of the lag is called an *ACF plot*. Correlation of a time series with another time series is called *cross-correlation*.

There are two kinds of plots you can use to determine which lag features to create from the target. The autocorrelation plot simply shows you the correlation a series has with each of its lags.

A *partial autocorrelation* plot, on the other hand, only shows you the *additional* correlation coming from each lag. A partial autocorrelation plot will tell you which lag features to create. For feature engineering purposes, a 

<note>TODO: acf & pacf</note>
<figure style="padding: 1em;">
<img src="" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center></center></figcaption>
</figure>

The most useful situation is when you have a time series feature that's correlated with *future* values of the target -- when, in other words, knowing the feature today can tell you about your target tomorrow. These kinds of features are called **leading indicators**. A leading indicator gives you advance notice of potential changes in what you're trying to forecast. For example, online product reviews might be a leading indicator of future business success.

A cross-correlation plot illustrates the relationship:

<figure style="padding: 1em;">
<img src="" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center></center></figcaption>
</figure>

# Stationarity #

A time series is **stationary** when its statistical properties are unchanging from step to step. Series with trend or seasonality are *not* stationary: trend describes a mean that is moving upwards or downwards, while seasonality describes a mean that varies periodically. (Remember the moving-average and seasonal plots.)

# Modeling Autocorrelation #

A model that includes past values of the target as features is called an **autoregressive** model. 

- **distributed lag model**

Extending our linear regression model with a lagged target, we would have:

```
target = a * target_lag_1 + b * trend + c * seasonality
```

# Example - Flu Trends #

The *Flu Trends* dataset contains records of doctor's visits for the flu together with the popularity of flu-related search terms as captured by Google Trends, for weeks between 2009 and 2016. Our goal is to forecast the number of flu cases for the coming weeks.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.signal import periodogram
from sklearn.linear_model import LinearRegression
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.ar_model import ar_select_order
from statsmodels.tsa.tsatools import lagmat

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=16,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)


def plot_ccf(x, y, lags=10, ax=None):
    from matplotlib.ticker import MaxNLocator
    if ax is None:
        fig, ax = plt.subplots()
    ax.xcorr(
        x, y, maxlags=lags, usevlines=True, normed=True, lw=2, color="C3",
    )
    ax.xcorr(
        x, y, maxlags=lags, usevlines=False, normed=True, ms=4, color="C3",
    )
    # ax.set_xticks(np.arange(-lags, lags + 1, dtype=np.int8))
    ax.xaxis.set_major_locator(MaxNLocator(integer=True))
    ax.set(
        xlabel="Lag", ylabel="Correlation", title="Cross-Correlation",
    )
    return ax


data_dir = Path("../input/ts-course-data")
flu_trends = pd.read_csv(data_dir / "flu-trends.csv")
flu_trends.set_index(
    pd.PeriodIndex(flu_trends.Week, freq="W"),
    inplace=True,
)
flu_trends.drop("Week", axis=1, inplace=True)

flu_trends.FluVisits.plot(subplots=True, title="Flu Visits", **plot_params);

This data shows irregular cycles instead of a regular seasonality: the peak tends to occur around the new year, but sometimes earlier or later, sometimes larger or smaller. This irregularity means that modeling this series with seasonal features won't likely be very successful. Instead, we'll use lag features. This way, the model can learn to react as the case numbers rise and fall.

We'll look at two models. In the first, we'll only use lagged values of the series itself as features. In the second, we'll only use lagged values of Google Trends.

Let's take a look at the lag and autocorrelation plots first:

In [None]:
fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 7), sharex=True)
_ = plot_acf(flu_trends.FluVisits, ax=ax0)
_ = plot_pacf(flu_trends.FluVisits, ax=ax1)

These show an AR effect at lags 1, 2, 3, and 4, which we can verify with AR selection:

In [None]:
ar_res = ar_select_order(flu_trends.FluVisits, maxlag=8)
print(ar_res.ar_lags)

Now let's create the corresponding lag features:

In [None]:
def add_lags(X, y, lags=[1]):
    X_lag = lagmat(y, maxlag=max(lags), use_pandas=True, trim="both")
    X_lag = X_lag.iloc[:, [lag - 1 for lag in lags]]
    X = X.join(X_lag).dropna()
    y = y.loc[X.index]
    return X, y


X = pd.DataFrame(index=flu_trends.index)
y = flu_trends.FluVisits

X, y = add_lags(X, y, lags=[1, 2, 3, 4])

X.head()

And now we'll fit a model and look at the regression curve and residuals:

In [None]:
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
y_pred = pd.Series(y_pred, index=y.index)

ax = y.plot(**plot_params)
_ = y_pred.plot(ax=ax)

The model does seem to be unable to anticipate when the case numbers make sudden changes in direction though, and so its predictions are always a week behind.

In [None]:
ax = y["2014-11-01":"2015-03-01"].plot(**plot_params)
_ = y_pred["2014-11-01":"2015-03-01"].plot(ax=ax)

Now let's imagine an alternate scenario. Let's suppose that data about flu visits is unavailable. Instead, we'll use search term related to the flu.

We can anticipate these changes if we use a leading indicator. Let's look at a cross correlation plot to see if the Google Trends feature could provide this:

In [None]:
visit_diff = flu_trends.FluVisits.diff().dropna()
symptoms_diff = flu_trends.FluSymptoms.diff().dropna()
_ = plot_ccf(visit_diff, symptoms_diff, lags=12)

Sure enough, the rise in Google Trends tends to lead the rise in doctors' visits by a week or two. 

Let's create the lag features and fit a model to see how well we can fit case numbers through search-term popularity:

In [None]:
X = pd.DataFrame(index=flu_trends.index)
y = flu_trends.FluVisits.copy()

X, _ = add_lags(X, flu_trends.drop("FluVisits", axis=1), lags=[1])
y = y[X.index]

model = LinearRegression(normalize=True)
model.fit(X, y)
y_pred_2 = model.predict(X)
y_pred_2 = pd.Series(y_pred_2, index=y.index)

ax = y.plot(**plot_params)
_ = y_pred_2.plot(ax=ax)

In [None]:
ax = (y - y_pred).plot()
_ = (y - y_pred_2).plot(ax=ax)

Comparing the residuals, we can see that the fitted predictions...

# Your Turn #