# Introduction #

In this lesson and the next, you'll learn how add trend and seasonal components to any machine learning model through feature engineering. This lesson is about modeling trend.

# Moving Average Plots #

The **trend** of a time series is a long-term change in the *level* of the series (the value of the observations, in other words). For a change to be a part of the trend, it should occur over a longer period than any seasonal effects. If a time series only had weekly seasonality, for instance, any change over a period longer than seven days would be a part of the trend.

We can make the trend in a series more apparent by averaging over the seasonal periods in a **moving average plot**. With a moving average plot, we compute the average of values within a *sliding window*. Each point on the graph represents the average of all the points that fall within the window on either side.

<img>CO2 yearly MA animation</img>

To make the trend in a series apparent, we should choose a window size at least as large as the largest seasonal period. The series in the plot above has a yearly seasonality. To see its trend, we therefore chose a moving window of size 12.

Plots like these can help you determine what kind of trend you want your model to capture, whether a straight-line linear trend is sufficient, or whether a curved quadratic trend or something with changepoints would be better.

# Trend Models #

In this course, we'll create features that capture important temporal patterns, and use scikit-learn's `LinearRegression` model to combine these features into an additive model very much like Prophet.

```
target = trend + remainder
```

A common way to model time dependence is through a **time dummy**. A time dummy is a feature that . It's essentially an ordinal encoding (aka, label encoding) for a column of timestamps. It's important when creating a time dummy for linear regression that the codes are spaced in proportion to the timestamps. If your time series skips a date, for instance, your time dummy should skip the corresponding code:

<fig>Show time dummy</fig>

Leaving off the remainder, a linear trend model as this form:

```
target = level + slope * time
```

The *CO2* dataset exhibits a strong linear trend.

<img>CO2 with linear trend</img>

Time series with a curving trend can be modeled with *polynomial features*. Create polynomial features by taking powers of the time dummy. To fit a quadratic trend, for instance, you would add a squared feature.

<img>quadratic trend</img>

Higher-order polynomials often change rapidly outside of the data they were fit to. This means it's rarely a good idea to use polynomial trend models with degree higher than 2 or perhaps 3, especially for long-term forecasts.

<img>diverging polynomials</img>

For time series with changing trajectories, it's better to use a model that can capture changepoints.

# Trend with Changepoints #

- *Dutch Sales*

# Other Kinds of Trend #

- *Motorcycles* (irregular)
- *Height of River* (no trend)
- *Pepper Prices* (irregular/no trend)
- *Marathon Times* (saturating)
- *Cement* (with changepoints)

# Example - Tunnel Traffic #

This hidden cell loads the data for this example. Note how we've created a time series by setting the index to a series of dates. Though not required for the exercise, you might like to examine the code for details.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=14,
    titlepad=10,
)

plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)


data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
# Create a time series in Pandas by setting the index to a date column
tunnel = tunnel.set_index("Day")
# By default, Pandas creates `DatetimeIndex` with dtype `datetime64`.
# A `PeriodIndex` is easier to work with when creating features, however.
tunnel = tunnel.to_period()

tunnel.head()

First we'll look at a moving average plot. Since this series has an annual seasonality, we'll use a 365-day window to estimate the trend. In Pandas, use the `rolling` method to create windowed computations, followed by the `mean` method to compute the average.

In [None]:
moving_average = tunnel.rolling(
    window=365,
    center=True,  # otherwise the trend will be offset
    min_periods=183,  # choose about half the window size
).mean()

ax = tunnel.plot(style=".", color="0.5")
moving_average.plot(ax=ax, linewidth=3, title="365-Day Moving Average", legend=False);

The plot suggests that the trend is fairly close to linear, so that's the kind of feature we'll create.

Now we'll create the trend features. `CalendarTimeTrend` from the `statsmodels` library makes it easy to create trend features in a robust way. In particular, it will give the correct encoding even when a series has missing dates.

In [None]:
from statsmodels.tsa.deterministic import CalendarTimeTrend

ctt = CalendarTimeTrend(
    freq="D",  # daily timestamps
    order=1,  # order 1 for linear, order 2 for quadratic
    constant=True,  # to also include the level (the default)
    base_period=tunnel.index[0],  # what date should be coded as 1
)
X = pd.DataFrame(
    ctt.in_sample(tunnel.index),  # use the `in_sample` method to create the features
    index=tunnel.index,
)
y = tunnel["NumVehicles"]  # the target

X

Notice the features `CalendarTimeTrend` created. The `const` feature is what models the level, and the `trend` feature is what models the trend.

Now let's create the trend model. As mentioned before, we'll use `LinearRegression` from scikit-learn to create an additive model. For now, we'll just look at the predictions on the training data, and save forecasting for our in-depth look in Lesson 5.

In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression(fit_intercept=False)  # the intercept is the same as the `const` feature
model.fit(X, y)

y_pred = pd.Series(
    model.predict(X),
    index=tunnel.index,
)


ax = tunnel.plot(color="0.25", style=".", title="Tunnel Traffic - Linear Trend")
y_pred.plot(ax=ax, linewidth=3, label="Trend");

The trend discovered by our `LinearRegression` model is almost identical to the moving average plot, which suggests that a linear trend was the right decision in this case.

# Your Turn #