# Introduction #

In this lesson and the next, you'll learn how add trend and seasonal components to any machine learning model through feature engineering. This lesson is about modeling trend.

# Trend #

The **trend** of a time series is a long-term change in the mean of the series. For a change to be part of the trend, it should occur over a longer period than any seasonal changes. For example, if a time series had weekly seasonality, any trend would have to be a change over a period longer than seven days.

In addition, we require that the trend should depend only on the time the observation was taken, that it should be a *time dependent* property of the series and not a serially dependent property. This is to distinguish a trend (which should be predictable) from a purely random accumulation (which is not predictable).

<img>Trend vs. random walk</img>

# Moving Average Plots #

We can make the trend in a series more apparent by averaging over the seasonal periods in a **moving average plot**. With a moving average plot, we compute the average of values within a *sliding window*. Each point on the graph represents the average of all the points that fall within the window on either side.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/EZOXiPs.gif" width=400, alt="An animated plot showing an undulating curve slowly increasing with a moving average line developing from left to right within a window of 12 points (in red).">
<figcaption style="textalign: center; font-style: italic"><center>
</center></figcaption>
</figure>

Choose a window size at least as long as the longest seasonality in the series. This helps to isolate the trend by smoothing over any seasonal effects. The series above has monthly observations and a yearly seasonality, so we chose a moving window of size 12.

<note>TODO: explain animation more</note>

When used for visualization, taking the average at the center of the window is best so the trend doesn't appear lagged. Moving averages are also sometimes used as features. In this case, you would want to take the average only of past values to avoid leaking information backward from the future.

# Trend Models #

In this course, we'll create features that capture important temporal patterns, and use scikit-learn's `LinearRegression` model to combine these features into an additive model very much like Prophet's.

```
target = trend + remainder
```

Because the trend depends only on time, we can model it by using a time-step feature in our training data. Such a feature is called a **time dummy**, which is essentially an ordinal encoding for a column of timestamps:

| Date    | Time  |
|---------|-------|
| 1959-06 | 1.0   |
| 1959-07 | 2.0   |
| 1959-08 | 3.0   |
| 1959-09 | 4.0   |
| 1959-11 | 6.0   |
| ...     | ...   |

A time dummy implicitly includes all the timestamps within the range of dates of the time series, including missing values. Generally, this means that if your time series skips a date, your time dummy should skip the corresponding code. Above, the `Date` column skipped `1959-11`, so `Time` needed to skip `5.0`. This only applies to timestamps that are truly missing, however. Many financial time series only include business days, for instance, but we still wouldn't include weekends or holidays in the trend because we assume nothing happened then.

Trend in a time series can manifest itself in curves with a variety of shapes. Two simple trend patters, linear and quadratic, are displayed here:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/KFYlgGm.png" width=400, alt="Above, Cars Sold in Quebec: an undulating plot gradually increasing from 1960-01 to 1968-12 with a linear trend-line superimposed. Below, Plastics Production in Australia: an undulating plot with a concave-up quadratic trend-line superimposed.">
<figcaption style="textalign: center; font-style: italic"><center><strong>Top:</strong> Series with a linear trend. <strong>Below:</strong> Series with a quadratic trend.
</center></figcaption>
</figure>

A linear model will create a linear trend line when given the time dummy. To create a quadratic trend line, we only need to include an additional column with the time dummy's squared values:

| Date    | Time  (linear) | Time Squared (quadratic)  |
|---------|----------------|---------------------------|
| 1959-06 | 1.0            | 1.0                       |
| 1959-07 | 2.0            | 4.0                       |
| 1959-08 | 3.0            | 9.0                       |
| 1959-09 | 4.0            | 16.0                      |
| 1959-11 | 6.0            | 36.0                      |
| ...     | ...            | ...                       |

It's possible to create higher-order polynomial trends by taking higher powers of the time dummy, but polynomials of higher order can diverge rapidly outside the training data making them unreliable when used for forecasting. When the trend can't be fit with a linear or quadratic curve, it's usually best to use an alternative regression algorithm. We'll mention a few of these alternatives in the next section.

# Other Kinds of Trend

Trend can manifest in a time series in a variety of ways. 

Here are some time series whose trend doesn't fit the linear or quadratic pattern:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/7cxTLCt.png" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center>Trend patterns in four time series. <strong>Top left:</strong> Irregular trend. <strong>Top right:</strong> No trend. <strong>Bottom left:</strong> Trend saturating downwards. <strong>Bottom right:</strong> Trend with changepoints.
</center></figcaption>
</figure>

- MARS for trend with changepoints
- GAM for flexible with interactions

GAMs you can find in the `statsmodels` and `pygam` libraries. MARS you can find in `py-earth`.

<blockquote style="margin-right:auto; margin-left:auto; background-color: #ebf9ff; padding: 1em; margin:24px;">
<strong>statsmodels</strong>

[`statsmodels`](https://www.statsmodels.org/stable/index.html) is a Python library for statistical models and estimation -- it's like the scikit-learn of classical regression and implements a lot of R's core functionality.

In addition to ordinary regression, `statsmodels` also has a comprehensive set of tools for time series analysis, much of which we'll make use of in this course.
</blockquote>

# Example - Tunnel Traffic #

In this example we'll create a trend model for the *Tunnel Traffic* dataset using scikit-learn's `LinearRegression`. *Tunnel Traffic* contains the number of vehicles traveling through the Baregg Tunnel in Switzerland each day from November 2003 to November 2005. Though not required for the exercise, you might like to examine the code in the hidden cell to see how to use Pandas to prepare a time series.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

simplefilter("ignore")  # ignore warnings to clean up output cells

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=14,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)


# Load Tunnel Traffic dataset
data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])

# Create a time series in Pandas by setting the index to a date
# column. We parsed "Day" as a date type by using `parse_dates` when
# loading the data.
tunnel = tunnel.set_index("Day")

# By default, Pandas creates a `DatetimeIndex` with dtype
# `datetime64`. I find that a `PeriodIndex` is easier to work with,
# but this is just a matter of preference. See the Pandas
# documentation for details on the difference.
tunnel = tunnel.to_period()

tunnel.plot(title="Tunnel Traffic", **plot_params);

To see what kind of trend this series has, let's make a moving average plot. The *Tunnel Traffic* series has an annual seasonality, so we'll use a 365-day window. (You'll learn how to estimate seasonality in the next lesson.)

To create a moving average, first use the `rolling` method to begin a windowed computation. Follow this by the `mean` method to compute the average over the window.

In [None]:
moving_average = tunnel.rolling(
    window=365,
    center=True,  # puts the average at the center of the window
    min_periods=183,  # choose about half the window size
).mean()

ax = tunnel.plot(style=".", color="0.5")
moving_average.plot(
    ax=ax, linewidth=3, title="Tunnel Traffic - 365-Day Moving Average", legend=False,
);

`DeterministicProcess` is a utility from the `statsmodels` library for creating time series features. It accomodates missing timestamps and can generate features for times outside of the training data (much like Prophet's `make_future_dataframe` we saw in Lesson 1). The `constant` argument creates a feature for the level, and the `order` argument creates a feature for the trend, where "order" refers to polynomial order: 1 is linear, 2 is quadratic, and so on. The moving average plot suggests that the trend is fairly close to linear, so we'll create a time dummy of order 1.

In [None]:
from statsmodels.tsa.deterministic import DeterministicProcess

dp = DeterministicProcess(
    index=tunnel.index,  # dates from the training data
    constant=True,       # the level
    order=1,             # the trend
    drop=True,           # drop terms to avoid collinearity
)
X = dp.in_sample()  # features for the training data

X.head(10)

(Technical note: A "deterministic process" is a time series that is non-random or completely determined. Our trend and seasonal features are deterministic because they depend only on the dates and times, which are fixed. In contrast, a "stochastic process" is a series that is random or unpredictable to some degree. Anything truly requiring a forecast -- like tunnel traffic -- is stochastic. We'll return to the distinction between deterministic and stochastic in Lesson 4.)

Now let's create the trend model. As mentioned before, we'll use `LinearRegression` from scikit-learn.

In [None]:
from sklearn.linear_model import LinearRegression

y = tunnel["NumVehicles"]  # the target

model = LinearRegression(fit_intercept=False)  # the intercept is the same as the `const` feature
model.fit(X, y)

y_pred = pd.Series(
    model.predict(X),
    index=X.index,
)


ax = tunnel.plot(style=".", color="0.5", title="Tunnel Traffic - Linear Trend")
y_pred.plot(ax=ax, linewidth=3, label="Trend");

The trend discovered by our `LinearRegression` model is almost identical to the moving average plot, which suggests that a linear trend was the right decision in this case.

To make a forecast, we apply our model to "out of sample" features. "Out of sample" refers to times outside of the observation period of the training data, which would include the future times of a forecast. Here's how we could make a 30-day forecast:

In [None]:
X = dp.out_of_sample(steps=30)

y_fore = pd.Series(
    model.predict(X),
    index=X.index,
)

y_fore.head(10)

Let's plot a portion of the series to see the forecast trend:

In [None]:
ax = tunnel["2005-05":].plot(title="Tunnel Traffic - Linear Trend Forecast", **plot_params)
ax = y_pred["2005-05":].plot(ax=ax, linewidth=3, label="Trend")
ax = y_fore.plot(ax=ax, linewidth=3, label="Trend Forecast", color="C3")
ax.legend();

# Your Turn #
