# What is Seasonality? #

Like trend, seasonality is a kind of time dependence sometimes present in time series that describes a changing mean, and, like trend, we will model it as being completely predictable as a function of time (as being *deterministic*, that is).

We say that a time series exhibits **seasonality** whenever there is a regular, periodic change in the mean of the series. Seasonal changes generally follow the clock and calendar -- repetitions over a day, a week, or a year are common. Seasonality is often driven by the cycles of the natural world over days and years or by conventions of social behavior surrounding dates and times.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/ViYbSxS.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center>Seasonal patterns in four time series.
</center></figcaption>
</figure>

Many time series have multiple seasonal components. Website visits, for instance, might rise and fall regularly over the course of a day and over the course of a year.

# Seasonal Plots and Seasonal Indicators #

Just like we used a moving average plot to discover the trend in a series, we can use a **seasonal plot** to discover seasonal patterns.

A seasonal plot shows segments of the time series plotted against some common period, the period being the "season" you want to observe. The figure shows a seasonal plot of the daily views of Wikipedia's article on *Trigonometry*: the article's daily views plotted over a common *weekly* period.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/bd7D4NJ.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center>There is a clear weekly seasonal pattern in this series, higher on weekdays and falling towards the weekend.
</center></figcaption>
</figure>

To help bring out a *yearly* pattern in the *Trigonometry* series, let's plot the series with the daily views summed into a monthly total. (Both Pandas and Seaborn have methods that make this easy.)

<figure style="padding: 1em;">
<img src="https://i.imgur.com/HvUcMut.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center>When a period contains a large number of observations (like days in a year), aggregating observations through a sum or an average can help make any seasonal patterns more prominant.</center></figcaption>
</figure>

You might be familiar with *categorical plots* <note>link to dataviz</note>. The seasonal plot shows the series factored over the category of seasons.

### Seasonal indicators

**Seasonal indicators** are binary features that represent seasonal differences in the level of a time series. Seasonal indicators are what you get if you treat a seasonal period as a categorical feature and apply one-hot encoding.

By one-hot encoding days of the week, we get weekly seasonal indicators. Creating weekly indicators for the *Trigonometry* series will then give us six new "dummy" features. (Linear regression works best if you drop one of the indicators; we chose Monday in the frame below.)

| Date       | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|------------|---------|-----------|----------|--------|----------|--------|
| 2016-01-04 | 0.0     | 0.0       | 0.0      | 0.0    | 0.0      | 0.0    |
| 2016-01-05 | 1.0     | 0.0       | 0.0      | 0.0    | 0.0      | 0.0    |
| 2016-01-06 | 0.0     | 1.0       | 0.0      | 0.0    | 0.0      | 0.0    |
| 2016-01-07 | 0.0     | 0.0       | 1.0      | 0.0    | 0.0      | 0.0    |
| 2016-01-08 | 0.0     | 0.0       | 0.0      | 1.0    | 0.0      | 0.0    |
| 2016-01-09 | 0.0     | 0.0       | 0.0      | 0.0    | 1.0      | 0.0    |
| 2016-01-10 | 0.0     | 0.0       | 0.0      | 0.0    | 0.0      | 1.0    |
| 2016-01-11 | 0.0     | 0.0       | 0.0      | 0.0    | 0.0      | 0.0    |
| ...        | ...     | ...       | ...      | ...    | ...      | ...    |

Adding seasonal indicators to the training data helps models distinguish means within a seasonal period:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/sswiBwZ.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center>Ordinary linear regression learns the mean values at each time in the season.</center></figcaption>
</figure>

# Fourier Features and the Periodogram #

We've seen how the seasonal components of a time series are periodic, repeating with a frequency that might be daily, weekly, monthly, or whatever. A plot of the **periodogram** shows you the magnitude of these frequencies:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/PK6WEe3.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center></center></figcaption>
</figure>

The figure above shows the periodogram of the *Wiki Trigonometry* series. Agreeing with our seasonal plots, we see a strong weekly seasonality, as well as an annual (yearly) seasonality with three intermediate fluctuations; these intermediate fluctuations appear as the large bar between *Semiannual* (two times a year) and *Quarterly* (four times a year).

### Fourier features

As an alternative to seasonal indicators, we can use "Fourier features" to model these frequencies directly. The idea is to include in our training data periodic curves having the same frequencies as the seasonal components we are trying to model; specifically, we use curves of the trigonometric functions sine and cosine.

**Fourier features** are pairs of sine and cosine curves, one pair for each frequency in the seasonal component starting with the longest. Fourier pairs modeling annual seasonality would have frequencies: once per year, twice per year, three times per year, and so on.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/bKOjdU7.png" width=600, alt="A top figure and a bottom figure, each showing a sine curve and a cosine curve. The curves in the top plot both have frequency of once per year, while the curves in the bottom plot both have a frequency of twice per year.">
<figcaption style="textalign: center; font-style: italic"><center>The first two Fourier pairs for annual seasonality.<strong>Top: </strong>Frequency of once per year. <strong>Bottom:</strong>Frequency of twice per year.</strong></center></figcaption>
</figure>

If we add a set of these sine / cosine curves to our training data, the linear regression algorithm will figure out the weights that will fit the seasonal component in the target series. The figure illustrates how linear regression used four Fourier pairs to model the annual seasonality in the *Wiki Trigonometry* series.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/AqPMnVx.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center><strong>Top: </strong>Curves for four Fourier pairs, a sum of sine and cosine with regression coefficients. Each curve models a different frequency. <strong>Bottom: </strong>The sum of these curves approximates the seasonal pattern.</center></figcaption>
</figure>

Notice that we only needed eight features (four sine / cosine pairs) to get a good estimate of the annual seasonality. Compare this to the seasonal indicator method which would have required hundreds of features (one for each day of the year). By modeling only the "main effect" of the seasonality with Fourier features, you'll usually need to add far fewer features to your training data: this means reduced computation time and less risk of overfitting.

### Computing fourier features (optional)

Knowing how Fourier features are computed isn't essential to using them, but if seeing the details would clarify things, the cell hidden cell below illustrates how a set of Fourier features could be derived from the index of a time series. (We'll use a library function from statsmodels for our applications, however.)

In [None]:
#$HIDE_INPUT$
import numpy as np


def fourier_features(index, freq, order):
    time = np.arange(len(index), dtype=np.float32)
    k = 2 * np.pi * (1 / freq) * time
    features = {}
    for i in range(1, order + 1):
        features.update({
            f"sin_{freq}_{i}": np.sin(i * k),
            f"cos_{freq}_{i}": np.cos(i * k),
        })
    return pd.DataFrame(features, index=index)


# Compute Fourier features to the 4th order (8 new features) for a
# series y with daily observations and annual seasonality:
#
# fourier_features(y, freq=365.25, order=4)

# Example - Tunnel Traffic #

We'll continue once more with the *Tunnel Traffic* dataset. This hidden cell loads the data and defines two functions: `seasonal_plot` and `plot_periodogram`.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=16,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)
%config InlineBackend.figure_format = 'retina'


# annotations: https://stackoverflow.com/a/49238256/5769929
def seasonal_plot(X, y, period, freq, ax=None):
    if ax is None:
        _, ax = plt.subplots()
    palette = sns.color_palette("husl", n_colors=X[period].nunique(),)
    ax = sns.lineplot(
        x=freq,
        y=y,
        hue=period,
        data=X,
        ci=False,
        ax=ax,
        palette=palette,
        legend=False,
    )
    ax.set_title(f"Seasonal Plot ({period}/{freq})")
    for line, name in zip(ax.lines, X[period].unique()):
        y_ = line.get_ydata()[-1]
        ax.annotate(
            name,
            xy=(1, y_),
            xytext=(6, 0),
            color=line.get_color(),
            xycoords=ax.get_yaxis_transform(),
            textcoords="offset points",
            size=14,
            va="center",
        )
    return ax


def plot_periodogram(ts, detrend='linear', ax=None):
    from scipy.signal import periodogram
    fs = pd.Timedelta("1Y") / pd.Timedelta("1D")
    freqencies, spectrum = periodogram(
        ts,
        fs=fs,
        detrend=detrend,
        window="boxcar",
        scaling='spectrum',
    )
    if ax is None:
        _, ax = plt.subplots()
    ax.step(freqencies, spectrum, color="purple")
    ax.set_xscale("log")
    ax.set_xticks([1, 2, 4, 6, 12, 26, 52, 104])
    ax.set_xticklabels(
        [
            "Annual (1)",
            "Semiannual (2)",
            "Quarterly (4)",
            "Bimonthly (6)",
            "Monthly (12)",
            "Biweekly (26)",
            "Weekly (52)",
            "Semiweekly (104)",
        ],
        rotation=30,
    )
    ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
    ax.set_ylabel("Density")
    ax.set_title("Periodogram")
    return ax


data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
tunnel = tunnel.set_index("Day").to_period("D")

The easiest way to create a seasonal plot in Python is to create a categorical feature for each period and frequency of interest and use Seaborn to separate the seasons. We've wrapped everything up in a convenience function `seasonal_plot`, defined in the hidden cell.

In [None]:
X = tunnel.copy()

# days within a week
X["day"] = X.index.dayofweek  # the frequency
X["week"] = X.index.week  # the period

# months within a year
X["month"] = X.index.month  # use `dt` instead of `index` if your
                            # timestamps are in a column
X["year"] = X.index.year  # you can also get things like quarter,
                          # weekday, weekofyear, ...

fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 6))
seasonal_plot(X, y="NumVehicles", period="year", freq="month", ax=ax0)
seasonal_plot(X, y="NumVehicles", period="week", freq="day", ax=ax1);

Now let's look at a periodogram for this series:

In [None]:
plot_periodogram(tunnel.NumVehicles);

The periodogram agrees with the seasonal plots above: both weekly and annual components.

The falling peaks occuring after the 'Annual and 'Weekly' period are known as *harmonics*. Harmonics occur at multiples of the dominant frequency and indicate a difference in the seasonal curve from a pure sine/cosine curve. When using Fourier features, we want to create enough pairs to capture all of the harmonics of a season. From the periodogram, it appears that the harmonics of the 'Annual (1)' season end just before 'Monthly (12)', so let's use an order 10 approximation (10 sine / cosine pairs).

We'll create our seasonal features using `DeterministicProcess`, the same utility we used in Lesson 2 to create trend features. To use two seasonal periods (weekly and yearly), we'll need to instantiate one of them as an "additional term":

In [None]:
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess

fourier = CalendarFourier(freq="A", order=10)  # 10 sin/cos pairs for Annual seasonality

dp = DeterministicProcess(
    index=tunnel.index,
    constant=True,               # level
    order=1,                     # trend (order 1 means linear)
    seasonal=True,               # weekly seasonality (indicators)
    additional_terms=[fourier],  # annual (yearly) seasonality (fourier)
    drop=True,                   # drop terms to avoid collinearity
)

X = dp.in_sample()  # create features for dates in tunnel.index

You can create additional terms for `DeterministicProcess` using:
- `CalenderTimeTrend` for trend features,
- `CalendarSeasonality` for seasonal indicators, and
- `CalendarFourier` for Fourier features,

all from the `tsa.deterministic` module in `statsmodels`.

With our seasonal features created, we're ready to fit our linear regression model:

In [None]:
y = tunnel.NumVehicles.copy()

model = LinearRegression(fit_intercept=False)
_ = model.fit(X, y)

Let's look at the fitted values to get a sense of how successful we were in capturing our series' seasonality. We'll also add a 90-day forecast to see how our model extrapolates beyond the training data.

In [None]:
#$HIDE_INPUT$
y_pred = pd.Series(model.predict(X), index=y.index)
X_fore = dp.out_of_sample(steps=90)
y_fore = pd.Series(model.predict(X_fore), index=X_fore.index)

ax = y.plot(color='0.25', style='.', title="Tunnel Traffic - Seasonal Forecast")
ax = y_pred.plot(ax=ax, label="Seasonal")
ax = y_fore.plot(ax=ax, label="Seasonal Forecast", color='C3')
_ = ax.legend()

Our latest model -- just linear regression with trend and seasonal features -- appears to be making the almost the same predictions as the Prophet model we created in Lesson 1. In fact, Prophet uses the same feature engineering techniques that you've just learned in it's algorithm. Knowing how to create these features yourself though means that you can now turn almost any machine learning model into a time series model.

---

There's still more we can do with time series to improve our forecasts, though. In the next lesson, you'll learn how to use time series themselves as a features through *lag features*. Lag features give you a powerful way to capture serial dependence in time series not always well modeled by trend or seasonality.

# Your Turn #