# Introduction #

You'll learn how to discover seasonality through data visualization and how to create two kinds of seasonal features.

# Seasonal Plots #

Just like we used a moving-average plot to visualize the trend in a series, we can use a **seasonal plot** to discover seasonal patterns.

When making a seasonal plot, you want to decide on the *period* of seasonality as well as the *frequency* with which to plot points within each period. We might want to know how a website's traffic varies over a year (the period) from month to month (the frequency), or how the traffic varies over a day (the period) from hour to hour (the frequency).

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


# annotations: https://stackoverflow.com/a/49238256/5769929
def seasonal_plot(X, target, period, freq, ax=None):
    if ax is None:
        _, ax = plt.subplots()
    palette = sns.color_palette("husl", n_colors=X[period].nunique(),)
    ax = sns.lineplot(
        x=freq,
        y=target,
        hue=period,
        data=X,
        ci=False,
        ax=ax,
        palette=palette,
        legend=False,
    )
    ax.set_title(f"Seasonal Plot ({period}/{freq})")
    for line, name in zip(ax.lines, X[period].unique()):
        y = line.get_ydata()[-1]
        ax.annotate(
            name,
            xy=(1, y),
            xytext=(6, 0),
            color=line.get_color(),
            xycoords=ax.get_yaxis_transform(),
            textcoords="offset points",
            size=14,
            va="center",
        )
    return ax


simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=14,
    titlepad=10,
)


data_dir = Path("/home/kaggle/datasets/wikipedia-trends/")
df = pd.read_csv(
    data_dir / "trig.csv",
    parse_dates=["date"],
    usecols=["date", "views"],
    index_col="date",
).to_period("D")

X = df["2016":"2019"][df.views < 4000]
X["dayofweek"] = X.index.dayofweek
X["week"] = X.index.week
X["month"] = X.index.month
X["year"] = X.index.year

fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 7))
ax = seasonal_plot(X, "views", period="week", freq="dayofweek", ax=ax0)
_ = seasonal_plot(X, "views", period="year", freq="month", ax=ax1)

The figure illustrates some of the variety of seasonal patterns you might find in a time series.
- Wiki Trends
- Bank Calls

# Seasonal Indicators #

One way to model seasonality is through *seasonal indicators*. 

The observation to make from the seasonal plots is that each point in the season has a different level.

The set of indicators created by a one-hot encoded category is like a function that...

Adding a set of seasonal indicators means the linear regression model can learn to shift the overall level from season-to-season.

- CO2

The seasonality accounts for the deviation from the trend-line.

How could we capture this effect in our model?

- Wiki Trends

Seasonal indicators are best when you have only a few categories to encode, like months within a year (12) or days within a week (7). When the seasonality is of high-frequency, like days within a year, it's better to use our next method.

# Fourier Features and the Periodogram #

- Wiki Trends

with far fewer features than would be required with indicators, and creating less risk of overfitting. An annual seasonality occuring over days would need 365 indicators, but you might be able to model the same pattern with only 10 or 20 fourier features.

With indicators, we modeled changes in a series over time. Another approach is to model changes over frequency. (timepoint vs. cycle)


# Example - Tunnel Traffic #

We'll continue once more with the *Tunnel Traffic* dataset. This hidden cell loads the data and defines three functions, `seasonal_plot`, `plot_periodogram` and `add_trend`.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy.signal import periodogram
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.deterministic import CalendarTimeTrend

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=16,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)


# annotations: https://stackoverflow.com/a/49238256/5769929
def seasonal_plot(X, target, period, freq, ax=None):
    if ax is None:
        _, ax = plt.subplots()
    palette = sns.color_palette("husl", n_colors=X[period].nunique(),)
    ax = sns.lineplot(
        x=freq,
        y=target,
        hue=period,
        data=X,
        ci=False,
        ax=ax,
        palette=palette,
        legend=False,
    )
    ax.set_title(f"Seasonal Plot ({period}/{freq})")
    for line, name in zip(ax.lines, X[period].unique()):
        y = line.get_ydata()[-1]
        ax.annotate(
            name,
            xy=(1, y),
            xytext=(6, 0),
            color=line.get_color(),
            xycoords=ax.get_yaxis_transform(),
            textcoords="offset points",
            size=14,
            va="center",
        )
    return ax

def plot_periodogram(ts, ax=None):
    freqencies, coefficient = periodogram(ts, fs=365.25)
    if ax is None:
        _, ax = plt.subplots()
    ax.step(freqencies, coefficient, color="purple")
    ax.set_xscale("log")
    ax.set_xticks([1, 2, 4, 6, 12, 26, 52])
    ax.set_xticklabels(
        [
            "Annually",
            "Semiannually",
            "Quarterly",
            "Bimonthly",
            "Monthly",
            "Biweekly",
            "Weekly",
        ],
        rotation=30,
    )
    ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
    ax.set_ylabel("Periodogram Coefficient")
    ax.set_title("Periodogram for an Annual Period with Daily Frequency")
    return ax


def add_trend(X, freq="D", order=1):
    trend = CalendarTimeTrend(
        freq=freq, order=order, constant=False, base_period=X.index[0],
    )
    return X.join(trend.in_sample(X.index))


data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
tunnel = tunnel.set_index("Day").to_period("D")

The easiest way to create a seasonal plot in Python is to create a categorical feature for each period and frequency of interest and use seaborn to separate the seasons.

In [None]:
X = tunnel.copy()

# days within a week
X["day"] = X.index.dayofweek  # the frequency
X["week"] = X.index.week  # the period

# months within a year
X["month"] = X.index.month  # use `dt` instead of `index` if your
                            # timestamps are in a column
X["year"] = X.index.year  # you can also get things like quarter,
                          # weekday, weekofyear, ...

fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 8))
seasonal_plot(X, y="NumVehicles", period="year", freq="month", ax=ax0)
seasonal_plot(X, y="NumVehicles", period="week", freq="day", ax=ax1);

We have pretty clear seasonality, both yearly (annual) and weekly. Just like for trend features, we'll use a function from statsmodels to create the seasonal indicators.

In [None]:
from statsmodels.tsa.deterministic import CalendarSeasonality

cs = CalendarSeasonality(freq="D", period="W")
X_weekly = cs.in_sample(tunnel.index)

X_weekly.head()

Now let's look at a periodogram for this series:

In [None]:
plot_periodogram(tunnel.NumVehicles);

The periodogram agrees with the seasonal plots above: ...

In [None]:
from statsmodels.tsa.deterministic import CalendarFourier

fourier = CalendarFourier(freq="A", order=12)
X_annual = fourier.in_sample(tunnel.index)

X_annual.iloc[:, :6].head()  # the first few pairs of fourier features

In [None]:
X = X_weekly.join(X_annual)
X = add_trend(X, order=1)
y = tunnel.NumVehicles.copy()

model = LinearRegression()
model.fit(X, y)

y_pred = model.predict(X)
y_pred = pd.Series(y_pred, index=y.index)

ax = y.plot(style=".", color="0.25")
_ = y_pred.plot(ax=ax)

Not too bad! The model we created through feature engineering is functionally almost identical to the model from Lesson 1. In the next lesson, you'll learn how to extend it even further by including time series themselves as features.

# Your Turn #