# Introduction #

In this lesson, we'll learn how to address some of the unique challenges that come with forecasting.

In this lesson, we'll learn how to define and evaluate forecasts.

- *Electricity Demand*

# Defining the Forecasting Goal #

Let's review some terminology commonly used in forecasting.

- why did we wait until now? now we have the ability to make good forecasts, so now let's learn how to evaluate them
- be more precise about what defines a forecast

- Forecast Origin
  - time the forecast is made
  - generally, the final point in the training period
- Lead Time
  - how long until the forecast starts
  - aka: gap, delay, embargo
  - forecasts with a lead time greater than one step are called multi-step *ahead* forecasts
- Forecast Horizon
  - the time period for which forecasts are made
  - aka: test period
- One-step Forecast and Multi-step Forecast
  - how many time steps are forecast for a given origin
  - the length of the forecast horizon

<img>forecast definition diagram</img>

We need to understand the circumstances of the problem. What is the goal? What are the constraints?

Note that there is a difference between a single multi-step forecast and multiple single-step forecasts.

<img>single-step / multi-step table</img>

The forecast horizon (the number of steps) is equal to the number of outputs the model should produce.

A multi-step forecast means you predict values for multiple times in the future all from the same forecast origin -- each row in the dataframe has several target values.

Some machine learning algorithms are capable of producing multiple outputs (including linear regression and neural nets), but there are workarounds for those that can't. You might start with scikit-learn's [`MultiOutputRegressor`](https://scikit-learn.org/stable/modules/multiclass.html#multioutput-regression), but also see [this discussion](https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/151927) from the *M5 Forecasting* competition on the so-called recursive and direct methods.

# Cross Validation #

What we really care about is how our model performs on *new* data (that is, the out-of-sample or test data).

Performance evaluation is an essential part of the machine learning workflow. Two techniques you might be familiar with are *the hold-out method* and *cross-validation*. (For a refresher, see our [lesson on cross-validation](https://www.kaggle.com/alexisbcook/cross-validation) in the *Intermediate ML* course.) We can use both of these evaluation techniques in forecasting, but the time- and serial-dependence in time series means we have to use some special care.

How you evaluate a machine learning model should reflect, as much as possible, how you intend to use it. This is the basic principal of model evaluation. The biggest trouble with time series is that their statistical properties can be time dependent: the very thing that your model is trying to learn can shift over time.

Whichever method you're using, in these cases it's best to have the training set come chronologically prior to the validation set.

When using the hold-out method, the strategy is to choose a validation set from a time that resembles as much as possible the forecasting period. If you had a 30-day forecast horizon, you might choose for a hold-out validation set the 30 days just before the horizon. On the other hand, if your forecast horizon was the last two weeks of December (holiday season in the many parts of the world), it might be better to use the last two weeks of December from the previous year.

<img>time series hold-out</img>

Cross-validation is similar, with a fixed-size validation set and a training set made of prior data. The training set can either take the form of a *rolling window* or an *expanding* window:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/NFnQLwe.png" width=400, alt="">
<figcaption style="textalign: center; font-style: italic"><center>
</center></figcaption>
</figure>

You could use a rolling window to see how the model changes as the training data changes, and an expanding window to see how robust your model is to overfitting due to insufficient training data.

Time-series cross-validation of this form is sometimes called **backtesting**, especially when the performance is considered over time instead of averaged together into a single number. Backtesting a forecasting model can help you understand how it performs as the behavior of the time series evolves or during extraordinary events. You might backtest forecasts for a utility-demand model to see how it performs under severe weather conditions or as the service population grows.

<img>backtesting</img>

<note>Delete the next paragraph?</note>
Time series that are *not* time dependent, whose statistical properties stay the same, are known as **stationary** series. With stationary series, there is much less to worry about. You can even use a validation set that comes prior to your training set since, statistically, the data behaves the same way at all times. The only trouble with stationary series is that they can still have serial dependence. For technical reasons, this puts your error estimates at risk of being too optimistic. It turns out though that with a sufficiently powerful forecasting model even this isn't a danger. (We'll talk about how to create such models in the next lesson.)

In summary:
- In general, order your training and validation splits *chronologically*: training data first, validation data second.
- The *hold-out* strategy is a good default. Try to choose a validation set that looks like the forecasting period -- the time period just before, perhaps.
- Use *Time-series cross-validation* or *backtesting* to test how your model behaves under different historical conditions.

<blockquote>
It's worth noting that most top-ranking competitors in Kaggle's recent forecasting competitions have used the hold-out method, which suggests that just using a hold-out validation set should be fine if you're only making a single forecast.
</blockquote>

# Example - Electricity Demand #

The *Electricity Demand* dataset contains hourly demand for electricity.

The hidden cell defines some utility functions from the previous lessons and loads the dataset.

In [None]:
#$HIDE_INPUT$
from pathlib import Path
from warnings import simplefilter

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.deterministic import (CalendarFourier,
                                           DeterministicProcess)

simplefilter("ignore")

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=16,
    titlepad=10,
)
plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)

# Load data
data_dir = Path("../input/ts-course-data")
elecdemand = pd.read_csv(data_dir / "elecdemand.csv", parse_dates=["Datetime"])
elecdemand = elecdemand.set_index("Datetime").to_period("H")

# Create features

# Data is hourly. There are 168 hours per week, so `fourier` creates
# about half as many features (42 * 2) as indicators would (168 - 1).
fourier = CalendarFourier(freq="W", order=42)

dp = DeterministicProcess(
    index=elecdemand.index,
    constant=True,               # level
    order=2,                     # trend (order 1 means linear)
    seasonal=True,               # daily seasonality (indicators)
    additional_terms=[fourier],  # weekly seasonality (fourier)
    drop=True,                   # drop terms to avoid collinearity
)

X = dp.in_sample()
y = elecdemand.Demand.copy()

First we'll use holdout validation.

We can use `train_test_split` from scikit-learn to create our data splits. It's important to set `shuffle=False` or else the test set will be sampled at random dates instead of taken as a continuous block at the end.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=24 * 14,  # 14 days
    shuffle=False,      # time series should not be shuffled
)

Now we'll create the predictions and look at the train and test error:

In [None]:
from sklearn.metrics import mean_squared_error

model = LinearRegression(fit_intercept=False)
model.fit(X_train, y_train)

y_fit = pd.Series(
    model.predict(X_train),
    index=X_train.index,
)
y_pred = pd.Series(
    model.predict(X_test),
    index=y_test.index,
)

train_rmse = mean_squared_error(y_train, y_fit, squared=False)
test_rmse = mean_squared_error(y_test, y_pred, squared=False)

print((f"Train RMSE: {train_rmse:.2f}\n" f"Test RMSE: {test_rmse:.2f}"))

With timeseries validation.

In [None]:
from sklearn.model_selection import cross_val_score, TimeSeriesSplit


def rollingcv(n, train_size, test_size, gap=0):
    n_splits = (n - train_size - gap) // test_size
    cv = TimeSeriesSplit(n_splits=n_splits,
                         max_train_size=train_size,
                         gap=gap,
                         test_size=test_size)
    return cv


cv = rollingcv(
    n=len(y),
    train_size=24*28,  # 28 days
    test_size=24*14,  # 14 days
    gap=14  # 1 day
)

cv_rmse = (-1) * cross_val_score(
    LinearRegression(fit_intercept=False),
    X,
    y,
    cv=cv,
    scoring="neg_mean_squared_error",
)
cv_rmse = np.sqrt(cv_rmse.mean())

print("Backtest RMSE: ", cv_rmse)

Forecasting future demand.

In [None]:
# refit model to entire training set
model.fit(X, y)

# create features for forecast
X_oos = dp.out_of_sample(steps=24 * 14)

y_forecast = pd.Series(
    model.predict(X_oos),
    index=X_oos.index,
)

In [None]:
ax = y.plot()
_ = y_forecast.plot(ax=ax, color='C3')

# Your Turn #