# Forecasting Environments

In this chapter, we're going to do a tour of forecasting *environments*: that is, the set of slices of time that you might need to put together when doing any forecast exercise. There won't be too much code in this chapter: we're just laying the theoretical groundwork here.

Let's start with some definitions. Features, or regressors, are labelled by $k = 0, \dots, K-1$ and time by $t = 0, \dots, T-1$. Note that this means there are $T$ time periods and $K$ features. We'll label different slices of time (or equivalently models trained on different data) by an index $\mu$ beginning from $\mu=1$, and we'll give these slices of time different labels: IS for in-sample and OS for out-of-sample. $f_\mu$ is the model trained on the $\mu$th set of in-sample data. The target variable (the number we are trying to forecast) is $\left\{y_{t+h}\right\}_{t=0}^{t=T-1}$, where $h$ is the number of time steps ahead we wish to forecast. Let $\left\{x_{tk}\right\}_{t=0}^{t=T-1}$ represent feature (or regressor) $k$. This is a bit simplistic because of real-time data timeliness, but in general a forecast implies $h>0$ (otherwise we're doing a nowcast).

## Two Period Forecast

The most simple forecast you can imagine is if we have two periods (each of which may be made up of more than one value of $t$). The important thing is that the $t$ in the first, "in-sample" period do not overlap with the $t$ in the second, "out-of-sample" period. That's because we *only want to use information in the in-sample period to forecast the out-of-sample period*. There are some cases where you may want to "forecast" the in-sample period, but if you're interested in the real-world performance of your forecast, this is the setup you need.

In this world of a single in-sample period and a single out-of-sample period, $\mu=1$, and we could get drop that notation and just use IS and OS. But we'll leave it in to make the contrast with what's to come clearer.

The exercise is to use a model, $f = f_{\mu=1}$, that is trained to predict $\{y_{t+h}, t:\mu_{\text{IS}}=1\}$ using $\{x_{tk}, t:\mu_{\text{IS}}=1\}$ so that it can do an out-of-sample forecast of $\{y_{t+h}, t:\mu_\text{OS}=1\}$ using $\{x_{tk}, t:\mu_{\text{OS}}=1\}$. It's actually easier to see it with a picture:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Plot settings
plt.style.use(
    "https://github.com/aeturrell/coding-for-economists/raw/main/plot_style.txt"
)
# Set max rows displayed for readability
pd.set_option("display.max_rows", 15)

from plotnine import ggplot, geom_point, aes, stat_smooth, facet_wrap
from plotnine import (
    options,
    geom_tile,
    scale_size,
    scale_x_continuous,
    scale_y_continuous,
)
from plotnine.scales import scale_fill_brewer
import pandas as pd
import numpy as np

s = 1
alpha = 4
T = 10

options.set_option("figure_size", (12, 3))
options.set_option("dpi", 150)


def in_sample_block_e(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange(0, mu * s + alpha, s)})
    xf[r"$\mu$"] = [mu] * (mu * s + alpha - 0)
    xf["sample"] = "in-sample"
    return xf


def os_sample_block_e(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange(mu * s + alpha, T, s)})
    xf[r"$\mu$"] = [mu] * (T - (mu * s + alpha))
    xf["sample"] = "out-of-sample"
    return xf


def in_sample_block_r(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange((mu - 1) * s, mu * s + alpha, s)})
    xf[r"$\mu$"] = [mu] * (mu * s + alpha - (mu - 1) * s)
    xf["sample"] = "in-sample"
    return xf


def os_sample_block_r(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange(mu * s + alpha, T, s)})
    xf[r"$\mu$"] = [mu] * (T - (mu * s + alpha))
    xf["sample"] = "out-of-sample"
    return xf


(
    ggplot(aes("t", "sample", r"$\mu$"))
    + geom_tile(
        in_sample_block_e(1, s, alpha, T), aes(width=0.95, height=0.95, fill="sample")
    )
    + geom_tile(
        os_sample_block_e(1, s, alpha, T), aes(width=0.95, height=0.95, fill="sample")
    )
    + scale_x_continuous(breaks=range(11))
    + scale_fill_brewer(type="qual", palette="Accent")
)

In this case, the in-sample period runs from $t=0$ to $t=4$, while the out-of-sample period runs from $t=5$ to $t=9$. $T=10$.

Let's see how it works with some fake data, and we'll assume $h=3$ in this case.

In [None]:
import string

horizon = 3
num = 10 + horizon

df = pd.DataFrame(
    np.array(
        [
            range(num),
            list(string.ascii_lowercase)[:num],
            list(string.ascii_lowercase)[num : 2 * num],
        ]
    ).T,
    columns=["Time", "y", "x"],
)
df[r"$y_{t+h}$"] = df["y"].shift(-horizon)
df

What we have here is a dataset with a time index and y and x values. $y_{t+h}$ has been added as a feature by shifting the $y$ variable back by $h=3$ steps. We're not going to be able to predict those NaN values, so the first thing to do is to drop those variables. Second, we'll want to add in which parts are in-sample and which are out-of-sample.

In [None]:
df = df.dropna()
df["sample"] = df["Time"].apply(lambda x: "IS" if int(x) < 5 else "OS")
df = df.set_index("Time")
df

Note how the sample can either be IS or OS-but not both.

Now, if we were doing a forecast, we'd select the IS part to do our initial model building by running something like

```python
model = ols.fit(X=df.loc[df["sample"]=="IS","x"],
                y=df.loc[df["sample"]=="IS",r"$y_{t+h}$"])
```

This would then be used to predict the out of sample values:

```python
y_t_plus_h_os = model.predict(X=df.loc[df["sample"]=="OS","x"])
```

To know how good the estimate was, we'd then compare it with the true value. Typically, a value like the root mean square error is used for this, given by

$$
{\displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{t=0}^{T-1}({\hat {y}}_{t+h}-y_{t+h})^{2}}{T}}}.}
$$

where $\hat{y}_{t+h}$ is the predicted value, equivalent to the object `y_t_plus_h_os` above.

### An Example Two Period Forecast

Let's see an example of this with some numerical data. Our dataframe needs to have our features, $x$, and target, $y_{t+h}$, in.

In [None]:
T = 10
window_size = 5
horizon = 3


df = pd.DataFrame(
    {
        "x": range(0, T + horizon),
        "y": [0] * horizon
        + list(range(0, window_size))
        + list(range(T - 1, window_size - 1, -1)),
    }
)
df["x"] = df["x"].astype(float)
# Bring y_{t+h} in line with feature x so that models always take data from the same row.
# Note that for a variable to be at t+h today, with h the horizon, we need to bring it *back* by h steps
df[r"$y_{t+h}$"] = df["y"].shift(-horizon)
# drop y so that we don't accidentally put it into model as a feature by mistake
df = df.drop("y", axis=1)
df

As ever, it's good to look at your data so let's do a quick chart. Note that time and $x$ have the same values here, so this is also the time series of $y_{t+h}$.

In [None]:
df.plot.scatter(x="x", y=r"$y_{t+h}$");

Okay, what could possibly go wrong! Let's now do our forecasting exercise. First, we train or fit the model. Note that *pandas* indexing is *inclusive*, so we run to `window_size-1` rather than `window_size` to pick up our 5 in-sample points

In [None]:
from sklearn.linear_model import LinearRegression

df = df.dropna()
model = LinearRegression(fit_intercept=False).fit(
    X=df.loc[: window_size - 1, ["x"]], y=df.loc[: window_size - 1, [r"$y_{t+h}$"]]
)
# Make in-sample prediction. Note that the predictions of the model come out in
# an array of form [[a], [b], ...] so we flatten this to [a, b, ...]. Also, we pad
# the results we didn't predict with [None]
df[r"$\hat{y}_{t+h}^{\operatorname{IS}}$"] = list(
    model.predict(df.loc[: window_size - 1, ["x"]]).flatten()
) + [None] * (window_size)

Make out of sample prediction:

In [None]:
df[r"$\hat{y}_{t+h}^{\operatorname{OS}}$"] = [None] * (window_size) + list(
    model.predict(df.loc[window_size:, ["x"]]).flatten()
)
df

In [None]:
fig, ax = plt.subplots()
df.plot.scatter(x="x", y=r"$y_{t+h}$", ax=ax, label=r"$y_{t+h}$", s=80)
df.plot.scatter(
    x="x",
    y=r"$\hat{y}_{t+h}^{\operatorname{OS}}$",
    color="red",
    ax=ax,
    label=r"$\hat{y}_{t+h}^{\operatorname{OS}}$",
    s=100,
)
ax.set_ylabel(r"$y_{t+h}$ and $\hat{y}_{t+h}^{\operatorname{OS}}$")
ax.legend();

Oh dear, it doesn't look like our out-of-sample prediction did very well! Let's see what the RMSE error is for both the in-sample and out-of-sample prediction.

## Expanding Window

In an expanding window forecast exercise, multiple forecasts are made. In each, the in-sample period grows over time. Necessarily (given finite data), the out-of-sample period shrinks over time. The overall out-of-sample forecast is given by the unique union of the out-of-sample forecasts that are trained on the most information. To make this clear, let's first define it mathematically and then with a diagram.

We index the different forecasts (synoymous with different models $f_\mu$) by $\mu$. The starting size of the window will be $s + \alpha$ where $s$ is the step size and $\alpha$ is a parameter that adjusts window size. For arbitrary $z_t$, the $\mu$th in-sample slice of time series data is: 
$$
I_{\mu}(\vec{z}) = \left\{z_t  \right\}_{t=0}^{t=\mu\cdot s + \alpha -1}
$$
Applied to the features, this is the slice that will be used to train (aka fit) model $f_\mu$.

Likewise, the out-of-sample set is:
$$
O_{\mu}(\vec{z}) = \left\{z_t \right\}^{t=T-1}_{t=\mu\cdot s + \alpha}
$$

We can visualise this like so:

In [None]:
# TODO: hide-input tag
from plotnine import ggtitle

s = 1
alpha = 2
T = 10
max_mu = int((T - 1 + 1 - alpha) / s)

(
    ggplot(aes("t", r"$\mu$", "sample"))
    + [
        geom_tile(
            os_sample_block_e(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + [
        geom_tile(
            in_sample_block_e(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + scale_x_continuous(breaks=range(T + 1))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Accent")
    + ggtitle("Expanding Window")
)

Note that in-sample and out-of-sample never overlap in time for the same value of $\mu$. This is important because it ensures 

In [None]:
script_O = pd.concat(
    [os_sample_block_e(mu, s, alpha, T) for mu in range(1, max_mu)], axis=0
)
script_O = script_O.reset_index(drop=True)
script_O["Union"] = False
script_O.loc[script_O[~script_O["t"].duplicated(keep="last")].index, "Union"] = True
script_O

In [None]:
(
    ggplot(aes("t", r"$\mu$", "Union"))
    + geom_tile(script_O, aes(width=0.95, height=0.95, fill="Union"))
    + scale_x_continuous(breaks=range(T), limits=(-1, T))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Pastel1")
    + ggtitle("Union of Best Expanding Window Out-of-Sample Predictions")
)

In [None]:
script_I = pd.concat(
    [in_sample_block_e(mu, s, alpha, T) for mu in range(1, max_mu)], axis=0
)
script_I = script_I.reset_index()
# def script_is_e(s, alpha, df):
script_I["Union"] = False
script_I.loc[script_I[~script_I["t"].duplicated(keep="first")].index, "Union"] = True
script_I

In [None]:
(
    ggplot(aes("t", r"$\mu$", "Union"))
    + geom_tile(script_I, aes(width=0.95, height=0.95, fill="Union"))
    + scale_x_continuous(breaks=range(T), limits=(-1, T))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Pastel1")
    + ggtitle("Union of Best Expanding Window In-Sample Predictions")
)

In [None]:
# TODO hide all
# For completeness, this is what the IS and OS time series look like next to each other:

script_I["sample"] = "In-Sample"
script_O["sample"] = "Out-of-Sample"
script_rolling = pd.concat([script_I, script_O], axis=0)
script_rolling = script_rolling[script_rolling["Union"] == True]

(
    ggplot(aes("t", r"$\mu$", "sample"))
    + geom_tile(script_rolling, aes(width=0.95, height=0.95, fill="sample"))
    + scale_x_continuous(breaks=range(T), limits=(-1, T))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Accent")
    + ggtitle("Best Expanding Window Predictions: In-Sample and Out-of-Sample")
)

## Rolling Window

Below is example of all the rolling window forecasts in a single forecasting exercise.

In [None]:
# TODO: hide-input tag
from plotnine import ggtitle

s = 1
alpha = 2
window_size = s + alpha
T = 10
max_mu = int((T - 1 + 1 - alpha) / s)

(
    ggplot(aes("t", r"$\mu$", "sample"))
    + [
        geom_tile(
            in_sample_block_r(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + [
        geom_tile(
            os_sample_block_r(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + scale_x_continuous(breaks=range(T + 1))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Accent")
    + ggtitle("Rolling Window")
)

Unions

Below shows how to find *overall* performance of the forecast. These are best out-of-sample predictions possible. These are going to be the union of one step ahead forecasts (ie forecasts that are just outside of the in-sample period).

In [None]:
script_I = pd.concat(
    [in_sample_block_r(mu, s, alpha, T) for mu in range(1, max_mu)], axis=0
)
script_I = script_I.reset_index()
# def script_is_e(s, alpha, df):
script_I["Union"] = False
script_I.loc[script_I[~script_I["t"].duplicated(keep="first")].index, "Union"] = True
script_I

Below gives best in-sample prediction because it is union of last period within-the-window predictions.

In [None]:
(
    ggplot(aes("t", r"$\mu$", "Union"))
    + geom_tile(script_I, aes(width=0.95, height=0.95, fill="Union"))
    + scale_x_continuous(breaks=range(T), limits=(-1, T))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Pastel1")
    + ggtitle("Union of Best Rolling Window In-Sample Predictions")
)

below gives best out-of-sample predictions because it is the union of *just* out-of-sample forecasts (by one step size).

In [None]:
script_O = pd.concat(
    [os_sample_block_r(mu, s, alpha, T) for mu in range(1, max_mu)], axis=0
)
script_O = script_O.reset_index()
script_O["Union"] = False
script_O.loc[script_O[~script_O["t"].duplicated(keep="last")].index, "Union"] = True
script_O

In [None]:
(
    ggplot(aes("t", r"$\mu$", "Union"))
    + geom_tile(script_O, aes(width=0.95, height=0.95, fill="Union"))
    + scale_x_continuous(breaks=range(T), limits=(-1, T))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Pastel1")
    + ggtitle("Union of Best Rolling Window Out-of-Sample Predictions")
)

In [None]:
# TODO hide all
# For completeness, this is what the IS and OS time series look like next to each other:

script_I["sample"] = "In-Sample"
script_O["sample"] = "Out-of-Sample"
script_rolling = pd.concat([script_I, script_O], axis=0)
script_rolling = script_rolling[script_rolling["Union"] == True]

(
    ggplot(aes("t", r"$\mu$", "sample"))
    + geom_tile(script_rolling, aes(width=0.95, height=0.95, fill="sample"))
    + scale_x_continuous(breaks=range(T), limits=(-1, T))
    + scale_y_continuous(breaks=range(1, max_mu))
    + scale_fill_brewer(type="qual", palette="Accent")
    + ggtitle("Best Rolling Window Predictions: In-Sample and Out-of-Sample")
)