# Forecasting Environments

In this chapter, we're going to do a tour of forecasting *environments*: that is, the set of slices of time that you might need to put together when doing any forecast exercise. There won't be too much code in this chapter: we're just laying the theoretical groundwork here.

Let's start with some definitions. Features, or regressors, are labelled by $k = 0, \dots, K-1$ and time by $t = 0, \dots, T-1$. Note that this means there are $T$ time periods and $K$ features. We'll label different slices of time by an index $\mu$ beginning from $\mu=1$, and we'll give these slices of time different labels: IS for in-sample and OS for out-of-sample. The target variable (the number we are trying to forecast) is $\left\{y_{t+h}\right\}_{t=0}^{t=T-1}$, where $h$ is the number of time steps ahead we wish to forecast. Let $\left\{x_{tk}\right\}_{t=0}^{t=T-1}$ represent feature (or regressor) $k$. This is a bit simplistic because of real-time data timeliness, but in general a forecast implies $h>0$ (otherwise we're doing a nowcast).

## Two Period Forecast

The most simple forecast you can imagine is if we have two periods (each of which may be made up of more than one value of $t$). The important thing is that the $t$ in the first, "in-sample" period do not overlap with the $t$ in the second, "out-of-sample" period. That's because we *only want to use information in the in-sample period to forecast the out-of-sample period*. There are some cases where you may want to "forecast" the in-sample period, but if you're interested in the real-world performance of your forecast, this is the setup you need.

In this world of a single in-sample period and a single out-of-sample period, $\mu=1$, and we could get drop that notation and just use IS and OS. But we'll leave it in to make the contrast with what's to come clearer.

The exercise is to use a model, $f$, that is trained to predict $\{y_{t+h}, t:\mu_{\text{IS}}=1\}$ using $\{x_{tk}, t:\mu_{\text{IS}}=1\}$ to forecast $\{y_{t+h}, t:\mu_\text{OS}=1\}$ using $\{x_{tk}, t:\mu_{\text{OS}}=1\}$. It's actually easier to see it with a picture:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Plot settings
plt.style.use(
    "https://github.com/aeturrell/coding-for-economists/raw/main/plot_style.txt"
)
# Set max rows displayed for readability
pd.set_option("display.max_rows", 15)

from plotnine import ggplot, geom_point, aes, stat_smooth, facet_wrap
from plotnine import (
    options,
    geom_tile,
    scale_size,
    scale_x_continuous,
    scale_y_continuous,
)
import pandas as pd
import numpy as np

s = 1
alpha = 4
T = 10

options.set_option("figure_size", (12, 3))
options.set_option("dpi", 150)


def in_sample_block_e(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange(0, mu * s + alpha, s)})
    xf[r"$\mu$"] = [mu] * (mu * s + alpha - 0)
    xf["sample"] = "in-sample"
    return xf


def os_sample_block_e(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange(mu * s + alpha, T, s)})
    xf[r"$\mu$"] = [mu] * (T - (mu * s + alpha))
    xf["sample"] = "out-of-sample"
    return xf


def in_sample_block_r(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange((mu - 1) * s, mu * s + alpha, s)})
    xf[r"$\mu$"] = [mu] * (mu * s + alpha - (mu - 1) * s)
    xf["sample"] = "in-sample"
    return xf


def os_sample_block_r(mu, s, alpha, T):
    xf = pd.DataFrame({"t": np.arange(mu * s + alpha, T, s)})
    xf[r"$\mu$"] = [mu] * (T - (mu * s + alpha))
    xf["sample"] = "out-of-sample"
    return xf


(
    ggplot(aes("t", "sample", r"$\mu$"))
    + geom_tile(
        in_sample_block_e(1, s, alpha, T), aes(width=0.95, height=0.95, fill="sample")
    )
    + geom_tile(
        os_sample_block_e(1, s, alpha, T), aes(width=0.95, height=0.95, fill="sample")
    )
    + scale_x_continuous(breaks=range(11))
)

In this case, the in-sample period runs from $t=0$ to $t=4$, while the out-of-sample period runs from $t=5$ to $t=9$. $T=10$.

Let's see how it works with some fake data, and we'll assume $h=3$ in this case.

In [None]:
import string

horizon = 3
num = 10 + horizon

df = pd.DataFrame(
    np.array(
        [
            range(num),
            list(string.ascii_lowercase)[:num],
            list(string.ascii_lowercase)[num : 2 * num],
        ]
    ).T,
    columns=["Time", "y", "x"],
)
df[r"$y_{t+h}$"] = df["y"].shift(-horizon)
df

What we have here is a dataset with a time index and y and x values. $y_{t+h}$ has been added as a feature by shifting the $y$ variable back by $h=3$ steps. We're not going to be able to predict those NaN values, so the first thing to do is to drop those variables. Second, we'll want to add in which parts are in-sample and which are out-of-sample.

In [None]:
df = df.dropna()
df["sample"] = df["Time"].apply(lambda x: "IS" if int(x) < 5 else "OS")
df = df.set_index("Time")
df

Note how the sample can either be IS or OS-but not both.

Now, if we were doing a forecast, we'd select the IS part to do our initial model building by running something like

```python
model = ols.fit(X=df.loc[df["sample"]=="IS","x"],
                y=df.loc[df["sample"]=="IS",r"$y_{t+h}$"])
```

This would then be used to predict the out of sample values:

```python
y_t_plus_h_os = model.predict(X=df.loc[df["sample"]=="OS","x"])
```

To know how good the estimate was, we'd then compare it with the true value. Typically, a value like the root mean square error is used for this, given by

$$
{\displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{t=0}^{T-1}({\hat {y}}_{t+h}-y_{t+h})^{2}}{T}}}.}
$$

where $\hat{y}_{t+h}$ is the predicted value, equivalent to the object `y_t_plus_h_os` above.

### An Example Two Period Forecast

Let's see a real example of this in action.

## Expanding Window

In an expanding window forecast exercise, multiple forecasts are made. In each, the in-sample period grows over time. Necessarily (given finite data), the out-of-sample period shrinks over time. The overall out-of-sample forecast is given by the unique union of the out-of-sample forecasts that are trained on the most information. To make this clear, let's first define it mathematically and then with a diagram.

We index the different forecasts (synoymous with different models $f_\mu$) by $\mu$. The starting size of the window will be $s + \alpha$ where $s$ is the step size and $\alpha$ is a parameter that adjusts window size. For arbitrary $z_t$, the $\mu$th in-sample slice of time series data is: 
$$
I_{\mu}(\vec{z}) = \left\{z_t  \right\}_{t=0}^{t=\mu\cdot s + \alpha -1}
$$
Applied to the features, this is the slice that will be used to train (aka fit) model $f_\mu$.

Likewise, the out-of-sample set is:
$$
O_{\mu}(\vec{z}) = \left\{z_t \right\}^{t=T-1}_{t=\mu\cdot s + \alpha}
$$

We can visualise this like so:

In [None]:
# TODO: hide-input tag
from plotnine import ggtitle

s = 1
alpha = 2
T = 10
max_mu = int((T - 1 + 1 - alpha) / s)

(
    ggplot(aes("t", r"$\mu$", "sample"))
    + [
        geom_tile(
            in_sample_block_e(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + [
        geom_tile(
            os_sample_block_e(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + scale_x_continuous(breaks=range(T + 1))
    + scale_y_continuous(breaks=range(1, max_mu))
    + ggtitle("Expanding Window")
)

## Rolling Window



In [None]:
# TODO: hide-input tag
from plotnine import ggtitle

s = 1
alpha = 2
window_size = s + alpha
T = 10
max_mu = int((T - 1 + 1 - alpha) / s)

(
    ggplot(aes("t", r"$\mu$", "sample"))
    + [
        geom_tile(
            in_sample_block_r(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + [
        geom_tile(
            os_sample_block_r(mu, s, alpha, T),
            aes(width=0.95, height=0.95, fill="sample"),
        )
        for mu in range(1, max_mu)
    ]
    + scale_x_continuous(breaks=range(T + 1))
    + scale_y_continuous(breaks=range(1, max_mu))
    + ggtitle("Rolling Window")
)

Unions

In [None]:
script_I = pd.concat(
    [in_sample_block_r(mu, s, alpha, T) for mu in range(1, max_mu)], axis=0
)
need to track which get used in final script_I visually too
# def script_is_e(s, alpha, df):
script_I[~script_I["t"].duplicated(keep="first")]