# Temporal Aggregation with THIEF

> Temporal Hierarchical Forecasting on M3 monthly and quarterly data with THIEF

In this notebook we present an example on how to use `HierarchicalForecast` to produce coherent forecasts between temporal levels. We will use the monthly and quarterly timeseries of the `M3` dataset. We will first load the `M3` data and produce base forecasts using an `AutoETS` model from `StatsForecast`. Then, we reconcile the forecasts with `THIEF` (Temporal HIerarchical Forecasting) from `HierarchicalForecast` according to a specified temporal hierarchy.  

### References
[Athanasopoulos, G, Hyndman, Rob J., Kourentzes, N., Petropoulos, Fotios (2017). Forecasting with temporal hierarchies. European Journal of Operational Research, 262, 60-74](https://www.sciencedirect.com/science/article/pii/S0377221717301911)

You can run these experiments using CPU or GPU with Google Colab.

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/M3withThief.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install hierarchicalforecast statsforecast datasetsforecast

## 1. Load and Process Data

In this example we will use the [Tourism](https://otexts.com/fpp3/tourism.html) dataset from the [Forecasting: Principles and Practice](https://otexts.com/fpp3/) book.

The dataset only contains the time series at the lowest level, so we need to create the time series for all hierarchies.

In [None]:
import numpy as np
import pandas as pd

In [None]:
from datasetsforecast.m3 import M3

In [None]:
m3_monthly, _, _ = M3.load(directory='data', group='Monthly')
m3_quarterly, _, _ = M3.load(directory='data', group='Quarterly')

  freq = pd.tseries.frequencies.to_offset(class_group.freq)
  freq = pd.tseries.frequencies.to_offset(class_group.freq)


We will be making aggregations up to yearly levels, so for both monthly and quarterly data we make sure each time series has an integer multiple of bottom-level timesteps. 

For example, the first time series in m3_monthly (with `unique_id='M1'`) has 68 timesteps. This is not a multiple of 12 (12 months in one year), so we would not be able to aggregate all timesteps into full years. Hence, we truncate (remove) the first 8 timesteps, resulting in 60 timesteps for this series. We do something similar for the quarterly data, albeit with a multiple of 4 (4 quarters in one year).

Depending on the highest temporal aggregation in your reconciliation problem, you may have to truncate your data differently.

In [None]:
m3_monthly = m3_monthly.groupby("unique_id", group_keys=False)\
                       .apply(lambda x: x.tail(len(x) //  12 * 12))\
                       .reset_index(drop=True)

m3_quarterly = m3_quarterly.groupby("unique_id", group_keys=False)\
                           .apply(lambda x: x.tail(len(x) //  4 * 4))\
                           .reset_index(drop=True)

  .apply(lambda x: x.tail(len(x) //  12 * 12))\
  .apply(lambda x: x.tail(len(x) //  4 * 4))\


## 2. Temporal reconciliation

### 2a. Split Train/Test sets

We use as test samples the last 24 observations from the Monthly series and the last 8 observations of each quarterly series. 

Note again that for both problems, we choose a multiple of the highest seasonality that we aim for in our temporal aggregation, which in this case is the yearly aggregation. 

For example, 24 observations for the Monthly series constitutes 2 years, and 8 observations for the quarterly data also constitutes two years.

In [None]:
horizon_monthly = 24
horizon_quarterly = 8

In [None]:
m3_monthly_test = m3_monthly.groupby("unique_id", as_index=False).tail(horizon_monthly)
m3_monthly_train = m3_monthly.drop(m3_monthly_test.index)

m3_quarterly_test = m3_quarterly.groupby("unique_id", as_index=False).tail(horizon_quarterly)
m3_quarterly_train = m3_quarterly.drop(m3_quarterly_test.index)

### 2a. Aggregating the dataset according to temporal hierarchy

We first define the temporal aggregation spec. You can use string aliases of timestamp attributes to compute temporal aggregations or a callable. For Pandas, see an overview of allowable attributes [here](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html). 

In [None]:
spec_temporal_monthly = {"yearly": 12, "semiannually": 6, "quarterly": 3, "bimonthly": 2, "monthly": 1}
spec_temporal_quarterly = {"yearly": 4, "semiannually": 2, "quarterly": 1}

We next compute the temporally aggregated train- and test sets using the `aggregate_temporal` function. Note that we have different aggregation matrices `S` for the train- and test set, as the test set contains temporal hierarchies that are not included in the train set.

In [None]:
from hierarchicalforecast.utils import aggregate_temporal

In [None]:
# Monthly
Y_monthly_train, S_monthly_train, tags_monthly_train = aggregate_temporal(df=m3_monthly_train, spec=spec_temporal_monthly)
Y_monthly_test, S_monthly_test, tags_monthly_test = aggregate_temporal(df=m3_monthly_test, spec=spec_temporal_monthly)

# Quarterly
Y_quarterly_train, S_quarterly_train, tags_quarterly_train = aggregate_temporal(df=m3_quarterly_train, spec=spec_temporal_quarterly)
Y_quarterly_test, S_quarterly_test, tags_quarterly_test = aggregate_temporal(df=m3_quarterly_test,  spec=spec_temporal_quarterly)


Our aggregation matrices aggregate the lowest temporal granularity (quarters) up to years, for the train- and test set.

In [None]:
Y_monthly_train.query("unique_id == 'M1'")

Unnamed: 0,temporal_id,unique_id,ds,y
0,yearly-1,M1,1991-08-31,42960.0
1,yearly-2,M1,1992-08-31,56640.0
2,yearly-3,M1,1993-08-31,36120.0
10631,semiannually-1,M1,1991-02-28,21480.0
10632,semiannually-2,M1,1991-08-31,21480.0
...,...,...,...,...
138234,yearly-3/semiannually-6/quarterly-11/bimonthly...,M1,1993-04-30,3840.0
138235,yearly-3/semiannually-6/quarterly-11/bimonthly...,M1,1993-05-31,960.0
138236,yearly-3/semiannually-6/quarterly-12/bimonthly...,M1,1993-06-30,2280.0
138237,yearly-3/semiannually-6/quarterly-12/bimonthly...,M1,1993-07-31,1320.0


In [None]:
Y_monthly_test.query("unique_id == 'M1'")

Unnamed: 0,temporal_id,unique_id,ds,y
0,yearly-1,M1,1994-08-31,34920.0
1,yearly-2,M1,1995-08-31,23040.0
2856,semiannually-1,M1,1994-02-28,21840.0
2857,semiannually-2,M1,1994-08-31,13080.0
2858,semiannually-3,M1,1995-02-28,11520.0
2859,semiannually-4,M1,1995-08-31,11520.0
8568,quarterly-1,M1,1993-11-30,10920.0
8569,quarterly-2,M1,1994-02-28,10920.0
8570,quarterly-3,M1,1994-05-31,7800.0
8571,quarterly-4,M1,1994-08-31,5280.0


In [None]:
S_monthly_train.iloc[:5, :5]

Unnamed: 0,temporal_id,yearly-1/semiannually-1/quarterly-1/bimonthly-1/monthly-1,yearly-1/semiannually-1/quarterly-1/bimonthly-1/monthly-2,yearly-1/semiannually-1/quarterly-1/bimonthly-2/monthly-3,yearly-1/semiannually-1/quarterly-2/bimonthly-2/monthly-4
0,yearly-1,1.0,1.0,1.0,1.0
1,yearly-2,0.0,0.0,0.0,0.0
2,yearly-3,0.0,0.0,0.0,0.0
3,yearly-4,0.0,0.0,0.0,0.0
4,yearly-5,0.0,0.0,0.0,0.0


In [None]:
S_monthly_test.iloc[:5, :5]

Unnamed: 0,temporal_id,yearly-1/semiannually-1/quarterly-1/bimonthly-1/monthly-1,yearly-1/semiannually-1/quarterly-1/bimonthly-1/monthly-2,yearly-1/semiannually-1/quarterly-1/bimonthly-2/monthly-3,yearly-1/semiannually-1/quarterly-2/bimonthly-2/monthly-4
0,yearly-1,1.0,1.0,1.0,1.0
1,yearly-2,0.0,0.0,0.0,0.0
2,semiannually-1,1.0,1.0,1.0,1.0
3,semiannually-2,0.0,0.0,0.0,0.0
4,semiannually-3,0.0,0.0,0.0,0.0


### 2b. Computing base forecasts

Now, we need to compute base forecasts for each temporal aggregation. The following cell computes the **base forecasts** for each temporal aggregation in `Y_monthly_train` and `Y_quarterly_train` using the `AutoARIMA` model. Observe that `Y_hats` contains the forecasts but they are not coherent.

Note also that both frequency and horizon are different for each temporal aggregation. For the monthly data, the lowest level has a monthly frequency, and a horizon of `18` (constituting `1.5` years). However, as example, the `year` aggregation has a yearly frequency with a horizon of `2`.

It is of course possible to choose a different model for each level in the temporal aggregation - you can be as creative as you like!

In [None]:
from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast

In [None]:
Y_hats = []
id_cols = ["unique_id", "temporal_id", "ds", "y"]

# We loop over the monthly and quarterly data
for tags_train, tags_test, Y_train, Y_test in zip([tags_monthly_train, tags_quarterly_train], 
                                                  [tags_monthly_test, tags_quarterly_test],
                                                  [Y_monthly_train, Y_quarterly_train], 
                                                  [Y_monthly_test, Y_quarterly_test]):
    # We will train a model for each temporal level
    Y_hats_tags = []
    for level, temporal_ids_train in tags_train.items():
        # Filter the data for the level
        Y_level_train = Y_train.query("temporal_id in @temporal_ids_train")
        temporal_ids_test = tags_test[level]
        Y_level_test = Y_test.query("temporal_id in @temporal_ids_test")
        # For each temporal level we have a different frequency and forecast horizon. We use the timestamps of the first timeseries to automatically derive the frequency & horizon of the temporally aggregated series.
        unique_id = Y_level_train["unique_id"].iloc[0]
        freq_level = pd.infer_freq(Y_level_train.query("unique_id == @unique_id")["ds"])
        if freq_level.find("-") != -1:
            freq_level = freq_level.split("-")[0]
        horizon_level = Y_level_test.query("unique_id == @unique_id")["ds"].nunique()
        # Train a model and create forecasts
        fcst = StatsForecast(models=[AutoARIMA()], freq=freq_level, n_jobs=-1)
        Y_hat_level = fcst.forecast(df=Y_level_train[["ds", "unique_id", "y"]], h=horizon_level)
        assert Y_hat_level.isnull().sum().sum() == 0
        # Add the test set to the forecast
        Y_hat_level = pd.concat([Y_level_test.reset_index(drop=True), Y_hat_level.drop(columns=["unique_id", "ds"])], axis=1)
        # Put cols in the right order (for readability)
        Y_hat_cols = id_cols + [col for col in Y_hat_level.columns if col not in id_cols]
        Y_hat_level = Y_hat_level[Y_hat_cols]
        assert Y_hat_level.isnull().sum().sum() == 0
        # Append the forecast to the list
        Y_hats_tags.append(Y_hat_level)

    Y_hat_tag = pd.concat(Y_hats_tags, ignore_index=True)
    Y_hats.append(Y_hat_tag)

### 2c. Reconcile forecasts

We can use the `HierarchicalReconciliation` class to reconcile the forecasts. In this example we use `BottomUp` and `MinTrace(wls_struct)`. The latter is the 'structural scaling' method introduced in [Forecasting with temporal hierarchies
](https://robjhyndman.com/publications/temporal-hierarchies/). 

Note that we have to set `temporal=True` in the `reconcile` function.

In [None]:
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.core import HierarchicalReconciliation

In [None]:
reconcilers = [
    BottomUp(),
    MinTrace(method="wls_struct"),
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_recs = []
for Y_hat, S, tags in zip(Y_hats, 
                          [S_monthly_test, S_quarterly_test], 
                          [tags_monthly_test, tags_quarterly_test]):
    Y_rec = hrec.reconcile(Y_hat_df=Y_hat, S=S, tags=tags, temporal=True)
    Y_recs.append(Y_rec)

## 3. Evaluation 

The `HierarchicalForecast` package includes the `evaluate` function to evaluate the different hierarchies.

We evaluate the temporally aggregated forecasts _across all temporal aggregations_. Note that we set the `tags_te` attribute in the `evaluate` function, to enable the `evaluate` function to evaluate in the temporal dimension, rather than the cross-sectional dimension.

In [None]:
from hierarchicalforecast.evaluation import evaluate
from utilsforecast.losses import mae, scaled_crps

### 3a. Monthly 


In [None]:
Y_rec_monthly = Y_recs[0]
evaluation = evaluate(df = Y_rec_monthly.drop(columns = 'unique_id'),
                      tags = tags_monthly_test,
                      metrics = [mae],
                      id_col='temporal_id',
                      benchmark="AutoARIMA")

evaluation.columns = ['level', 'metric', 'Base', 'BottomUp', 'MinTrace(wls_struct)']
numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format).astype(np.float64)

evaluation

Unnamed: 0,level,metric,Base,BottomUp,MinTrace(wls_struct)
0,yearly,mae-scaled,1.0,0.78,0.75
1,semiannually,mae-scaled,1.0,0.99,0.95
2,quarterly,mae-scaled,1.0,0.95,0.93
3,bimonthly,mae-scaled,1.0,0.96,0.94
4,yearly/semiannually/quarterly/bimonthly/monthly,mae-scaled,1.0,1.0,0.99
5,Overall,mae-scaled,1.0,0.94,0.91


`MinTrace(ols)` is the best overall point method, scoring the lowest `mae` on both the `Quarter` aggregated forecasts as well as the `Year` aggregated forecasts. However, the `Base` method is better on the probabilistic measures, where it scores the lowest `scaled_crps`, indicating the levels predicted with the `Base` method are better in this example.

### 3b. Quarterly

In [None]:
Y_rec_quarterly = Y_recs[1]
evaluation = evaluate(df = Y_rec_quarterly.drop(columns = 'unique_id'),
                      tags = tags_quarterly_test,
                      metrics = [mae],
                      id_col='temporal_id',
                      benchmark="AutoARIMA")

evaluation.columns = ['level', 'metric', 'Base', 'BottomUp', 'MinTrace(wls_struct)']
numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format).astype(np.float64)

evaluation

Unnamed: 0,level,metric,Base,BottomUp,MinTrace(wls_struct)
0,yearly,mae-scaled,1.0,0.87,0.85
1,semiannually,mae-scaled,1.0,1.03,1.0
2,yearly/semiannually/quarterly,mae-scaled,1.0,1.0,0.97
3,Overall,mae-scaled,1.0,0.97,0.94
