# HierE2E Baseline

This notebook runs and evaluates HierE2E's baseline method predictions.

- It reads a preprocessed hierarchical dataset.
- It fits HierE2E's optimal reported configuration.
- It evaluates HierE2E forecasts' sCRPS and MSSE.

## References
- [GluonTS, DeepVARHierarchicalEstimator](https://ts.gluon.ai/stable/api/gluonts/gluonts.mx.model.deepvar_hierarchical.html?highlight=deepvarhierarchicalestimator#gluonts.mx.model.deepvar_hierarchical.DeepVARHierarchicalEstimator)
- [Syama Sundar Rangapuram, Lucien D Werner, Konstantinos Benidis, Pedro Mercado, Jan Gasthaus, Tim Januschowski. (2021). End-to-End Learning of Coherent Probabilistic Forecasts for Hierarchical Time Series. Proceedings of the 38th International Conference on Machine Learning (ICML).](https://proceedings.mlr.press/v139/rangapuram21a.html)


<br>
You can run these experiments using GPU with Google Colab.

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/experiments/hierarchical_baselines/nbs/run_hiere2e.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


In [2]:
%%capture
!pip install mxnet-cu112

In [3]:
import mxnet as mx

assert mx.context.num_gpus()>0

In [4]:
%%capture
!pip install "gluonts[mxnet,pro]"
!pip install git+https://github.com/Nixtla/hierarchicalforecast.git
!pip install git+https://github.com/Nixtla/datasetsforecast.git

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from gluonts.mx.trainer import Trainer
from gluonts.dataset.hierarchical import HierarchicalTimeSeries
from gluonts.mx.model.deepvar_hierarchical import DeepVARHierarchicalEstimator

from hierarchicalforecast.evaluation import scaled_crps, rel_mse, msse
from datasetsforecast.hierarchical import HierarchicalInfo, HierarchicalData

## Auxiliary Functions

In [6]:
class HierarchicalDataset(object):
    # Class with loading, processing and
    # prediction evaluation methods for hierarchical data

    available_datasets = ['Labour','Traffic',
                          'TourismSmall','TourismLarge','Wiki2',
                          'OldTraffic', 'OldTourismLarge']

    @staticmethod
    def _get_hierarchical_scrps(hier_idxs, Y, Yq_hat, q_to_pred):
        # We use the indexes obtained from the aggregation tags
        # to compute scaled CRPS across the hierarchy levels
        scrps_list = []
        for idxs in hier_idxs:
            y      = Y[idxs, :]
            yq_hat = Yq_hat[idxs, :, :]
            level_scrps  = scaled_crps(y, yq_hat, q_to_pred)
            scrps_list.append(level_scrps)
        return scrps_list

    @staticmethod
    def _get_hierarchical_msse(hier_idxs, Y, Y_hat, Y_train):
        # We use the indexes obtained from the aggregation tags
        # to compute MS scaled Error across the hierarchy levels
        msse_list = []
        for idxs in hier_idxs:
            y       = Y[idxs, :]
            y_hat   = Y_hat[idxs, :]
            y_train = Y_train[idxs, :]
            level_msse = msse(y, y_hat, y_train)
            msse_list.append(level_msse)
        return msse_list

    @staticmethod
    def _get_hierarchical_rel_mse(hier_idxs, Y, Y_hat, Y_train):
        # We use the indexes obtained from the aggregation tags
        # to compute relative MSE across the hierarchy levels
        rel_mse_list = []
        for idxs in hier_idxs:
            y       = Y[idxs, :]
            y_hat   = Y_hat[idxs, :]
            y_train = Y_train[idxs, :]
            level_rel_mse = rel_mse(y, y_hat, y_train)
            rel_mse_list.append(level_rel_mse)
        return rel_mse_list

    @staticmethod
    def _sort_hier_df(Y_df, S_df):
        # NeuralForecast core, sorts unique_id lexicographically
        # deviating from S_df, this class matches S_df and Y_hat_df order.
        Y_df.unique_id = Y_df.unique_id.astype('category')
        Y_df.unique_id = Y_df.unique_id.cat.set_categories(S_df.index)
        Y_df = Y_df.sort_values(by=['unique_id', 'ds'])
        return Y_df

    @staticmethod
    def _nonzero_indexes_by_row(M):
        return [np.nonzero(M[row,:])[0] for row in range(len(M))]

    @staticmethod
    def load_process_data(dataset, directory='./data'):
        # Load data
        data_info = HierarchicalInfo[dataset]
        Y_df, S_df, tags = HierarchicalData.load(directory=directory,
                                                 group=dataset)

        # Parse and augment data
        Y_df['ds'] = pd.to_datetime(Y_df['ds'])
        Y_df = HierarchicalDataset._sort_hier_df(Y_df=Y_df, S_df=S_df)

        # Obtain indexes for plots and evaluation
        hier_levels = ['Overall'] + list(tags.keys())
        hier_idxs = [np.arange(len(S_df))] +\
            [S_df.index.get_indexer(tags[level]) for level in list(tags.keys())]
        hier_linked_idxs = HierarchicalDataset._nonzero_indexes_by_row(S_df.values.T)

        # Final output
        data = dict(Y_df=Y_df, S_df=S_df, tags=tags,
                    # Hierarchical idxs
                    hier_idxs=hier_idxs,
                    hier_levels=hier_levels,
                    hier_linked_idxs=hier_linked_idxs,
                    # Dataset Properties
                    horizon=data_info.papers_horizon,
                    freq=data_info.freq,
                    seasonality=data_info.seasonality)
        return data

## Fit/Predict HierE2E

In [7]:
def run_hiere2e(config, data):
    #------------------------- Declare DataLoaders ----------------------------#
    # Parse data and parameters
    S = data['S_df'].values
    bottom_cols = data['S_df'].columns
    prediction_length = data['horizon']

    Y_bottom_df = data['Y_df'].pivot(index='ds', columns='unique_id',values='y')
    Y_bottom_df = Y_bottom_df.loc[:, bottom_cols].to_period()

    hts_train = HierarchicalTimeSeries(
        ts_at_bottom_level=Y_bottom_df.iloc[:-prediction_length, :],
        S=S)
    hts_test = HierarchicalTimeSeries(
        ts_at_bottom_level=Y_bottom_df.iloc[-prediction_length:, :],
        S=S,
    )

    #-------------------------- Fit/Predict HierE2E ---------------------------#
    dataset_train = hts_train.to_dataset()

    estimator = DeepVARHierarchicalEstimator(
        freq=hts_train.freq, # Set TourismSmall freq='M', 'Q' Freq fails
        prediction_length=prediction_length,
        target_dim=hts_train.num_ts,
        S=S,
        trainer=Trainer(ctx = mx.context.gpu(),
                        epochs=config['epochs'],
                        num_batches_per_epoch=config['num_batches_per_epoch'],
                        hybridize=config['hybridize'],
                        learning_rate=config['learning_rate']),
        scaling=config['scaling'],
        pick_incomplete=config['pick_incomplete'],
        batch_size=config['batch_size'],
        num_parallel_samples=config['num_parallel_samples'],
        context_length=config['context_length'],
        num_layers=config['num_layers'],
        num_cells=config['num_cells'],
        coherent_train_samples=config['coherent_train_samples'],
        coherent_pred_samples=config['coherent_pred_samples'],
        likelihood_weight=config['likelihood_weight'],
        CRPS_weight=config['CRPS_weight'],
        num_samples_for_loss=config['num_samples_for_loss'],
        sample_LH=config['sample_LH'],
        seq_axis=config['seq_axis'],
        warmstart_epoch_frac = config['warmstart_epoch_frac'],
    )

    predictor = estimator.train(dataset_train)
    forecast_it = predictor.predict(dataset_train)

    samples = next(forecast_it).samples
    Y_hat  = np.transpose(np.mean(samples, axis=0), (1,0))
    Yq_hat = np.quantile(samples, q=QUANTILES, axis=0)
    Yq_hat = np.transpose(Yq_hat, (2,1,0))

    Y_test = hts_test.ts_at_all_levels.values # [Q,T,n_series]->[n_series,T,Q]
    Y_test = np.transpose(Y_test, (1,0))

    Y_train = hts_train.ts_at_all_levels.values # [Q,T,n_series]->[n_series,T,Q]
    Y_train = np.transpose(Y_train, (1,0))

    return Yq_hat, Y_hat, Y_test, Y_train

In [8]:
# Optimal parameters reported from ICML 2021 code
configs = {
"Labour": {"epochs": 50, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": False, "batch_size": 32, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 24, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 0.0, "CRPS_weight": 1.0, "num_samples_for_loss": 200, "sample_LH": False, "rec_weight": 0.0, "seq_axis": [1], "warmstart_epoch_frac": 0.1},
"Traffic": {"epochs": 50, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": False, "batch_size": 32, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 40, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 1.0, "CRPS_weight": 0.0, "num_samples_for_loss": 50, "sample_LH": True, "seq_axis": [1], "warmstart_epoch_frac": 0.1},
"OldTraffic": {"epochs": 50, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": False, "batch_size": 32, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 40, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 1.0, "CRPS_weight": 0.0, "num_samples_for_loss": 50, "sample_LH": True, "seq_axis": [1], "warmstart_epoch_frac": 0.1},
"TourismSmall": {"epochs": 10, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": True, "batch_size": 32, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 24, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 1.0, "CRPS_weight": 0.0, "num_samples_for_loss": 50, "sample_LH": True, "seq_axis": [], "warmstart_epoch_frac": 0.0},
"TourismLarge": {"epochs": 40, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": False, "batch_size": 4, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 36, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 1.0, "CRPS_weight": 0.0, "num_samples_for_loss": 50, "sample_LH": True, "seq_axis": [1], "warmstart_epoch_frac": 0.0},
"OldTourismLarge": {"epochs": 40, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": False, "batch_size": 4, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 36, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 1.0, "CRPS_weight": 0.0, "num_samples_for_loss": 50, "sample_LH": True, "seq_axis": [1], "warmstart_epoch_frac": 0.0},
"Wiki2": {"epochs": 50, "num_batches_per_epoch": 50, "scaling": True, "pick_incomplete": False, "batch_size": 32, "num_parallel_samples": 200, "hybridize": False, "learning_rate": 0.001, "context_length": 15, "rank": 0, "assert_reconciliation": False, "num_deep_models": 1, "num_layers": 2, "num_cells": 40, "coherent_train_samples": True, "coherent_pred_samples": True, "likelihood_weight": 0.0, "CRPS_weight": 1.0, "num_samples_for_loss": 100, "sample_LH": False, "rec_weight": 0.0, "seq_axis": [1], "warmstart_epoch_frac": 0.1}}

In [9]:
DATASET = 'OldTourismLarge'
LEVEL = np.arange(0, 100, 2)
qs = [[50-lv/2, 50+lv/2] for lv in LEVEL]
QUANTILES = np.sort(np.concatenate(qs)/100)

config = configs[DATASET]
data = HierarchicalDataset.load_process_data(dataset=DATASET)

Yq_hat, Y_hat, Y_test, Y_train = run_hiere2e(config, data)

100%|██████████| 1.30M/1.30M [00:00<00:00, 7.25MiB/s]
100%|██████████| 335k/335k [00:00<00:00, 7.60MiB/s]
100%|██████████| 968k/968k [00:00<00:00, 14.0MiB/s]
100%|██████████| 50/50 [00:14<00:00,  3.42it/s, epoch=1/40, avg_epoch_loss=3.38e+3]
100%|██████████| 50/50 [00:21<00:00,  2.36it/s, epoch=2/40, avg_epoch_loss=3.11e+3]
100%|██████████| 50/50 [00:15<00:00,  3.19it/s, epoch=3/40, avg_epoch_loss=3.03e+3]
100%|██████████| 50/50 [00:19<00:00,  2.53it/s, epoch=4/40, avg_epoch_loss=2.98e+3]
100%|██████████| 50/50 [00:14<00:00,  3.44it/s, epoch=5/40, avg_epoch_loss=2.95e+3]
100%|██████████| 50/50 [00:14<00:00,  3.36it/s, epoch=6/40, avg_epoch_loss=2.93e+3]
100%|██████████| 50/50 [00:20<00:00,  2.46it/s, epoch=7/40, avg_epoch_loss=2.92e+3]
100%|██████████| 50/50 [00:15<00:00,  3.26it/s, epoch=8/40, avg_epoch_loss=2.9e+3] 
100%|██████████| 50/50 [00:16<00:00,  3.08it/s, epoch=9/40, avg_epoch_loss=2.89e+3]
100%|██████████| 50/50 [00:15<00:00,  3.33it/s, epoch=10/40, avg_epoch_loss=2.88e+3]
1

## Evaluate HierE2E

To evaluate we use the following metrics:

A scaled variation of the CRPS, as proposed by Rangapuram (2021), to measure the accuracy of predicted quantiles `y_hat` compared to the observation `y`.

$$ \mathrm{sCRPS}(\hat{F}_{\tau}, \mathbf{y}_{\tau}) = \frac{2}{N} \sum_{i}
\int^{1}_{0}
\frac{\mathrm{QL}(\hat{F}_{i,\tau}, y_{i,\tau})_{q}}{\sum_{i} | y_{i,\tau} |} dq $$


Relative mean squared error (RelMSE), as proposed by Hyndman & Koehler (2006) and used in Olivares (2023).

$$ \mathrm{RelMSE}(\mathbf{y}, \mathbf{\hat{y}}, \mathbf{\hat{y}}^{naive1}) =
\frac{\mathrm{MSE}(\mathbf{y}, \mathbf{\hat{y}})}{\mathrm{MSE}(\mathbf{y}, \mathbf{\hat{y}}^{naive1})} $$

Mean squared scaled error (MSSE), as proposed by Hyndman & Koehler (2006).

$$ \mathrm{MSSE}(\mathbf{y}, \mathbf{\hat{y}}, \mathbf{y}^{in-sample}) =
\frac{\frac{1}{h} \sum^{t+h}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^2}{\frac{1}{t-1} \sum^{t}_{\tau=2} (y_{\tau} - y_{\tau-1})^2}
$$

In [10]:
# Final sCRPS and MSSE evaluation
_scrps = HierarchicalDataset._get_hierarchical_scrps(Y=Y_test, Yq_hat=Yq_hat,
                                            hier_idxs=data['hier_idxs'],
                                            q_to_pred=QUANTILES)

_msse = HierarchicalDataset._get_hierarchical_msse(Y=Y_test, Y_hat=Y_hat,
                                           Y_train=Y_train,
                                           hier_idxs=data['hier_idxs'])

_rel_mse = HierarchicalDataset._get_hierarchical_rel_mse(Y=Y_test, Y_hat=Y_hat,
                                           Y_train=Y_train,
                                           hier_idxs=data['hier_idxs'])


results_df = pd.DataFrame(dict(level=['Overall']+list(data['tags'].keys())))
results_df['scrps'] = _scrps
results_df['rel_mse'] = _rel_mse
results_df['msse'] = _msse
results_df

Unnamed: 0,level,scrps,rel_mse,msse
0,Overall,0.147381,0.232104,0.110713
1,Country,0.07543,0.27249,0.103729
2,Country/State,0.101628,0.287956,0.130836
3,Country/State/Zone,0.131646,0.364769,0.191554
4,Country/State/Zone/Region,0.16939,0.374835,0.242274
5,Country/Purpose,0.099023,0.138338,0.075082
6,Country/State/Purpose,0.13742,0.174976,0.105321
7,Country/State/Zone/Purpose,0.200288,0.291827,0.193209
8,Country/State/Zone/Region/Purpose,0.26422,0.343603,0.254339
