# • R's Fable/HTS Replication1

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/AustralianDomesticTourism.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In many cases, only the time series at the lowest level of the hierarchies (bottom time series) are available. `HierarchicalForecast` has tools to create time series for all hierarchies. In this notebook we will see how to do it.

In [1]:

# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive
import pandas as pd

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut
from datasetsforecast.hierarchical import HierarchicalData
import numpy as np
from statsforecast.models import ETS


  from tqdm.autonotebook import tqdm


## Aggregate bottom time series

In this example we will use the [Tourism](https://otexts.com/fpp3/tourism.html) dataset from the [Forecasting: Principles and Practice](https://otexts.com/fpp3/) book. The dataset only contains the time series at the lowest level, so we need to create the time series for all hierarchies.

In [2]:
# Load TourismSmall dataset
Y_df, S, tags = HierarchicalData.load('./data', 'TourismSmall')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

In [3]:
Y_df

Unnamed: 0,unique_id,ds,y
0,total,1998-03-31,84503
1,total,1998-06-30,65312
2,total,1998-09-30,72753
3,total,1998-12-31,70880
4,total,1999-03-31,86893
...,...,...,...
3199,nt-oth-noncity,2005-12-31,59
3200,nt-oth-noncity,2006-03-31,25
3201,nt-oth-noncity,2006-06-30,52
3202,nt-oth-noncity,2006-09-30,72


In [4]:
S

Unnamed: 0,nsw-hol-city,nsw-hol-noncity,vic-hol-city,vic-hol-noncity,qld-hol-city,qld-hol-noncity,sa-hol-city,sa-hol-noncity,wa-hol-city,wa-hol-noncity,...,qld-oth-city,qld-oth-noncity,sa-oth-city,sa-oth-noncity,wa-oth-city,wa-oth-noncity,tas-oth-city,tas-oth-noncity,nt-oth-city,nt-oth-noncity
total,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
hol,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
vfr,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
bus,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
oth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
wa-oth-noncity,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
tas-oth-city,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
tas-oth-noncity,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
nt-oth-city,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [5]:
tags

{'Country': array(['total'], dtype=object),
 'Country/Purpose': array(['hol', 'vfr', 'bus', 'oth'], dtype=object),
 'Country/Purpose/State': array(['nsw-hol', 'vic-hol', 'qld-hol', 'sa-hol', 'wa-hol', 'tas-hol',
        'nt-hol', 'nsw-vfr', 'vic-vfr', 'qld-vfr', 'sa-vfr', 'wa-vfr',
        'tas-vfr', 'nt-vfr', 'nsw-bus', 'vic-bus', 'qld-bus', 'sa-bus',
        'wa-bus', 'tas-bus', 'nt-bus', 'nsw-oth', 'vic-oth', 'qld-oth',
        'sa-oth', 'wa-oth', 'tas-oth', 'nt-oth'], dtype=object),
 'Country/Purpose/State/CityNonCity': array(['nsw-hol-city', 'nsw-hol-noncity', 'vic-hol-city',
        'vic-hol-noncity', 'qld-hol-city', 'qld-hol-noncity',
        'sa-hol-city', 'sa-hol-noncity', 'wa-hol-city', 'wa-hol-noncity',
        'tas-hol-city', 'tas-hol-noncity', 'nt-hol-city', 'nt-hol-noncity',
        'nsw-vfr-city', 'nsw-vfr-noncity', 'vic-vfr-city',
        'vic-vfr-noncity', 'qld-vfr-city', 'qld-vfr-noncity',
        'sa-vfr-city', 'sa-vfr-noncity', 'wa-vfr-city', 'wa-vfr-noncity',
     

### Split Train/Test sets

We use the final horizon as test set.

In [6]:
HORIZON = 8
FREQUENCY = "1Q"

In [7]:
Y_test_df = Y_df.groupby('unique_id').tail(HORIZON)
Y_train_df = Y_df.drop(Y_test_df.index)

In [8]:
Y_test_df = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')

In [9]:
Y_train_df.groupby('unique_id').size()

unique_id
bus                28
hol                28
nsw-bus            28
nsw-bus-city       28
nsw-bus-noncity    28
                   ..
wa-oth-city        28
wa-oth-noncity     28
wa-vfr             28
wa-vfr-city        28
wa-vfr-noncity     28
Length: 89, dtype: int64

In [10]:
Y_train_df

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
total,1998-03-31,84503
total,1998-06-30,65312
total,1998-09-30,72753
total,1998-12-31,70880
total,1999-03-31,86893
...,...,...
nt-oth-noncity,2003-12-31,132
nt-oth-noncity,2004-03-31,12
nt-oth-noncity,2004-06-30,40
nt-oth-noncity,2004-09-30,186


In [11]:
list(tags["Country"])
Y_train_df.loc[Y_train_df.index == "total"].sort_values(by="ds")

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
total,1998-03-31,84503
total,1998-06-30,65312
total,1998-09-30,72753
total,1998-12-31,70880
total,1999-03-31,86893
total,1999-06-30,66866
total,1999-09-30,72182
total,1999-12-31,68318
total,2000-03-31,85651
total,2000-06-30,64467


## Computing base forecasts

The following cell computes the **base forecasts** for each time series in `Y_df` using the `auto_arima` and `naive` models. Observe that `Y_hat_df` contains the forecasts but they are not coherent.

In [12]:
# from mlforecast import MLForecast
# from window_ops.expanding import expanding_mean
# from window_ops.rolling import rolling_mean, rolling_std, seasonal_rolling_mean, seasonal_rolling_std
# import lightgbm as lgb
# from numba import njit

# CONTEXT_LEN = 2*HORIZON

# models = [
#     lgb.LGBMRegressor()
# ]

# @njit
# def rolling_mean_custom(x):
#     return rolling_mean(x, window_size=CONTEXT_LEN//2)

# @njit
# def rolling_std_custom(x):
#     return rolling_std(x, window_size=CONTEXT_LEN//2)

# @njit
# def seasonal_rolling_mean_custom(x):
#     return seasonal_rolling_mean(x, season_length=1, window_size=CONTEXT_LEN//2)

# rolling_feats = {1:  [expanding_mean]}
# for lag in range(1, CONTEXT_LEN//2):
#     rolling_feats[lag] = [rolling_mean_custom, rolling_std_custom, seasonal_rolling_mean_custom]

# num_levels = len(tags)
# all_models = {}
# Y_hat_df = None
# for level, tag in enumerate(tags.keys()):
#     fcst = MLForecast(
#         models=models,
#         freq=FREQUENCY,
#         lags=[_ for _ in range(1, CONTEXT_LEN//2)],
#         lag_transforms=rolling_feats,
#         date_features=['year', 'month', 'day', 'dayofweek', 'quarter', 'week'],
#         differences=[1],
#     )
#     Y_train_df_level = None
#     for _, id_ in enumerate(tags[tag]):
#         Y_train_df_level_ = Y_train_df.loc[Y_train_df.index == tags[tag][_]]
#         if Y_train_df_level is None:
#             Y_train_df_level = Y_train_df_level_
#         else:
#             Y_train_df_level = pd.concat([Y_train_df_level, Y_train_df_level_], axis=0)
#     Y_train_df_level = Y_train_df_level.sort_values(by=["ds"])
#     # display(Y_train_df_level)
#     print(f"Fitting for level = {tag}")
#     fcst.fit(Y_train_df_level, id_col='index', time_col='ds', target_col='y')

#     print(f"Forecasting for level = {tag}")
#     # predictions
#     Y_hat_level = fcst.predict(horizon=HORIZON)
#     if Y_hat_df is None:
#         Y_hat_df = Y_hat_level
#     else:
#         Y_hat_df = pd.concat([Y_hat_df, Y_hat_level], axis=0)


In [13]:
from sklearn.preprocessing import StandardScaler

# Normalize df
class TSStandardScaler:
    def __init__(self) -> None:
        self.scaler = StandardScaler()
        self.ids = []
        self.ts = None

    def _pd2np(self, df):
        X = []
        for k in df.index.unique():
            self.ids.append(k)
            vals_ = df[df.index == k].sort_values(by="ds")["y"]
            ts_ = df[df.index == k].sort_values(by="ds")["ds"]
            X.append(vals_)
        self.ts = ts_
        X = np.array(X)
        return np.transpose(X)

    def _np2pd(self, X):
        df = {"unique_id": [], "ds": [], "y": []}
        for i in range(X.shape[0]):
            vals_ = X[i,:]
            ids_ = [self.ids[i]] * X.shape[1]
            ts_ = self.ts.values
            # print(i, vals_, ids_, ts_)
            df["unique_id"].extend(ids_)
            df["ds"].extend(ts_)
            df["y"].extend(vals_)
        df = pd.DataFrame(df).set_index("unique_id")
        return df

    def fit(self, df):
        self.scaler.fit(self._pd2np(df))
    
    def transform(self, df):
        Y = self.scaler.transform(self._pd2np(df))
        return self._np2pd(np.transpose(Y))

    def inverse_transform(self, df):
        Y = self.scaler.inverse_transform(self._pd2np(df))
        return self._np2pd(np.transpose(Y))
    

scaler = TSStandardScaler()
scaler.fit(Y_train_df)
Y_train_df = scaler.transform(Y_train_df)
Y_test_df = scaler.transform(Y_test_df)

In [14]:
Y_train_df

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
total,1998-03-31,1.566369
total,1998-06-30,-1.187098
total,1998-09-30,-0.119486
total,1998-12-31,-0.388218
total,1999-03-31,1.909279
...,...,...
nt-oth-noncity,2003-12-31,-0.073474
nt-oth-noncity,2004-03-31,-1.057025
nt-oth-noncity,2004-06-30,-0.827529
nt-oth-noncity,2004-09-30,0.369124


In [15]:
Y_test_df

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
total,2005-03-31,1.780007
total,2005-06-30,-2.001330
total,2005-09-30,-0.967004
total,2005-12-31,-1.462574
total,2006-03-31,1.298641
...,...,...
nt-oth-noncity,2005-12-31,-0.671801
nt-oth-noncity,2006-03-31,-0.950473
nt-oth-noncity,2006-06-30,-0.729174
nt-oth-noncity,2006-09-30,-0.565249


In [16]:
scaler.inverse_transform(Y_train_df)

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
total,1998-03-31,84503.0
total,1998-06-30,65312.0
total,1998-09-30,72753.0
total,1998-12-31,70880.0
total,1999-03-31,86893.0
...,...,...
nt-oth-noncity,2003-12-31,132.0
nt-oth-noncity,2004-03-31,12.0
nt-oth-noncity,2004-06-30,40.0
nt-oth-noncity,2004-09-30,186.0


In [17]:
from mlforecast import MLForecast
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean, rolling_std, seasonal_rolling_mean, seasonal_rolling_std
import lightgbm as lgb
from numba import njit

CONTEXT_LEN = 2*HORIZON

models = [
    lgb.LGBMRegressor()
]

@njit
def rolling_mean_custom(x):
    return rolling_mean(x, window_size=CONTEXT_LEN//2)

@njit
def rolling_std_custom(x):
    return rolling_std(x, window_size=CONTEXT_LEN//2)

@njit
def seasonal_rolling_mean_custom(x):
    return seasonal_rolling_mean(x, season_length=1, window_size=CONTEXT_LEN//2)

rolling_feats = {1:  [expanding_mean]}
for lag in range(1, CONTEXT_LEN//2):
    rolling_feats[lag] = [rolling_mean_custom, rolling_std_custom, seasonal_rolling_mean_custom]


fcst = MLForecast(
    models=models,
    freq=FREQUENCY,
    lags=[_ for _ in range(1, CONTEXT_LEN//2)],
    lag_transforms=rolling_feats,
    date_features=['year', 'month', 'day', 'dayofweek', 'quarter', 'week'],
    differences=[1],
)

fcst.fit(Y_train_df, id_col='index', time_col='ds', target_col='y')
Y_hat_df = fcst.predict(horizon=HORIZON)

In [18]:
# Y_hat_df inverse transform
Y_hat_df["y"] = Y_hat_df["LGBMRegressor"]
Y_hat_df = scaler.inverse_transform(Y_hat_df)
Y_hat_df = Y_hat_df.rename(columns={"y": "LGBMRegressor"})


In [19]:
# id_ = Y_hat_df.index.unique()[12]
# print(id_)
# Y_test_df[Y_test_df.index == id_]
# Y_test_df[Y_test_df.index == id_]["y"].plot(marker="o", label="true")
# Y_hat_df[Y_hat_df.index == id_]["LGBMRegressor"].plot(marker="^", label="predicted")

### Computing in-sample forecasts needed for MinT, ERM methods
Note that the model is already trained on the training part of the data. Now, the in-sample forecasts are obtained by a moving window method. These in-sample forecasts are needed to estimate the residual covariance matrix in MinT and ERM methods.

In [20]:
dates = Y_df.ds.unique()
dates.sort()
dates_train = Y_train_df.ds.unique()
dates_train.sort()
Y_hat_in_sample = None
for i in range(0, len(dates_train)-HORIZON-CONTEXT_LEN+1):
    # print(i, i+CONTEXT_LEN, i+CONTEXT_LEN+HORIZON)
    backtest_history = Y_train_df[(Y_train_df.ds >= dates[i]) & (Y_train_df.ds < dates[i+CONTEXT_LEN])]
    end_pt = i+CONTEXT_LEN+HORIZON
    if end_pt < len(dates_train):
        backtest_test_true = Y_train_df[(Y_train_df.ds >= dates[i+CONTEXT_LEN]) & (Y_train_df.ds < dates[end_pt])]
    else:
        backtest_test_true = Y_train_df[(Y_train_df.ds >= dates[i+CONTEXT_LEN])]
    
    fcst.fit(backtest_history, id_col='index', time_col='ds', target_col='y')
    Y_hat_in_sample_part = fcst.predict(horizon=HORIZON)
    
    test_dates = backtest_test_true["ds"].unique()
    test_dates.sort()
    first_horizon_date = backtest_test_true["ds"].unique()[0]
    if Y_hat_in_sample is None:
        Y_hat_in_sample = Y_hat_in_sample_part[Y_hat_in_sample_part["ds"] == first_horizon_date]
    else:
        if i == len(dates_train)-HORIZON-CONTEXT_LEN:
            Y_hat_in_sample = pd.concat([Y_hat_in_sample, Y_hat_in_sample_part])
        else:
            Y_hat_in_sample = pd.concat([Y_hat_in_sample, Y_hat_in_sample_part[Y_hat_in_sample_part["ds"] == first_horizon_date]])

In [21]:
Y_hat_in_sample

Unnamed: 0_level_0,ds,LGBMRegressor
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
bus,2002-03-31,0.415679
hol,2002-03-31,0.275740
nsw-bus,2002-03-31,1.617192
nsw-bus-city,2002-03-31,1.182113
nsw-bus-noncity,2002-03-31,-0.276268
...,...,...
wa-vfr-noncity,2003-12-31,0.964753
wa-vfr-noncity,2004-03-31,1.044365
wa-vfr-noncity,2004-06-30,2.067843
wa-vfr-noncity,2004-09-30,2.743913


## Denormalize everything

In [22]:
Y_train_df = scaler.inverse_transform(Y_train_df)
Y_test_df = scaler.inverse_transform(Y_test_df)

Y_hat_in_sample = Y_hat_in_sample.rename(columns={"LGBMRegressor": "y"})
Y_hat_in_sample = scaler.inverse_transform(Y_hat_in_sample)
Y_hat_in_sample = Y_hat_in_sample.rename(columns={"y": "LGBMRegressor"})

In [23]:
Y_hat_in_sample

Unnamed: 0_level_0,ds,LGBMRegressor
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
total,2002-03-31,76482.965367
total,2002-06-30,51665.450702
total,2002-09-30,73395.961154
total,2002-12-31,90468.354427
total,2003-03-31,82519.211576
...,...,...
nt-oth-noncity,2003-12-31,258.670813
nt-oth-noncity,2004-03-31,268.383934
nt-oth-noncity,2004-06-30,393.255336
nt-oth-noncity,2004-09-30,475.740480


In [24]:
# Create Y_df with y_hat_in_sample
Y_train_df_extended = Y_train_df.merge(Y_hat_in_sample, on=["ds", "unique_id"], how="inner")
Y_train_df_extended

Unnamed: 0_level_0,ds,y,LGBMRegressor
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
total,2002-03-31,83938.0,76482.965367
total,2002-06-30,63529.0,51665.450702
total,2002-09-30,75540.0,73395.961154
total,2002-12-31,75663.0,90468.354427
total,2003-03-31,83860.0,82519.211576
...,...,...,...
nt-oth-noncity,2003-12-31,132.0,258.670813
nt-oth-noncity,2004-03-31,12.0,268.383934
nt-oth-noncity,2004-06-30,40.0,393.255336
nt-oth-noncity,2004-09-30,186.0,475.740480


## Reconcile forecasts

The following cell makes the previous forecasts coherent using the `HierarchicalReconciliation` class. Since the hierarchy structure is not strict, we can't use methods such as `TopDown` or `MiddleOut`. In this example we use `BottomUp` and `MinTrace`.

In [25]:
from hierarchicalforecast.methods import BottomUp, MinTrace, ERM

reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols'),
    ERM(method='reg')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df_extended, S=S, tags=tags)

The dataframe `Y_rec_df` contains the reconciled forecasts.

In [26]:
Y_rec_df

Unnamed: 0_level_0,ds,LGBMRegressor,LGBMRegressor/BottomUp,LGBMRegressor/MinTrace_method-mint_shrink,LGBMRegressor/MinTrace_method-ols,LGBMRegressor/ERM_method-reg_lambda_reg-0.01
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
total,2005-03-31,81908.234854,93736.132812,82500.776298,81302.447920,120877.984375
total,2005-06-30,76061.248922,88132.398438,72354.754489,74471.508784,139658.968750
total,2005-09-30,73863.237447,92426.648438,74440.680659,74300.910745,148080.312500
total,2005-12-31,64414.304959,86003.921875,71612.951942,66301.089546,166279.906250
total,2006-03-31,70100.351274,86149.960938,81455.004791,73197.647625,136075.328125
...,...,...,...,...,...,...
nt-oth-noncity,2005-12-31,-474.331918,-474.331909,-349.347439,-138.429598,-343.392609
nt-oth-noncity,2006-03-31,-770.245909,-770.245911,-611.345365,-348.865154,-374.883789
nt-oth-noncity,2006-06-30,-881.960363,-881.960388,-610.592528,-86.430289,-231.292175
nt-oth-noncity,2006-09-30,-875.712914,-875.712891,-569.143836,10.695275,48.188232


## Evaluation 

The `HierarchicalForecast` package includes the `HierarchicalEvaluation` class to evaluate the different hierarchies and also is capable of compute scaled metrics compared to a benchmark model.

In [27]:
from hierarchicalforecast.evaluation import HierarchicalEvaluation

def rmse(y, y_hat):
    return np.mean(np.sqrt(np.mean((y-y_hat)**2, axis=1)))

def mase(y, y_hat, y_insample, seasonality=4):
    errors = np.mean(np.abs(y - y_hat), axis=1)
    scale = np.mean(np.abs(y_insample[:, seasonality:] - y_insample[:, :-seasonality]), axis=1)
    return np.mean(errors / scale)

def rmsse(y, y_hat, y_insample):
    errors = np.mean(np.square(y - y_hat), axis=1)
    scale = np.mean(np.square(y_insample[:, 1:] - y_insample[:, :-1]), axis=1)
    return np.mean(np.sqrt(errors / scale))

eval_tags = {}
eval_tags['Total'] = tags['Country']
eval_tags['Purpose'] = tags['Country/Purpose']
# eval_tags['State'] = tags['Country/State']#np.concatenate([val for key, val in tags.items() if 'State' in key])
# eval_tags['Regions'] = tags['Country/State/Region']
eval_tags['Purpose-State'] = tags['Country/Purpose/State']
# eval_tags['Bottom'] = tags['Country/State/Region/Purpose']
eval_tags['Regions'] = tags['Country/Purpose/State/CityNonCity']
# eval_tags['All'] = np.concatenate(list(tags.values()))

evaluator = HierarchicalEvaluation(evaluators=[rmse, mase, rmsse])
evaluation = evaluator.evaluate(
        Y_hat_df=Y_rec_df, Y_test_df=Y_test_df,
        tags=eval_tags, Y_df=Y_train_df
)
evaluation = evaluation.drop('Overall')
# evaluation.columns = ['Base', 'BottomUp', 'MinTrace(mint_shrink)', 'MinTrace(ols)']
evaluation.columns = ['Base', 'BottomUp', 'MinTrace(ols)', 'MinTrace(mint_shrink)', 'ERM']
evaluation = evaluation.applymap('{:.4f}'.format)

  evaluation = evaluation.drop('Overall')


### RMSE

The following table shows the performance measured using RMSE across levels for each reconciliation method.

In [28]:
score_df = evaluation.query('metric == "rmse"')
score_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Base,BottomUp,MinTrace(ols),MinTrace(mint_shrink),ERM
level,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Total,rmse,15755.1919,16422.048,11123.7182,13450.1368,74791.6893
Purpose,rmse,2877.505,7657.9025,6590.8727,4035.8349,21098.3564
Purpose-State,rmse,1792.8371,2143.7276,1422.5663,1435.6485,3496.5943
Regions,rmse,1177.263,1177.263,879.0429,972.9166,1883.1068


### MASE


The following table shows the performance measured using MASE across levels for each reconciliation method.

In [29]:
evaluation.query('metric == "mase"')

Unnamed: 0_level_0,Unnamed: 1_level_0,Base,BottomUp,MinTrace(ols),MinTrace(mint_shrink),ERM
level,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Total,mase,4.9144,4.9363,3.6551,4.37,27.0992
Purpose,mase,2.2497,6.0074,5.3299,3.257,16.4551
Purpose-State,mase,4.5698,5.257,3.7563,3.7634,7.8353
Regions,mase,4.3553,4.3553,3.3753,3.8753,6.5512


### RMSSE

In [30]:
score_df = evaluation.query('metric == "rmsse"')
score_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Base,BottomUp,MinTrace(ols),MinTrace(mint_shrink),ERM
level,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Total,rmsse,1.3224,1.3783,0.9336,1.1289,6.2775
Purpose,rmsse,0.955,2.1098,1.7059,1.2585,3.7575
Purpose-State,rmsse,2.3257,2.7902,1.9944,2.2197,3.4654
Regions,rmsse,2.596,2.596,2.0994,2.4555,3.381


In [31]:
score_df.astype(float).mean()

Base                     1.799775
BottomUp                 2.218575
MinTrace(ols)            1.683325
MinTrace(mint_shrink)    1.765650
ERM                      4.220350
dtype: float64

### Comparison fable

Observe that we can recover the results reported by the [Forecasting: Principles and Practice](https://otexts.com/fpp3/tourism.html). The original results were calculated using the R package [fable](https://github.com/tidyverts/fable).

![Fable's reconciliation results](./imgs/AustralianDomesticTourism-results-fable.png)

### References
- [Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting: principles and practice, 3rd edition: 
Chapter 11: Forecasting hierarchical and grouped series.". OTexts: Melbourne, Australia. OTexts.com/fpp3 
Accessed on July 2022.](https://otexts.com/fpp3/hierarchical.html)
- [Rob Hyndman, Alan Lee, Earo Wang, Shanika Wickramasuriya, and Maintainer Earo Wang (2021). "hts: Hierarchical and Grouped Time Series". URL https://CRAN.R-project.org/package=hts. R package version 0.3.1.](https://cran.r-project.org/web/packages/hts/index.html)
- [Mitchell O’Hara-Wild, Rob Hyndman, Earo Wang, Gabriel Caceres, Tim-Gunnar Hensel, and Timothy Hyndman (2021). "fable: Forecasting Models for Tidy Time Series". URL https://CRAN.R-project.org/package=fable. R package version 6.0.2.](https://CRAN.R-project.org/package=fable)