# Geographical Aggregation (Tourism)

> Geographical Hierarchical Forecasting on Australian Tourism Data

In many applications, a set of time series is hierarchically organized. Examples include the presence of geographic levels, products, or categories that define different types of aggregations. In such scenarios, forecasters are often required to provide predictions for all disaggregate and aggregate series. A natural desire is for those predictions to be **"coherent"**, that is, for the bottom series to add up precisely to the forecasts of the aggregated series.

In this notebook we present an example on how to use `HierarchicalForecast` to produce coherent forecasts between geographical levels. We will use the classic Australian Domestic Tourism (`Tourism`) dataset, which contains monthly time series of the number of visitors to each state of Australia.

We will first load the Tourism data and produce base forecasts using a diverse set of models, including `AutoETS` from `StatsForecast`, and machine learning models like `lightgbm` and `HistGradientBoostingRegressor` using `MLForecast`, as well as neural network models like `MLP` and `NBEATS` from `NeuralForecast`. We will then reconcile these base forecasts with several reconciliation algorithms from `HierarchicalForecast`. 

Finally, we show the performance is comparable with the results reported by the [Forecasting: Principles and Practice](https://otexts.com/fpp3/tourism.html) which uses the R package [fable](https://github.com/tidyverts/fable).

You can run these experiments using CPU or GPU with Google Colab.

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/AustralianDomesticTourism.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install hierarchicalforecast statsforecast mlforecast datasetsforecast lightgbm sklearn neuralforecast

## 1. Load and Process Data

In this example we will use the [Tourism](https://otexts.com/fpp3/tourism.html) dataset from the [Forecasting: Principles and Practice](https://otexts.com/fpp3/) book.

The dataset only contains the time series at the lowest level, so we need to create the time series for all hierarchies.

In [None]:
import numpy as np
import pandas as pd

In [None]:
Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.PeriodIndex(Y_df["ds"], freq='Q').to_timestamp()
Y_df.head()

The dataset can be grouped in the following non-strictly hierarchical structure.

In [None]:
spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
    ['Country', 'State', 'Region', 'Purpose']
]

Using the `aggregate` function from `HierarchicalForecast` we can get the full set of time series.

In [None]:
from hierarchicalforecast.utils import aggregate

In [None]:
%%capture
Y_df, S_df, tags = aggregate(Y_df, spec)

In [None]:
Y_df.head()

In [None]:
S_df.iloc[:5, :5]

In [None]:
tags['Country/Purpose']

### Split Train/Test sets

We use the final two years (8 quarters) as test set.

In [None]:
Y_test_df = Y_df.groupby('unique_id', as_index=False).tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)

In [None]:
Y_train_df.groupby('unique_id').size()

## 2. Computing base forecasts

The following cell computes the **base forecasts** for each time series in `Y_df` using the `ETS` model. Observe that `Y_hat_df` contains the forecasts but they are not coherent.

In [None]:
%%capture
from statsforecast.models import AutoETS
from statsforecast.core import StatsForecast

In [None]:
%%capture
fcst = StatsForecast(models=[AutoETS(season_length=4, model='ZZA')], 
                     freq='QS', n_jobs=-1)
Y_hat_df_stats = fcst.forecast(df=Y_train_df, h=8, fitted=True)
Y_fitted_df_stats = fcst.forecast_fitted_values()

### 2.1 Computing MLForecast models

In [None]:
%%capture
import lightgbm as lgb
from sklearn.ensemble import HistGradientBoostingRegressor
from mlforecast.lag_transforms import ExpandingMean, RollingMean, ExpandingStd
from mlforecast.target_transforms import Differences
from mlforecast import MLForecast

In [None]:
%%capture
mlf = MLForecast(
    models = {
        'lgbm': lgb.LGBMRegressor(verbosity=-1),
        'gbm':HistGradientBoostingRegressor()
    }, 
    freq='QS',
    target_transforms=[Differences([1, 4])],
    lags=[1, 2, 3, 4, 5, 6, 7, 8, 12],
    lag_transforms={  
        1: [ExpandingMean(), RollingMean(window_size=4)],
        4: [ExpandingMean(), RollingMean(window_size=2), RollingMean(window_size=4)],
        8: [RollingMean(window_size=4)]
    },
    date_features=['quarter', 'year']
)
mlf.fit(Y_train_df, fitted=True)
Y_hat_df_ml = mlf.predict(new_df=Y_train_df, h=8)
Y_fitted_df_ml = mlf.forecast_fitted_values()

### 2.2 Computing Neuralforecast models

In [None]:
from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATS, MLP
from neuralforecast.losses.pytorch import MAE

In [None]:
nf = NeuralForecast(
    models=[
        NBEATS(
            h=4,
            input_size=16,
            mlp_units=[[256, 256], [256, 256], [256, 256]],
            learning_rate=1e-3,
            loss=MAE(),
            random_seed=42
        ),
        MLP(
            h=4,
            input_size=16,
            num_layers=2,
            hidden_size=64,
            max_steps=500,
            learning_rate=1e-3,
            loss=MAE(),
            random_seed=42
        )
    ],
    freq='QS'
)
nf.fit(df=Y_train_df, val_size=4)
Y_hat_df_nf = nf.predict()
Y_fitted_df_nf = nf.predict_insample(step_size=4)

In [None]:
%%capture
Y_hat_df = Y_hat_df_stats.merge(Y_hat_df_ml, on=['unique_id', 'ds']).merge(Y_hat_df_nf, on=['unique_id', 'ds'])
Y_fitted_df = Y_fitted_df_stats.merge(Y_fitted_df_ml.drop(columns=['y']), on=['unique_id', 'ds']).merge(Y_fitted_df_nf.drop(columns=['cutoff', 'y']), on=['unique_id', 'ds'], how="left")

## 3. Reconcile forecasts

The following cell makes the previous forecasts coherent using the `HierarchicalReconciliation` class. Since the hierarchy structure is not strict, we can't use methods such as `TopDown` or `MiddleOut`. In this example we use `BottomUp` and `MinTrace`.

In [None]:
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.core import HierarchicalReconciliation

In [None]:
reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_fitted_df, S=S_df, tags=tags)

The dataframe `Y_rec_df` contains the reconciled forecasts.

In [None]:
Y_rec_df.head()

## 4. Evaluation 

The `HierarchicalForecast` package includes an `evaluate` function to evaluate the different hierarchies and also is capable of compute scaled metrics compared to a benchmark model.

In [None]:
from hierarchicalforecast.evaluation import evaluate
from utilsforecast.losses import rmse, mase
from functools import partial

We'll be cleaning the columns names and the following function will help us do it in a more structured way:

In [None]:
def rename_evaluation_columns(
        evaluation_df: pd.DataFrame,
        base_model_name: str = 'AutoETS',
        other_model_names: list = ['AutoETS', 'lgbm', 'knn', 'gbm', 'NBEATS', 'MLP'],
    ) -> pd.DataFrame:

    cleaned_column_mapping = {}

    def _clean_recon_method(raw_name: str) -> str:
        if raw_name.startswith('MinTrace_method-'):
            method_part = raw_name.replace('MinTrace_method-', '')
            return f"MinTrace({method_part})"
        else:
            return raw_name.replace('-', ' ')

    for col in evaluation_df.columns:
        if col in ['level', 'metric']:
            cleaned_column_mapping[col] = col
        elif col == base_model_name:
            cleaned_column_mapping[col] = 'Base'
        elif col.startswith(f'{base_model_name}/'): 
            recon_method_raw = col.split('/', 1)[1]
            cleaned_column_mapping[col] = _clean_recon_method(recon_method_raw)
        else: 
            is_other_base_model = False
            for model in other_model_names:
                if col == model:
                    cleaned_column_mapping[col] = model
                    is_other_base_model = True
                    break
            if is_other_base_model:
                continue
            
            parts = col.split('/', 1)
            if len(parts) == 2:
                model_name = parts[0]
                recon_method_raw = parts[1]
                
                recon_method_clean = _clean_recon_method(recon_method_raw)
                
                new_name = f"{model_name} {recon_method_clean}"
                cleaned_column_mapping[col] = new_name
            else:
                cleaned_column_mapping[col] = col
    
    return evaluation_df.rename(columns=cleaned_column_mapping)

In [None]:
eval_tags = {}
eval_tags['Total'] = tags['Country']
eval_tags['Purpose'] = tags['Country/Purpose']
eval_tags['State'] = tags['Country/State']
eval_tags['Regions'] = tags['Country/State/Region']
eval_tags['Bottom'] = tags['Country/State/Region/Purpose']

df = Y_rec_df.merge(Y_test_df, on=['unique_id', 'ds'])
evaluation = evaluate(df = df,
                      tags = eval_tags,
                      train_df = Y_train_df,
                      metrics = [rmse,
                                 partial(mase, seasonality=4)])

evaluation = rename_evaluation_columns(evaluation)
numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format).astype(np.float64)

### RMSE

The following table shows the performance measured using RMSE across levels for each reconciliation method.

In [None]:
evaluation.query('metric == "rmse"')

### MASE


The following table shows the performance measured using MASE across levels for each reconciliation method. Focusing only in AutoETS model

In [None]:
evaluation.query('metric == "mase"')[['level', 'metric', 'Base', 'BottomUp', 'MinTrace(mint_shrink)', 'MinTrace(ols)']]

In [None]:
evaluation.query('metric == "mase"')[['level', 'metric', 'Base', 'BottomUp', 'MinTrace(mint_shrink)', 'MinTrace(ols)']]

### Comparison fable

Observe that we can recover the results reported by the [Forecasting: Principles and Practice](https://otexts.com/fpp3/tourism.html). The original results were calculated using the R package [fable](https://github.com/tidyverts/fable).

![Fable's reconciliation results](./imgs/AustralianDomesticTourism-results-fable.png)

### References
- [Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting: principles and practice, 3rd edition: 
Chapter 11: Forecasting hierarchical and grouped series.". OTexts: Melbourne, Australia. OTexts.com/fpp3 
Accessed on July 2022.](https://otexts.com/fpp3/hierarchical.html)
- [Rob Hyndman, Alan Lee, Earo Wang, Shanika Wickramasuriya, and Maintainer Earo Wang (2021). "hts: Hierarchical and Grouped Time Series". URL https://CRAN.R-project.org/package=hts. R package version 0.3.1.](https://cran.r-project.org/web/packages/hts/index.html)
- [Mitchell O’Hara-Wild, Rob Hyndman, Earo Wang, Gabriel Caceres, Tim-Gunnar Hensel, and Timothy Hyndman (2021). "fable: Forecasting Models for Tidy Time Series". URL https://CRAN.R-project.org/package=fable. R package version 6.0.2.](https://CRAN.R-project.org/package=fable)