# Advanced Forecasting with Nixtla: Hierarchical Forecasts

Welcome to this tutorial on advanced forecasting techniques using Nixtla's tools. Nixtla provides state-of-the-art libraries for time series forecasting, including neural network-based models and hierarchical forecasting methods.

In this notebook, we will cover:
- Understand cross‑sectional hierarchical structures and coherent forecasts.
- Train base (unreconciled) forecasts with StatsForecast (AutoARIMA + Naive).
- Reconcile forecasts using BottomUp, TopDown, and MiddleOut.
- Evaluate performance by hierarchy level and compare against a benchmark.
- (Advanced) Explore MinTrace / ERM.

By the end of this tutorial, you will understand these concepts and implement them using Python.

Hierarchical forecasting deals with time series that are organized in a hierarchy (e.g., country → region → store). It ensures coherence between aggregated and disaggregated forecasts.

In [1]:
import pandas as pd
import numpy as np

# Data
from datasetsforecast.hierarchical import HierarchicalData

# Base forecasts (unreconciled)
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive

# Reconciliation & evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut
from hierarchicalforecast.evaluation import evaluate

# Metric(s)
from utilsforecast.losses import mse

  __import__("pkg_resources").declare_namespace(__name__)  # type: ignore


Using an example adapted from https://nixtlaverse.nixtla.io/hierarchicalforecast/index.html.

Load the following data:
- Y_df — long-format series (unique_id, ds, y)
- S_df — summing/aggregation matrix encoding the hierarchy
- tags — level name → list of unique_ids for that level

In [2]:
Y_df, S_df, tags = HierarchicalData.load('./data', 'TourismSmall')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S_df = S_df.reset_index(names="unique_id")

Examine each in turn, first is the `Y_df`, this contains 3 columns: the date, the values, and the label.

In [3]:
Y_df

Unnamed: 0,unique_id,ds,y
0,total,1998-03-31,84503
1,total,1998-06-30,65312
2,total,1998-09-30,72753
3,total,1998-12-31,70880
4,total,1999-03-31,86893
...,...,...,...
3199,nt-oth-noncity,2005-12-31,59
3200,nt-oth-noncity,2006-03-31,25
3201,nt-oth-noncity,2006-06-30,52
3202,nt-oth-noncity,2006-09-30,72


The tags show the level of the tags: Country -> Country/Purpose -> Country/Purpose/State -> Country/Purpose/State/CityNonCity. Each tag contains the list of labels the pertains to that level.

In [4]:
tags

{'Country': array(['total'], dtype=object),
 'Country/Purpose': array(['hol', 'vfr', 'bus', 'oth'], dtype=object),
 'Country/Purpose/State': array(['nsw-hol', 'vic-hol', 'qld-hol', 'sa-hol', 'wa-hol', 'tas-hol',
        'nt-hol', 'nsw-vfr', 'vic-vfr', 'qld-vfr', 'sa-vfr', 'wa-vfr',
        'tas-vfr', 'nt-vfr', 'nsw-bus', 'vic-bus', 'qld-bus', 'sa-bus',
        'wa-bus', 'tas-bus', 'nt-bus', 'nsw-oth', 'vic-oth', 'qld-oth',
        'sa-oth', 'wa-oth', 'tas-oth', 'nt-oth'], dtype=object),
 'Country/Purpose/State/CityNonCity': array(['nsw-hol-city', 'nsw-hol-noncity', 'vic-hol-city',
        'vic-hol-noncity', 'qld-hol-city', 'qld-hol-noncity',
        'sa-hol-city', 'sa-hol-noncity', 'wa-hol-city', 'wa-hol-noncity',
        'tas-hol-city', 'tas-hol-noncity', 'nt-hol-city', 'nt-hol-noncity',
        'nsw-vfr-city', 'nsw-vfr-noncity', 'vic-vfr-city',
        'vic-vfr-noncity', 'qld-vfr-city', 'qld-vfr-noncity',
        'sa-vfr-city', 'sa-vfr-noncity', 'wa-vfr-city', 'wa-vfr-noncity',
     

`S_df` contains the summing matrix, each label shown in the tags are contained here and how they relate to one another.

In [5]:
S_df

Unnamed: 0,unique_id,nsw-hol-city,nsw-hol-noncity,vic-hol-city,vic-hol-noncity,qld-hol-city,qld-hol-noncity,sa-hol-city,sa-hol-noncity,wa-hol-city,...,qld-oth-city,qld-oth-noncity,sa-oth-city,sa-oth-noncity,wa-oth-city,wa-oth-noncity,tas-oth-city,tas-oth-noncity,nt-oth-city,nt-oth-noncity
0,total,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,hol,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,vfr,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,bus,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,oth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
84,wa-oth-noncity,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
85,tas-oth-city,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
86,tas-oth-noncity,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
87,nt-oth-city,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


With these bits of information a hierarchical forecast can be calculated.
Let's forecast the next 4 quarters of the data:

In [6]:
Y_test_df  = Y_df.groupby('unique_id').tail(4)
Y_train_df = Y_df.drop(Y_test_df.index)

We can fit any forecast models to the data, but since we've looked at statsforecast and baseline models previously, let's start with those:

In [7]:
fcst = StatsForecast(
    models=[AutoARIMA(season_length=4), Naive()],
    freq='QE',
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df, h=4)
Y_hat_df

Unnamed: 0,unique_id,ds,AutoARIMA,Naive
0,bus,2006-03-31,8918.474609,11547.0
1,bus,2006-06-30,9581.921875,11547.0
2,bus,2006-09-30,11194.672852,11547.0
3,bus,2006-12-31,10678.953125,11547.0
4,hol,2006-03-31,42805.343750,26418.0
...,...,...,...,...
351,wa-vfr-city,2006-12-31,1271.184448,1236.0
352,wa-vfr-noncity,2006-03-31,859.472229,745.0
353,wa-vfr-noncity,2006-06-30,859.472229,745.0
354,wa-vfr-noncity,2006-09-30,859.472229,745.0


Now we can reconcile the forecast, what this essentially means is altering the forecasts so that the total makes sense. There are a few methods to do this:
- Bottom Up: Takes the lowest level forecasts and aggregates them, adjusting the top levels as it goes. Use this reconciler if the bottom level data is rich and accurate.
- Top Down: Does the opposite, uses the top level forecast, then splits down using historical proportions. Use this when the bottom level data is sparse.
- Middle Out: Pick a level to start at, use this to move up the hierarchy in a bottom-up approach, and go down the hierarchy with a top-down approach. Use this when the best data is at one of the middle layers.

Each of these will make the forecast coherent (i.e., all predictions across all levels of a hierarchy satisfy the aggregation constraints, so that the sum of lower-level forecasts exactly equals the forecast at each higher level).

Now we can evaluate the forecasts:

In [8]:
df = Y_rec_df.merge(Y_test_df, on=['unique_id','ds'], how='left')
results = evaluate(df=df, metrics=[mse], tags=tags, benchmark='Naive')
results.sort_values(['metric', 'level'])

NameError: name 'Y_rec_df' is not defined

### Questions
What does coherence mean in hierarchical forecasting, and why can unreconciled base forecasts violate it? 

Method trade‑offs: When might TopDown outperform BottomUp?

Change h=8. Which reconcilers degrade least as horizon increases? Why?

Replace AutoARIMA with AutoETS (or add it in to the list of models) and re‑evaluate. Does reconciliation change which method ranks best at each level?

Next let's examine automatic ways to select the reconcilers.

- MinTrace: minimises the variance of the reconciled forecasts by adjusting weighting of the forecasts in the reconciler.
- Elastic Reconciliation Method: introduces an L1 regularisation term to simplify the reconciliation weights. Lambda specifies a step size to change the weights, so needs careful consideration.

Unlike the approaches mentioned above, these methods use all the forecasts provided and adjust them to optimise the forecast.

In [None]:
mint        = MinTrace(method='mint_shrink')
erm_closed  = ERM(method='closed')
erm_reg     = ERM(method='reg', lambda_reg=0.1)
erm_reg_bu  = ERM(method='reg_bu', lambda_reg=0.1)

hrec = HierarchicalReconciliation(reconcilers=[mint, erm_closed, erm_reg, erm_reg_bu])

Y_rec_df = hrec.reconcile(
    Y_hat_df=Y_hat_df,
    Y_df=Y_fitted_df,
    S_df=S_df,
    tags=tags
)

df_eval = Y_rec_df.merge(Y_test_df, on=['unique_id','ds'], how='left')
results = evaluate(df=df_eval, metrics=[mse], tags=tags, benchmark='Naive')
results.sort_values(['metric','level'])

### Questions
Experiment with the lambda_reg values for `erm_reg` and `erm_bottom_up`, see if you can lower the mse.