# 5. Reconciled Probabilistic Forecasts (Bootstrap)

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/AustralianDomesticTourism-Bootstraped-Intervals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In many cases, only the time series at the lowest level of the hierarchies (bottom time series) are available. `HierarchicalForecast` has tools to create time series for all hierarchies and also allows you to calculate prediction intervals for all hierarchies. In this notebook we will see how to do it.

In [None]:
%%captureBootstrapped
!pip install hierarchicalforecast
!pip install -U statsforecast numba statsmodels matplotlib

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.utils import aggregate, HierarchicalPlot
# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import ETS

## Aggregate bottom time series

In this example we will use the [Tourism](https://otexts.com/fpp3/tourism.html) dataset from the [Forecasting: Principles and Practice](https://otexts.com/fpp3/) book. The dataset only contains the time series at the lowest level, so we need to create the time series for all hierarchies.

In [None]:
Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head()

The dataset can be grouped in the following non-strictly hierarchical structure.

In [None]:
spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
    ['Country', 'State', 'Region', 'Purpose']
]

Using the `aggregate` function from `HierarchicalForecast` we can get the full set of time series.

In [None]:
Y_df, S, tags = aggregate(Y_df, spec)
Y_df = Y_df.reset_index()

In [None]:
Y_df.head()

In [None]:
S.iloc[:5, :5]

In [None]:
tags['Country/Purpose']

We can visualize the `S` matrix and the data using the `HierarchicalPlot` class as follows.

In [None]:
hplot = HierarchicalPlot(S=S, tags=tags)

In [None]:
hplot.plot_summing_matrix()

In [None]:
hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra/Holiday',
    Y_df=Y_df.set_index('unique_id')
)

### Split Train/Test sets

We use the final two years (8 quarters) as test set.

In [None]:
Y_df_test = Y_df.groupby('unique_id').tail(8)
Y_df_train = Y_df.drop(Y_df_test.index)

In [None]:
Y_df_test = Y_df_test.set_index('unique_id')
Y_df_train = Y_df_train.set_index('unique_id')

In [None]:
Y_df_train.groupby('unique_id').size()

## Computing base forecasts

The following cell computes the **base forecasts** for each time series in `Y_df` using the `AutoARIMA` and model. Observe that `Y_hat_df` contains the forecasts but they are not coherent. Since we are computing prediction intervals using bootstrapping, we only need the fitted values of the models.

In [None]:
fcst = StatsForecast(df=Y_df_train, 
                     models=[ETS(season_length=4, model='ZAA')], 
                     freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(h=8, fitted=True)
Y_fitted_df = fcst.forecast_fitted_values()

## Reconcile forecasts

The following cell makes the previous forecasts coherent using the `HierarchicalReconciliation` class. Since the hierarchy structure is not strict, we can't use methods such as `TopDown` or `MiddleOut`. In this example we use `BottomUp` and `MinTrace`. If you want to calculate prediction intervals, you have to use the `level` argument as follows and set `bootstrap=True`.

In [None]:
reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df, Y_fitted_df, S, tags, level=[80, 90], bootstrap=True)

The dataframe `Y_rec_df` contains the reconciled forecasts.

In [None]:
Y_rec_df.head()

## Plot forecasts

Then we can plot the probabilist forecasts using the following function.

In [None]:
df_plot = pd.concat([Y_df.set_index(['unique_id', 'ds']), 
                     Y_rec_df.set_index('ds', append=True)], axis=1)
df_plot = df_plot.reset_index('ds')

### Plot single time series

In [None]:
hplot.plot_series(
    series='Australia',
    Y_df=df_plot, 
    models=['y', 'ETS', 'ETS/MinTrace_method-ols', 'ETS/MinTrace_method-mint_shrink'],
    level=[80]
)

In [None]:
# Since we are plotting a bottom time series
# the probabilistic and mean forecasts
# differ due to bootstrapping
hplot.plot_series(
    series='Australia/Western Australia/Experience Perth/Visiting',
    Y_df=df_plot, 
    models=['y', 'ETS', 'ETS/BottomUp'],
    level=[80]
)

### Plot hierarchichally linked time series

In [None]:
hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/Western Australia/Experience Perth/Visiting',
    Y_df=df_plot, 
    models=['y', 'ETS', 'ETS/MinTrace_method-ols', 'ETS/BottomUp'],
    level=[80]
)

In [None]:
# ACT only has Canberra
hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra/Other',
    Y_df=df_plot, 
    models=['y', 'ETS/MinTrace_method-mint_shrink'],
    level=[80, 90]
)

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/AustralianDomesticTourism-Bootstraped-Intervals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>