# DeepAR in Traffic data

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/AustralianDomesticTourism.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In many cases, only the time series at the lowest level of the hierarchies (bottom time series) are available. `HierarchicalForecast` has tools to create time series for all hierarchies. In this notebook we will see how to do it.

In [22]:

# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive
import pandas as pd

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut
from datasetsforecast.hierarchical import HierarchicalData
import numpy as np
from statsforecast.models import ETS


## Aggregate bottom time series

In this example we will use the [Tourism](https://otexts.com/fpp3/tourism.html) dataset from the [Forecasting: Principles and Practice](https://otexts.com/fpp3/) book. The dataset only contains the time series at the lowest level, so we need to create the time series for all hierarchies.

In [23]:
# Load TourismSmall dataset
Y_df, S, tags = HierarchicalData.load('./data', 'Traffic')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

In [24]:
Y_df

Unnamed: 0,unique_id,ds,y
0,Total,2008-01-01,1536.0182
1,Total,2008-01-02,1619.2435
2,Total,2008-01-03,1423.6574
3,Total,2008-01-04,1096.3325
4,Total,2008-01-05,974.5526
...,...,...,...
75757,Bottom200,2008-12-27,13.0458
75758,Bottom200,2008-12-28,11.6035
75759,Bottom200,2008-12-29,13.4012
75760,Bottom200,2008-12-30,13.3731


In [25]:
unq_ids = Y_df["unique_id"].unique()
len(unq_ids)

207

In [26]:
len(Y_df[Y_df["unique_id"] == unq_ids[0]])

366

In [27]:
S

Unnamed: 0,Bottom1,Bottom2,Bottom3,Bottom4,Bottom5,Bottom6,Bottom7,Bottom8,Bottom9,Bottom10,...,Bottom191,Bottom192,Bottom193,Bottom194,Bottom195,Bottom196,Bottom197,Bottom198,Bottom199,Bottom200
Total,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
y1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
y2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
y11,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
y12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Bottom196,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Bottom197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Bottom198,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
Bottom199,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [28]:
tags

{'Level1': array(['Total'], dtype=object),
 'Level2': array(['y1', 'y2'], dtype=object),
 'Level3': array(['y11', 'y12', 'y21', 'y22'], dtype=object),
 'Level4': array(['Bottom1', 'Bottom2', 'Bottom3', 'Bottom4', 'Bottom5', 'Bottom6',
        'Bottom7', 'Bottom8', 'Bottom9', 'Bottom10', 'Bottom11',
        'Bottom12', 'Bottom13', 'Bottom14', 'Bottom15', 'Bottom16',
        'Bottom17', 'Bottom18', 'Bottom19', 'Bottom20', 'Bottom21',
        'Bottom22', 'Bottom23', 'Bottom24', 'Bottom25', 'Bottom26',
        'Bottom27', 'Bottom28', 'Bottom29', 'Bottom30', 'Bottom31',
        'Bottom32', 'Bottom33', 'Bottom34', 'Bottom35', 'Bottom36',
        'Bottom37', 'Bottom38', 'Bottom39', 'Bottom40', 'Bottom41',
        'Bottom42', 'Bottom43', 'Bottom44', 'Bottom45', 'Bottom46',
        'Bottom47', 'Bottom48', 'Bottom49', 'Bottom50', 'Bottom51',
        'Bottom52', 'Bottom53', 'Bottom54', 'Bottom55', 'Bottom56',
        'Bottom57', 'Bottom58', 'Bottom59', 'Bottom60', 'Bottom61',
        'Bottom62', 

In [29]:
len(tags.keys())

4

### Split Train/Test sets

We use the final horizon as test set.

In [30]:
HORIZON = 7
FREQUENCY = "1D"

In [31]:
Y_test_df = Y_df.groupby('unique_id').tail(HORIZON)
Y_train_df = Y_df.drop(Y_test_df.index)

In [32]:
Y_train_df = Y_train_df.set_index("unique_id")
Y_test_df = Y_test_df.set_index("unique_id")

In [33]:
Y_test_df

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
Total,2008-12-25,1036.8308
Total,2008-12-26,1563.6199
Total,2008-12-27,1606.0017
Total,2008-12-28,1567.3015
Total,2008-12-29,1722.5124
...,...,...
Bottom200,2008-12-27,13.0458
Bottom200,2008-12-28,11.6035
Bottom200,2008-12-29,13.4012
Bottom200,2008-12-30,13.3731


In [34]:
Y_train_df.groupby('unique_id').size()

unique_id
Bottom1      359
Bottom10     359
Bottom100    359
Bottom101    359
Bottom102    359
            ... 
y11          359
y12          359
y2           359
y21          359
y22          359
Length: 207, dtype: int64

## Computing base forecasts

The following cell computes the **base forecasts** for each time series in `Y_df` using the `auto_arima` and `naive` models. Observe that `Y_hat_df` contains the forecasts but they are not coherent.

In [35]:

from statsforecast import StatsForecast
from statsforecast.models import Theta

fcst = StatsForecast(df=Y_train_df, 
                     models=[Theta(season_length=7, decomposition_type="additive")], 
                     freq=FREQUENCY, n_jobs=-1)
Y_hat_df = fcst.forecast(h=HORIZON, fitted=True)
Y_fitted_df = fcst.forecast_fitted_values()

## Reconcile forecasts

The following cell makes the previous forecasts coherent using the `HierarchicalReconciliation` class. Since the hierarchy structure is not strict, we can't use methods such as `TopDown` or `MiddleOut`. In this example we use `BottomUp` and `MinTrace`.

In [36]:
from hierarchicalforecast.methods import BottomUp, MinTrace, ERM

reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols'),
    ERM(method='reg')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_fitted_df, S=S, tags=tags)

The dataframe `Y_rec_df` contains the reconciled forecasts.

In [37]:
Y_rec_df

Unnamed: 0_level_0,ds,Theta,Theta/BottomUp,Theta/MinTrace_method-mint_shrink,Theta/MinTrace_method-ols,Theta/ERM_method-reg_lambda_reg-0.01
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Total,2008-12-25,1503.743164,1488.312012,1501.537615,1503.699253,1303.095947
Total,2008-12-26,1494.726074,1479.193604,1492.754115,1494.681884,1387.072021
Total,2008-12-27,1499.522217,1480.290894,1501.304389,1499.467537,1382.727905
Total,2008-12-28,1505.871338,1486.347778,1507.151456,1505.815842,1603.329102
Total,2008-12-29,1516.441040,1500.258057,1514.059711,1516.395114,1567.219971
...,...,...,...,...,...,...
Bottom200,2008-12-27,11.527153,11.527153,11.690545,11.663962,9.138451
Bottom200,2008-12-28,11.900398,11.900398,12.063513,12.037117,10.323282
Bottom200,2008-12-29,12.096720,12.096720,12.202844,12.188204,10.096326
Bottom200,2008-12-30,11.867877,11.867877,11.969750,11.955844,10.494240


## Evaluation 

The `HierarchicalForecast` package includes the `HierarchicalEvaluation` class to evaluate the different hierarchies and also is capable of compute scaled metrics compared to a benchmark model.

In [38]:
from hierarchicalforecast.evaluation import HierarchicalEvaluation

def rmse(y, y_hat):
    return np.mean(np.sqrt(np.mean((y-y_hat)**2, axis=1)))

def mase(y, y_hat, y_insample, seasonality=4):
    errors = np.mean(np.abs(y - y_hat), axis=1)
    scale = np.mean(np.abs(y_insample[:, seasonality:] - y_insample[:, :-seasonality]), axis=1)
    return np.mean(errors / scale)

def rmsse(y, y_hat, y_insample):
    errors = np.mean(np.square(y - y_hat), axis=1)
    scale = np.mean(np.square(y_insample[:, 1:] - y_insample[:, :-1]), axis=1)
    return np.mean(np.sqrt(errors / scale))

eval_tags = {}
for k in tags.keys():
    eval_tags[k] = tags[k]

evaluator = HierarchicalEvaluation(evaluators=[rmse, mase, rmsse])
evaluation = evaluator.evaluate(
        Y_hat_df=Y_rec_df, Y_test_df=Y_test_df,
        tags=eval_tags, Y_df=Y_train_df
)
evaluation = evaluation.drop('Overall')
# evaluation.columns = ['Base', 'BottomUp', 'MinTrace(mint_shrink)', 'MinTrace(ols)']
evaluation.columns = ['Base', 'BottomUp', 'MinTrace(ols)', 'MinTrace(mint_shrink)', 'ERM']
evaluation = evaluation.applymap('{:.4f}'.format)

  evaluation = evaluation.drop('Overall')


### RMSE

The following table shows the performance measured using RMSE across levels for each reconciliation method.

In [39]:
score_df = evaluation.query('metric == "rmse"')
score_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Base,BottomUp,MinTrace(ols),MinTrace(mint_shrink),ERM
level,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Level1,rmse,231.0258,233.6766,230.8551,231.0316,185.7905
Level2,rmse,116.6914,117.9239,116.545,116.6936,94.9345
Level3,rmse,60.0416,60.6055,60.027,60.0425,50.8
Level4,rmse,2.2658,2.2658,2.2523,2.2617,2.311


### MASE


The following table shows the performance measured using MASE across levels for each reconciliation method.

In [40]:
evaluation.query('metric == "mase"')

Unnamed: 0_level_0,Unnamed: 1_level_0,Base,BottomUp,MinTrace(ols),MinTrace(mint_shrink),ERM
level,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Level1,mase,0.5536,0.5773,0.5532,0.5536,0.5151
Level2,mase,0.5533,0.5767,0.5527,0.5533,0.5149
Level3,mase,0.5654,0.5913,0.5649,0.5654,0.5305
Level4,mase,0.8708,0.8708,0.8607,0.8634,0.9333


### RMSSE

In [41]:
score_df = evaluation.query('metric == "rmsse"')
score_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Base,BottomUp,MinTrace(ols),MinTrace(mint_shrink),ERM
level,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Level1,rmsse,0.8581,0.868,0.8575,0.8582,0.6901
Level2,rmsse,0.8624,0.8714,0.8612,0.8624,0.7011
Level3,rmsse,0.8814,0.8897,0.8811,0.8814,0.7464
Level4,rmsse,1.1347,1.1347,1.1266,1.1315,1.1544


In [42]:
score_df.astype(float).mean()

Base                     0.934150
BottomUp                 0.940950
MinTrace(ols)            0.931600
MinTrace(mint_shrink)    0.933375
ERM                      0.823000
dtype: float64

### Comparison fable

Observe that we can recover the results reported by the [Forecasting: Principles and Practice](https://otexts.com/fpp3/tourism.html). The original results were calculated using the R package [fable](https://github.com/tidyverts/fable).

![Fable's reconciliation results](./imgs/AustralianDomesticTourism-results-fable.png)

### References
- [Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting: principles and practice, 3rd edition: 
Chapter 11: Forecasting hierarchical and grouped series.". OTexts: Melbourne, Australia. OTexts.com/fpp3 
Accessed on July 2022.](https://otexts.com/fpp3/hierarchical.html)
- [Rob Hyndman, Alan Lee, Earo Wang, Shanika Wickramasuriya, and Maintainer Earo Wang (2021). "hts: Hierarchical and Grouped Time Series". URL https://CRAN.R-project.org/package=hts. R package version 0.3.1.](https://cran.r-project.org/web/packages/hts/index.html)
- [Mitchell O’Hara-Wild, Rob Hyndman, Earo Wang, Gabriel Caceres, Tim-Gunnar Hensel, and Timothy Hyndman (2021). "fable: Forecasting Models for Tidy Time Series". URL https://CRAN.R-project.org/package=fable. R package version 6.0.2.](https://CRAN.R-project.org/package=fable)