# Temporal hierarchy reconciliation (thief) example


**data**:

- This example uses the tourism dataset, which records monthly Australian tourism visitor nights from January 1998 to December 2016. 
- The dataset locates in `data/tourism.csv`. `data/tourism_S.csv` saves the summing matrix.
- The dataset contains $304$ time series, which can construct a cross-sectional hierarchy with $555$ time series according to geographical location and tourism purpose.

**Experiment design**
- In this example, we construct a temporal hierarchy (aggregation periods 1, 2, 3, 4, 6, 12) for each of the $555$ time series. 
- The last $12$ months are used for evaluation.
- We first generate the base forecasts for each time series across all frequencies.
- Then, we generate reconciled forecasts using `pyhts.TemporalHierarchy` and `FoReco::thfrec` function in R. We then compare their accuracies and ensure the consistency.

In [1]:
from pyhts import TemporalHierarchy, CrossSectionalHierarchy
import pandas as pd
import numpy as np

ht = TemporalHierarchy.new([1, 2, 3, 4, 6, 12])

In [2]:
# rmse of FoReco::thfrec 
foreco = pd.read_csv("data/tourism_temporal.csv", )
# read the base forecast
basef = {i: pd.read_csv(f"data/tourism_baseforecast_{i}.csv").values[0:12//i,]  for i in [1, 2, 3, 4, 6, 12]}
# read the residuals
residuals = {i: pd.read_csv(f"data/tourism_residuals_{i}.csv").values  for i in [1, 2, 3, 4, 6, 12]}
# read the data
S = pd.read_csv("data/tourism_S.csv", index_col=0).values
dt = pd.read_csv("data/tourism.csv")
bts = dt.iloc[:, 4:].values.T

In [3]:
allts = bts.dot(S.T)
tts = allts[216:, :]
tts.shape

(12, 555)

Test all the results from pyhts and FoReco and ensure the results are consistent

# Cross-sectional hierarchy example

**Experiment design**:

- Use the last 12 months for evaluation
- Construct a cross-sectional at all temporal aggregation periods, i.e. [1, 2, 3, 4, 6, 12]
- Compare the overall RMSE for each reconciled forecasts with reconciled forecasts from `FoReco::htsrec` package

Note: Since the saved summing matrix is not identical with the summing matrix automatically generated by `CrossSectionalHierarchy.new()` constructor, so we generate the `CrossSectionalHierarchy` with `__init__` directly here. The two summing matrices are of same aggregated series but different row positions, and it won't affect the reconciliation results in practice.

In [5]:
foreco = pd.read_csv("data/tourism_crosssectional.csv", )
# ht = CrossSectionalHierarchy.new(dt, trees=[("State", "Region", "City"), ("Purpose",)]) 
from scipy.sparse import csr_matrix
ht = CrossSectionalHierarchy(csr_matrix(S), None)

In [6]:
pyhts_method = ["ols", "structural", "wlsv", "shrinkage"]
foreco_method = ["ols", "struc", "wls", "shr"]
for (pm, fm) in zip(pyhts_method, foreco_method):
    res = []
    for i in [1, 2, 3, 4, 6, 12]:
        recf = ht.reconcile(basef[i], pm, residuals=residuals[i])
        # get the cross-sectional time series at i temporal aggregation periods
        tts_i = np.apply_along_axis(lambda x: x.reshape(-1, i).sum(axis=1), 0, bts[-12:,:])
        res.append(np.sqrt(np.mean((recf - tts_i)**2)))
    res = np.array(res)
    assert np.allclose(res, foreco[fm])