# Chapter 57: Hierarchical and Grouped Time‑Series

## Learning Objectives

By the end of this chapter, you will be able to:

- Understand the structure of hierarchical and grouped time‑series data and why coherent forecasts are important
- Identify different types of hierarchies (geographical, product, organisational) and their application to financial data
- Implement bottom‑up, top‑down, and middle‑out forecasting methods
- Apply optimal combination (MinT) reconciliation using covariance estimation
- Handle grouped time‑series with multiple cross‑sectional grouping dimensions
- Understand temporal hierarchies and how to reconcile forecasts across different time frequencies
- Evaluate forecast accuracy at both aggregated and disaggregated levels
- Scale hierarchical forecasting to large numbers of series using sparse algebra
- Implement these methods for the NEPSE system, e.g., forecasting the overall market index, sector indices, and individual stocks consistently

---

## Introduction

In many real‑world forecasting problems, time series are naturally organised into hierarchies or groups. For example, sales data may be aggregated by product category, then by region, then by country. In finance, we might have individual stock prices, which aggregate to sector indices, which in turn aggregate to a national market index like the NEPSE index. Decisions often need to be made at different levels: a trader might care about a specific stock, while a fund manager cares about the sector, and a regulator cares about the overall market.

Forecasting such data presents a unique challenge: forecasts made independently at each level will not be **coherent**—they will not add up correctly across the hierarchy (e.g., the sum of forecasted sector indices may not equal the forecasted overall index). Incoherent forecasts can lead to inconsistent planning and poor decisions.

**Hierarchical forecasting** aims to produce forecasts that are coherent across all levels. This can be achieved by first generating base forecasts for every series (using any univariate or multivariate model) and then applying a **reconciliation** method to adjust them so that they satisfy the aggregation constraints.

In this chapter, we will explore hierarchical and grouped time‑series, reconciliation methods, and their application to the NEPSE system. We will use the `scikit-hts` library and also implement the core mathematics manually to understand the underlying principles.

---

## 57.1 Hierarchical Structures

A hierarchy is defined by aggregation constraints. For example, consider a two‑level hierarchy:

- Level 0 (top): Total market index (NEPSE)
- Level 1: Sector indices (Banking, Hydropower, Insurance, etc.)
- Level 2: Individual stocks within each sector

The constraints are: the sum of sector indices equals the total index, and within each sector, the sum of stock prices (weighted appropriately) equals the sector index. In practice, we often work with returns or log‑returns, and the constraints are linear: `y_t^total = sum_{sectors} y_t^sector` and `y_t^sector = sum_{stocks in sector} y_t^stock`.

Formally, let `y_t` be a vector of all series at time `t`, arranged from bottom to top. There exists a **summing matrix** `S` such that:

`y_t = S b_t`

where `b_t` contains the bottom‑level series only. For example, with 2 sectors and 3 stocks (1 in sector A, 2 in sector B), we have:

`[Total, SectorA, SectorB, StockA1, StockB1, StockB2]' = S * [StockA1, StockB1, StockB2]'`

with `S` having rows that sum the appropriate bottom series.

The goal of reconciliation is to adjust any set of base forecasts (which may be incoherent) so that the adjusted forecasts satisfy `ŷ_t = S \hat{b}_t` for some bottom‑level forecasts.

---

## 57.2 Bottom‑Up Approach

The simplest method is **bottom‑up**: forecast only the bottom‑level series, then aggregate them up the hierarchy. This ensures coherence by construction. It works well if the bottom‑level forecasts are accurate and the hierarchy is not too deep.

For NEPSE, we could forecast each individual stock using a univariate model, then sum them to get sector and total index forecasts.

**Implementation:**

```python
import pandas as pd
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Assume we have a DataFrame 'stocks' with columns for each stock, index Date
# We'll forecast each stock separately

stock_forecasts = {}
for col in stocks.columns:
    model = ExponentialSmoothing(stocks[col], seasonal_periods=5, trend='add', seasonal='add')
    fit = model.fit()
    forecast = fit.forecast(5)
    stock_forecasts[col] = forecast

# Combine into a DataFrame
bottom_forecasts = pd.DataFrame(stock_forecasts)

# Now aggregate to sectors (if we have sector mapping)
# sector_map: dict {stock: sector}
sector_forecasts = bottom_forecasts.groupby(sector_map, axis=1).sum()
total_forecast = sector_forecasts.sum(axis=1)
```

**Advantages:** Simple, coherent, no information lost at lower levels.
**Disadvantages:** Ignores higher‑level information that might improve bottom forecasts; can be noisy if bottom series are erratic.

---

## 57.3 Top‑Down Approach

**Top‑down** forecasts only the top level, then disaggregates it to lower levels using historical proportions. The proportions can be based on historical averages, or more sophisticated methods like forecast proportions.

For NEPSE, we would forecast the total index, then allocate to sectors and stocks based on their average weight.

**Implementation (historical proportions):**

```python
# Forecast total index (using some model)
total_forecast = forecast_total_index()

# Historical proportions (e.g., average share of each sector in total)
sector_props = stocks.groupby(sector_map, axis=1).sum().mean() / stocks.sum(axis=1).mean()

# Disaggregate
sector_forecasts = pd.DataFrame({sector: total_forecast * prop for sector, prop in sector_props.items()})

# Further disaggregate to stocks within each sector using similar proportions
stock_props_within_sector = {}
for sector in sector_props.index:
    stocks_in_sector = [s for s in stocks.columns if sector_map[s] == sector]
    sector_total = stocks[stocks_in_sector].sum(axis=1).mean()
    props = stocks[stocks_in_sector].mean() / sector_total
    stock_props_within_sector[sector] = props

# Combine
stock_forecasts = pd.DataFrame()
for sector, props in stock_props_within_sector.items():
    sector_fc = sector_forecasts[sector]
    for stock, prop in props.items():
        stock_forecasts[stock] = sector_fc * prop
```

**Advantages:** Simple, stable if proportions are stable.
**Disadvantages:** Ignores dynamics at lower levels; proportions may change over time.

---

## 57.4 Middle‑Out Approach

**Middle‑out** selects an intermediate level to forecast first, then aggregates up and disaggregates down. This can be useful when a particular level is more reliable or easier to forecast.

For NEPSE, we might forecast sector indices directly, then sum to get total, and disaggregate to stocks using proportions within each sector.

---

## 57.5 Optimal Combination (MinT)

The **optimal combination** approach, also known as **MinT** (Minimum Trace) reconciliation, produces forecasts that are coherent and as close as possible to the original base forecasts, in a weighted least squares sense. It uses the covariance matrix of forecast errors to weight the reconciliation.

The reconciled forecasts `ŷ` are given by:

`ŷ = S (S' W^{-1} S)^{-1} S' W^{-1} \hat{y}`

where `\hat{y}` are the base forecasts for all series, `S` is the summing matrix, and `W` is the covariance matrix of the base forecast errors. In practice, `W` is often estimated from historical forecast errors, or simplified (e.g., `W = I` yields ordinary least squares).

**Implementation with `scikit-hts`**

The `scikit-hts` library provides an easy interface for hierarchical forecasting.

```python
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.compose import HierarchicalForecasting
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.utils.plotting import plot_series

# Assume we have a DataFrame 'y' with a MultiIndex (sector, stock) and columns ['value']
# Also we have a top level 'total'

# Fit a hierarchical forecaster with MinT reconciliation
forecaster = HierarchicalForecasting(
    estimator=ExponentialSmoothing(trend='add', seasonal='add', sp=5),
    aggregation_approach='mint_cov',  # MinT with covariance estimation
    levels=[0, 1]  # hierarchy levels (0: top, 1: bottom)
)

# Prepare data
# y should be a Series with MultiIndex: (level_0, level_1, ...)
# For NEPSE: (sector, stock) and a separate series for total

# Fit
forecaster.fit(y_train)

# Predict
fh = ForecastingHorizon([1,2,3,4,5], is_relative=True)
y_pred = forecaster.predict(fh)
```

**Explanation:**  
`sktime`'s `HierarchicalForecasting` handles the hierarchy and applies the chosen reconciliation method. The `mint_cov` option estimates the covariance matrix of errors from in‑sample residuals. The resulting forecasts are coherent.

**Manual implementation of MinT**

To understand the mechanics, let's implement MinT from scratch.

```python
import numpy as np
import pandas as pd
from scipy.linalg import inv

def reconcile_mint(base_forecasts, S, W=None):
    """
    base_forecasts: array of shape (n_series,) - base forecasts for all nodes
    S: summing matrix of shape (n_series, n_bottom)
    W: covariance matrix of base forecast errors (n_series, n_series). If None, use identity.
    """
    if W is None:
        W = np.eye(len(base_forecasts))
    W_inv = inv(W)
    # Compute G = (S' W^{-1} S)^{-1} S' W^{-1}
    G = inv(S.T @ W_inv @ S) @ S.T @ W_inv
    # Bottom forecasts
    bottom_forecasts = G @ base_forecasts
    # Reconciled all levels
    reconciled = S @ bottom_forecasts
    return reconciled

# Example: simple hierarchy with 3 bottom nodes and 1 top node
# y = [total, bottom1, bottom2, bottom3]
S = np.array([
    [1, 1, 1],  # total = b1+b2+b3
    [1, 0, 0],  # bottom1
    [0, 1, 0],  # bottom2
    [0, 0, 1]   # bottom3
])

base_forecasts = np.array([100, 30, 40, 35])  # total=100, but bottom sum=105 (incoherent)
reconciled = reconcile_mint(base_forecasts, S)
print("Reconciled forecasts:", reconciled)
```

**Explanation:**  
The function computes optimal bottom forecasts that minimise the weighted squared deviation from the base forecasts, subject to coherence. The `G` matrix maps base forecasts to bottom forecasts. The reconciled forecasts are then obtained by aggregating bottom forecasts.

---

## 57.6 Grouped Time‑Series

Grouped time‑series are more general than hierarchies: they involve multiple cross‑sectional grouping dimensions. For example, we might have sales data grouped by both product category and region. The constraints are that the sum across categories for a given region equals the region total, and sum across regions for a given category equals the category total, and overall total equals sum of all.

In finance, we could have stocks grouped by sector and also by market capitalisation (large, mid, small). The constraints are more complex but still linear.

The same reconciliation methods apply, but the summing matrix becomes more intricate. `sktime`'s `HierarchicalForecasting` can handle grouped structures if you provide a complete `MultiIndex` and specify the aggregation hierarchy.

**Example with two grouping dimensions:**

```python
# Create a MultiIndex for (sector, size)
index = pd.MultiIndex.from_tuples([
    ('Bank', 'Large'), ('Bank', 'Small'),
    ('Hydro', 'Large'), ('Hydro', 'Small')
], names=['sector', 'size'])

# Create a Series of values
y = pd.Series([10, 5, 8, 3], index=index)

# To reconcile, we need to define the summing matrix.
# In practice, use a library like `sktime` or `pyhts`.
```

---

## 57.7 Temporal Hierarchies

Temporal hierarchies involve reconciling forecasts made at different time frequencies. For example, we might have monthly, quarterly, and annual forecasts. The constraints are that the sum of monthly forecasts within a quarter should equal the quarterly forecast, etc. This is analogous to a hierarchy where the bottom level is the highest frequency.

**Benefits:** Ensures consistency across planning horizons and can improve accuracy by using information from all frequencies.

**Implementation:** Use the same reconciliation framework but with a summing matrix that aggregates over time. The `sktime` package supports temporal hierarchies via the `TemporalHierarchy` class.

```python
from sktime.forecasting.temporal_hierarchy import TemporalHierarchyForecaster

forecaster = TemporalHierarchyForecaster(
    estimator=ExponentialSmoothing(trend='add', seasonal='add', sp=12),
    aggregation_approach='mint_cov'
)
forecaster.fit(y_train)  # y_train is daily or monthly
y_pred = forecaster.predict(fh)
```

**Explanation:**  
The forecaster automatically creates the temporal hierarchy (e.g., daily, weekly, monthly) and reconciles forecasts across these frequencies.

---

## 57.8 Evaluation of Hierarchical Forecasts

Evaluation must consider all levels. Common metrics:

- **Root Mean Squared Error (RMSE)** at each level.
- **Weighted Average RMSE** across levels, using importance weights.
- **Mean Absolute Scaled Error (MASE)**.
- **Overall coherence** can be measured by the discrepancy between aggregated bottom forecasts and top forecasts (should be zero after reconciliation).

We can also compute metrics that jointly assess coherence and accuracy, such as the **mean interval score** for probabilistic forecasts.

**Example: Compute RMSE at each level**

```python
from sklearn.metrics import mean_squared_error

# y_true: DataFrame with columns for each series (top and bottom)
# y_pred: similarly, reconciled forecasts

rmse = {}
for col in y_true.columns:
    rmse[col] = np.sqrt(mean_squared_error(y_true[col], y_pred[col]))
print(pd.Series(rmse))
```

We can also compute a weighted average where top levels are weighted more if they are more important for the business.

---

## 57.9 Scalability

When the number of series is large (e.g., thousands of stocks), computing the full covariance matrix `W` for MinT becomes infeasible. Approximations are used:

- **W = I** (ordinary least squares) – simple but may not be optimal.
- **Diagonal W** using only variances of each series (ignoring correlations).
- **Structural scaling** – assume errors are proportional to the level of aggregation.
- **Sparse covariance** estimation.

The `sktime` implementation allows specifying `agg_algorithm='mint_shrink'` to use a shrinkage estimator for the covariance.

For very large hierarchies, bottom‑up may be the only practical option, especially if the bottom series are forecasted with a single model that scales well (e.g., a global model like a neural network).

---

## 57.10 Implementation for NEPSE

Let's design a hierarchical forecasting system for NEPSE:

- **Bottom level**: Individual stocks (e.g., 200+ stocks)
- **Middle level**: Sector indices (Banking, Hydropower, Insurance, Manufacturing, Trading, Others)
- **Top level**: NEPSE total index

We have historical daily closing prices for each stock. We'll compute returns and work with them.

**Steps:**

1. **Aggregate data** to get sector and total indices. For total index, we may have an official index, or we can compute a market‑cap weighted index. For simplicity, we'll use equal‑weighted or value‑weighted sums.
2. **Create summing matrix** `S` that maps stocks to sectors and to total.
3. **Generate base forecasts** for every series (each stock, each sector, total) using a univariate model (e.g., ARIMA, ETS, or a global LSTM).
4. **Reconcile** using MinT (or another method) to get coherent forecasts.
5. **Evaluate** at all levels.

**Example code skeleton:**

```python
import pandas as pd
import numpy as np
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.compose import HierarchicalForecaster
from sktime.forecasting.model_selection import temporal_train_test_split

# Load data: a DataFrame with columns: 'stock1', 'stock2', ..., 'sector1', ..., 'total'
# Each column is a time series of returns or prices.
df = pd.read_csv('nepse_all_series.csv', index_col=0, parse_dates=True)

# Define hierarchy levels
# We need a MultiIndex for columns indicating the level.
# For simplicity, we'll create a mapping and use a flat DataFrame, but sktime expects a MultiIndex.
# Better: restructure data into a Series with a MultiIndex.

# Example: create a MultiIndex DataFrame where rows are time, columns are MultiIndex (level, name)
# Let's create a dummy:
levels = []
for stock in stocks.columns:
    levels.append(('stock', stock))
for sector in sectors.columns:
    levels.append(('sector', sector))
levels.append(('total', 'NEPSE'))
df.columns = pd.MultiIndex.from_tuples(levels, names=['level', 'name'])

# Now stack to get a Series with MultiIndex
y = df.stack()  # result has index (date, level, name)

# Train/test split
y_train, y_test = temporal_train_test_split(y, test_size=0.2)

# Define forecaster
forecaster = HierarchicalForecaster(
    estimator=ExponentialSmoothing(trend='add', seasonal='add', sp=5),  # weekly seasonality?
    aggregation_approach='mint_cov'
)

# Fit
forecaster.fit(y_train)

# Predict
fh = ForecastingHorizon([1,2,3,4,5], is_relative=True)
y_pred = forecaster.predict(fh)

# Unstack to compare with actuals
y_pred_unstacked = y_pred.unstack('level')
y_test_unstacked = y_test.unstack('level')

# Evaluate
for level in y_pred_unstacked.columns.levels[0]:
    pred = y_pred_unstacked[level]
    true = y_test_unstacked[level].loc[pred.index]
    rmse = np.sqrt(((pred - true) ** 2).mean())
    print(f"{level} RMSE: {rmse:.4f}")
```

**Explanation:**  
We structure the data as a Series with a MultiIndex (date, level, name). The `HierarchicalForecaster` automatically builds the summing matrix from the index hierarchy (assuming levels are ordered from top to bottom). It then generates base forecasts for all series and reconciles.

---

## Chapter Summary

In this chapter, we explored hierarchical and grouped time‑series forecasting, a crucial area for ensuring coherent predictions across aggregation levels. We covered:

- The structure of hierarchies and the importance of coherence.
- Simple methods: bottom‑up, top‑down, middle‑out.
- Optimal combination (MinT) reconciliation using covariance estimation.
- Grouped time‑series with multiple cross‑sectional dimensions.
- Temporal hierarchies for reconciling across frequencies.
- Evaluation metrics and scalability considerations.
- A practical implementation for the NEPSE system using `sktime`.

By applying hierarchical forecasting to NEPSE, we can produce consistent forecasts for individual stocks, sector indices, and the overall market index, enabling better decision‑making at all levels. In the next chapter, we will discuss **Anomaly Detection**, a critical task for identifying unusual market events and potential data quality issues.

---

**End of Chapter 57**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='56. multi_variate_time_series.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='58. anomaly_detection.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
