In [None]:
#| default_exp methods

# Reconciliation Methods

Large collections of time series organized into structures at different aggregation levels often require their forecasts to follow their aggregation constraints, which poses the challenge of creating novel algorithms capable of coherent forecasts. <br><br> The `HierarchicalForecast` package provides the most comprehensive collection of Python implementations of hierarchical forecasting algorithms that follow classic hierarchical reconciliation. All the methods have a `reconcile` function capable of reconcile base forecasts using `numpy` arrays.

Most reconciliation methods can be described by the following convenient linear algebra notation: 

$$\tilde{\mathbf{y}}_{[a,b],\tau} = \mathbf{S}_{[a,b][b]} \mathbf{P}_{[b][a,b]} \hat{\mathbf{y}}_{[a,b],\tau}$$

where $a, b$ represent the aggregate and bottom levels, $\mathbf{S}_{[a,b][b]}$ contains the hierarchical aggregation constraints, and $\mathbf{P}_{[b][a,b]}$ varies across 
reconciliation methods. The reconciled predictions are $\tilde{\mathbf{y}}_{[a,b],\tau}$, and the base predictions $\hat{\mathbf{y}}_{[a,b],\tau}$.

In [None]:
#| export
import warnings
from collections import OrderedDict
from concurrent.futures import ThreadPoolExecutor, as_completed
from copy import deepcopy
from typing import Optional, Union

import clarabel
import numpy as np
from quadprog import solve_qp
from scipy import sparse

In [None]:
#| export
from hierarchicalforecast.probabilistic_methods import PERMBU, Bootstrap, Normality
from hierarchicalforecast.utils import (
    _construct_adjacency_matrix,
    _is_strictly_hierarchical,
    _lasso,
    _ma_cov,
    _shrunk_covariance_schaferstrimmer_no_nans,
    _shrunk_covariance_schaferstrimmer_with_nans,
    is_strictly_hierarchical,
)

In [None]:
#| hide
from fastcore.test import ExceptionExpected, test_close, test_eq, test_fail
from nbdev.showdoc import add_docs, show_doc

In [None]:
#| exporti
class HReconciler:
    fitted = False
    is_sparse_method = False
    insample = False
    P = None
    sampler = None

    def _get_sampler(
        self,
        intervals_method,
        S,
        P,
        y_hat,
        y_insample,
        y_hat_insample,
        W,
        sigmah,
        num_samples,
        seed,
        tags,
    ):
        if intervals_method == "normality":
            sampler = Normality(S=S, P=P, y_hat=y_hat, W=W, sigmah=sigmah, seed=seed)
        elif intervals_method == "permbu":
            sampler = PERMBU(
                S=S,
                P=P,
                y_hat=(S @ (P @ y_hat)),
                tags=tags,
                y_insample=y_insample,
                y_hat_insample=y_hat_insample,
                sigmah=sigmah,
                num_samples=num_samples,
                seed=seed,
            )
        elif intervals_method == "bootstrap":
            sampler = Bootstrap(
                S=S,
                P=P,
                y_hat=y_hat,
                y_insample=y_insample,
                y_hat_insample=y_hat_insample,
                num_samples=num_samples,
                seed=seed,
            )
        else:
            sampler = None
        return sampler

    def _reconcile(
        self,
        S: np.ndarray,
        P: np.ndarray,
        y_hat: np.ndarray,
        SP: np.ndarray = None,
        level: Optional[list[int]] = None,
        sampler: Optional[Union[Normality, PERMBU, Bootstrap]] = None,
    ):

        # Mean reconciliation
        res = {"mean": (S @ (P @ y_hat))}

        # Probabilistic reconciliation
        if (level is not None) and (sampler is not None):
            # Update results dictionary within
            # Vectorized quantiles
            quantiles = np.concatenate(
                [[(100 - lv) / 200, ((100 - lv) / 200) + lv / 100] for lv in level]
            )
            quantiles = np.sort(quantiles)
            res = sampler.get_prediction_quantiles(res, quantiles)

        return res

    def predict(
        self, S: np.ndarray, y_hat: np.ndarray, level: Optional[list[int]] = None
    ):
        """Predict using reconciler.

        Predict using fitted mean and probabilistic reconcilers.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `level`: float list 0-100, confidence levels for prediction intervals.<br>

        **Returns:**<br>
        `y_tilde`: Reconciliated predictions.
        """
        if not self.fitted:
            raise Exception("This model instance is not fitted yet, Call fit method.")

        return self._reconcile(
            S=S, P=self.P, y_hat=y_hat, sampler=self.sampler, level=level
        )

    def sample(self, num_samples: int):
        """Sample probabilistic coherent distribution.

        Generates n samples from a probabilistic coherent distribution.
        The method uses fitted mean and probabilistic reconcilers, defined by
        the `intervals_method` selected during the reconciler's
        instantiation. Currently available: `normality`, `bootstrap`, `permbu`.

        **Parameters:**<br>
        `num_samples`: int, number of samples generated from coherent distribution.<br>

        **Returns:**<br>
        `samples`: Coherent samples of size (`num_series`, `horizon`, `num_samples`).
        """
        if not self.fitted:
            raise Exception("This model instance is not fitted yet, Call fit method.")
        if self.sampler is None:
            raise ValueError(
                "This model instance does not have sampler. Call fit with `intervals_method`."
            )

        samples = self.sampler.get_samples(num_samples=num_samples)
        return samples

    def fit(self, *args, **kwargs):

        raise NotImplementedError("This method is not implemented yet.")

    def fit_predict(self, *args, **kwargs):

        raise NotImplementedError("This method is not implemented yet.")

    __call__ = fit_predict

# 1. Bottom-Up

In [None]:
#| export
class BottomUp(HReconciler):
    """Bottom Up Reconciliation Class.
    The most basic hierarchical reconciliation is performed using an Bottom-Up strategy. It was proposed for
    the first time by Orcutt in 1968.
    The corresponding hierarchical \"projection\" matrix is defined as:
    $$\mathbf{P}_{\\text{BU}} = [\mathbf{0}_{\mathrm{[b],[a]}}\;|\;\mathbf{I}_{\mathrm{[b][b]}}]$$

    **Parameters:**<br>
    None

    **References:**<br>
    - [Orcutt, G.H., Watts, H.W., & Edwards, J.B.(1968). \"Data aggregation and information loss\". The American
    Economic Review, 58 , 773(787)](http://www.jstor.org/stable/1815532).
    """

    insample = False

    def _get_PW_matrices(self, S, idx_bottom):
        n_hiers, n_bottom = S.shape
        P = np.eye(n_bottom, n_hiers, n_hiers - n_bottom, np.float64)
        if getattr(self, "intervals_method", False) is None:
            W = None
        else:
            W = np.eye(n_hiers, dtype=np.float64)
        return P, W

    def fit(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        idx_bottom: np.ndarray,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
    ):
        """Bottom Up Fit Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>
        `y_insample`: In-sample values of size (`base`, `horizon`).<br>
        `y_hat_insample`: In-sample forecast values of size (`base`, `horizon`).<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>        
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `**sampler_kwargs`: Coherent sampler instantiation arguments.<br>

        **Returns:**<br>
        `self`: object, fitted reconciler.
        """
        self.intervals_method = intervals_method
        self.P, self.W = self._get_PW_matrices(S=S, idx_bottom=idx_bottom)
        self.sampler = self._get_sampler(
            S=S,
            P=self.P,
            W=self.W,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
        )
        self.fitted = True
        return self

    def fit_predict(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        idx_bottom: np.ndarray,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
    ):
        """BottomUp Reconciliation Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>
        `y_insample`: In-sample values of size (`base`, `insample_size`).<br>
        `y_hat_insample`: In-sample forecast values of size (`base`, `insample_size`).<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>	
        `level`: float list 0-100, confidence levels for prediction intervals.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `**sampler_kwargs`: Coherent sampler instantiation arguments.<br>

        **Returns:**<br>
        `y_tilde`: Reconciliated y_hat using the Bottom Up approach.
        """
        # Fit creates P, W and sampler attributes
        self.fit(
            S=S,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
            idx_bottom=idx_bottom,
        )

        return self._reconcile(
            S=S, P=self.P, y_hat=y_hat, sampler=self.sampler, level=level
        )

    __call__ = fit_predict

In [None]:
show_doc(BottomUp, title_level=3)

In [None]:
show_doc(BottomUp.fit, name="BottomUp.fit", title_level=3)

In [None]:
show_doc(BottomUp.predict, name="BottomUp.predict", title_level=3)

In [None]:
show_doc(BottomUp.fit_predict, name="BottomUp.fit_predict", title_level=3)

In [None]:
show_doc(BottomUp.sample, name="BottomUp.sample", title_level=3)

In [None]:
# | export
class BottomUpSparse(BottomUp):
    """BottomUpSparse Reconciliation Class.

    This is the implementation of a Bottom Up reconciliation using the sparse
    matrix approach. It works much more efficient on datasets with many time series.
    [makoren: At least I hope so, I only checked up until ~20k time series, and
    there's no real improvement, it would be great to check for smth like 1M time
    series, where the dense S matrix really stops fitting in memory]

    See the parent class for more details.
    """

    is_sparse_method = True

    def _get_PW_matrices(self, S, idx_bottom):
        n_hiers, n_bottom = S.shape
        P = sparse.eye(n_bottom, n_hiers, n_hiers - n_bottom, np.float64, "csr")
        if getattr(self, "intervals_method", False) is None:
            W = None
        else:
            W = sparse.eye(n_hiers, dtype=np.float64, format="csr")
        return P, W

In [None]:
show_doc(BottomUpSparse, title_level=3)

In [None]:
show_doc(BottomUpSparse.fit, name="BottomUpSparse.fit", title_level=3)

In [None]:
show_doc(BottomUpSparse.predict, name="BottomUpSparse.predict", title_level=3)

In [None]:
show_doc(BottomUpSparse.fit_predict, name="BottomUpSparse.fit_predict", title_level=3)

In [None]:
show_doc(BottomUpSparse.sample, name="BottomUpSparse.sample", title_level=3)

In [None]:
#| hide
S = np.array(
    [
        [1.0, 1.0, 1.0, 1.0],
        [1.0, 1.0, 0.0, 0.0],
        [0.0, 0.0, 1.0, 1.0],
        [1.0, 0.0, 0.0, 0.0],
        [0.0, 1.0, 0.0, 0.0],
        [0.0, 0.0, 1.0, 0.0],
        [0.0, 0.0, 0.0, 1.0],
    ]
)
h = 2
_y = np.array([10.0, 5.0, 4.0, 2.0, 1.0])
y_bottom = np.vstack([i * _y for i in range(1, 5)])
y_hat_bottom_insample = np.roll(y_bottom, 1)
y_hat_bottom_insample[:, 0] = np.nan
y_hat_bottom = np.vstack([i * np.ones(h) for i in range(1, 5)])
idx_bottom = [3, 4, 5, 6]
tags = {"level1": np.array([0]), "level2": np.array([1, 2]), "level3": idx_bottom}

# sigmah for all levels in the hierarchy
# sigmah for Naive method
# as calculated here:
# https://otexts.com/fpp3/prediction-intervals.html
y_base = S @ y_bottom
y_hat_base = S @ y_hat_bottom
y_hat_base_insample = S @ y_hat_bottom_insample
sigma = np.nansum((y_base - y_hat_base_insample) ** 2, axis=1) / (y_base.shape[1] - 1)
sigma = np.sqrt(sigma)
sigmah = sigma[:, None] * np.sqrt(
    np.vstack([np.arange(1, h + 1) for _ in range(y_base.shape[0])])
)

In [None]:
#| hide
cls_bottom_up = BottomUp()
test_eq(
    cls_bottom_up(S=S, y_hat=S @ y_hat_bottom, idx_bottom=idx_bottom)["mean"],
    S @ y_hat_bottom,
)

In [None]:
#| hide
# test bottom up forecast recovery
cls_bottom_up = BottomUp()
bu_bootstrap_intervals = cls_bottom_up(
    S=S,
    y_hat=S @ y_hat_bottom,
    idx_bottom=idx_bottom,
)
test_eq(bu_bootstrap_intervals["mean"], S @ y_hat_bottom)

In [None]:
#| hide
# test forecast recovery with fit -> predict
cls_bottom_up = BottomUp()
cls_bottom_up.fit(S=S, y_hat=S @ y_hat_bottom, idx_bottom=idx_bottom)
y_tilde = cls_bottom_up.predict(S=S, y_hat=S @ y_hat_bottom)["mean"]
test_eq(y_tilde, S @ y_hat_bottom)

In [None]:
#| hide
# test not fitted message, for unfitted predict
cls_bottom_up = BottomUp()
test_fail(
    cls_bottom_up.predict,
    contains="not fitted yet",
    args=(S, S @ y_hat_bottom),
)

# 2. Top-Down

In [None]:
#| hide
assert is_strictly_hierarchical(S, tags)
S_non_hier = np.array(
    [
        [1.0, 1.0, 1.0, 1.0],  # total
        [1.0, 1.0, 0.0, 0.0],  # city 1
        [0.0, 0.0, 1.0, 1.0],  # city 2
        [1.0, 0.0, 1.0, 0.0],  # transgender
        [0.0, 1.0, 0.0, 1.0],  # no transgender
        [1.0, 0.0, 0.0, 0.0],  # city 1 - transgender
        [0.0, 1.0, 0.0, 0.0],  # city 1 - no transgender
        [0.0, 0.0, 1.0, 0.0],  # city 2 - transgender
        [0.0, 0.0, 0.0, 1.0],  # city 2 - no transgender
    ]
)
tags_non_hier = {
    "Country": np.array([0]),
    "Country/City": np.array([2, 1]),
    "Country/Transgender": np.array([3, 4]),
    "Country-City-Transgender": np.array([5, 6, 7, 8]),
}
assert not is_strictly_hierarchical(S_non_hier, tags_non_hier)

In [None]:
#| hide
A = np.array(
    [
        [0, 1, 1, 0, 0, 0, 0],
        [0, 0, 0, 1, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 1],
    ]
)
test_eq(_construct_adjacency_matrix(sparse.csr_matrix(S), tags).toarray(), A)
assert _is_strictly_hierarchical(sparse.csr_matrix(A, dtype=bool))
A_non_hier = np.array(
    [
        [0, 1, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 1, 0, 0, 0, 0],
        [0, 0, 0, 1, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 1],
    ]
)
test_eq(_construct_adjacency_matrix(sparse.csr_matrix(S_non_hier), tags_non_hier).toarray(), A_non_hier)
assert not _is_strictly_hierarchical(sparse.csr_matrix(A_non_hier, dtype=bool))

In [None]:
#| exporti
def _get_child_nodes(
    S: Union[np.ndarray, sparse.csr_matrix], tags: dict[str, np.ndarray]
):
    if isinstance(S, sparse.spmatrix):
        S = S.toarray()
    level_names = list(tags.keys())
    nodes = OrderedDict()
    for i_level, level in enumerate(level_names[:-1]):
        parent = tags[level]
        child = np.zeros_like(S)
        idx_child = tags[level_names[i_level + 1]]
        child[idx_child] = S[idx_child]
        nodes_level = {}
        for idx_parent_node in parent:
            parent_node = S[idx_parent_node]
            idx_node = child * parent_node.astype(bool)
            (idx_node,) = np.where(idx_node.sum(axis=1) > 0)
            nodes_level[idx_parent_node] = [idx for idx in idx_child if idx in idx_node]
        nodes[level] = nodes_level
    return nodes

In [None]:
#| exporti
def _reconcile_fcst_proportions(
    S: np.ndarray,
    y_hat: np.ndarray,
    tags: dict[str, np.ndarray],
    nodes: dict[str, dict[int, np.ndarray]],
    idx_top: int,
):
    reconciled = np.zeros_like(y_hat)
    reconciled[idx_top] = y_hat[idx_top]
    level_names = list(tags.keys())
    for i_level, level in enumerate(level_names[:-1]):
        nodes_level = nodes[level]
        for idx_parent, idx_childs in nodes_level.items():
            fcst_parent = reconciled[idx_parent]
            childs_sum = y_hat[idx_childs].sum()
            for idx_child in idx_childs:
                if np.abs(childs_sum) < 1e-8:
                    n_children = len(idx_childs)
                    reconciled[idx_child] = fcst_parent / n_children
                else:
                    reconciled[idx_child] = y_hat[idx_child] * fcst_parent / childs_sum
    return reconciled

In [None]:
#| export
class TopDown(HReconciler):
    """Top Down Reconciliation Class.

    The Top Down hierarchical reconciliation method, distributes the total aggregate predictions and decomposes
    it down the hierarchy using proportions $\mathbf{p}_{\mathrm{[b]}}$ that can be actual historical values
    or estimated.

    $$\mathbf{P}=[\mathbf{p}_{\mathrm{[b]}}\;|\;\mathbf{0}_{\mathrm{[b][a,b\;-1]}}]$$
    **Parameters:**<br>
    `method`: One of `forecast_proportions`, `average_proportions` and `proportion_averages`.<br>

    **References:**<br>
    - [CW. Gross (1990). \"Disaggregation methods to expedite product line forecasting\". Journal of Forecasting, 9 , 233–254.
    doi:10.1002/for.3980090304](https://onlinelibrary.wiley.com/doi/abs/10.1002/for.3980090304).<br>
    - [G. Fliedner (1999). \"An investigation of aggregate variable time series forecast strategies with specific subaggregate
    time series statistical correlation\". Computers and Operations Research, 26 , 1133–1149.
    doi:10.1016/S0305-0548(99)00017-9](https://doi.org/10.1016/S0305-0548(99)00017-9).
    """

    def __init__(self, method: str):
        self.method = method
        self.insample = method in ["average_proportions", "proportion_averages"]

    def _get_PW_matrices(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        y_insample: np.ndarray,
        tags: Optional[dict[str, np.ndarray]] = None,
    ):

        n_hiers, n_bottom = S.shape

        # Check if the data structure is strictly hierarchical.
        if tags is not None:
            if not is_strictly_hierarchical(S, tags):
                raise ValueError(
                    "Top-down reconciliation requires strictly hierarchical structures."
                )
            idx_top = int(S.sum(axis=1).argmax())
            levels_ = dict(sorted(tags.items(), key=lambda x: len(x[1])))
            idx_bottom = levels_[list(levels_)[-1]]
            y_btm = y_insample[idx_bottom]
        else:
            idx_top = 0
            y_btm = y_insample[(n_hiers - n_bottom) :]

        y_top = y_insample[idx_top]

        if self.method == "average_proportions":
            prop = np.nanmean(y_btm / y_top, axis=1)
        elif self.method == "proportion_averages":
            prop = np.nanmean(y_btm, axis=1) / np.nanmean(y_top)
        elif self.method == "forecast_proportions":
            raise NotImplementedError(f"Fit method not implemented for {self.method} yet")
        else:
            raise ValueError(f"Unknown method {self.method}")
        
        if np.isnan(y_btm).any() or np.isnan(y_top).any():
            warnings.warn(
                '''
                Warning: There are NaN values in one or more levels of Y_df.
                This may lead to unexpected behavior when computing average proportions and proportion averages.
                '''
            )

        P = np.zeros_like(
            S, np.float64
        ).T  # float 64 if prop is too small, happens with wiki2
        P[:, idx_top] = prop
        W = np.eye(n_hiers, dtype=np.float64)
        return P, W

    def fit(
        self,
        S,
        y_hat,
        y_insample: np.ndarray,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
        idx_bottom: Optional[np.ndarray] = None,
    ):
        """TopDown Fit Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `y_insample`: Insample values of size (`base`, `insample_size`). Optional for `forecast_proportions` method.<br>
        `y_hat_insample`: Insample forecast values of size (`base`, `insample_size`). Optional for `forecast_proportions` method.<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `interval_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>

        **Returns:**<br>
        `self`: object, fitted reconciler.
        """
        self.intervals_method = intervals_method
        self.P, self.W = self._get_PW_matrices(
            S=S, y_hat=y_hat, tags=tags, y_insample=y_insample
        )
        self.sampler = self._get_sampler(
            S=S,
            P=self.P,
            W=self.W,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
        )
        self.fitted = True
        return self

    def fit_predict(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        tags: dict[str, np.ndarray],
        idx_bottom: np.ndarray = None,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
    ):
        """Top Down Reconciliation Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>
        `y_insample`: Insample values of size (`base`, `insample_size`). Optional for `forecast_proportions` method.<br>
        `y_hat_insample`: Insample forecast values of size (`base`, `insample_size`). Optional for `forecast_proportions` method.<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `level`: float list 0-100, confidence levels for prediction intervals.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>

        **Returns:**<br>
        `y_tilde`: Reconciliated y_hat using the Top Down approach.
        """
        if self.method == "forecast_proportions":
            idx_top = int(S.sum(axis=1).argmax())
            levels_ = dict(sorted(tags.items(), key=lambda x: len(x[1])))
            if level is not None:
                raise ValueError("Prediction intervals not implemented for `forecast_proportions`")
            nodes = _get_child_nodes(S=S, tags=levels_)
            reconciled = [
                _reconcile_fcst_proportions(
                    S=S,
                    y_hat=y_hat_[:, None],
                    tags=levels_,
                    nodes=nodes,
                    idx_top=idx_top,
                )
                for y_hat_ in y_hat.T
            ]
            reconciled = np.hstack(reconciled)
            return {"mean": reconciled}
        else:
            # Fit creates P, W and sampler attributes
            self.fit(
                S=S,
                y_hat=y_hat,
                y_insample=y_insample,
                y_hat_insample=y_hat_insample,
                sigmah=sigmah,
                intervals_method=intervals_method,
                num_samples=num_samples,
                seed=seed,
                tags=tags,
                idx_bottom=idx_bottom,
            )
            return self._reconcile(
                S=S, P=self.P, y_hat=y_hat, level=level, sampler=self.sampler
            )

    __call__ = fit_predict

In [None]:
show_doc(TopDown, title_level=3)

In [None]:
show_doc(TopDown.fit, name="TopDown.fit", title_level=3)

In [None]:
show_doc(TopDown.predict, name="TopDown.predict", title_level=3)

In [None]:
show_doc(TopDown.fit_predict, name="TopDown.fit_predict", title_level=3)

In [None]:
show_doc(TopDown.sample, name="TopDown.sample", title_level=3)

In [None]:
#| export
class TopDownSparse(TopDown):
    """TopDownSparse Reconciliation Class.

    This is an implementation of top-down reconciliation using the sparse matrix
    approach. It works much more efficiently on data sets with many time series.

    See the parent class for more details.
    """

    is_sparse_method = True
    is_strictly_hierarchical = False

    def _get_PW_matrices(
        self,
        S: sparse.csr_matrix,
        y_hat: np.ndarray,
        y_insample: np.ndarray,
        tags: Optional[dict[str, np.ndarray]] = None,
    ):
        # Avoid a redundant check during middle-out reconciliation.
        if not self.is_strictly_hierarchical:
            # Check if the data structure is strictly hierarchical.
            if tags is not None and not _is_strictly_hierarchical(
                _construct_adjacency_matrix(S, tags)
            ):
                raise ValueError(
                    "Top-down reconciliation requires strictly hierarchical structures."
                )

        # Get the dimensions of the "summing" matrix.
        n_hiers, n_bottom = S.shape

        # Get the in-sample values of the top node and bottom nodes.
        y_top = y_insample[0]
        y_btm = y_insample[(n_hiers - n_bottom) :]

        # Calculate the disaggregation proportions.
        if self.method == "average_proportions":
            prop = np.mean(y_btm / y_top, 1)
        elif self.method == "proportion_averages":
            prop = np.mean(y_btm, 1) / np.mean(y_top)
        elif self.method == "forecast_proportions":
            raise ValueError(f"Fit method not yet implemented for {self.method}.")
        else:
            raise ValueError(f"{self.method} is an unknown disaggregation method.")

        # Instantiate and allocate the "projection" matrix to distribute the
        # disaggregated base forecast of the top node to the bottom nodes.
        P = sparse.csr_matrix(
            (
                prop,
                np.zeros_like(prop, np.uint8),
                np.arange(len(prop) + 1, dtype=np.min_scalar_type(n_bottom)),
            ),
            shape=(n_bottom, n_hiers),
            dtype=np.float64,
        )

        # Instantiate and allocate the "weight" matrix.
        if getattr(self, "intervals_method", False) is None:
            W = None
        else:
            W = sparse.eye(n_hiers, dtype=np.float64, format="csr")

        return P, W

    def fit_predict(
        self,
        S: sparse.csr_matrix,
        y_hat: np.ndarray,
        tags: dict[str, np.ndarray],
        idx_bottom: np.ndarray = None,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
    ) -> dict[str, np.ndarray]:
        if self.method == "forecast_proportions":
            # Check if probabilistic reconciliation is required.
            if level is not None:
                raise NotImplementedError(
                    "Prediction intervals are not implemented for `forecast_proportions`."
                )
            # Construct the adjacency matrix.
            A = _construct_adjacency_matrix(S, tags)
            # Avoid a redundant check during middle-out reconciliation.
            if not self.is_strictly_hierarchical:
                # Check if the data structure is strictly hierarchical.
                if tags is not None and not _is_strictly_hierarchical(A):
                    raise ValueError(
                        "Top-down reconciliation requires strictly hierarchical structures."
                    )
            # As we may have zero sibling sums, replace any zeroes with eps.
            y_hat[y_hat == 0.0] = np.finfo(np.float64).eps
            # Calculate the relative proportions for each node.
            with np.errstate(divide="ignore"):
                P = y_hat / ((A.T @ A) @ y_hat)
            # Set the relative proportion of the root node.
            P[P == np.inf] = 1.0
            # Precompute the transpose of the summing matrix.
            S_T = S.T
            # Propagate the relative proportions for the nodes along each leaf
            # node's disaggregation pathway, convert the resultant sparse
            # matrix to a LIL matrix for an efficient dense conversion, stack
            # the lists, calculate the row-wise product to get the forecast
            # proportions, and use these to reconcile the forecasts.
            y_tilde = np.array(
                [
                    S
                    @ (
                        np.prod(np.vstack(S_T.multiply(P[:, i]).tolil().data), 1)
                        * y_hat[0, i]
                    )
                    for i in range(y_hat.shape[1])
                ]
            ).T
            return {"mean": y_tilde}
        else:
            # Fit creates the P, W, and sampler attributes.
            self.fit(
                S=S,
                y_hat=y_hat,
                y_insample=y_insample,
                y_hat_insample=y_hat_insample,
                sigmah=sigmah,
                intervals_method=intervals_method,
                num_samples=num_samples,
                seed=seed,
                tags=tags,
                idx_bottom=idx_bottom,
            )
            return self._reconcile(
                S=S, P=self.P, y_hat=y_hat, level=level, sampler=self.sampler
            )

    __call__ = fit_predict

In [None]:
show_doc(TopDownSparse, title_level=3)

In [None]:
show_doc(TopDownSparse.fit, name="TopDownSparse.fit", title_level=3)

In [None]:
show_doc(TopDownSparse.predict, name="TopDownSparse.predict", title_level=3)

In [None]:
show_doc(TopDownSparse.fit_predict, name="TopDownSparse.fit_predict", title_level=3)

In [None]:
show_doc(TopDownSparse.sample, name="TopDownSparse.sample", title_level=3)

In [None]:
#| hide
# we are able to recover forecasts
# from top_down in this example
# because the time series
# share the same proportion
# across time
# but it is not a general case
for method in ["forecast_proportions", "average_proportions", "proportion_averages"]:
    cls_top_down = TopDown(method=method)
    if cls_top_down.insample:
        assert method in ["average_proportions", "proportion_averages"]
        test_close(
            cls_top_down(
                S=S, y_hat=S @ y_hat_bottom, y_insample=S @ y_bottom, tags=tags
            )["mean"],
            S @ y_hat_bottom,
        )
    else:
        test_close(
            cls_top_down(S=S, y_hat=S @ y_hat_bottom, tags=tags)["mean"],
            S @ y_hat_bottom,
        )

In [None]:
cls_top_down(
                S=S, y_hat=S @ y_hat_bottom, y_insample=S @ y_bottom, tags=tags
            )["mean"]

In [None]:
#\ hide
cls_top_down = TopDownSparse(method="average_proportions")
test_fail(
    cls_top_down,
    contains="Top-down reconciliation requires strictly hierarchical structures.",
    args=(sparse.csr_matrix(S_non_hier), None, tags_non_hier),
)

In [None]:
#| hide
for method in ["forecast_proportions", "average_proportions", "proportion_averages"]:
    cls_top_down = TopDownSparse(method=method)
    if cls_top_down.insample:
        assert method in ["average_proportions", "proportion_averages"]
        test_close(
            cls_top_down(
                S=sparse.csr_matrix(S),
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            S @ y_hat_bottom,
        )
    else:
        test_close(
            cls_top_down(S=sparse.csr_matrix(S), y_hat=S @ y_hat_bottom, tags=tags)[
                "mean"
            ],
            S @ y_hat_bottom,
        )

In [None]:
#| hide
for method in ["forecast_proportions", "average_proportions", "proportion_averages"]:
    cls_top_down = TopDown(method=method)
    cls_top_down_sparse = TopDownSparse(method=method)
    if cls_top_down.insample:
        assert method in ["average_proportions", "proportion_averages"]
        test_close(
            cls_top_down(
                S=S,
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            cls_top_down_sparse(
                S=sparse.csr_matrix(S),
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            np.finfo(np.float64).eps,
        )
    else:
        test_close(
            cls_top_down(S=S, y_hat=S @ y_hat_bottom, tags=tags)["mean"],
            cls_top_down_sparse(
                S=sparse.csr_matrix(S),
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            1e-9,
        )

In [None]:
#| hide
for method in ["average_proportions", "proportion_averages"]:
    cls_top_down = TopDown(method=method)
    y_insample_orig = S @ y_bottom
    y_insample_orig[-1, :] = 0
    result_orig = cls_top_down(
        S=S, y_hat=S @ y_hat_bottom, y_insample=S @ y_bottom, tags=tags
    )["mean"]
    y_insample_nan = y_insample_orig.copy()
    y_insample_nan[-1, 0] = np.nan
    result_nan = cls_top_down(
        S=S, y_hat=S @ y_hat_bottom, y_insample=S @ y_bottom, tags=tags
    )["mean"]
    test_close(result_orig, result_nan)


# 3. Middle-Out

In [None]:
#| export
class MiddleOut(HReconciler):
    """Middle Out Reconciliation Class.

    This method is only available for **strictly hierarchical structures**. It anchors the base predictions
    in a middle level. The levels above the base predictions use the Bottom-Up approach, while the levels
    below use a Top-Down.

    **Parameters:**<br>
    `middle_level`: Middle level.<br>
    `top_down_method`: One of `forecast_proportions`, `average_proportions` and `proportion_averages`.<br>

    **References:**<br>
    - [Hyndman, R.J., & Athanasopoulos, G. (2021). \"Forecasting: principles and practice, 3rd edition:
    Chapter 11: Forecasting hierarchical and grouped series.\". OTexts: Melbourne, Australia. OTexts.com/fpp3
    Accessed on July 2022.](https://otexts.com/fpp3/hierarchical.html)

    """

    def __init__(self, middle_level: str, top_down_method: str):
        self.middle_level = middle_level
        self.top_down_method = top_down_method
        self.insample = top_down_method in [
            "average_proportions",
            "proportion_averages",
        ]

    def _get_PW_matrices(self, **kwargs):
        raise NotImplementedError("Not implemented")

    def fit(self, **kwargs):
        raise NotImplementedError("Not implemented")

    def predict(self, **kwargs):
        raise NotImplementedError("Not implemented")

    def fit_predict(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        tags: dict[str, np.ndarray],
        y_insample: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
    ):
        """Middle Out Reconciliation Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        `y_insample`: Insample values of size (`base`, `insample_size`). Only used for `forecast_proportions`<br>
        `level`: Not supported. <br>
        `intervals_method`: Not supported.<br>

        **Returns:**<br>
        `y_tilde`: Reconciliated y_hat using the Middle Out approach.
        """
        if level is not None or intervals_method is not None:
            raise NotImplementedError("Prediction intervals are not implemented for `MiddleOut`")

        if not is_strictly_hierarchical(S, tags):
            raise ValueError(
                "Middle out reconciliation requires strictly hierarchical structures."
            )
        if self.middle_level not in tags.keys():
            raise ValueError("You have to provide a `middle_level` in `tags`.")

        levels_ = dict(sorted(tags.items(), key=lambda x: len(x[1])))
        reconciled = np.full_like(y_hat, fill_value=np.nan)
        cut_nodes = levels_[self.middle_level]
        # bottom up reconciliation
        idxs_bu = []
        for node, idx_node in levels_.items():
            idxs_bu.append(idx_node)
            if node == self.middle_level:
                break
        idxs_bu = np.hstack(idxs_bu)
        # bottom up forecasts
        bu = BottomUp().fit_predict(
            S=np.fliplr(np.unique(S[idxs_bu], axis=1)),
            y_hat=y_hat[idxs_bu],
            idx_bottom=np.arange(len(idxs_bu))[-len(cut_nodes) :],
        )
        reconciled[idxs_bu] = bu["mean"]

        # top down
        child_nodes = _get_child_nodes(S, levels_)
        # parents contains each node in the middle out level
        # as key. The values of each node are the levels that
        # are conected to that node.
        parents = {node: {self.middle_level: np.array([node])} for node in cut_nodes}
        level_names = list(levels_.keys())
        for lv, lv_child in zip(level_names[:-1], level_names[1:]):
            # if lv is not part of the middle out to bottom
            # structure we continue
            if lv not in list(parents.values())[0].keys():
                continue
            for idx_middle_out in parents.keys():
                idxs_parents = parents[idx_middle_out].values()
                complete_idxs_child = []
                for idx_parent, idxs_child in child_nodes[lv].items():
                    if any(idx_parent in val for val in idxs_parents):
                        complete_idxs_child.append(idxs_child)
                parents[idx_middle_out][lv_child] = np.hstack(complete_idxs_child)

        for node, levels_node in parents.items():
            idxs_node = np.hstack(list(levels_node.values()))
            S_node = S[idxs_node]
            S_node = S_node[:, ~np.all(S_node == 0, axis=0)]
            counter = 0
            levels_node_ = deepcopy(levels_node)
            for lv_name, idxs_level in levels_node_.items():
                idxs_len = len(idxs_level)
                levels_node_[lv_name] = np.arange(counter, idxs_len + counter)
                counter += idxs_len
            td = TopDown(self.top_down_method).fit_predict(
                S=S_node,
                y_hat=y_hat[idxs_node],
                y_insample=y_insample[idxs_node] if y_insample is not None else None,
                tags=levels_node_,
            )
            reconciled[idxs_node] = td["mean"]
        return {"mean": reconciled}

    __call__ = fit_predict

In [None]:
show_doc(MiddleOut, title_level=3)

In [None]:
show_doc(MiddleOut.fit, name="MiddleOut.fit", title_level=3)

In [None]:
show_doc(MiddleOut.predict, name="MiddleOut.predict", title_level=3)

In [None]:
show_doc(MiddleOut.fit_predict, title_level=3)

In [None]:
show_doc(MiddleOut.sample, name="MiddleOut.sample", title_level=3)

In [None]:
#| export
class MiddleOutSparse(MiddleOut):
    """MiddleOutSparse Reconciliation Class.

    This is an implementation of middle-out reconciliation using the sparse matrix
    approach. It works much more efficiently on data sets with many time series.

    See the parent class for more details.
    """

    # Although this is a sparse method, as we need to a dense representation of the
    # "summing" matrix for the required transformations in the fit_predict method
    # prior to bottom-up and top-down reconciliation, we can avoid a redundant
    # conversion.
    is_sparse_method = False

    def fit_predict(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        tags: dict[str, np.ndarray],
        y_insample: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
    ) -> dict[str, np.ndarray]:
        # Check if probabilistic reconciliation is required.
        if level is not None or intervals_method is not None:
            raise NotImplementedError(
                "Prediction intervals are not implemented for `MiddleOutSparse`."
            )
        # Check if the middle level exists in the level to nodes mapping.
        if self.middle_level not in tags.keys():
            raise KeyError(f"{self.middle_level} is not a key in `tags`.")
        # Check if the data structure is strictly hierarchical.
        if not _is_strictly_hierarchical(
            _construct_adjacency_matrix(sparse.csr_matrix(S), tags)
        ):
            raise ValueError(
                "Middle-out reconciliation requires strictly hierarchical structures."
            )

        # Sort the levels by the number of nodes.
        levels = dict(sorted(tags.items(), key=lambda x: len(x[1])))
        # Allocate an array to store the reconciled point forecasts.
        y_tilde = np.full_like(y_hat, np.nan)
        # Find the nodes that constitute the middle level.
        cut_nodes = levels[self.middle_level]

        # Calculate the cut that separates the middle level from the lower levels.
        cut_idx = max(cut_nodes) + 1

        # Perform sparse bottom-up reconciliation from the middle level.
        y_tilde[:cut_idx, :] = BottomUpSparse().fit_predict(
            S=sparse.csr_matrix(np.fliplr(np.unique(S[:cut_idx, :], axis=1))),
            y_hat=y_hat[:cut_idx, :],
            idx_bottom=None,
        )["mean"]

        # Set up the reconciler for top-down reconciliation.
        cls_top_down = TopDownSparse(self.top_down_method)
        cls_top_down.is_strictly_hierarchical = True

        # Perform sparse top-down reconciliation from the middle level.
        for cut_node in cut_nodes:
            # Find the leaf nodes of the subgraph for the cut node.
            leaf_idx = np.flatnonzero(S[cut_node, :])
            # Find all the nodes in the subgraph for the cut node.
            sub_idx = np.hstack(
                (cut_node, cut_idx + np.flatnonzero((np.any(S[cut_idx:, leaf_idx], 1))))
            )

            # Construct the "tags" argument for the cut node.
            if self.insample:
                # It is not required for in-sample disaggregation methods.
                sub_tags = None
            else:
                # Disaggregating using forecast proportions requires the "tags" for
                # the subgraph.
                sub_tags = {}
                acc = 0
                for level_, nodes in levels.items():
                    # Find all the nodes in the subgraph for the level.
                    nodes = np.intersect1d(nodes, sub_idx, True)
                    # Get the number of nodes in the level.
                    n = len(nodes)
                    # Exclude any levels above the cut node or empty ones below.
                    if len(nodes) > 0:
                        sub_tags[level_] = np.arange(acc, n + acc)
                        acc += n

            # Perform sparse top-down reconciliation from the cut node.
            y_tilde[sub_idx, :] = cls_top_down.fit_predict(
                S=sparse.csr_matrix(S[sub_idx[:, None], leaf_idx]),
                y_hat=y_hat[sub_idx, :],
                y_insample=y_insample[sub_idx, :] if y_insample is not None else None,
                tags=sub_tags,
            )["mean"]

        return {"mean": y_tilde}

    __call__ = fit_predict

In [None]:
show_doc(MiddleOutSparse, title_level=3)

In [None]:
show_doc(MiddleOutSparse.fit, name="MiddleOutSparse.fit", title_level=3)

In [None]:
show_doc(MiddleOutSparse.predict, name="MiddleOutSparse.predict", title_level=3)

In [None]:
show_doc(MiddleOutSparse.fit_predict, title_level=3)

In [None]:
show_doc(MiddleOutSparse.sample, name="MiddleOutSparse.sample", title_level=3)

In [None]:
#| hide
# we are able to recover forecasts
# from middle out in this example
# because the time series
# share the same proportion
# across time
# but it is not a general case
for method in ["forecast_proportions", "average_proportions", "proportion_averages"]:
    cls_middle_out = MiddleOut(middle_level="level2", top_down_method=method)
    if cls_middle_out.insample:
        assert method in ["average_proportions", "proportion_averages"]
        test_close(
            cls_middle_out(
                S=S, y_hat=S @ y_hat_bottom, y_insample=S @ y_bottom, tags=tags
            )["mean"],
            S @ y_hat_bottom,
        )
    else:
        test_close(
            cls_middle_out(S=S, y_hat=S @ y_hat_bottom, tags=tags)["mean"],
            S @ y_hat_bottom,
        )

In [None]:
#| hide
for method in ["forecast_proportions", "average_proportions", "proportion_averages"]:
    cls_middle_out = MiddleOutSparse(middle_level="level2", top_down_method=method)
    if cls_middle_out.insample:
        assert method in ["average_proportions", "proportion_averages"]
        test_close(
            cls_middle_out(
                S=S,
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            S @ y_hat_bottom,
        )
    else:
        test_close(
            cls_middle_out(S=S, y_hat=S @ y_hat_bottom, tags=tags)["mean"],
            S @ y_hat_bottom,
        )

In [None]:
#| hide
for method in ["forecast_proportions", "average_proportions", "proportion_averages"]:
    cls_middle_out = MiddleOut(middle_level="level2", top_down_method=method)
    cls_middle_out_sparse = MiddleOutSparse(
        middle_level="level2", top_down_method=method
    )
    if cls_middle_out.insample:
        assert method in ["average_proportions", "proportion_averages"]
        test_close(
            cls_middle_out(
                S=S,
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            cls_middle_out_sparse(
                S=S,
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            np.finfo(np.float64).eps,
        )
    else:
        test_close(
            cls_middle_out(S=S, y_hat=S @ y_hat_bottom, tags=tags)["mean"],
            cls_middle_out_sparse(
                S=S,
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                tags=tags,
            )["mean"],
            np.finfo(np.float64).eps,
        )

# 4. Min-Trace

In [None]:
#| export
class MinTrace(HReconciler):
    """MinTrace Reconciliation Class.

    This reconciliation algorithm proposed by Wickramasuriya et al. depends on a generalized least squares estimator
    and an estimator of the covariance matrix of the coherency errors $\mathbf{W}_{h}$. The Min Trace algorithm
    minimizes the squared errors for the coherent forecasts under an unbiasedness assumption; the solution has a
    closed form.<br>

    $$
    \mathbf{P}_{\\text{MinT}}=\\left(\mathbf{S}^{\intercal}\mathbf{W}_{h}\mathbf{S}\\right)^{-1}
    \mathbf{S}^{\intercal}\mathbf{W}^{-1}_{h}
    $$

    **Parameters:**<br>
    `method`: str, one of `ols`, `wls_struct`, `wls_var`, `mint_shrink`, `mint_cov`.<br>
    `nonnegative`: bool, reconciled forecasts should be nonnegative?<br>
    `mint_shr_ridge`: float=2e-8, ridge numeric protection to MinTrace-shr covariance estimator.<br>
    `num_threads`: int=1, number of threads to use for solving the optimization problems (when nonnegative=True).

    **References:**<br>
    - [Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). \"Optimal forecast reconciliation for
    hierarchical and grouped time series through trace minimization\". Journal of the American Statistical Association,
    114 , 804–819. doi:10.1080/01621459.2018.1448825.](https://robjhyndman.com/publications/mint/).
    - [Wickramasuriya, S.L., Turlach, B.A. & Hyndman, R.J. (2020). \"Optimal non-negative
    forecast reconciliation". Stat Comput 30, 1167–1182,
    https://doi.org/10.1007/s11222-020-09930-0](https://robjhyndman.com/publications/nnmint/).
    """

    def __init__(
        self,
        method: str,
        nonnegative: bool = False,
        mint_shr_ridge: Optional[float] = 2e-8,
        num_threads: int = 1,
    ):
        self.method = method
        self.nonnegative = nonnegative
        self.insample = method in ["wls_var", "mint_cov", "mint_shrink"]
        if method == "mint_shrink":
            self.mint_shr_ridge = mint_shr_ridge
        self.num_threads = num_threads
        if not self.nonnegative and self.num_threads > 1:
            warnings.warn("`num_threads` is only used when `nonnegative=True`")

    def _get_PW_matrices(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        idx_bottom: Optional[list[int]] = None,
    ):
        # shape residuals_insample (n_hiers, obs)
        res_methods = ["wls_var", "mint_cov", "mint_shrink"]
        if self.method in res_methods and (y_insample is None or y_hat_insample is None):
            raise ValueError(
                f"Check `Y_df`. For method `{self.method}` you need to pass insample predictions and insample values."
            )
        n_hiers, n_bottom = S.shape
        n_aggs = n_hiers - n_bottom
        # Construct J and U.T
        J = np.concatenate(
            (np.zeros((n_bottom, n_aggs), dtype=np.float64), S[n_aggs:]), axis=1
        )
        Ut = np.concatenate((np.eye(n_aggs, dtype=np.float64), -S[:n_aggs]), axis=1)
        if self.method == "ols":
            W = np.eye(n_hiers)
            UtW = Ut
        elif self.method == "wls_struct":
            Wdiag = np.sum(S, axis=1, dtype=np.float64)
            UtW = Ut * Wdiag
            W = np.diag(Wdiag)
        elif (
            self.method in res_methods
            and y_insample is not None
            and y_hat_insample is not None
        ):
            # Residuals with shape (obs, n_hiers)
            residuals = (y_insample - y_hat_insample).T
            n, _ = residuals.shape

            # Protection: against overfitted model
            residuals_sum = np.sum(residuals, axis=0)
            zero_residual_prc = np.abs(residuals_sum) < 1e-4
            zero_residual_prc = np.mean(zero_residual_prc)
            if zero_residual_prc > 0.98:
                raise Exception(
                    f"Insample residuals close to 0, zero_residual_prc={zero_residual_prc}. Check `Y_df`"
                )

            if self.method == "wls_var":
                Wdiag = (
                    np.nansum(residuals**2, axis=0, dtype=np.float64)
                    / residuals.shape[0]
                )
                Wdiag += np.full(n_hiers, 2e-8, dtype=np.float64)
                W = np.diag(Wdiag)
                UtW = Ut * Wdiag
            elif self.method == "mint_cov":
                # Compute nans
                nan_mask = np.isnan(residuals.T)
                if np.any(nan_mask):
                    W = _ma_cov(residuals.T, ~nan_mask)
                else:
                    W = np.cov(residuals.T)

                UtW = Ut @ W
            elif self.method == "mint_shrink":
                # Compute nans
                nan_mask = np.isnan(residuals.T)
                # Compute shrunk empirical covariance
                if np.any(nan_mask):
                    W = _shrunk_covariance_schaferstrimmer_with_nans(
                        residuals.T, ~nan_mask, self.mint_shr_ridge
                    )
                else:
                    W = _shrunk_covariance_schaferstrimmer_no_nans(
                        residuals.T, self.mint_shr_ridge
                    )

                UtW = Ut @ W
        else:
            raise ValueError(f"Unknown reconciliation method {self.method}")

        try:
            P = (
                J
                - np.linalg.solve(
                    UtW[:, n_aggs:] @ Ut.T[n_aggs:] + UtW[:, :n_aggs],
                    UtW[:, n_aggs:] @ J.T[n_aggs:],
                ).T
                @ Ut
            )
        except np.linalg.LinAlgError:
            if self.method == "mint_shrink":
                raise Exception(
                    f"min_trace ({self.method}) is ill-conditioned. Increase the value of parameter 'mint_shr_ridge' or use another reconciliation method."
                )
            else:
                raise Exception(
                    f"min_trace ({self.method}) is ill-conditioned. Please use another reconciliation method."
                )

        return P, W

    def fit(
        self,
        S,
        y_hat,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
        idx_bottom: Optional[np.ndarray] = None,
    ):
        """MinTrace Fit Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `y_insample`: Insample values of size (`base`, `insample_size`). Only used with "wls_var", "mint_cov", "mint_shrink".<br>
        `y_hat_insample`: Insample forecast values of size (`base`, `insample_size`). Only used with "wls_var", "mint_cov", "mint_shrink"<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>

        **Returns:**<br>
        `self`: object, fitted reconciler.
        """
        self.y_hat = y_hat
        self.P, self.W = self._get_PW_matrices(
            S=S,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            idx_bottom=idx_bottom,
        )

        if self.nonnegative:
            _, n_bottom = S.shape
            W_inv = np.linalg.pinv(self.W)
            negatives = y_hat < 0
            if negatives.any():
                warnings.warn("Replacing negative forecasts with zero.")
                y_hat = np.copy(y_hat)
                y_hat[negatives] = 0.0
            # Quadratic progamming formulation
            # here we are solving the quadratic programming problem
            # formulated in the origial paper
            # https://robjhyndman.com/publications/nnmint/
            # The library quadprog was chosen
            # based on these benchmarks:
            # https://scaron.info/blog/quadratic-programming-in-python.html
            a = S.T @ W_inv
            G = a @ S
            try:
                _ = np.linalg.cholesky(G)
            except np.linalg.LinAlgError:
                raise Exception(
                    f"min_trace ({self.method}) is ill-conditioned. Try setting nonnegative=False or use another reconciliation method."
                )
            C = np.eye(n_bottom)
            b = np.zeros(n_bottom)
            # the quadratic programming problem
            # returns the forecasts of the bottom series
            if self.num_threads == 1:
                bottom_fcts = np.apply_along_axis(
                    lambda y_hat: solve_qp(G=G, a=a @ y_hat, C=C, b=b)[0],
                    axis=0,
                    arr=y_hat,
                )
            else:
                futures = []
                with ThreadPoolExecutor(self.num_threads) as executor:
                    for j in range(y_hat.shape[1]):
                        future = executor.submit(
                            solve_qp, G=G, a=a @ y_hat[:, j], C=C, b=b
                        )
                        futures.append(future)
                    bottom_fcts = np.hstack([f.result()[0][:, None] for f in futures])
            if not np.all(bottom_fcts > -1e-8):
                raise Exception("nonnegative optimization failed")
            # remove negative values close to zero
            bottom_fcts = np.clip(np.float32(bottom_fcts), a_min=0, a_max=None)
            self.y_hat = S @ bottom_fcts  # Hack

            # Overwrite P, W and sampler attributes with BottomUp's
            self.P, self.W = BottomUp()._get_PW_matrices(S=S, idx_bottom=idx_bottom)

        self.sampler = self._get_sampler(
            S=S,
            P=self.P,
            W=self.W,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
        )
        self.fitted = True
        return self

    def fit_predict(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        idx_bottom: np.ndarray = None,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
    ):
        """MinTrace Reconciliation Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>
        `y_insample`: Insample values of size (`base`, `insample_size`). Only used by `wls_var`, `mint_cov`, `mint_shrink`<br>
        `y_hat_insample`: Insample fitted values of size (`base`, `insample_size`). Only used by `wls_var`, `mint_cov`, `mint_shrink`<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `level`: float list 0-100, confidence levels for prediction intervals.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        
        **Returns:**<br>
        `y_tilde`: Reconciliated y_hat using the MinTrace approach.
        """
        if self.nonnegative:
            if (level is not None) and intervals_method in ["bootstrap", "permbu"]:
                raise ValueError(
                    "nonnegative reconciliation is not compatible with bootstrap or permbu forecasts"
                )
            if idx_bottom is None:
                raise ValueError("`idx_bottom` cannot be None with nonnegative reconciliation")

        # Fit creates P, W and sampler attributes
        self.fit(
            S=S,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
            idx_bottom=idx_bottom,
        )

        return self._reconcile(
            S=S, P=self.P, y_hat=self.y_hat, level=level, sampler=self.sampler
        )

    __call__ = fit_predict

In [None]:
show_doc(MinTrace, title_level=3)

In [None]:
show_doc(MinTrace.fit, name="MinTrace.fit", title_level=3)

In [None]:
show_doc(MinTrace.predict, name="MinTrace.predict", title_level=3)

In [None]:
show_doc(MinTrace.fit_predict, name="MinTrace.fit_predict", title_level=3)

In [None]:
show_doc(MinTrace.sample, name="MinTrace.sample", title_level=3)

In [None]:
#| export
class MinTraceSparse(MinTrace):
    """MinTraceSparse Reconciliation Class.

    This is the implementation of OLS and WLS estimators using sparse matrices. It is not guaranteed
    to give identical results to the non-sparse version, but works much more efficiently on data sets
    with many time series.<br>

    See the parent class for more details.<br>

    **Parameters:**<br>
    `method`: str, one of `ols`, `wls_struct`, or `wls_var`.<br>
    `nonnegative`: bool, return non-negative reconciled forecasts.<br>
    `num_threads`: int, number of threads to execute non-negative quadratic programming calls.<br>
    `qp`: bool, implement non-negativity constraint with a quadratic programming approach. Setting 
    this to True generally gives better results, but at the expense of higher cost to compute. <br>
    """

    is_sparse_method = True

    def __init__(
        self,
        method: str,
        nonnegative: bool = False,
        num_threads: int = 1,
        qp: bool = True,
    ) -> None:
        if method not in ["ols", "wls_struct", "wls_var"]:
            raise NotImplementedError(
                f"`{method}` is not supported for MinTraceSparse. Choose from `ols`, `wls_struct`, or `wls_var`."
            )
        # Call the parent constructor.
        super().__init__(method, nonnegative, num_threads=num_threads)
        # Assign the attributes specific to the sparse class.
        self.qp = qp

    def _get_PW_matrices(
        self,
        S: Union[np.ndarray, sparse.spmatrix],
        y_hat: np.ndarray,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        idx_bottom: Optional[list[int]] = None,
    ):
        # shape residuals_insample (n_hiers, obs)
        res_methods = ["wls_var", "mint_cov", "mint_shrink"]

        S = sparse.csr_matrix(S)

        if self.method in res_methods and (y_insample is None or y_hat_insample is None):
            raise ValueError(
                f"Check `Y_df`. For method `{self.method}` you need to pass insample predictions and insample values."
            )
        n_hiers, n_bottom = S.shape

        if self.method == "ols":
            W_diag = np.ones(n_hiers)
        elif self.method == "wls_struct":
            W_diag = S @ np.ones((n_bottom,))
        elif (
            self.method == "wls_var"
            and y_insample is not None
            and y_hat_insample is not None
        ):
            # Residuals with shape (obs, n_hiers)
            residuals = (y_insample - y_hat_insample).T
            n, _ = residuals.shape

            # Protection: against overfitted model
            residuals_sum = np.sum(residuals, axis=0)
            zero_residual_prc = np.abs(residuals_sum) < 1e-4
            zero_residual_prc = np.mean(zero_residual_prc)
            if zero_residual_prc > 0.98:
                raise Exception(
                    f"Insample residuals close to 0, zero_residual_prc={zero_residual_prc}. Check `Y_df`"
                )

            # Protection: cases where data is unavailable/nan
            # makoren: this masking stuff causes more harm than good, I found the results in the presence
            # of nan-s can often be rubbish, I'd argue it's better to fail than give rubbish results, here
            # the code is simply failing if it encounters nan in the variance vector.
            # masked_res = np.ma.array(residuals, mask=np.isnan(residuals))
            # covm = np.ma.cov(masked_res, rowvar=False, allow_masked=True).data

            W_diag = np.nanvar(residuals, axis=0, ddof=1)
        else:
            raise ValueError(f"Unknown reconciliation method {self.method}")

        if any(W_diag < 1e-8):
            raise Exception(
                f"min_trace ({self.method}) needs covariance matrix to be positive definite."
            )

        if any(np.isnan(W_diag)):
            raise Exception(
                f"min_trace ({self.method}) needs covariance matrix to be positive definite (not nan)."
            )

        M = sparse.spdiags(np.reciprocal(W_diag), 0, W_diag.size, W_diag.size)
        R = sparse.csr_matrix(S.T @ M)

        # The implementation of P acting on a vector:
        def get_P_action(y):
            b = R @ y

            A = sparse.linalg.LinearOperator(
                (b.size, b.size), matvec=lambda v: R @ (S @ v)
            )

            x_tilde, exit_code = sparse.linalg.bicgstab(A, b, atol=1e-5)

            return x_tilde

        P = sparse.linalg.LinearOperator(
            (S.shape[1], y_hat.shape[0]), matvec=get_P_action
        )
        W = sparse.spdiags(W_diag, 0, W_diag.size, W_diag.size)

        return P, W

    def fit(
        self,
        S: sparse.csr_matrix,
        y_hat: np.ndarray,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
        idx_bottom: Optional[np.ndarray] = None,
    ) -> "MinTraceSparse":
        """MinTraceSparse Fit Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `y_insample`: Insample values of size (`base`, `insample_size`). Only used with "wls_var".<br>
        `y_hat_insample`: Insample forecast values of size (`base`, `insample_size`). Only used with "wls_var"<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>

        **Returns:**<br>
        `self`: object, fitted reconciler.
        """
        if self.nonnegative:
            # Clip the base forecasts to align them with their use in practice.
            self.y_hat = np.clip(y_hat, 0, None)
            # Get the number of nodes, leaf nodes, and parent nodes.
            n, n_b = S.shape
            n_a = n - n_b
            # Find the optimal non-negative forecasts.
            if self.qp:
                # Get the diagonal weight matrix, i.e., precision matrix, for
                # the problem.
                if self.method == "ols":
                    W = sparse.eye(n, format="csc")
                elif self.method == "wls_struct":
                    W = sparse.csc_matrix(
                        (
                            (1.0 / S.sum(axis=1)).A1,
                            np.arange(n, dtype=np.min_scalar_type(n - 1)),
                            np.arange(n + 1, dtype=np.min_scalar_type(n)),
                        )
                    )
                elif self.method == "wls_var":
                    # Check that we have the in-sample values.
                    if y_insample is None or y_hat_insample is None:
                        raise ValueError(
                            "`y_insample` and `y_hat_insample` are required to calculate residuals."
                        )
                    # Add a small jitter to the variance to improve the condition
                    # of the variance matrix.
                    W = sparse.csc_matrix(
                        (
                            1.0
                            / (
                                np.nanvar(y_insample - y_hat_insample, 1, ddof=1) + 2e-8
                            ),
                            np.arange(n, dtype=np.min_scalar_type(n - 1)),
                            np.arange(n + 1, dtype=np.min_scalar_type(n)),
                        )
                    )
                # Get the linear constraints matrix by vertically stacking the
                # (n_a x n) constraint matrix in the zero-constrained
                # represenation, which has the set of all reconciled forecasts
                # in its null space, and a horizontally stacked (n_b x n_a)
                # zero matrix and a negated (n_b x n_b) identity matrix.
                A = sparse.vstack(
                    (
                        sparse.hstack(
                            (sparse.eye(n_a, format="csc"), -S[:n_a, :].tocsc())
                        ),
                        -sparse.eye(n_b, n, n_a, format="csc"),
                    )
                )
                # Get the linear constraints vector.
                b = np.zeros(n)
                # Get the composition of convex cones to solve the problem.
                cones = [clarabel.ZeroConeT(n_a), clarabel.NonnegativeConeT(n_b)]
                # Set up the settings for the solver.
                settings = clarabel.DefaultSettings()
                settings.verbose = False

                def solve_clarabel(
                    y: np.ndarray,
                    P: sparse.csc_matrix,
                    A: sparse.csc_matrix,
                    b: np.ndarray,
                    cones: list,
                    settings: clarabel.DefaultSettings,
                    n_b: int,
                ) -> tuple[bool, Optional[np.ndarray]]:
                    # Get the linear coefficients, i.e., the cost vector.
                    q = P @ -y
                    # Set up the Clarabel solver.
                    solver = clarabel.DefaultSolver(P, q, A, b, cones, settings)
                    # Solve the problem.
                    solution = solver.solve()
                    # Resolve the solver exit status.
                    if status := solution.status == clarabel.SolverStatus.Solved:
                        # Return the slice of the primal solution that
                        # represents the optimal non-negative reconciled
                        # bottom level forecasts.
                        return status, solution.x[-n_b:]
                    else:
                        # As the solver failed, discard the empty primal
                        # solution.
                        return status, None

                with ThreadPoolExecutor(self.num_threads) as executor:
                    # Dispatch the jobs.
                    futures = [
                        executor.submit(
                            solve_clarabel, y, W, A, b, cones, settings, n_b
                        )
                        for y in self.y_hat.transpose()
                    ]
                    # Yield the futures as they complete.
                    for future in as_completed(futures):
                        # Return the exit status of the solver and the primal
                        # solution.
                        status, x = future.result()
                        # Check that the problem is successfully solved and the
                        # primal solution is within tolerance.
                        if not (status and np.min(x) > -1e-8):
                            raise Exception("Non-negative optimisation failed.")

                # Extract the optimal non-negative reconciled bottom level
                # forecasts.
                x = np.vstack([future.result()[1] for future in futures]).transpose()
                # Clip the negative forecasts within tolerance.
                x = np.clip(x, 0, None)
                # Aggregate the clipped bottom level forecasts and overwrite
                # the base forecasts with the solution.
                self.y_hat = S @ x
                # Overwrite the attributes for the P and W matrices with those
                # for bottom-up reconciliation to force projection onto the
                # non-negative coherent subspace.
                self.P, self.W = BottomUpSparse()._get_PW_matrices(S=S, idx_bottom=None)
            else:
                # Get the reconciliation matrices.
                self.P, self.W = self._get_PW_matrices(
                    S=S,
                    y_hat=self.y_hat,
                    y_insample=y_insample,
                    y_hat_insample=y_hat_insample,
                    idx_bottom=idx_bottom,
                )
                # Although it is now sufficient to ensure that all of the
                # entries in P are positive, as it is implemented as a linear
                # operator for the iterative method to solve the sparse linear
                # system, we need to reconcile to find if any of the coherent
                # bottom level point forecasts are negative.
                y_tilde = self._reconcile(
                    S=S, P=self.P, y_hat=self.y_hat, level=None, sampler=None
                )["mean"][-n_b:, :]
                # Find if any of the forecasts are negative.
                if np.any(y_tilde < 0):
                    # Clip the negative forecasts.
                    y_tilde = np.clip(y_tilde, 0, None)
                    # Force non-negative coherence by overwriting the base
                    # forecasts with the aggregated, clipped bottom level
                    # forecasts.
                    self.y_hat = S @ y_tilde
                    # Overwrite the attributes for the P and W matrices with
                    # those for bottom-up reconciliation to force projection
                    # onto the non-negative coherent subspace.
                    self.P, self.W = BottomUpSparse()._get_PW_matrices(
                        S=S, idx_bottom=None
                    )
        else:
            # Get the reconciliation matrices.
            self.y_hat = y_hat
            self.P, self.W = self._get_PW_matrices(
                S=S,
                y_hat=self.y_hat,
                y_insample=y_insample,
                y_hat_insample=y_hat_insample,
                idx_bottom=idx_bottom,
            )

        # Get the sampler for probabilistic reconciliation.
        self.sampler = self._get_sampler(
            S=S,
            P=self.P,
            W=self.W,
            y_hat=self.y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
        )
        # Set the instance as fitted.
        self.fitted = True
        return self

In [None]:
show_doc(MinTraceSparse, title_level=3)

In [None]:
show_doc(MinTraceSparse.fit, name="MinTraceSparse.fit", title_level=3)

In [None]:
show_doc(MinTraceSparse.predict, name="MinTraceSparse.predict", title_level=3)

In [None]:
show_doc(MinTraceSparse.fit_predict, name="MinTraceSparse.fit_predict", title_level=3)

In [None]:
show_doc(MinTraceSparse.sample, name="MinTraceSparse.sample", title_level=3)

In [None]:
#| hide
for method in ["ols", "wls_struct", "wls_var", "mint_shrink"]:
    for nonnegative in [False, True]:
        # test nonnegative behavior
        # we should be able to recover the same forecasts
        # in this example
        cls_min_trace = MinTrace(method=method, nonnegative=nonnegative)
        assert cls_min_trace.nonnegative is nonnegative
        if cls_min_trace.insample:
            assert method in ["wls_var", "mint_cov", "mint_shrink"]
            test_close(
                cls_min_trace(
                    S=S,
                    y_hat=S @ y_hat_bottom,
                    y_insample=S @ y_bottom,
                    y_hat_insample=S @ y_hat_bottom_insample,
                    idx_bottom=idx_bottom if nonnegative else None,
                )["mean"],
                S @ y_hat_bottom,
            )
        else:
            test_close(
                cls_min_trace(
                    S=S,
                    y_hat=S @ y_hat_bottom,
                    idx_bottom=idx_bottom if nonnegative else None,
                )["mean"],
                S @ y_hat_bottom,
            )
mintrace_1thr = MinTrace(method="ols", nonnegative=False, num_threads=1).fit(
    S=S, y_hat=S @ y_hat_bottom
)
mintrace_2thr = MinTrace(method="ols", nonnegative=False, num_threads=2).fit(
    S=S, y_hat=S @ y_hat_bottom
)
np.testing.assert_allclose(mintrace_1thr.y_hat, mintrace_2thr.y_hat)
with ExceptionExpected(regex="min_trace (mint_cov)*"):
    for nonnegative in [False, True]:
        cls_min_trace = MinTrace(method="mint_cov", nonnegative=nonnegative)
        cls_min_trace(
            S=S,
            y_hat=S @ y_hat_bottom,
            y_insample=S @ y_bottom,
            y_hat_insample=S @ y_hat_bottom_insample,
            idx_bottom=idx_bottom if nonnegative else None,
        )

In [None]:
#| hide
# MinTrace-shr covariance's stress test
diff_len_y_insample = S @ y_bottom
diff_len_y_hat_insample = S @ y_hat_bottom_insample
diff_len_y_insample[-1, :-1] = np.nan
diff_len_y_hat_insample[-1, :-1] = np.nan
cls_min_trace = MinTrace(method="mint_shrink")
result_min_trace = cls_min_trace(
    S=S,
    y_hat=S @ y_hat_bottom,
    y_insample=diff_len_y_insample,
    y_hat_insample=diff_len_y_hat_insample,
    idx_bottom=idx_bottom,
)

In [None]:
#| hide
# test levels
for method in ["ols", "wls_struct", "wls_var", "mint_shrink"]:
    for nonnegative in [False, True]:
        cls_min_trace = MinTrace(method=method, nonnegative=nonnegative)
        test_close(
            cls_min_trace(
                S=S,
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                y_hat_insample=S @ y_hat_bottom_insample,
                idx_bottom=idx_bottom if nonnegative else None,
            )["mean"],
            S @ y_hat_bottom,
        )

In [None]:
#| hide
# Test the sparse functionality.
for method in ["ols", "wls_struct", "wls_var"]:
    # Check both the non-negative heuristic and QP solutions.
    for nonnegative, qp in [(False, False), (True, False), (True, True)]:
        cls_min_trace = MinTraceSparse(method=method, nonnegative=nonnegative, qp=qp)
        test_close(
            cls_min_trace(
                S=sparse.csr_matrix(S),
                y_hat=S @ y_hat_bottom,
                y_insample=S @ y_bottom,
                y_hat_insample=S @ y_hat_bottom_insample,
                idx_bottom=idx_bottom,
            )["mean"],
            S @ y_hat_bottom,
        )

# 5. Optimal Combination

In [None]:
#| export
class OptimalCombination(MinTrace):
    """Optimal Combination Reconciliation Class.

    This reconciliation algorithm was proposed by Hyndman et al. 2011, the method uses generalized least squares
    estimator using the coherency errors covariance matrix. Consider the covariance of the base forecast
    $\\textrm{Var}(\epsilon_{h}) = \Sigma_{h}$, the $\mathbf{P}$ matrix of this method is defined by:
    $$ \mathbf{P} = \\left(\mathbf{S}^{\intercal}\Sigma_{h}^{\dagger}\mathbf{S}\\right)^{-1}\mathbf{S}^{\intercal}\Sigma^{\dagger}_{h}$$
    where $\Sigma_{h}^{\dagger}$ denotes the variance pseudo-inverse. The method was later proven equivalent to
    `MinTrace` variants.

    **Parameters:**<br>
    `method`: str, allowed optimal combination methods: 'ols', 'wls_struct'.<br>
    `nonnegative`: bool, reconciled forecasts should be nonnegative?<br>

    **References:**<br>
    - [Rob J. Hyndman, Roman A. Ahmed, George Athanasopoulos, Han Lin Shang (2010). \"Optimal Combination Forecasts for
    Hierarchical Time Series\".](https://robjhyndman.com/papers/Hierarchical6.pdf).<br>
    - [Shanika L. Wickramasuriya, George Athanasopoulos and Rob J. Hyndman (2010). \"Optimal Combination Forecasts for
    Hierarchical Time Series\".](https://robjhyndman.com/papers/MinT.pdf).
    - [Wickramasuriya, S.L., Turlach, B.A. & Hyndman, R.J. (2020). \"Optimal non-negative
    forecast reconciliation". Stat Comput 30, 1167–1182,
    https://doi.org/10.1007/s11222-020-09930-0](https://robjhyndman.com/publications/nnmint/).
    """

    def __init__(self, method: str, nonnegative: bool = False, num_threads: int = 1):
        comb_methods = ["ols", "wls_struct"]
        if method not in comb_methods:
            raise ValueError(
                f'Optimal Combination class does not support method: "{method}"'
            )
        super().__init__(
            method=method, nonnegative=nonnegative, num_threads=num_threads
        )
        self.insample = False

In [None]:
show_doc(OptimalCombination, title_level=3)

In [None]:
show_doc(OptimalCombination.fit, name="OptimalCombination.fit", title_level=3)

In [None]:
show_doc(OptimalCombination.predict, name="OptimalCombination.predict", title_level=3)

In [None]:
show_doc(
    OptimalCombination.fit_predict, name="OptimalCombination.fit_predict", title_level=3
)

In [None]:
show_doc(OptimalCombination.sample, name="OptimalCombination.sample", title_level=3)

In [None]:
#| hide
for method in ["ols", "wls_struct"]:
    for nonnegative in [False, True]:
        # test nonnegative behavior
        # we should be able to recover the same forecasts
        # in this example
        cls_optimal_combination = OptimalCombination(
            method=method, nonnegative=nonnegative
        )
        test_close(
            cls_optimal_combination(
                S=S,
                y_hat=S @ y_hat_bottom,
                idx_bottom=idx_bottom if nonnegative else None,
            )["mean"],
            S @ y_hat_bottom,
        )

# 6. Emp. Risk Minimization

In [None]:
#| export
class ERM(HReconciler):
    """Optimal Combination Reconciliation Class.

    The Empirical Risk Minimization reconciliation strategy relaxes the unbiasedness assumptions from
    previous reconciliation methods like MinT and optimizes square errors between the reconciled predictions
    and the validation data to obtain an optimal reconciliation matrix P.

    The exact solution for $\mathbf{P}$ (`method='closed'`) follows the expression:
    $$\mathbf{P}^{*} = \\left(\mathbf{S}^{\intercal}\mathbf{S}\\right)^{-1}\mathbf{Y}^{\intercal}\hat{\mathbf{Y}}\\left(\hat{\mathbf{Y}}\hat{\mathbf{Y}}\\right)^{-1}$$

    The alternative Lasso regularized $\mathbf{P}$ solution (`method='reg_bu'`) is useful when the observations
    of validation data is limited or the exact solution has low numerical stability.
    $$\mathbf{P}^{*} = \\text{argmin}_{\mathbf{P}} ||\mathbf{Y}-\mathbf{S} \mathbf{P} \hat{Y} ||^{2}_{2} + \lambda ||\mathbf{P}-\mathbf{P}_{\\text{BU}}||_{1}$$

    **Parameters:**<br>
    `method`: str, one of `closed`, `reg` and `reg_bu`.<br>
    `lambda_reg`: float, l1 regularizer for `reg` and `reg_bu`.<br>

    **References:**<br>
    - [Ben Taieb, S., & Koo, B. (2019). Regularized regression for hierarchical forecasting without
    unbiasedness conditions. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge
    Discovery & Data Mining KDD '19 (p. 1337-1347). New York, NY, USA: Association for Computing Machinery.](https://doi.org/10.1145/3292500.3330976).<br>
    """

    def __init__(self, method: str, lambda_reg: float = 1e-2):
        self.method = method
        self.lambda_reg = lambda_reg
        self.insample = True

    def _get_PW_matrices(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        y_insample: np.ndarray,
        y_hat_insample: np.ndarray,
        idx_bottom: np.ndarray,
    ):
        n_hiers, n_bottom = S.shape
        # y_hat_insample shape (n_hiers, obs)
        if y_insample is None or y_hat_insample is None:
            raise ValueError(
                "Check `Y_df`. For method `ERM` you need to pass insample predictions and insample values."
            )
        # remove obs with nan values
        nan_idx = np.isnan(y_hat_insample).any(axis=0)
        y_insample = y_insample[:, ~nan_idx]
        y_hat_insample = y_hat_insample[:, ~nan_idx]
        # only using h validation steps to avoid
        # computational burden
        # print(y_hat.shape)
        h = min(y_hat.shape[1], y_hat_insample.shape[1])
        y_hat_insample = y_hat_insample[:, -h:]  # shape (h, n_hiers)
        y_insample = y_insample[:, -h:]
        if self.method == "closed":
            B = np.linalg.inv(S.T @ S) @ S.T @ y_insample
            B = B.T
            P = np.linalg.pinv(y_hat_insample.T) @ B
            P = P.T
        elif self.method == "reg":
            X = np.kron(S, y_hat_insample.T)
            z = y_insample.reshape(-1)

            if self.lambda_reg is None:
                lambda_reg = np.max(np.abs(X.T.dot(z)))
            else:
                lambda_reg = self.lambda_reg

            beta = _lasso(X, z, lambda_reg, max_iters=1000, tol=1e-4)
            P = beta.reshape(S.shape).T
        elif self.method == "reg_bu":
            X = np.kron(S, y_hat_insample.T)
            Pbu = np.zeros_like(S)
            Pbu[idx_bottom] = S[idx_bottom]
            z = y_insample.reshape(-1) - X @ Pbu.reshape(-1)

            if self.lambda_reg is None:
                lambda_reg = np.max(np.abs(X.T.dot(z)))
            else:
                lambda_reg = self.lambda_reg

            beta = _lasso(X, z, lambda_reg, max_iters=1000, tol=1e-4)
            P = beta + Pbu.reshape(-1)
            P = P.reshape(S.shape).T
        else:
            raise ValueError(f"Unknown reconciliation method {self.method}")

        W = np.eye(n_hiers, dtype=np.float64)

        return P, W

    def fit(
        self,
        S,
        y_hat,
        y_insample,
        y_hat_insample,
        sigmah: Optional[np.ndarray] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
        idx_bottom: Optional[np.ndarray] = None,
    ):
        """ERM Fit Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `y_insample`: Train values of size (`base`, `insample_size`).<br>
        `y_hat_insample`: Insample train predictions of size (`base`, `insample_size`).<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `tags`: Each key is a level and each value its `S` indices.<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>
        
        **Returns:**<br>
        `self`: object, fitted reconciler.
        """
        self.P, self.W = self._get_PW_matrices(
            S=S,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            idx_bottom=idx_bottom,
        )
        self.sampler = self._get_sampler(
            S=S,
            P=self.P,
            W=self.W,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
        )
        self.fitted = True
        return self

    def fit_predict(
        self,
        S: np.ndarray,
        y_hat: np.ndarray,
        idx_bottom: np.ndarray = None,
        y_insample: Optional[np.ndarray] = None,
        y_hat_insample: Optional[np.ndarray] = None,
        sigmah: Optional[np.ndarray] = None,
        level: Optional[list[int]] = None,
        intervals_method: Optional[str] = None,
        num_samples: Optional[int] = None,
        seed: Optional[int] = None,
        tags: Optional[dict[str, np.ndarray]] = None,
    ):
        """ERM Reconciliation Method.

        **Parameters:**<br>
        `S`: Summing matrix of size (`base`, `bottom`).<br>
        `y_hat`: Forecast values of size (`base`, `horizon`).<br>
        `idx_bottom`: Indices corresponding to the bottom level of `S`, size (`bottom`).<br>
        `y_insample`: Train values of size (`base`, `insample_size`).<br>
        `y_hat_insample`: Insample train predictions of size (`base`, `insample_size`).<br>
        `sigmah`: Estimated standard deviation of the conditional marginal distribution.<br>
        `level`: float list 0-100, confidence levels for prediction intervals.<br>
        `intervals_method`: Sampler for prediction intervals, one of `normality`, `bootstrap`, `permbu`.<br>
        `num_samples`: Number of samples for probabilistic coherent distribution.<br>
        `seed`: Seed for reproducibility.<br>
        `tags`: Each key is a level and each value its `S` indices.<br>

        **Returns:**<br>
        `y_tilde`: Reconciliated y_hat using the ERM approach.
        """
        # Fit creates P, W and sampler attributes
        self.fit(
            S=S,
            y_hat=y_hat,
            y_insample=y_insample,
            y_hat_insample=y_hat_insample,
            sigmah=sigmah,
            intervals_method=intervals_method,
            num_samples=num_samples,
            seed=seed,
            tags=tags,
            idx_bottom=idx_bottom,
        )

        return self._reconcile(
            S=S, P=self.P, y_hat=y_hat, level=level, sampler=self.sampler
        )

    __call__ = fit_predict

In [None]:
show_doc(ERM, title_level=3)

In [None]:
show_doc(ERM.fit, name="ERM.fit", title_level=3)

In [None]:
show_doc(ERM.predict, name="ERM.predict", title_level=3)

In [None]:
show_doc(ERM.fit_predict, name="ERM.fit_predict", title_level=3)

In [None]:
show_doc(ERM.sample, name="ERM.sample", title_level=3)

In [None]:
#| hide
for method in ["reg_bu"]:
    cls_erm = ERM(method=method, lambda_reg=None)
    test_close(
        cls_erm(
            S=S,
            y_hat=S @ y_hat_bottom,
            y_insample=S @ y_bottom,
            y_hat_insample=S @ y_hat_bottom_insample,
            idx_bottom=idx_bottom,
        )["mean"],
        S @ y_hat_bottom,
    )

In [None]:
S @ y_hat_bottom_insample

In [None]:
#| hide
# test intervals
reconciler_args = dict(
    S=S,
    y_hat=y_hat_base,
    y_insample=y_base,
    y_hat_insample=y_hat_base_insample,
    sigmah=sigmah,
    level=[80, 90],
    intervals_method="normality",
    num_samples=200,
    seed=0,
    tags=tags,
    idx_bottom=idx_bottom,
)

In [None]:
#| hide
# test normality prediction intervals
# we should recover the original sigmah
# for the bottom time series
cls_bottom_up = BottomUp()
test_eq(cls_bottom_up(**reconciler_args)["sigmah"][idx_bottom], sigmah[idx_bottom])

In [None]:
#| hide
# test normality interval's names
cls_bottom_up = BottomUp()
bu_bootstrap_intervals = cls_bottom_up(**reconciler_args)
test_eq(["mean", "sigmah", "quantiles"], list(bu_bootstrap_intervals.keys()))

# test PERMBU interval's names
reconciler_args["intervals_method"] = "permbu"
bu_permbu_intervals = cls_bottom_up(**reconciler_args)
test_eq(["mean", "quantiles"], list(bu_permbu_intervals.keys()))

In [None]:
#| hide
# test TopDown + intervals
for method in ["average_proportions", "proportion_averages"]:
    for intervals_method in ["normality", "bootstrap", "permbu"]:
        cls_top_down = TopDown(method=method)
        reconciler_args["intervals_method"] = "permbu"
        cls_top_down(**reconciler_args)

In [None]:
#| hide
# test MinTrace + intervals
nonnegative = False
for method in ["ols", "wls_struct", "wls_var", "mint_shrink"]:
    for intervals_method in ["normality", "bootstrap", "permbu"]:
        cls_min_trace = MinTrace(method=method, nonnegative=nonnegative)
        reconciler_args["intervals_method"] = intervals_method
        cls_min_trace(**reconciler_args)

In [None]:
#| hide
# test OptimalCombination + intervals
nonnegative = False
for method in ["ols", "wls_struct"]:
    for intervals_method in ["normality", "bootstrap", "permbu"]:
        cls_optimal_combination = OptimalCombination(
            method=method, nonnegative=nonnegative
        )
        reconciler_args["intervals_method"] = intervals_method
        if not nonnegative:
            cls_optimal_combination(**reconciler_args)

In [None]:
#| hide
# test ERM + intervals
for method in ["reg_bu"]:
    for intervals_method in ["normality", "bootstrap", "permbu"]:
        cls_erm = ERM(method=method, lambda_reg=None)
        reconciler_args["intervals_method"] = intervals_method
        cls_erm(**reconciler_args)

In [None]:
#| hide
# test coherent sample's shape
reconciler_args = dict(
    S=S,
    y_hat=y_hat_base,
    y_insample=y_base,
    y_hat_insample=y_hat_base_insample,
    sigmah=sigmah,
    intervals_method="bootstrap",
    tags=tags,
    idx_bottom=idx_bottom,
)

cls_bottom_up = BottomUp()
shapes = []
for intervals_method in ["normality", "bootstrap", "permbu"]:
    cls_bottom_up.fit(**reconciler_args)
    coherent_samples = cls_bottom_up.sample(num_samples=100)
    shapes.append(coherent_samples.shape)
test_eq(shapes[0], shapes[1])
test_eq(shapes[0], shapes[2])

# References

### General Reconciliation
- [Orcutt, G.H., Watts, H.W., & Edwards, J.B.(1968). Data aggregation and information loss. The American 
Economic Review, 58 , 773(787).](http://www.jstor.org/stable/1815532)<br>
- [Disaggregation methods to expedite product line forecasting. Journal of Forecasting, 9 , 233–254. 
doi:10.1002/for.3980090304](https://onlinelibrary.wiley.com/doi/abs/10.1002/for.3980090304).<br>
- [An investigation of aggregate variable time series forecast strategies with specific subaggregate 
time series statistical correlation. Computers and Operations Research, 26 , 1133–1149. 
doi:10.1016/S0305-0548(99)00017-9.](https://doi.org/10.1016/S0305-0548(99)00017-9)<br>
- [Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting: principles and practice, 3rd edition: 
Chapter 11: Forecasting hierarchical and grouped series.". OTexts: Melbourne, Australia. OTexts.com/fpp3 
Accessed on July 2022.](https://otexts.com/fpp3/hierarchical.html)

### Optimal Reconciliation
- [Rob J. Hyndman, Roman A. Ahmed, George Athanasopoulos, Han Lin Shang. "Optimal Combination Forecasts for 
Hierarchical Time Series" (2010).](https://robjhyndman.com/papers/Hierarchical6.pdf)<br>
- [Shanika L. Wickramasuriya, George Athanasopoulos and Rob J. Hyndman. "Optimal Combination Forecasts for 
Hierarchical Time Series" (2010).](https://robjhyndman.com/papers/MinT.pdf)<br>
- [Ben Taieb, S., & Koo, B. (2019). Regularized regression for hierarchical forecasting without 
unbiasedness conditions. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge 
Discovery & Data Mining KDD '19 (p. 1337-1347). New York, NY, USA: Association for Computing Machinery.](https://doi.org/10.1145/3292500.3330976)<br>

### Hierarchical Probabilistic Coherent Predictions
- [Puwasala Gamakumara Ph. D. dissertation. Monash University, Econometrics and Business Statistics. "Probabilistic Forecast Reconciliation".](https://bridges.monash.edu/articles/thesis/Probabilistic_Forecast_Reconciliation_Theory_and_Applications/11869533)<br>
- [Taieb, Souhaib Ben and Taylor, James W and Hyndman, Rob J. (2017). Coherent probabilistic forecasts for hierarchical time series. International conference on machine learning ICML.](https://proceedings.mlr.press/v70/taieb17a.html)<br>