# Forecast Reconciliation from Scratch in Python
Reconcile hierarchal forecasts into coherent forecasts using Python.

## Introduction
In my experience, it is somewhat unusual to work with real time-series data that does not have underlying "levels". For example, consider the data that might be produced via transactions at a grocery store. Transactions can be described at an individual product level, a shopper level, or at a store level. The products purchased might also be categorized into a specific type of product, and those categories might in turn fall into broader categories. For a business owner, this complex hierarchal structure can complicate making accurate and unbiased forecasts for their business. A data-driven person will most likely produce forecasts for each of the different levels, but they don't always add up. This is where reconciliation becomes necessary.

The majority of my explanations throughout this post are based on the book [*Forecasting: Principles and Practice*](https://otexts.com/fpp3/) by Rob J. Hyndman and George Athanasopoulos - an excellent resource for forecasting in general and completely free online. I highly recommend that you spend some time reading Hyndman's [more detailed explanation](https://otexts.com/fpp3/hierarchical.html) of reconciling hierarchal and grouped forecasts at some point. It is not my intention here to replace this book, although I still intend to give a comprehensive explanation of forecasting reconciliation. My intention in writing this blog-post is to provide whomever is reading this with the background needed to develop their own forecast reconciliation code. I have found that the [existing tools]() I have used are not (in my opinion) at a stage in their development where they can be considered reliable in a production environement, or cannot be generalized to every scenario (I know this from personal experience). I have also found that staring at a mathematical formula or applying some pre-defined function that somebody else created are not the best ways to gain a firm understanding of what is actually happening. Hopefully by sharing some short explanations of how one might go about writing their own Python code, I can also provide some additional insight that might not be attained otherwise.

## Hierarchal Time Series
To begin, I will give a brief introduction to the differences between hierarchal and grouped time series (no, they are not quite the same thing). A time series is considered *hierarchal*
when lower levels withing the hierarchy only fall under one domain. For example consider the following tree:

In [8]:
from treelib import Tree

tree = Tree()
tree.create_node("Total", "total")  # root node
tree.create_node("Category 1", "cat1", parent="total")
tree.create_node("Category 2", "cat2", parent="total")
tree.create_node("Sub-Category 1", "sub1", parent="cat1")
tree.create_node("Sub-Category 2", "sub2", parent="cat1")
tree.create_node("Sub-Category 3", "sub3", parent="cat2")
tree.create_node("Sub-Category 4", "sub4", parent="cat2")
tree.show()

Total
├── Category 1
│   ├── Sub-Category 1
│   └── Sub-Category 2
└── Category 2
    ├── Sub-Category 3
    └── Sub-Category 4



`Sub-Category 1` falls exclusively under `Category 1`, `Sub-Category 4` falls exlusively under `Sub-Category 2`, etc. This *exclusivity* is what defines a hierarchal time series. It is also important to note that mathematically, each of the sub-categories should add up to the category above them, and `Total` is also the total sum of each of the bottom-level categories. A mathematical expression might look something like this:

*T* = *C1* + *C2* = *SC1* + *SC2* + *SC3* + *SC4* <br>
*C1* = *SC1* + *SC2* <br>
*C2* = *SC3* + *SC4* <br>

A *grouped* time series is when that exclusivity between sub-domains does not exist. For example, if this same tree were to have an extra node - `Sub-Category 5` - that fell under both `Category 1` and `Category 2`:

In [9]:
tree.create_node("Sub-Category 5", "sub51", parent="cat1")
tree.create_node("Sub-Category 5", "sub52", parent="cat2")
tree.show()

Total
├── Category 1
│   ├── Sub-Category 1
│   ├── Sub-Category 2
│   └── Sub-Category 5
└── Category 2
    ├── Sub-Category 3
    ├── Sub-Category 4
    └── Sub-Category 5



This complexity means that the original formulas used are no longer valid. Hyndman [describes](https://otexts.com/fpp3/hts.html#grouped-time-series) this structural concept as *"not naturally disaggregat[ing] in a unique hierarchical manner."*

The differences between hierarchal and grouped time series are important to understand, because the summing matrix (described in the next section) is dependent on the structure of the time series.

## Building Blocks of Coherent Forecasts
Coherent - or reconciled - forecasts are constructed from a few key components:

1) Base Forecasts: forecasts at each hierarchal level represented as an *m*x*n* matrix (*m* rows, *n* columns)
2) Summing matrix: describes the hierarchal structure of the matrix using a binary format ()
3) Mapping Matrix: 

## Reconciliation Methods


## Available Reconciliation Tools


## Forecast Reconciliation in Python

In [None]:
# Import libraries
import pandas as pd
import numpy as np

import sys, os
sys.path.append(os.path.join(os.path.dirname('__file__'), '..', 'src'))
import hyperarch
from utils import get_sample_data, plot_single

%matplotlib inline

# Read data
h_df = get_sample_data()
g_df = get_sample_data(agg_type='grouped')

In [None]:
hierarchy_df, h_bottom, h_labels = hyperarch.get_hierarchal(h_df, 'category', 'subcategory', agg_type='hierarchy') 
grouped_df, g_bottom, g_labels = hyperarch.get_hierarchal(g_df, 'category', 'subcategory', agg_type='grouped') 