# Non-Negative MinTrace

Large collections of time series organized into structures at different aggregation levels often require their forecasts to follow their aggregation constraints and to be nonnegative, which poses the challenge of creating novel algorithms capable of coherent forecasts.

The `HierarchicalForecast` package provides a wide collection of Python implementations of hierarchical forecasting algorithms that follow nonnegative hierarchical reconciliation.

In this notebook, we will show how to use the `HierarchicalForecast` package to perform nonnegative reconciliation of forecasts on `Wiki2` dataset.

You can run these experiments using CPU or GPU with Google Colab.

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/NonNegativeReconciliation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install hierarchicalforecast statsforecast datasetsforecast

## 1. Load Data

In this example we will use the `Wiki2` dataset. The following cell gets the time series for the different levels in the hierarchy, the summing dataframe  `S_df` which recovers the full dataset from the bottom level hierarchy and the indices of each hierarchy denoted by `tags`.

In [2]:
import numpy as np
import pandas as pd

from datasetsforecast.hierarchical import HierarchicalData

In [3]:
Y_df, S_df, tags = HierarchicalData.load('./data', 'Wiki2')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

100%|██████████| 1.30M/1.30M [00:00<00:00, 11.8MiB/s]


In [4]:
Y_df.head()

Unnamed: 0,unique_id,ds,y
0,Total,2016-01-01,156508
1,Total,2016-01-02,129902
2,Total,2016-01-03,138203
3,Total,2016-01-04,115017
4,Total,2016-01-05,126042


<font color='gold'>This is a representation of the Hierarchy - 1 means that column name (item, Drugs in our case), belongs to the Total row. Rows represent totals at each level of the hierarchy, for each node</font>

In [5]:
S_df.iloc[:5, :5]

Unnamed: 0,de_AAC_AAG_001,de_AAC_AAG_010,de_AAC_AAG_014,de_AAC_AAG_045,de_AAC_AAG_063
Total,1,1,1,1,1
de,1,1,1,1,1
en,0,0,0,0,0
fr,0,0,0,0,0
ja,0,0,0,0,0


In [41]:
df = pd.read_excel('data/Quarterly_smoothing.xlsx')#, index_col=0)
#df = pd.read_excel('data/Quarterly_smoothing.xlsx')#, index_col=0)

save_shape = df.shape
#df = df.loc[:, (df != 0).all()]

#print("Removed", save_shape[1] - df.shape[1], "Columns with at least one 0 in them")
# adding 1 unit everywhere
for c in df.columns[1:]:
  df[c] = df[c]+1
df.head()

Unnamed: 0,Month,Дальневосточный ФО - ADRIANOL - Adrianol for adults nasal drops 10 ml #1,Дальневосточный ФО - AGALATES - Agalates tabs 0.5 mg #2,Дальневосточный ФО - AGALATES - Agalates tabs 0.5 mg #8,Дальневосточный ФО - ALMAGEL - Almagel A susp 170 ml #1,Дальневосточный ФО - ALMAGEL - Almagel Neo sachet 10 ml #10,Дальневосточный ФО - ALMAGEL - Almagel Neo susp 170 ml #1,Дальневосточный ФО - ALMAGEL - Almagel sachet 10 ml #10,Дальневосточный ФО - ALMAGEL - Almagel susp 170 ml #1,Дальневосточный ФО - ALMONT - Almont FC tabs 10 mg #28,...,Южный ФО - VELBINE - Velbine solution for inf 10 mg/ml 5ml #1,Южный ФО - VESTIBO - Vestibo tabs 16 mg #30,Южный ФО - VESTIBO - Vestibo tabs 24 mg #30,Южный ФО - VINCRISTINE-TEVA - Vincristine-Teva lyoph for inf 1 mg/ml 1 ml #1,Южный ФО - VINCRISTINE-TEVA - Vincristine-Teva lyoph for inf 1 mg/ml 2 ml #1,Южный ФО - VINORELBINE-TEVA - VINORELBIN-TEVA 50 mg.5 ml,Южный ФО - VINORELBINE-TEVA - VINORELBINE-TEVA concentrate 10 mg.ml 1 ml,Южный ФО - ZINCTERAL - Zincteral-Teva FC tabs 124 mg #150,Южный ФО - ZINCTERAL - Zincteral-Teva FC tabs 124 mg #25,Южный ФО - ZOLEDRONAT-TEVA - Zoledronate-Teva concentrate for inf 4 mg/5ml 5 ml #1
0,2018-03-01,201,1,1,10,1,1,1,949,36,...,106,1,315,401,1,1,1,1,17,9690
1,2018-04-01,1001,1,1,10,1,1,1,1037,36,...,1046,1,248,2161,1,1,1,1,1,11705
2,2018-05-01,1732,1,1,302,1,1,1,1246,94,...,1011,1,248,3246,1,1,1,1,1,15233
3,2018-06-01,2091,1,1,491,1,1,1,1787,184,...,947,1,1,3236,1,1,1,1,1,6586
4,2018-07-01,1548,1,61,491,1,1,1,4132,184,...,46,1,1,5086,1,1,1,7,1,6035


In [42]:
pd.Series([c.split(" - ")[1] for c in df.columns[1:]]).value_counts()

AMBROBENE        77
TROXEVASIN       65
VALZ             53
SUMAMED          48
ALMAGEL          46
                 ..
LONQUEX           6
LOSARTAN-TEVA     5
CLOBIR            5
ESCORDI COR       5
CLOFARABINE       3
Length: 138, dtype: int64

In [43]:
selected_brands = pd.Series([c.split(" - ")[1] for c in df.columns[1:]]).value_counts()[0:3].keys()
toremove = [c for c in df.columns[1:] if c.split(" - ")[1] not in selected_brands]
print(selected_brands)
print(toremove)

Index(['AMBROBENE', 'TROXEVASIN', 'VALZ'], dtype='object')
['Дальневосточный ФО - ADRIANOL - Adrianol for adults nasal drops 10 ml #1', 'Дальневосточный ФО - AGALATES - Agalates tabs 0.5 mg #2', 'Дальневосточный ФО - AGALATES - Agalates tabs 0.5 mg #8', 'Дальневосточный ФО - ALMAGEL - Almagel A susp 170 ml #1', 'Дальневосточный ФО - ALMAGEL - Almagel Neo sachet 10 ml #10', 'Дальневосточный ФО - ALMAGEL - Almagel Neo susp 170 ml #1', 'Дальневосточный ФО - ALMAGEL - Almagel sachet 10 ml #10', 'Дальневосточный ФО - ALMAGEL - Almagel susp 170 ml #1', 'Дальневосточный ФО - ALMONT - Almont FC tabs 10 mg #28', 'Дальневосточный ФО - ALMONT - Almont chew tabs 4 mg #28', 'Дальневосточный ФО - ALMONT - Almont chew tabs 5 mg #28', 'Дальневосточный ФО - ALMONT - Almont chew tabs 5 mg #98', 'Дальневосточный ФО - AMLODIPINE-TEVA - Amlodipine-Teva tabs 10 mg #30', 'Дальневосточный ФО - AMLODIPINE-TEVA - Amlodipine-Teva tabs 5 mg #30', 'Дальневосточный ФО - ANASTROSOLE - Anastrozole-Teva FC tabs 1 mg #

In [44]:
df = df.drop(columns = toremove)
df

Unnamed: 0,Month,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 25 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 50 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin tabs 4+100 mg #20,Дальневосточный ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 100 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 40 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene solution for inj 15 mg/2ml 2 ml #5 ampules,Дальневосточный ФО - AMBROBENE - Ambrobene syrup 15 mg/5ml #100 bottle,Дальневосточный ФО - AMBROBENE - Ambrobene tabs 30 mg #20,...,Южный ФО - TROXEVASIN - Troxevasin caps 300 mg #50,Южный ФО - TROXEVASIN - Troxevasin gel 2% 100 g #1,Южный ФО - TROXEVASIN - Troxevasin gel 2% 40 g #1,Южный ФО - VALZ - Valz Combi FC tabs 10 mg + 160 mg #28,Южный ФО - VALZ - Valz Combi FC tabs 5 mg + 160 mg #28,Южный ФО - VALZ - Valz Combi FC tabs 5 mg + 80 mg #28,Южный ФО - VALZ - Valz FC tabs 160 mg #28,Южный ФО - VALZ - Valz FC tabs 80 mg #28,Южный ФО - VALZ - Valz N FC tabs 160 mg/12.5mg #28,Южный ФО - VALZ - Valz N FC tabs 80 mg/12.5mg #28
0,2018-03-01,1,1,1,1,1039,1,1151,1082,4011,...,37,1,98,3,1,1,1,1,1,1
1,2018-04-01,1,1,1,1,3220,1904,1551,9208,10352,...,63,1,527,1,1,1,1,1,1,1
2,2018-05-01,1,1,1,1,4523,2221,1701,10999,13751,...,383,1,1651,1,1,1,1,1,1,1
3,2018-06-01,5,101,1,1,5126,2251,1051,25765,19211,...,462,1,1987,13,1,21,1,1,1,1
4,2018-07-01,25,101,1,1,3634,378,651,20550,20003,...,1181,1,2520,33,1,41,1,1,1,1
5,2018-08-01,27,231,4,1,3121,274,501,21029,18504,...,881,1,3988,33,1,41,1,1,1,1
6,2018-09-01,23,181,4,1,2718,414,181,7344,15135,...,778,1,3743,24,1,21,1,1,1,1
7,2018-10-01,3,181,4,1,3344,440,181,5682,18757,...,133,1,4807,4,1,1,1,1,1,1
8,2018-11-01,1,51,1,1,3659,236,711,11492,21497,...,950,1,2797,7,1,1,1,1,1,1
9,2018-12-01,13,209,1,1,3961,146,716,12803,24887,...,1293,1,2702,14,1,1,1,1,1,1


In [45]:
# Let's Create S_df for drugs data

columns = df.columns

# Initialize a blank dataframe with the columns
S_df = pd.DataFrame(0, index=['Total'], columns=columns)

# Total row
S_df.loc['Total'] = 1

# For each column, determine its hierarchy levels
for col in columns:
    levels = col.split(" - ")
    for level in levels:
        if level not in S_df.index:
            S_df.loc[level] = 0
        S_df.at[level, col] = 1

S_df


Unnamed: 0,Month,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 25 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 50 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin tabs 4+100 mg #20,Дальневосточный ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 100 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 40 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene solution for inj 15 mg/2ml 2 ml #5 ampules,Дальневосточный ФО - AMBROBENE - Ambrobene syrup 15 mg/5ml #100 bottle,Дальневосточный ФО - AMBROBENE - Ambrobene tabs 30 mg #20,...,Южный ФО - TROXEVASIN - Troxevasin caps 300 mg #50,Южный ФО - TROXEVASIN - Troxevasin gel 2% 100 g #1,Южный ФО - TROXEVASIN - Troxevasin gel 2% 40 g #1,Южный ФО - VALZ - Valz Combi FC tabs 10 mg + 160 mg #28,Южный ФО - VALZ - Valz Combi FC tabs 5 mg + 160 mg #28,Южный ФО - VALZ - Valz Combi FC tabs 5 mg + 80 mg #28,Южный ФО - VALZ - Valz FC tabs 160 mg #28,Южный ФО - VALZ - Valz FC tabs 80 mg #28,Южный ФО - VALZ - Valz N FC tabs 160 mg/12.5mg #28,Южный ФО - VALZ - Valz N FC tabs 80 mg/12.5mg #28
Total,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Month,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Дальневосточный ФО,0,1,1,1,1,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
AMBROBENE,0,1,1,1,1,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ambrobene Stoptussin drops 4 mg+100 mg/ml 25 ml #1,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ambrobene Stoptussin drops 4 mg+100 mg/ml 50 ml #1,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ambrobene Stoptussin tabs 4+100 mg #20,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ambrobene oral solution 7.5 mg/ml 100 ml #1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Ambrobene oral solution 7.5 mg/ml 40 ml #1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [46]:
def prep_data_for_scikit_hts(df):
    # Beware, chatgpt below :P
    aggregated_df = pd.DataFrame()

    # Split the columns into hierarchical levels by '-'
    columns_split = [col.split(' - ') for col in df.columns]

    # Get the unique top-level classes (regions)
    regions = list(set([col[0] for col in columns_split if len(col) > 1]))

    # AFAIK, HTS needs a 'total' column for each level in the hierarchy. I believe all tree nodes except bottom-most
    # Create a dictionary to represent the hierarchy, starting with 'total'
    hierarchy = {'total': regions}

    # Iterate through regions: Дальневосточный ФО
    for region in regions:
        # Drug Categories - 'ADRIANOL', 'AGALATES', 'ALMAGEL', 'ALMONT', 'AMBROBENE'
        categories = list(set([col[1] for col in columns_split if len(col) > 1 and col[0] == region]))
        region_key = region
        hierarchy[region_key] = [f'{region} - {category}' for category in categories]

        # Aggregate at the region level
        region_columns = [col for col in df.columns if col.startswith(f'{region} - ')]
        aggregated_df[region_key] = df[region_columns].sum(axis=1)

        # Iterate through Drug categories
        for category in categories:
            category_key = f'{region} - {category}'
            products = [col for col in df.columns if col.startswith(f'{region} - {category} - ')]
            hierarchy[category_key] = products

            # Aggregate at the category level
            category_columns = [col for col in df.columns if col.startswith(f'{region} - {category} - ')]
            aggregated_df[category_key] = df[category_columns].sum(axis=1)

    # Concatenate the aggregated columns with the original DataFrame
    df_with_aggregates = pd.concat([df, aggregated_df], axis=1)

    # Add the "total" column across all columns
    df_with_aggregates['total'] = df_with_aggregates.sum(axis=1)

    return df_with_aggregates, hierarchy

In [47]:
df_with_aggregates, hierarchy = prep_data_for_scikit_hts(df)

  df_with_aggregates['total'] = df_with_aggregates.sum(axis=1)


In [48]:
df_with_aggregates.head(5)

Unnamed: 0,Month,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 25 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 50 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin tabs 4+100 mg #20,Дальневосточный ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 100 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 40 ml #1,Дальневосточный ФО - AMBROBENE - Ambrobene solution for inj 15 mg/2ml 2 ml #5 ampules,Дальневосточный ФО - AMBROBENE - Ambrobene syrup 15 mg/5ml #100 bottle,Дальневосточный ФО - AMBROBENE - Ambrobene tabs 30 mg #20,...,Северо-кавказский ФО - TROXEVASIN,Приволжский ФО,Приволжский ФО - AMBROBENE,Приволжский ФО - VALZ,Приволжский ФО - TROXEVASIN,Северо-западный ФО,Северо-западный ФО - AMBROBENE,Северо-западный ФО - VALZ,Северо-западный ФО - TROXEVASIN,total
0,2018-03-01,1,1,1,1,1039,1,1151,1082,4011,...,257,60959,60334,10,615,21827,20661,7,1159,577968
1,2018-04-01,1,1,1,1,3220,1904,1551,9208,10352,...,553,125205,124159,10,1036,38191,35602,7,2582,1133067
2,2018-05-01,1,1,1,1,4523,2221,1701,10999,13751,...,1229,133062,131532,337,1193,52819,49024,46,3749,1449378
3,2018-06-01,5,101,1,1,5126,2251,1051,25765,19211,...,1660,128820,126354,444,2022,48226,44809,46,3371,1789011
4,2018-07-01,25,101,1,1,3634,378,651,20550,20003,...,2009,93419,91004,585,1830,52565,49111,46,3408,1818312


<font color='cyan'>HierarchicalForecast likes data to be Drug | Date | Sales, rather than having DrugName as columns</font>

In [49]:
# Melt the DataFrame - convert ColNames to rows to match input to HierForecast
melted_df = df_with_aggregates.melt(id_vars=['Month'], var_name='Drug', value_name='Sales')

# Convert melted DataFrame to the required format
melted_df = melted_df[['Drug', 'Month', 'Sales']]

# Col names seem to need to be thus for package
melted_df.rename(columns={'Drug': 'unique_id', 'Month':'ds', 'Sales':'y'}, inplace=True)


melted_df


Unnamed: 0,unique_id,ds,y
0,Дальневосточный ФО - AMBROBENE - Ambrobene Sto...,2018-03-01,1
1,Дальневосточный ФО - AMBROBENE - Ambrobene Sto...,2018-04-01,1
2,Дальневосточный ФО - AMBROBENE - Ambrobene Sto...,2018-05-01,1
3,Дальневосточный ФО - AMBROBENE - Ambrobene Sto...,2018-06-01,5
4,Дальневосточный ФО - AMBROBENE - Ambrobene Sto...,2018-07-01,25
...,...,...,...
12991,total,2022-07-01,1473174
12992,total,2022-08-01,1433685
12993,total,2022-09-01,1256082
12994,total,2022-10-01,1287393


In [50]:
hierarchy

{'total': ['Сибирский ФО',
  'Уральский ФО',
  'Дальневосточный ФО',
  'Центральный ФО',
  'Южный ФО',
  'Северо-кавказский ФО',
  'Приволжский ФО',
  'Северо-западный ФО'],
 'Сибирский ФО': ['Сибирский ФО - AMBROBENE',
  'Сибирский ФО - VALZ',
  'Сибирский ФО - TROXEVASIN'],
 'Сибирский ФО - AMBROBENE': ['Сибирский ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1',
  'Сибирский ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 25 ml #1',
  'Сибирский ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 50 ml #1',
  'Сибирский ФО - AMBROBENE - Ambrobene Stoptussin tabs 4+100 mg #20',
  'Сибирский ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 100 ml #1',
  'Сибирский ФО - AMBROBENE - Ambrobene oral solution 7.5 mg/ml 40 ml #1',
  'Сибирский ФО - AMBROBENE - Ambrobene solution for inj 15 mg/2ml 2 ml #5 ampules',
  'Сибирский ФО - AMBROBENE - Ambrobene syrup 15 mg/5ml #100 bottle',
  'Сибирский ФО - AMBROBENE - Ambrobene tabs 30 mg #20',
  'Сибирский

In [51]:
print(Y_df.head(15))
print(Y_df.tail(5))

                                            unique_id         ds     y
0   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-03-01   948
1   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-04-01  1036
2   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-05-01  1245
3   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-06-01  1786
4   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-07-01  4131
5   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-08-01  4508
6   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-09-01  3760
7   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-10-01  2280
8   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-11-01  2094
9   Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2018-12-01  2559
10  Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2019-01-01  1856
11  Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2019-02-01  1731
12  Дальневосточный ФО - ALMAGEL - Almagel susp 17... 2019-03-01   670
13  Да

In [53]:
tags

{'Views': array(['Total'], dtype=object),
 'Views/Country': array(['de', 'en', 'fr', 'ja', 'ru', 'zh'], dtype=object),
 'Views/Country/Access': array(['de_AAC', 'de_DES', 'de_MOB', 'en_AAC', 'en_DES', 'en_MOB',
        'fr_AAC', 'fr_DES', 'fr_MOB', 'ja_AAC', 'ja_DES', 'ja_MOB',
        'ru_AAC', 'ru_DES', 'ru_MOB', 'zh_AAC', 'zh_DES', 'zh_MOB'],
       dtype=object),
 'Views/Country/Access/Agent': array(['de_AAC_AAG', 'de_AAC_SPD', 'de_DES_AAG', 'de_MOB_AAG',
        'en_AAC_AAG', 'en_AAC_SPD', 'en_DES_AAG', 'en_MOB_AAG',
        'fr_AAC_AAG', 'fr_AAC_SPD', 'fr_DES_AAG', 'fr_MOB_AAG',
        'ja_AAC_AAG', 'ja_AAC_SPD', 'ja_DES_AAG', 'ja_MOB_AAG',
        'ru_AAC_AAG', 'ru_AAC_SPD', 'ru_DES_AAG', 'ru_MOB_AAG',
        'zh_AAC_AAG', 'zh_AAC_SPD', 'zh_DES_AAG', 'zh_MOB_AAG'],
       dtype=object),
 'Views/Country/Access/Agent/Topic': array(['de_AAC_AAG_001', 'de_AAC_AAG_010', 'de_AAC_AAG_014',
        'de_AAC_AAG_045', 'de_AAC_AAG_063', 'de_AAC_AAG_100',
        'de_AAC_AAG_110', 'de_AAC

We split the dataframe in train/test splits.

In [54]:
Y_df = melted_df

In [55]:
Y_test_df = Y_df.groupby('unique_id').tail(7) # Original code
Y_train_df = Y_df.drop(Y_test_df.index)

In [56]:
Y_test_df = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')

In [57]:
Y_test_df

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2022-05-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2022-06-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2022-07-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2022-08-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2022-09-01,1
...,...,...
total,2022-07-01,1473174
total,2022-08-01,1433685
total,2022-09-01,1256082
total,2022-10-01,1287393


In [58]:
Y_train_df

Unnamed: 0_level_0,ds,y
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2018-03-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2018-04-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2018-05-01,1
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2018-06-01,5
Дальневосточный ФО - AMBROBENE - Ambrobene Stoptussin drops 4 mg+100 mg/ml 10 ml #1,2018-07-01,25
...,...,...
total,2021-12-01,5219037
total,2022-01-01,4549113
total,2022-02-01,3128595
total,2022-03-01,2460279


## 2. Base Forecasts

The following cell computes the *base forecast* for each time series using the `ETS` and `naive` models. Observe that `Y_hat_df` contains the forecasts but they are not coherent.

In [59]:
%%capture
from statsforecast.models import ETS, Naive
from statsforecast.core import StatsForecast

In [60]:
%%capture
fcst = StatsForecast(
    df=Y_train_df,
    models=[ETS(season_length=7, model='ZAA'), Naive()],
    freq='M',
    n_jobs=-1
)
Y_hat_df = fcst.forecast(h=7)

Observe that the ETS model computes negative forecasts for some series.

In [65]:
Y_hat_df

Unnamed: 0_level_0,ds,ETS,Naive
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
total,2022-04-30,2.643419e+06,2547954.0
total,2022-05-31,2.786836e+06,2547954.0
total,2022-06-30,2.918572e+06,2547954.0
total,2022-07-31,2.865979e+06,2547954.0
total,2022-08-31,2.775162e+06,2547954.0
...,...,...,...
Южный ФО - VALZ - Valz N FC tabs 80 mg/12.5mg #28,2022-06-30,2.654084e+02,1839.0
Южный ФО - VALZ - Valz N FC tabs 80 mg/12.5mg #28,2022-07-31,1.556661e+02,1839.0
Южный ФО - VALZ - Valz N FC tabs 80 mg/12.5mg #28,2022-08-31,4.382364e+02,1839.0
Южный ФО - VALZ - Valz N FC tabs 80 mg/12.5mg #28,2022-09-30,4.720009e+02,1839.0


In [69]:
S_df.shape

(40, 196)

## 3. Non-Negative Reconciliation

The following cell makes the previous forecasts coherent and nonnegative using the `HierarchicalReconciliation` class.

In [62]:
from hierarchicalforecast.methods import MinTrace, BottomUp
from hierarchicalforecast.core import HierarchicalReconciliation

In [None]:
reconcilers = [
    BottomUp(),
    TopDown(method='forecast_proportions'),
    MiddleOut(middle_level='Country/Purpose/State',
              top_down_method='forecast_proportions')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
                          S=S_df, tags=tags)

In [64]:
%%capture
reconcilers = [
    BottomUp()
    # MinTrace(method='ols'),
    # MinTrace(method='ols', nonnegative=True)
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
                          S=S_df, tags=hierarchy)

Exception: ignored

Observe that the nonnegative reconciliation method obtains nonnegative forecasts.

In [None]:
Y_rec_df.query('`ETS/MinTrace_method-ols_nonnegative-True` < 0')

Unnamed: 0_level_0,ds,ETS,Naive,ETS/MinTrace_method-ols,Naive/MinTrace_method-ols,ETS/MinTrace_method-ols_nonnegative-True,Naive/MinTrace_method-ols_nonnegative-True
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1


The free reconciliation method gets negative forecasts.

In [None]:
Y_rec_df.query('`ETS/MinTrace_method-ols` < 0')

Unnamed: 0_level_0,ds,ETS,Naive,ETS/MinTrace_method-ols,Naive/MinTrace_method-ols,ETS/MinTrace_method-ols_nonnegative-True,Naive/MinTrace_method-ols_nonnegative-True
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
de_DES,2016-12-25,-2553.932861,495.0,-3468.745214,495.0,2.262540e-15,495.0
de_DES,2016-12-26,-2155.228271,495.0,-2985.587125,495.0,1.356705e-30,495.0
de_DES,2016-12-27,-2720.993896,495.0,-3698.680055,495.0,6.857413e-30,495.0
de_DES,2016-12-29,-3429.432617,495.0,-2965.207609,495.0,2.456449e+02,495.0
de_DES,2016-12-30,-3963.202637,495.0,-3217.360371,495.0,3.646790e+02,495.0
...,...,...,...,...,...,...,...
zh_MOB_AAG_036,2016-12-26,75.298317,115.0,-165.799776,115.0,3.207772e-14,115.0
zh_MOB_AAG_036,2016-12-27,72.895554,115.0,-134.340626,115.0,2.308198e-14,115.0
zh_MOB_AAG_138,2016-12-25,94.796623,65.0,-47.009813,65.0,3.116938e-14,65.0
zh_MOB_AAG_138,2016-12-26,71.293983,65.0,-169.804110,65.0,0.000000e+00,65.0


## 4. Evaluation

The `HierarchicalForecast` package includes the `HierarchicalEvaluation` class to evaluate the different hierarchies and also is capable of compute scaled metrics compared to a benchmark model.

In [None]:
from hierarchicalforecast.evaluation import HierarchicalEvaluation

In [None]:
def mse(y, y_hat):
    return np.mean((y-y_hat)**2)

evaluator = HierarchicalEvaluation(evaluators=[mse])
evaluation = evaluator.evaluate(
        Y_hat_df=Y_rec_df, Y_test_df=Y_test_df,
        tags=tags, benchmark='Naive'
)
evaluation.filter(like='ETS', axis=1).T

level,Overall,Views,Views/Country,Views/Country/Access,Views/Country/Access/Agent,Views/Country/Access/Agent/Topic
metric,mse-scaled,mse-scaled,mse-scaled,mse-scaled,mse-scaled,mse-scaled
ETS,1.011585,0.7358,1.190354,1.103657,1.089515,1.397139
ETS/MinTrace_method-ols,0.979163,0.698355,1.062521,1.143277,1.113349,1.354041
ETS/MinTrace_method-ols_nonnegative-True,0.945075,0.677892,1.004639,1.184719,1.141442,1.158672


Observe that the nonnegative reconciliation method performs better that its unconstrained counterpart.

### References
- [Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting: principles and practice, 3rd edition:
Chapter 11: Forecasting hierarchical and grouped series.". OTexts: Melbourne, Australia. OTexts.com/fpp3
Accessed on July 2022.](https://otexts.com/fpp3/hierarchical.html)
- [Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). \"Optimal forecast reconciliation for
    hierarchical and grouped time series through trace minimization\". Journal of the American Statistical Association,
    114 , 804–819. doi:10.1080/01621459.2018.1448825.](https://robjhyndman.com/publications/mint/).
- [Wickramasuriya, S.L., Turlach, B.A. & Hyndman, R.J. (2020). \"Optimal non-negative
    forecast reconciliation". Stat Comput 30, 1167–1182,
    https://doi.org/10.1007/s11222-020-09930-0](https://robjhyndman.com/publications/nnmint/).