# Local vs Global Temporal Aggregation

> Temporal Hierarchical Aggregation on a local or global level.

In this notebook we explain the difference between temporally aggregating timeseries locally and globally.

You can run these experiments using CPU or GPU with Google Colab.

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/LocalGlobalAggregation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install hierarchicalforecast utilsforecast

## 1. Generate Data

In this example we will generate synthetic series to explain the difference between local- and global temporal aggregation. We will generate 2 series with a daily frequency.

In [None]:
from utilsforecast.data import generate_series

In [None]:
freq = "D"
n_series = 2
df = generate_series(n_series=n_series, 
                     freq=freq, 
                     min_length=2 * 365, 
                     max_length=4 * 365,  
                     equal_ends=True)

Note that our two timeseries do not have the same number of timesteps:

In [None]:
df.groupby('unique_id', observed=True)["ds"].count()

unique_id
0    1414
1    1289
Name: ds, dtype: int64

We then define a spec for our temporal aggregations.

In [None]:
spec  = {"year": 365, "quarter": 91, "month": 30, "week": 7, "day": 1}

## 2. Local aggregation (default)

In local aggregation, we treat the timestamps of each timeseries individually. It means that the temporal aggregation is performed by only lookking at the timestamps of each series, disregarding the timestamps of other series. 

In [None]:
from hierarchicalforecast.utils import aggregate_temporal

In [None]:
Y_df_local, S_df_local, tags_local = aggregate_temporal(df, spec)

  df._native_frame.groupby([df._native_frame[key] for key in keys])[


We have created temporal aggregations _per timeseries_, as the temporal aggregation `month-1` doesn't correspond to the same (year, month) for both timeseries. 

In [None]:
Y_df_local.query("temporal_id == 'month-1'")

Unnamed: 0,temporal_id,unique_id,ds,y
39,month-1,0,2000-03-16,93.574676
87,month-1,1,2000-07-19,91.506421


## 2. Global aggregation

In global aggregation, we examine all unique timestamps across all timeseries, and base our temporal aggregations on the unique list of timestamps across all timeseries.

In [None]:
Y_df_global, S_df_global, tags_globval = aggregate_temporal(df, spec, aggregation_type="global")


We have created temporal aggregations _across all timeseries_, as the temporal aggregation `month-1` corresponds to the same (year, month) for both timeseries. Since `month-1` isn't present in the second timeseries (as it is shorter), we have only have one record for the aggregation.

In [None]:
Y_df_global.query("temporal_id == 'month-1'")

Unnamed: 0,temporal_id,unique_id,ds,y
39,month-1,0,2000-03-16,93.574676


For `month-5` however, we have a record for both timeseries, as the second series has its first datapoint at that date.

In [None]:
Y_df_global.query("temporal_id == 'month-5'")

Unnamed: 0,temporal_id,unique_id,ds,y
43,month-5,0,2000-07-14,95.169659
87,month-5,1,2000-07-14,74.502584


## 3. What to choose?

- The default behavior is `local`. This means that temporal aggregations between timeseries can't be compared. This behavior is generally safer, and advised to use when time series are not necessarily related.
- The `global` behavior is useful when dealing with timeseries where we expect relationships between the timeseries and where timeseries don't have the same length. For example, in case of forecasting product demand where individual products not always have sales for all timeesteps, but one is interested in the overall aggregation. The `global` setting has more room for error, so be careful and check the aggregation result carefully.
- If all timeseries have the same length, `global` and `local` yield the same results.