# Local vs Global Forecasting

The related article is available [here](https://medium.com/towards-data-science/local-vs-global-forecasting-what-you-need-to-know-1cc29e66cae0).

This notebook showcases the difference between the local and global approach for time-series forecasting. 

To install the needed dependencies, you can follow the instructions in the README at the root of the repository.

In [None]:
import pandas as pd
import plotly.graph_objects as go
from lightgbm import LGBMRegressor
from sklearn.preprocessing import MinMaxScaler

## Data preparation

In this example we use the Australian Tourism dataset. 

The dataset is made of quarter time-series starting in 1998. In this notebook we consider the tourism numbers at a region level.

In [None]:
# Load data.
data = pd.read_csv('https://raw.githubusercontent.com/unit8co/darts/master/datasets/australian_tourism.csv')

# Add time information: quarterly data starting in 1998.
data.index = pd.date_range("1998-01-01",periods = len(data), freq = "3MS")
data.index.name = "time"

# Consider only region-level data.
data = data[['NSW','VIC', 'QLD', 'SA', 'WA', 'TAS', 'NT']]

# Let's give it nicer names.
data = data.rename(columns = {
    'NSW': "New South Wales",
    'VIC': "Victoria", 
    'QLD': "Queensland", 
    'SA': "South Australia", 
    'WA': "Western Australia", 
    'TAS': "Tasmania", 
    'NT': "Northern Territory",
})

## Quick analysis

In [None]:
# Let's visualize the data.
def show_data(data,title=""):
    trace = [go.Scatter(x=data.index,y=data[c],name=c) for c in data.columns]
    go.Figure(trace,layout=dict(title=title)).show()

show_data(data,"Australian Tourism data by Region")

We can see that:
- data exhibits a strong seasonality
- the scale of the time-series is quite different across different regions
- the length of the time-series is always the same
- there's no missing data

## Data Engineering

Let's predict the value of the next quarter based on:
- the lagged values of the previous 2 years.
- the current quarter (as a categorical feature).

In [None]:
def build_targets_features(data,lags=range(8),horizon=1):
    features = {}
    targets = {}
    for c in data.columns:
        
        # Build lagged features.
        feat = pd.concat([data[[c]].shift(lag).rename(columns = {c: f"lag_{lag}"}) for lag in lags],axis=1)
        # Build quarter feature.
        feat["quarter"] = [f"Q{int((m-1) / 3 + 1)}" for m in data.index.month]
        feat["quarter"] = feat["quarter"].astype("category")

        # Build target at horizon.
        targ = data[c].shift(-horizon).rename(f"horizon_{horizon}")
        
        # Drop missing values generated by lags/horizon.
        idx = ~(feat.isnull().any(axis=1) | targ.isnull())
        features[c] = feat.loc[idx]
        targets[c] = targ.loc[idx]
        
    return targets,features


# Build targets and features.
targets,features = build_targets_features(data)

## Train/Test split

Let's consider the last 2 years as test set.

In [None]:
def train_test_split(targets,features,test_size=8):
    targ_train = {k: v.iloc[:-test_size] for k,v in targets.items()}
    feat_train = {k: v.iloc[:-test_size] for k,v in features.items()}
    targ_test = {k: v.iloc[-test_size:] for k,v in targets.items()}
    feat_test = {k: v.iloc[-test_size:] for k,v in features.items()}
    return targ_train,feat_train,targ_test,feat_test

targ_train,feat_train,targ_test,feat_test = train_test_split(targets,features)

## Model training

Now we estimate forecasting models using two different approaches:
- Local Approach: estimate one model for each time-series
- Global Approach: estimate one model for all time-series

In both cases we use a LightGBM model with default parameters.


### Local Approach

In [None]:
# Instantiate one LightGBM model with default parameters for each target.
local_models = {k: LGBMRegressor() for k in data.columns}

# Fit the models on the training set.
for k in data.columns:
    local_models[k].fit(feat_train[k],targ_train[k])

### Global Approach

The global approach needs a few extra steps. 
1. First, since the targets have different scales, we perform a normalization step. 
2. Then to allow the model to distinguish across different targets, we add a categorical feature for each target.
3. Finally, we need to concatenate the data for different targets together. 

In [None]:
def fit_scalers(feat_train,targ_train):
    feat_scalers = {k: MinMaxScaler().set_output(transform="pandas") for k in feat_train}
    targ_scalers = {k: MinMaxScaler().set_output(transform="pandas") for k in feat_train}
    for k in feat_train:
        feat_scalers[k].fit(feat_train[k].drop(columns="quarter"))
        targ_scalers[k].fit(targ_train[k].to_frame())
    return feat_scalers,targ_scalers
        
        
def scale_features(feat,feat_scalers):
    scaled_feat = {}
    for k in feat:
        df = feat[k].copy()
        cols = [c for c in df.columns if c not in {"quarter"}]
        df[cols] = feat_scalers[k].transform(df[cols])
        scaled_feat[k] = df
    return scaled_feat


def scale_targets(targ,targ_scalers):
    return {k: targ_scalers[k].transform(v.to_frame()) for k,v in targ.items()}


# Fit scalers on numerical features and target on the training period.
feat_scalers,targ_scalers = fit_scalers(feat_train,targ_train)

# Scale train data.
scaled_feat_train = scale_features(feat_train,feat_scalers)
scaled_targ_train = scale_targets(targ_train,targ_scalers)

# Scale test data.
scaled_feat_test = scale_features(feat_test,feat_scalers)
scaled_targ_test = scale_targets(targ_test,targ_scalers)

In [None]:
# Add a `target_name` feature.
def add_target_name_feature(feat):
    for k,df in feat.items():
        df["target_name"] = k

add_target_name_feature(scaled_feat_train)
add_target_name_feature(scaled_feat_test)

In [None]:
# Concatenate the data.
global_feat_train = pd.concat(scaled_feat_train.values())
global_targ_train = pd.concat(scaled_targ_train.values())
global_feat_test = pd.concat(scaled_feat_test.values())
global_targ_test = pd.concat(scaled_targ_test.values())

In [None]:
# Make `target_name` categorical after concatenation.
global_feat_train.target_name = global_feat_train.target_name.astype("category")
global_feat_test.target_name = global_feat_test.target_name.astype("category")

In [None]:
# Instantiate a single LightGBM model with default parameters.
global_model = LGBMRegressor()

# Fit the models on the training set.
global_model.fit(global_feat_train,global_targ_train)

## Predictions on the test set

In [None]:
# Make predictions with the local models.
pred_local = {k: model.predict(feat_test[k]) for k, model in local_models.items()}

In [None]:
def predict_global_model(global_model, global_feat_test, targ_scalers):
    # Predict.
    pred_global_scaled = global_model.predict(global_feat_test)
    # Re-arrange the predictions
    pred_df_global = global_feat_test[["target_name"]].copy()
    pred_df_global["predictions"] = pred_global_scaled
    pred_df_global = pred_df_global.pivot(
        columns="target_name", values="predictions"
    )
    # Un-scale the predictions
    return {
        k: targ_scalers[k]
        .inverse_transform(
            pred_df_global[[k]].rename(
                columns={k: global_targ_train.columns[0]}
            )
        )
        .reshape(-1)
        for k in pred_df_global.columns
    }


# Make predicitons with the global model.
pred_global = predict_global_model(global_model, global_feat_test, targ_scalers)

## Error Analysis

In [None]:
output = {}
for k in targ_test:
    df = targ_test[k].rename("target").to_frame()
    df["prediction_local"] = pred_local[k]
    df["prediction_global"] = pred_global[k]
    output[k] = df

In [None]:
def print_stats(output):
    output_all = pd.concat(output.values())
    mae_local = (output_all.target - output_all.prediction_local).abs().mean()
    mae_global = (output_all.target - output_all.prediction_global).abs().mean()

    print("                            LOCAL     GLOBAL")
    print(f"MAE overall              :  {mae_local:.1f}     {mae_global:.1f}\n")
    for k,df in output.items():   
        mae_local = (df.target - df.prediction_local).abs().mean()
        mae_global = (df.target - df.prediction_global).abs().mean()
        print(f"MAE - {k:19}:  {mae_local:.1f}     {mae_global:.1f}")

# Let's show some statistics.
print_stats(output)

In [None]:
def show_mae_by_timestep(output):
    output_all = pd.concat(output.values())
    df = pd.concat([
            (output_all.target - output_all.prediction_local).abs().groupby(level="time").mean().rename("mae_local"),
            (output_all.target - output_all.prediction_global).abs().groupby(level="time").mean().rename("mae_global"),
        ], 
        axis=1,
    )
    show_data(df,"MAE by timestep")
    
# Show the mean absolute error per timestep.
show_mae_by_timestep(output)

In [None]:
# Display the predictions.
for k,df in output.items():
    show_data(df,k)

## Conclusions

In this notebook we showcased the local and global approaches to time-series forecasting, using:
- quarterly Australian tourism data
- simple feature engineering
- a LightGBM model with no hyperparameter tuning

In this specific example, we saw that the global approach is superior, leading to a 43% lower mean absolute error than the local one.

In particular, the global approach had a lower MAE on:
- all the timesteps
- all the targets except for Western Australia

This was somehow expected, since:
- We are predicting multiple correlated time-series
- The depth of the historical data is very shallow
- We are using a somehow comples model for such a shallow univariate time-series. A classical statistical model such as Exponential Smoothing might be more appropriate in this setting.

