![Forecasting Demo 2: PyTorch Forecasting](https://raw.githubusercontent.com/VectorInstitute/forecasting-bootcamp/media-assets-do-not-merge/forecasting-demo-2.png?token=GHSAT0AAAAAABQMCWQFIGE34XPH3MQR2BBEYRHXQCQ)

[PyTorch Forecasting](https://github.com/jdb78/pytorch-forecasting) is a package/repository that provides convenient implementations of several leading deep learning-based forecasting models, namely [Temporal Fusion Transformers](https://arxiv.org/pdf/1912.09363.pdf), [N-BEATS](http://arxiv.org/abs/1905.10437), and [DeepAR](https://www.sciencedirect.com/science/article/pii/S0169207019301888). PyTorch Forecasting is built using [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/), making it easier to train in multi-GPU compute environments, out-of-the-box. 

**Note for Colab users:** Run the following cell to install PyTorch Forecasting. After installation completes, you will likely need to restart the Colab runtime. If this is the case, a button `RESTART RUNTIME` will appear at the bottom of the next cell's output.

In [None]:
if 'google.colab' in str(get_ipython()):
    !pip install pytorch-forecasting

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor, Callback
from pytorch_forecasting import TimeSeriesDataSet, NBeats, Baseline, DeepAR, GroupNormalizer, MultiNormalizer, EncoderNormalizer
from pytorch_forecasting.data import NaNLabelEncoder
from pytorch_forecasting.metrics import MAPE, MAE, MASE, RMSE
from pytorch_forecasting.metrics import NormalDistributionLoss

In [None]:
class EpochCallback(Callback):
        
    def on_validation_epoch_end(self, trainer, module):
        print(trainer.progress_bar_metrics)

In [None]:
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount('/content/drive')

## Data Loading

We will continue with the same example dataset and task as the previous demo.

In [None]:
# data_filename = "/content/drive/MyDrive/bootcamp_datasets/boc_exchange/dataset.csv"
data_filename = "/ssd003/projects/forecasting_bootcamp/bootcamp_datasets/boc_exchange/dataset.csv"
data_df = pd.read_csv(data_filename, index_col=0)
data_df.index = pd.to_datetime(data_df.index)
data_df = data_df.reset_index().rename({'index':'date'}, axis=1)
data_df

## Data Splitting

We immediately split the data into training and test sets. We can later split the training set into training and validation.

In [None]:
train_size = 0.8

train_df = data_df.iloc[:int(len(data_df)*train_size)]
test_df = data_df.iloc[int(len(data_df)*train_size):]

## Data Formatting

PyTorch Forecasting expects data to be formatted using its own [TimeSeriesDataSet](https://pytorch-forecasting.readthedocs.io/en/latest/data.html) objects. Building a TimeSeriesDataSet begins with a Pandas DataFrame, and like with Prophet and NeuralProphet, we need to add certain custom columns.

For this demo, we are once again specifying a lead time of 30 days and we will withhold the last 20% of data for testing. In the code below, we are very careful to ensure that when training and validating the model, it does not have access to the withheld data.

## Time Series Data Formatting

PyTorch Forecasting models can accomodate datasets consisting of multiple, coincident time series in several ways. As per the [documentation](https://pytorch-forecasting.readthedocs.io/en/latest/data.html), a combination of `group_id` and `time_idx` identify a sample in the data, and that *if we have only one time series, to set* `group_id` *to a constant.*

`time_idx` is an *integer column denoting the time index*. This, as opposed to the `date` column, is used to determine the temporal sequence of samples. 

In [None]:
lag_time = 30
lead_time = 30

train_df = train_df.reset_index().rename({'index':'time_idx'}, axis=1)
test_df = test_df.reset_index().rename({'index':'time_idx'}, axis=1)

train_df['group_ids'] = 0
test_df['group_ids'] = 0

In [None]:
train_df

## Preparing a TimeSeriesDataSet for N-BEATS

N-BEATS is a univariate forecasting method. As such, it can only process one variable at a time.

In [None]:
train_df_nbeats = train_df.melt(id_vars=['time_idx', 'date'], value_vars=data_df.columns, var_name='group_ids')
train_df_nbeats.head()

In [None]:
train_df_nbeats.tail()

The following code defines the training/validation split and important parts of the prediction task - the lag time and lead time. Here we are specifying that 25% of the training data should be used for validation. In total, this results in a 60/20/20 train/valid/test split (since 25% of 80% is 20% of the initial 100%).

In [None]:
# define dataset
valid_size = 0.25  
max_encoder_length = lag_time
max_prediction_length = lead_time
training_cutoff = train_df_nbeats["time_idx"][:int((1 - valid_size)*train_df_nbeats["time_idx"].max())].values[-1]
context_length = max_encoder_length
prediction_length = max_prediction_length

Since NBEATS is actually a univariate forecasting model, it makes sense that pytorch-forecasting does not support specifying explanatory variables as `time_varying_unknown_reals`. Earlier, we used the Pandas function `melt` to reshape the data into a single `value` column that is uniquely indexed by pairs of `(time_idx, group_ids)`.

In [None]:
training = TimeSeriesDataSet(
    train_df_nbeats.loc[train_df_nbeats.time_idx <= training_cutoff],  # Applying the training cutoff
    time_idx="time_idx",
    target="value",
    categorical_encoders={"group_ids": NaNLabelEncoder().fit(train_df_nbeats.group_ids)},
    group_ids=['group_ids'],
    max_encoder_length=context_length,
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=["value"],
)

In [None]:
validation = TimeSeriesDataSet.from_dataset(training, train_df_nbeats, min_prediction_idx=training_cutoff + 1)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=1)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=1)

## Training the model

In our first effort to train the N-BEATS model, we adopt the default configuration suggested by the pytorch-forecasting [tutorials](https://pytorch-forecasting.readthedocs.io/en/stable/tutorials/ar.html).

In [None]:
pl.seed_everything(42)
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
epoch_callback = EpochCallback()

trainer = pl.Trainer(
    max_epochs=8,
    gpus=1,
    weights_summary="top",
    gradient_clip_val=0.01,
    callbacks=[early_stop_callback, epoch_callback],
    limit_train_batches=1.0,
)

net = NBeats.from_dataset(
    training,
    learning_rate=1e-4,
    log_interval=10,
    log_val_interval=1,
    weight_decay=1e-2,
    # widths=[16, 16],
    # backcast_loss_ratio=1.0,
)

trainer.fit(
    net,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

## Testing the model

We can now evaluate the model using the test set. Even though we trained the N-BEATS model on all variables in the training set, we will only evaluate performance on `USD_CLOSE` as a direct comparison to earlier experiments with Prophet and NeuralProphet. The following code creates a dataloader for the test set.

In [None]:
test_dataset = TimeSeriesDataSet(
    test_df,
    time_idx="time_idx",
    target="USD_CLOSE",
    group_ids=['group_ids'],
    max_encoder_length=context_length,
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=["USD_CLOSE"],
)

test_dataloader = test_dataset.to_dataloader(train=False, batch_size=1, num_workers=0)

## Collect test set predictions

In [None]:
best_model_path = trainer.checkpoint_callback.best_model_path
best_model = NBeats.load_from_checkpoint(best_model_path)

actuals = torch.cat([y[0] for x, y in iter(test_dataloader)])
predictions = best_model.predict(test_dataloader)

## Plot model forecasts at max lead time

Like in Demo 1, we are plotting the predictions at max lead time as well as one individual example forecast. Feel free to modify the `start` variable to plot different examples.

In [None]:
actuals_np = actuals.detach().numpy()
predictions_np = predictions.detach().numpy()

indexes = []
preds = []
trues = []

for time_idx in range(actuals_np.shape[0]):
    indexes.append(time_idx)
    preds.append(predictions_np[time_idx][-1])
    trues.append(actuals_np[time_idx][-1])

fig, ax = plt.subplots()
ax.plot(indexes, preds, color='red', label='forecast')
ax.plot(indexes, trues, color='blue', label='ground truth')
ax.set_title(f"Forecasts at max lead time ({lead_time} samples) - N-BEATS")
ax.legend()

In [None]:
# plot at single time index

fig, axs = plt.subplots(4, 4, figsize=(20,20))
axs = axs.flatten()

for fig_index, example_index in enumerate(range(lag_time, len(predictions_np), lead_time)):

    if fig_index > len(axs) - 1:
        break
    
    n_prev_observations = lag_time
    start_index = example_index - n_prev_observations
    start_index = 0 if start_index < 0 else start_index

    ground_truth = actuals_np[example_index]
    forecast = predictions_np[example_index]

    for index in range(start_index, example_index):
        past_values = actuals_np[index]
        axs[fig_index].plot(list(range(index, index+lag_time)), past_values, color='black')

    index += 1
    axs[fig_index].plot(list(range(index, index+lead_time)), ground_truth, color='blue', label='ground truth')
    axs[fig_index].plot(list(range(index, index+lead_time)), forecast, color='red', label='forecast')
    axs[fig_index].legend()

## Compare results to previous experiment

If you have run through experiments using **Demo 1**, you can load the results file here and continue the experiment.

In [None]:
results_df = pd.read_csv("./exchange_rate_mean_test_metrics.csv", index_col=0)
results_df.sort_values('mae')

To ensure consistency, we will redefine the same metrics from **Demo 1**.

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error

metrics = {
    'mse': mean_squared_error,
    'rmse': lambda y_true, y_pred: np.sqrt(mean_squared_error(y_true, y_pred)),
    'mae': mean_absolute_error,
    'mape': mean_absolute_percentage_error
}

def compute_error_statistics(error_metrics_dict, exp_name):
    return {
        'mean': pd.DataFrame(error_metrics_dict).mean(axis=0).rename(f'{exp_name}_mean_metrics'),
        'std': pd.DataFrame(error_metrics_dict).std(axis=0).rename(f'{exp_name}_std_metrics'),
        'max': pd.DataFrame(error_metrics_dict).max(axis=0).rename(f'{exp_name}_max_metrics'),
    }

def compute_baseline_error_metrics(predictions, actuals):

    errors = {metric_name:[] for metric_name in metrics.keys()}

    for i in range(len(predictions)):
    
        for metric_name, metric_fn in metrics.items(): 
                errors[metric_name].append(metric_fn(y_true=actuals[i], y_pred=predictions[i]))

    return errors

In [None]:
actuals_np = actuals.detach().numpy()
predictions_np = predictions.detach().numpy()

nbeats_errors = compute_baseline_error_metrics(predictions_np, actuals_np)
nbeats_stats = compute_error_statistics(nbeats_errors, 'nbeats')
nbeats_stats['mean']

In [None]:
results_df = results_df.append(nbeats_stats['mean'])
results_df.sort_values('mae')

## DeepAR

We will now add pytorch-forecasting's implementation of DeepAR to our experiment. The implementation is based on the original [paper](https://www.sciencedirect.com/science/article/pii/S0169207019301888). DeepAR is a method for probabilistic forecasting with autoregressive recurrent neural networks. DeepAR learns a *global model* from all historical data in the dataset, similar to N-BEATS. 

In [None]:
deepar_training = TimeSeriesDataSet(
    train_df_nbeats.loc[train_df_nbeats.time_idx <= training_cutoff],  # Applying the training cutoff
    time_idx="time_idx",
    target="value",
    group_ids=['group_ids'],
    min_encoder_length=context_length,
    max_encoder_length=context_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    categorical_encoders={"group_ids": NaNLabelEncoder().fit(train_df_nbeats.group_ids)},
    time_varying_unknown_reals=["value"],
    time_varying_known_reals=["time_idx"],
    target_normalizer=GroupNormalizer(groups=["group_ids"]),
    add_relative_time_idx=False,
    add_target_scales=True,
    randomize_length=None,
)

In [None]:
deepar_validation = TimeSeriesDataSet.from_dataset(deepar_training, 
                                                   train_df_nbeats, 
                                                   min_prediction_idx=training_cutoff + 1)
batch_size = 64
train_dataloader = deepar_training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = deepar_validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)

In [None]:
pl.seed_everything(42)
trainer = pl.Trainer(gpus=1)
net = DeepAR.from_dataset(deepar_training)

In [None]:
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
epoch_callback = EpochCallback()

trainer = pl.Trainer(
    max_epochs=8,
    gpus=1,
    gradient_clip_val=0.1,
    # weights_summary="top",
    callbacks=[epoch_callback],
    limit_train_batches=1.0,
)

net = DeepAR.from_dataset(
    deepar_training,
    hidden_size=32,
    dropout=0.1,
    loss=NormalDistributionLoss(),
    learning_rate=1e-4,
    log_interval=10,
    log_val_interval=1,
)

trainer.fit(
    net,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

In [None]:
# save datasets
deepar_training.save("training.pkl")
deepar_validation.save("validation.pkl")

In [None]:
best_model_path = trainer.checkpoint_callback.best_model_path
best_model = DeepAR.load_from_checkpoint(best_model_path)
print(best_model_path)

### Load the model from a saved checkpoint (optional)

In [None]:
best_model = DeepAR.load_from_checkpoint(best_model_path)

### Prepare the test dataset with necessary preprocessing options

Since we only want to evaluate on **USD_CLOSE**, the test dataset should only contain data from this series (as opposed to all of them vertically stacked like we did for training).

In [None]:
deep_ar_test_dataset = TimeSeriesDataSet(
    test_df,
    time_idx="time_idx",
    target="USD_CLOSE",
    group_ids=['group_ids'],
    max_encoder_length=context_length,
    max_prediction_length=max_prediction_length,
    categorical_encoders={"group_ids": NaNLabelEncoder().fit(test_df.group_ids)},
    time_varying_unknown_reals=["USD_CLOSE"],
    time_varying_known_reals=["time_idx"],
    target_normalizer=GroupNormalizer(groups=["group_ids"]),
    add_relative_time_idx=False,
    add_target_scales=True,
    randomize_length=None,
)

deepar_test_dataloader = deep_ar_test_dataset.to_dataloader(train=False, batch_size=1, num_workers=0)

### Same visualization and evaluation code that we used previously

In [None]:
actuals = torch.cat([y[0] for x, y in iter(deepar_test_dataloader) if y[0].shape[1] == lead_time])
print(actuals.shape)

In [None]:
predictions = best_model.predict(deepar_test_dataloader)
print(predictions.shape)

In [None]:
actuals_np = actuals.detach().numpy()
predictions_np = predictions.detach().numpy()

indexes = []
preds = []
trues = []

for time_idx in range(actuals_np.shape[0]):
    indexes.append(time_idx)
    preds.append(predictions_np[time_idx][-1])
    trues.append(actuals_np[time_idx][-1])

fig, ax = plt.subplots()
ax.plot(indexes, preds, color='red', label='forecast')
ax.plot(indexes, trues, color='blue', label='ground truth')
ax.set_title(f"Forecasts at max lead time ({lead_time} samples) - N-BEATS")
ax.legend()

In [None]:
# plot at single time index

fig, axs = plt.subplots(4, 4, figsize=(20,20))
axs = axs.flatten()

for fig_index, example_index in enumerate(range(lag_time, len(predictions_np), lead_time)):

    if fig_index > len(axs) - 1:
        break
    
    n_prev_observations = lag_time
    start_index = example_index - n_prev_observations
    start_index = 0 if start_index < 0 else start_index

    ground_truth = actuals_np[example_index]
    forecast = predictions_np[example_index]

    for index in range(start_index, example_index):
        past_values = actuals_np[index]
        axs[fig_index].plot(list(range(index, index+lag_time)), past_values, color='black')

    index += 1
    axs[fig_index].plot(list(range(index, index+lead_time)), ground_truth, color='blue', label='ground truth')
    axs[fig_index].plot(list(range(index, index+lead_time)), forecast, color='red', label='forecast')
    axs[fig_index].legend()

In [None]:
deepar_errors = compute_baseline_error_metrics(predictions_np, actuals_np)
deepar_stats = compute_error_statistics(deepar_errors, 'deepAR')
deepar_stats['mean']

In [None]:
results_df = results_df.append(deepar_stats['mean'])
results_df.sort_values('mae')