[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/ourownstory/test-of-time/blob/main/tutorials/BenchmarkingTemplates.ipynb)

# Running benchmarking experiments
Note: The Benchmarking Framework does currently not properly support lagged covariates with multiple step ahead forecasts.

In [1]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/test-of-time.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes

# we also need prophet for this notebook
# !pip install prophet

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
from tot.datasets import Dataset
from tot.models.models_neuralprophet import NeuralProphetModel
from tot.models.models_prophet import ProphetModel
from tot.benchmark import SimpleBenchmark, CrossValidationBenchmark
set_log_level("ERROR")

## Load data

In [2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"

air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
peyton_manning_df = pd.read_csv(data_location + 'wp_log_peyton_manning.csv')

## 0. Configure Datasets and Model Parameters
First, we define the datasets that we would like to benchmark on.
Next, we define the models that we want to evaluate and set their hyperparameters.

In [3]:
dataset_list = [
    Dataset(df = air_passengers_df, name = "air_passengers", freq = "MS"),
    Dataset(df = peyton_manning_df, name = "peyton_manning", freq = "D"),
]
model_classes_and_params = [
    (NeuralProphetModel, {"seasonality_mode": "multiplicative", "learning_rate": 0.1}),
    (ProphetModel, {"seasonality_mode": "multiplicative"})
]

Note: As all the classes used in the Benchmark framework are dataclasses, 
they have a print function, allowing us to peek into them if we like:

In [4]:
model_classes_and_params

[(tot.models.models_neuralprophet.NeuralProphetModel,
  {'seasonality_mode': 'multiplicative', 'learning_rate': 0.1}),
 (tot.models.models_prophet.ProphetModel,
  {'seasonality_mode': 'multiplicative'})]

## 3. Manual Benchmark
If you need more control over the individual Experiments, you can set them up manually:

In [5]:
from tot.experiment import SimpleExperiment, CrossValidationExperiment
from tot.benchmark import ManualBenchmark, ManualCVBenchmark

### 3.1 ManualBenchmark: Manual SimpleExperiment Benchmark

In [10]:
metrics = ["RMSE", "MAPE"]
experiments = [
    SimpleExperiment(
        model_class=NeuralProphetModel,
        params={"seasonality_mode": "multiplicative", "learning_rate": 0.1},
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=0.25,
    ),
    SimpleExperiment(
        model_class=ProphetModel,
        params={"seasonality_mode": "multiplicative", },
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=0.25,
    )
]
benchmark = ManualBenchmark(
    experiments=experiments,
    metrics=metrics,
)
results_train, results_test = benchmark.run()

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").apply(lambda x: x[samples:]).reset_index(drop=True)

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").apply(lambda x: x[samples:]).reset_index(drop=True)

17:56:04 - cmdstanpy - INFO - Chain [1] start processing
17:56:04 - cmdstanpy - INFO - Chain [1] done processing


In [11]:
results_test

Unnamed: 0,data,model,params,experiment,RMSE,MAPE
0,air_passengers,NeuralProphet,"{'seasonality_mode': 'multiplicative', 'learni...",air_passengers_NeuralProphet_seasonality_mode_...,41.886112,9.221118
1,air_passengers,Prophet,"{'seasonality_mode': 'multiplicative', '_data_...",air_passengers_Prophet_seasonality_mode_multip...,33.422844,7.379519


### 3.2 ManualCVBenchmark: Manual CrossValidationExperiment Benchmark

In [12]:
air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
experiments = [
    CrossValidationExperiment(
        model_class=NeuralProphetModel,
        params={"seasonality_mode": "multiplicative", "learning_rate": 0.1},
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=0.10,
        num_folds=3,
        fold_overlap_pct=0,
    ),
    CrossValidationExperiment(
        model_class=ProphetModel,
        params={"seasonality_mode": "multiplicative", },
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=0.10,
        num_folds=3,
        fold_overlap_pct=0,
    ),
]
benchmark_cv = ManualCVBenchmark(
    experiments=experiments,
    metrics=metrics,
)
results_summary, results_train, results_test = benchmark_cv.run()

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").apply(lambda x: x[samples:]).reset_index(drop=True)

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").apply(lambda x: x[samples:]).reset_index(drop=True)

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").apply(lambda x: x[samples:]).reset_index(drop=True)

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").apply(lambda x: x[samples:]).reset_index(drop=True)

To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  predicted_new = predicted.groupby("ID").app

In [13]:
results_summary

Unnamed: 0,data,model,params,experiment,RMSE,MAPE,RMSE_std,MAPE_std,split
0,air_passengers,NeuralProphet,"{'seasonality_mode': 'multiplicative', 'learni...",air_passengers_NeuralProphet_seasonality_mode_...,16.233139,5.936355,3.077817,0.860741,train
1,air_passengers,Prophet,"{'seasonality_mode': 'multiplicative', '_data_...",air_passengers_Prophet_seasonality_mode_multip...,8.608308,3.087285,1.222968,0.241571,train
0,air_passengers,NeuralProphet,"{'seasonality_mode': 'multiplicative', 'learni...",air_passengers_NeuralProphet_seasonality_mode_...,35.846287,7.116955,1.583491,1.154174,test
1,air_passengers,Prophet,"{'seasonality_mode': 'multiplicative', '_data_...",air_passengers_Prophet_seasonality_mode_multip...,22.995789,4.639229,4.46579,0.721707,test
