# Quick Start: Running Chronos model on gift-eval benchmark

This notebook shows how to run Chronos model on the gift-eval benchmark.

Make sure you download the gift-eval benchmark and set the `GIFT-EVAL` environment variable correctly before running this notebook.

We will use the `Dataset` class to load the data and run the model. If you have not already please check out the [dataset.ipynb](./dataset.ipynb) notebook to learn more about the `Dataset` class. We are going to just run the model on two datasets for brevity. But feel free to run on any dataset by changing the `short_datasets` and `med_long_datasets` variables below.

Install Chronos package:
``
pip install git+https://github.com/amazon-science/chronos-forecasting.git
``

In [1]:
import json
from dotenv import load_dotenv
# Load environment variables
load_dotenv()

# short_datasets = "m4_yearly m4_quarterly m4_monthly m4_weekly m4_daily m4_hourly electricity/15T electricity/H electricity/D electricity/W solar/10T solar/H solar/D solar/W hospital covid_deaths us_births/D us_births/M us_births/W saugeenday/D saugeenday/M saugeenday/W temperature_rain_with_missing kdd_cup_2018_with_missing/H kdd_cup_2018_with_missing/D car_parts_with_missing restaurant hierarchical_sales/D hierarchical_sales/W LOOP_SEATTLE/5T LOOP_SEATTLE/H LOOP_SEATTLE/D SZ_TAXI/15T SZ_TAXI/H M_DENSE/H M_DENSE/D ett1/15T ett1/H ett1/D ett1/W ett2/15T ett2/H ett2/D ett2/W jena_weather/10T jena_weather/H jena_weather/D bitbrains_fast_storage/5T bitbrains_fast_storage/H bitbrains_rnd/5T bitbrains_rnd/H bizitobs_application bizitobs_service bizitobs_l2c/5T bizitobs_l2c/H"
short_datasets = "m4_weekly"

# med_long_datasets = "electricity/15T electricity/H solar/10T solar/H kdd_cup_2018_with_missing/H LOOP_SEATTLE/5T LOOP_SEATTLE/H SZ_TAXI/15T M_DENSE/H ett1/15T ett1/H ett2/15T ett2/H jena_weather/10T jena_weather/H bitbrains_fast_storage/5T bitbrains_rnd/5T bizitobs_application bizitobs_service bizitobs_l2c/5T bizitobs_l2c/H"
med_long_datasets = "bizitobs_l2c/H"


all_datasets = short_datasets.split() + med_long_datasets.split()

dataset_properties_map = json.load(open('dataset_properties.json'))

In [2]:
from gluonts.ev.metrics import (
    MSE,
    MAE,
    MASE,
    MAPE,
    SMAPE,
    MSIS,
    RMSE,
    NRMSE,
    ND,
    MeanWeightedSumQuantileLoss,
)

# Instantiate the metrics
metrics = [
    MSE(forecast_type="mean"),
    MSE(forecast_type=0.5),
    MAE(),
    MASE(),
    MAPE(),
    SMAPE(),
    MSIS(),
    RMSE(),
    NRMSE(),
    ND(),
    MeanWeightedSumQuantileLoss(quantile_levels=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]),
]

## Chronos Predictor

For foundation models, we need to implement a wrapper containing the model and use the wrapper to generate predicitons.

This is just meant to be a simple wrapper to get you started, feel free to use your own custom implementation to wrap any model.

In [3]:
import inspect
from dataclasses import dataclass, field
from typing import Iterator, List, Optional, Type
import logging 

import numpy as np
import pandas as pd
from gluonts.core.component import validated
from gluonts.dataset import Dataset
from gluonts.dataset.util import forecast_start
from gluonts.model import Forecast
from gluonts.model.forecast import QuantileForecast
from gluonts.model.predictor import RepresentablePredictor
from gluonts.transform.feature import LastValueImputation, MissingValueImputation
from statsforecast import StatsForecast
from statsforecast.models import (
    Naive,
    SeasonalNaive,
)

@dataclass
class ModelConfig:
    quantile_levels: Optional[List[float]] = None
    forecast_keys: List[str] = field(init=False)
    statsforecast_keys: List[str] = field(init=False)
    intervals: Optional[List[int]] = field(init=False)

    def __post_init__(self):
        self.forecast_keys = ["mean"]
        self.statsforecast_keys = ["mean"]
        if self.quantile_levels is None:
            self.intervals = None
            return

        intervals = set()

        for quantile_level in self.quantile_levels:
            interval = round(200 * (max(quantile_level, 1 - quantile_level) - 0.5))
            intervals.add(interval)
            side = "hi" if quantile_level > 0.5 else "lo"
            self.forecast_keys.append(str(quantile_level))
            self.statsforecast_keys.append(f"{side}-{interval}")

        self.intervals = sorted(intervals)


class ChronosPredictor:
    def __init__(
        self,
        model_path,
        num_samples: int,
        prediction_length: int,
        *args,
        **kwargs,
    ):
        # print('args:', args)
        # print('kwargs:', kwargs)
        print('prediction_length:', prediction_length)
        self.pipeline = ChronosPipeline.from_pretrained(
            model_path,
            *args,
            **kwargs,
        )
        self.prediction_length = prediction_length
        self.num_samples = num_samples

    def predict(self,
                test_data_input,
                batch_size: int = 256,
                limit_prediction_length: bool = True):

        pipeline = self.pipeline
        while True:
            try:
                # Generate forecast samples
                forecast_samples = []
                for batch in tqdm(batcher(test_data_input, batch_size=batch_size)):
                    context = [torch.tensor(entry["target"]) for entry in batch]
                    forecast_samples.append(
                        pipeline.predict(
                            context,
                            prediction_length=self.prediction_length,
                            num_samples=self.num_samples,
                            limit_prediction_length=False,  # We disable the limit on prediction length.
                        ).numpy()
                    )
                forecast_samples = np.concatenate(forecast_samples)
                break
            except torch.cuda.OutOfMemoryError:
                print(
                    f"OutOfMemoryError at batch_size {batch_size}, reducing to {batch_size // 2}"
                )
                batch_size //= 2

        # Convert forecast samples into gluonts SampleForecast objects
        sample_forecasts = []
        for item, ts in zip(forecast_samples, test_data_input):
            forecast_start_date = ts["start"] + len(ts["target"])
            sample_forecasts.append(
                SampleForecast(samples=item, start_date=forecast_start_date)
            )

        return sample_forecasts

## Evaluation

Now that we have our predictor class, we can use it to predict on the gift-eval benchmark datasets. We will use the `evaluate_model` function to evaluate the model. This function is a helper function to evaluate the model on the test data and return the results in a dictionary. We are going to follow the naming conventions explained in the [README](../README.md) file to store the results in a csv file called `all_results.csv` under the `results/chronos` folder.

The first column in the csv file is the dataset config name which is a combination of the dataset name, frequency and the term:

```python
f"{dataset_name}/{freq}/{term}"
```


In [4]:
all_datasets

['m4_weekly', 'bizitobs_l2c/H']

In [6]:
from gluonts.model import evaluate_model
import csv
import os
import time
from gluonts.time_feature import get_seasonality
from gift_eval.data import Dataset
from chronos import ChronosPipeline
import torch 
from tqdm.auto import tqdm
from gluonts.itertools import batcher
from gluonts.model.forecast import SampleForecast

# Iterate over all available datasets

model_name = 'chronos'
output_dir = f"../results/{model_name}"
# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)

# Define the path for the CSV file
csv_file_path = os.path.join(output_dir, 'all_results.csv')

with open(csv_file_path, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    
    # Write the header
    writer.writerow(['dataset', 'model', 'eval_metrics/MSE[mean]', 'eval_metrics/MSE[0.5]', 'eval_metrics/MAE[0.5]', 'eval_metrics/MASE[0.5]', 'eval_metrics/MAPE[0.5]', 'eval_metrics/sMAPE[0.5]', 'eval_metrics/MSIS', 'eval_metrics/RMSE[mean]', 'eval_metrics/NRMSE[mean]', 'eval_metrics/ND[0.5]', 'eval_metrics/mean_weighted_sum_quantile_loss', 'domain', 'num_variates'])
    
for ds_name in all_datasets:
    ds_key = ds_name.split("/")[0]
    print(f"Processing dataset: {ds_name}")
    terms = ["short", "medium", "long"]
    for term in terms:
        if (term == "medium" or term == "long") and ds_name not in med_long_datasets.split():
            continue

        # Initialize the dataset
        to_univariate = False if Dataset(name=ds_name, term=term,to_univariate=False).target_dim == 1 else True
        dataset = Dataset(name=ds_name, term=term, to_univariate=to_univariate)
        season_length = get_seasonality(dataset.freq)
        ds_config = f'{ds_name}/{term}' if '/' in ds_name else f'{ds_name}/{dataset_properties_map[ds_key]["frequency"]}/{term}'

        predictor = ChronosPredictor(
            model_path='amazon/chronos-t5-base',
            num_samples=20,
            prediction_length=dataset.prediction_length,
            device_map='cuda:0'
        )
        # Measure the time taken for evaluation
        tic = time.perf_counter()
        res = evaluate_model(
            predictor,
            test_data=dataset.test_data,
            metrics=metrics,
            batch_size=512,
            axis=None,
            mask_invalid_label=True,
            allow_nan_forecast=False,
            seasonality=season_length,
        )
        toc = time.perf_counter()
        runtime = str(toc - tic)

        # Append the results to the CSV file
        with open(csv_file_path, 'a', newline='') as csvfile:
            writer = csv.writer(csvfile)
            writer.writerow([
                ds_config, model_name,
                res['MSE[mean]'][0], res['MSE[0.5]'][0], res['MAE[0.5]'][0],
                res['MASE[0.5]'][0], res['MAPE[0.5]'][0], res['sMAPE[0.5]'][0],
                res['MSIS'][0], res['RMSE[mean]'][0], res['NRMSE[mean]'][0],
                res['ND[0.5]'][0], res['mean_weighted_sum_quantile_loss'][0],dataset_properties_map[ds_key]["domain"],dataset_properties_map[ds_key]["num_variates"]
            ])

        print(f"Results for {ds_name} have been written to {csv_file_path}")

Processing dataset: m4_weekly
prediction_length: 13


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 256, reducing to 128


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 128, reducing to 64


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 64, reducing to 32


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 32, reducing to 16


0it [00:00, ?it/s]

359it [00:00, 612.03it/s]


Results for m4_weekly have been written to ../results/chronos/all_results.csv
Processing dataset: bizitobs_l2c/H
prediction_length: 48


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 256, reducing to 128


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 128, reducing to 64


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 64, reducing to 32


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 32, reducing to 16


0it [00:00, ?it/s]

OutOfMemoryError at batch_size 16, reducing to 8


0it [00:00, ?it/s]

42it [00:00, 547.41it/s]


Results for bizitobs_l2c/H have been written to ../results/chronos/all_results.csv
prediction_length: 480


0it [00:00, ?it/s]

7it [00:00, 314.21it/s]


Results for bizitobs_l2c/H have been written to ../results/chronos/all_results.csv
prediction_length: 720


0it [00:00, ?it/s]

7it [00:00, 322.75it/s]

Results for bizitobs_l2c/H have been written to ../results/chronos/all_results.csv





## Results

Running the above cell will generate a csv file called `all_results.csv` under the `results/chronos` folder containing the results for the Chronos model on the gift-eval benchmark. We can display the csv file using the follow code:

In [7]:
import pandas as pd 
df = pd.read_csv('../results/chronos/all_results.csv')
df

Unnamed: 0,dataset,model,eval_metrics/MSE[mean],eval_metrics/MSE[0.5],eval_metrics/MAE[0.5],eval_metrics/MASE[0.5],eval_metrics/MAPE[0.5],eval_metrics/sMAPE[0.5],eval_metrics/MSIS,eval_metrics/RMSE[mean],eval_metrics/NRMSE[mean],eval_metrics/ND[0.5],eval_metrics/mean_weighted_sum_quantile_loss,domain,num_variates
0,m4_weekly/W/short,chronos,237841.388472,241514.360403,252.490036,2.06793,0.059826,0.059759,16.460706,487.689849,0.08885,0.046,0.036832,Econ/Fin,1
1,bizitobs_l2c/H/short,chronos,241.933036,264.502294,10.141999,0.977184,0.861773,0.938876,12.577963,15.554197,0.838411,0.54668,0.464979,Web/CloudOps,7
2,bizitobs_l2c/H/medium,chronos,370.699219,386.839881,14.218934,1.325081,1.294125,1.306003,30.169721,19.253551,1.165832,0.860978,0.791883,Web/CloudOps,7
3,bizitobs_l2c/H/long,chronos,325.809474,333.884896,13.517262,1.290158,1.502312,1.120351,31.145662,18.050193,1.102558,0.825673,0.763921,Web/CloudOps,7
