# Example: Forecasting with FM Eval Bench

This notebook demonstrates how to use FM Eval Bench.

There are two ways to use FM Eval Bench.

1. To forecast on datasets that have already been added to FM Eval Bench, you can directly interact with `synthefy-package/src/synthefy_pkg/fm_evals/eval.py.` use the --help flag to see supported datasets, models, and how to configure a run from the command line. To add your own datasets and forecasting models there, see [FM Evals Documentation](https://www.notion.so/FM-Evals-Documentation-229a80f2cd7c80afa9d5dcafffde916e?source=copy_link) for instructions

2. You can use FM Eval Bench's forecasting model API to run inference with any supported forecasting model on your own data, external to the Eval Bench tooling.

This notebook demonstrates the latter.

## 1. Load Example Data

Let's use a synthetic energy forecasting dataset as an example. The task is to predict `energy_production` for each `city` using `temperature` and `humidity` as correlates

In [40]:
import pandas as pd

In [41]:
synthetic_energy_df = pd.read_csv("s3://synthefy-fm-eval-datasets/gpt-synthetic/energy.csv")
synthetic_energy_df.head()

Unnamed: 0,timestamp,city,temperature,humidity,energy_production
0,2023-01-01,CityA,22.607403,51.696434,139.811255
1,2023-01-02,CityA,29.03024,48.037169,148.937308
2,2023-01-03,CityA,27.505048,49.884857,155.140365
3,2023-01-04,CityA,29.216618,50.576552,153.763531
4,2023-01-05,CityA,26.27076,52.143343,148.872769


## 1.1 Create an Eval Batch

All forecasting models in FM Eval Bench are standardized to expect an `EvalBatchFormat` (`src/synthefy_pkg/fm_evals/formats/eval_batch_format.py`) as input. We provide an easy interface for going from csv, parquet, arrays, etc to `EvalBatchFormat`s

### 1.1.1 Option 1: Use `EvalBatchFormat.from_dfs()` to convert dataframe(s) into EvalBatchFormat

The `from_dfs()` method easily converts dataframes into history / forecast, manages target, metadata, and leak columns, and supports creating backtesting windows.

In [43]:
from synthefy_pkg.fm_evals.formats.eval_batch_format import EvalBatchFormat

In [44]:
# First, we group synthetic_energy_df by city since we want to forecast for each city separately
groups = [group for _, group in synthetic_energy_df.groupby("city")]

eval_batch = EvalBatchFormat.from_dfs(
    dfs=groups,

    # A timestamp column must be provided
    timestamp_col="timestamp",

    # Everything after 2024-01-01 is part of the forecast, everything before is part of the history
    cutoff_date="2024-01-01",

    # We aim to forecast energy_production
    target_cols=["energy_production"],

    # Metadata columns are columns whose history the model is allowed to see
    metadata_cols=["temperature", "humidity"],

    # Leak columns are columns whose future the model is allowed to see
    # All leak columns must also be metadata columns
    leak_cols=["temperature", "humidity"],

    # We can specify a forecast horizon and a stride to create backtesting windows
    # in this case, we forecast 7 days at a time, and then stride by 7 days.
    forecast_window="30d",
    stride="30d"
)

`from_dfs()` also supports slicing data in numbers of rows instead of with a fixed timestamp. The docstring for from_dfs() explains which parameters are required in that case.

### 1.1.2 Option 2: Use `EvalBatchFormat.from_arrays()` to convert multidimensional arrays into an EvalBatchFormat

Here, we demonstrate creating an EvalBatchFormat from some (fake) arrays representing samples in a batch and correlates in a sample.

In [45]:
import numpy as np

In [46]:
batch_size = 8
num_correlates = 20
history_length = 100
forecast_length = 10

history_timestamps = np.array([
    [
        pd.date_range(start='2023-01-01', periods=history_length, freq='D').values
        for _ in range(num_correlates)
    ]
    for _ in range(batch_size)
])
target_timestamps = np.array([
    [
        pd.date_range(start='2023-01-01', periods=forecast_length, freq='D').values
        for _ in range(num_correlates)
    ]
    for _ in range(batch_size)
])

history_values = np.random.randn(batch_size, num_correlates, history_length)
target_values = np.random.randn(batch_size, num_correlates, forecast_length)

synthetic_eval_batch = EvalBatchFormat.from_arrays(
    # Sample_ids are used to identify each correlate. The shape must be (batch_size, num_correlates)
    sample_ids=np.arange(batch_size * num_correlates).reshape(batch_size, num_correlates),

    # History and target timestamps and values must be 3D arrays with shape (batch_size, num_correlates, length)
    history_timestamps=history_timestamps,
    history_values=history_values,
    target_timestamps=target_timestamps,
    target_values=target_values,

    # We can specify whether the model is allowed to see the target values in the history
    # and whether the model is allowed to see the target values in the forecast
    # These parameters can be vector masks, where True indicates that the model is allowed to see the target values
    forecast=True,
    metadata=True,
    leak_target=False,
)

## 2. Forecast with models from FM Eval Bench

All forecasting models in Fm Eval Bench expose a very simple API. They have a fit() method and a predict() method. They automatically compute metrics on outputs.

For this example, we compare AutoArima and TabPFN forecasters

In [47]:
from synthefy_pkg.fm_evals.forecasting.autoarima_forecaster import (
    AutoARIMAForecaster,
)
from synthefy_pkg.fm_evals.forecasting.tabpfn_forecaster import (
    TabPFNMultivariateForecaster,
)

In [48]:
# Some forecasters accept arguments, some don't.
auto_arima_forecaster = AutoARIMAForecaster()
tabpfn_forecaster = TabPFNMultivariateForecaster(future_leak=True, individual_correlate_timestamps=False)

# Fit the forecasters. The fit() method returns a boolean indicating whether the fit was successful.
auto_arima_status = auto_arima_forecaster.fit(eval_batch)
tabpfn_status = tabpfn_forecaster.fit(eval_batch)

if not (auto_arima_status and tabpfn_status):
    raise ValueError("Failed to fit forecasters")

# Run inference.
auto_arima_forecast = auto_arima_forecaster.predict(eval_batch)
tabpfn_forecast = tabpfn_forecaster.predict(eval_batch)


Fitting AutoARIMAForecaster: 100%|██████████| 26/26 [00:02<00:00, 10.35it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 779.85it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 799.22it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 835.08it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 866.00it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 771.39it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 777.30it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 740.96it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 768.14it/s]
Fitting TabPFNMultivariateForecaster (future leak): 100%|██████████| 3/3 [00:00<00:00, 760.11it/s]
Fitting TabPFNMultivariateForeca

## 3. Inspect metrics and plot

The `predict()` method returns a `ForecastOutputFormat` (`src/synthefy_pkg/fm_evals/formats/forecast_output_format.py`).

`ForecastOutputFormat`s returned by predict() methods already have metrics computed and saved in `ForecastMetrics`.

In [49]:
print(f"AutoARIMA metrics: \n{auto_arima_forecast.metrics}\n")
print(f"TabPFN metrics: \n{tabpfn_forecast.metrics}")

AutoARIMA metrics: 
ForecastMetrics(sample_id='batch_aggregated', mae=7.444229620375002, median_mae=6.971561590683069, nmae=0.3684342826717384, median_nmae=0.3409849018393436, mape=0.05345324866855087, median_mape=0.05130042136023271, mse=85.80376670244794, median_mse=74.84331693144117)

TabPFN metrics: 
ForecastMetrics(sample_id='batch_aggregated', mae=4.112625359760843, median_mae=4.23722425543008, nmae=0.20343889826101652, median_nmae=0.21144711114527687, mape=0.02965607605727161, median_mape=0.030085233977144906, mse=26.28981897305213, median_mse=26.284062108931824)


We provide some convenience functions for plotting qualitative samples from each batch

In [50]:
from synthefy_pkg.fm_evals.visualizations.line_plot import plot_batch_forecasts

In [51]:
plot_batch_forecasts(eval_batch, [auto_arima_forecast, tabpfn_forecast], pdf_path="forecasts.pdf")

We can save csvs of outputs using the `DatasetResultFormat`

In [52]:
from synthefy_pkg.fm_evals.formats.dataset_result_format import (
    DatasetResultFormat,
)

# add_batch can be called many times to keep appending to the same results object.
auto_arima_results = DatasetResultFormat()
auto_arima_results.add_batch(eval_batch, auto_arima_forecast)

tabpfn_results = DatasetResultFormat()
tabpfn_results.add_batch(eval_batch, tabpfn_forecast)

# We can save the results to a csv, h5, and pkl
auto_arima_results.save_csv("auto_arima_results.csv")
tabpfn_results.save_csv("tabpfn_results.csv")