# Data Exploration: Online Conformal

In this notebook, we are testing the code for Salesforce's implementations of online conformal models. The Python version used is Python 3.8.20 (base environment).

## Preliminaries

In [None]:
# imports
import duckdb
import matplotlib.pyplot as plt
from merlion.models.factory import ModelFactory
from online_conformal.dataset import M4
from online_conformal.visualize import plot_simulated_forecast

## Reading in data

In [None]:
regn_data = duckdb.sql(
    """select * 
       from 'test_data/regn_20230103_to_20230331.csv.gz' 
       where DATE = '2023-01-03'
       order by TIME_M """).df()

In [None]:
regn_data

A simple visual of REGN's stock price:

In [None]:
regn_data["dollar_volume"] = regn_data["PRICE"] * regn_data["SIZE"]

In [None]:
regn_data.PRICE.plot()

In [None]:
regn_data.SIZE.plot()

## Test Online Conformal Model

Below is an example from the ``online_conformal`` README which we will use for reference.

In [None]:
import pandas as pd
from merlion.models.factory import ModelFactory
from merlion.utils import TimeSeries
from online_conformal.dataset import M4
from online_conformal.saocp import SAOCP

# Get some time series data as pandas.DataFrames
data = M4("Hourly")[0]
train_data, test_data = data["train_data"], data["test_data"]
# Initialize a Merlion model for time series forecasting
model = ModelFactory.create(name="LGBMForecaster")
# Initialize the SAOCP wrapper on top of the model. This splits the data 
# into train/calibration splits, trains the model on the train split, 
# and initializes SAOCP's internal state on the calibration split.
# The target coverage is 90% here, but you can adjust this freely.
# We also do 24-step-ahead forecasting by setting horizon=24.
horizon = 24
saocp = SAOCP(model=model, train_data=train_data, coverage=0.9,
              calib_frac=0.2, horizon=horizon)

# Get the model's 24-step-ahead prediction, and convert it to prediction intervals
yhat, _ = saocp.model.forecast(horizon, time_series_prev=TimeSeries.from_pd(train_data))
delta_lb, delta_ub = zip(*[saocp.predict(horizon=h + 1) for h in range(horizon)])
yhat = yhat.to_pd().iloc[:, 0]
lb, ub = yhat + delta_lb, yhat + delta_ub

# Update SAOCP's internal state based on the next 24 observations
prev = train_data.iloc[:-horizon + 1]
time_series = pd.concat((train_data.iloc[-horizon + 1:], test_data.iloc[:horizon]))
for i in range(len(time_series)):
    # Predict yhat_{t-H+i+1}, ..., yhat_{t-H+i+H} = f(y_1, ..., y_{t-H+i}) 
    y = time_series.iloc[i:i + horizon, 0]
    yhat, _ = saocp.model.forecast(y.index, time_series_prev=TimeSeries.from_pd(prev))
    yhat = yhat.to_pd().iloc[:, 0]
    # Use h-step prediction of yhat_{t-k+h} to update SAOCP's h-step prediction interval
    for h in range(len(y)):
        if i >= h:
            saocp.update(ground_truth=y[h:h + 1], forecast=yhat[h:h + 1], horizon=h + 1)
    prev = pd.concat((prev, time_series.iloc[i:i+1]))

Now, we'll try to test this on a day's worth of data.

In [None]:
import logging
logging.basicConfig(level=logging.ERROR)

import matplotlib.pyplot as plt
from merlion.models.factory import ModelFactory
from online_conformal.dataset import M4
from online_conformal.visualize import plot_simulated_forecast
from time_series import evaluate, summarize_results, visualize

In [None]:
# get training size
train_size = round(len(regn_data.PRICE) * 0.7)

In [None]:
# get datetime column
regn_data["timestamp"] = pd.to_datetime(regn_data['DATE']) + pd.to_timedelta(regn_data['TIME_M'].astype(str))

# train-test split
train_vals, test_vals = regn_data[["timestamp", "PRICE"]][:train_size].set_index("timestamp"), regn_data[["timestamp", "PRICE"]][train_size:].set_index("timestamp")

In [None]:
# Initialize a Merlion model for time series forecasting
model = ModelFactory.create(name="Arima")

In [None]:
# Initialize the SAOCP wrapper on top of the model. This splits the data 
# into train/calibration splits, trains the model on the train split, 
# and initializes SAOCP's internal state on the calibration split.
# The target coverage is 90% here, but you can adjust this freely.
# We also do 24-step-ahead forecasting by setting horizon=24.
horizon = 24
saocp = SAOCP(model=model, train_data=train_vals, coverage=0.9,
              calib_frac=0.2, horizon=horizon)

Base code needs some tweaking to use only pretrained models, as the current implementation relies on the Merlion code base to train the model. Should be able to repurpose the Salesforce code with citations.