<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# Timeseries Observability in Arize with Quantile Forecasts
Timeseries Forecasting models often produce quantile and/or upper and lower range predictions as well as midpoint predictions. This guide provides recommendations for the ideal format for uploading quantile and range forecast data to Arize, and covers how to calculate specific time series evaluation metrics such as Pinball Loss in Arize.
 
In Timeseries Forecasting, it is is common to forecast upper and lower bounds as well as forecasting the midpoint prediction value.
 
This example model is predicting 30 day revenue forecasts with upper and lower quantile ranges. The model produces a midpoint prediction, as well as upper and lower quantile range predictions. The model produces:
* 75% quantile forecast values.
* midpoint forecast values.
* 25% quantile forecast values.
 
This tutorial will step through formating the timeseries data and sending it to Arize. And then configuring MAE, MAPE, and Pinball Loss metrics using both the midpoint forecasts and the quartile forecasts in Arize.





<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/timeseries_plot.png" width="1000"/>

In [None]:
!pip install -q arize

In [None]:
import pandas as pd
import numpy as np

from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

In [None]:
# Construct the df
# For the purpose of this tutorial, We are randomly generating time series data to send to Arize.
# The midpoint and quantile forecasts are random integers.
# The timestamp is the date the forcasts are made for.
# Prediction ids must be unique. we just convert the df index and reassing it as the id

df = pd.DataFrame()
df["timestamp"] = pd.date_range(start="2021-11-06", end="2022-11-06", freq="M")

df["prediction"] = np.random.randint(45, 55, size=12)
df["actuals"] = np.random.randint(45, 55, size=12)
df["upper_quantile"] = np.random.randint(75, 85, size=12)
df["lower_quantile"] = np.random.randint(15, 25, size=12)

df.reset_index(inplace=True)
df.rename(columns={"index": "prediction_id"}, inplace=True)
print(df.dtypes)
df.head()

We now construct a data schema for mapping data to Arize. Learn more about Arize data schema here https://docs.arize.com/arize/data-ingestion/model-schema 

* The prediction is sent as the midpoint forecast
* Both quantile forecasts are sent as "tags. 

Sending the quantile forecasts as tag data will allow us to plot the forecasts, and calculate the quantile metrics required. Learn more about Arize tag data here https://docs.arize.com/arize/data-ingestion/model-schema/9.-tags 

In [None]:
import matplotlib.pyplot as plt

# Using a inbuilt style to change
# the look and feel of the plot
plt.style.use("fivethirtyeight")

# setting figure size to 12, 10
plt.figure(figsize=(25, 10))

# Labelling the axes and setting
# a title
plt.xlabel("Date")
plt.ylabel("Values")
plt.title("Simulated Timeseries Data")

# plotting the "A" column and "A" column
# of Rolling Dataframe (window_size  = 20)
plt.plot(df["prediction"], label="predictions")
plt.plot(df["actuals"], label="actuals")
plt.plot(df["upper_quantile"], label="75% quantile forecasts")
plt.plot(df["lower_quantile"], label="25% quantile forecasts")
plt.legend()

In [None]:
# construct schema
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="timestamp",
    prediction_label_column_name="prediction",
    actual_label_column_name="actuals",
    tag_column_names=["upper_quantile", "lower_quantile"],
)

Log the data to Arize in batch through the python client. Notice that we choose the NUMERIC model type. Learn more about how to upload regression and forecasting models here https://docs.arize.com/arize/model-schema-mapping#regression

In [None]:
# initialize client
arize_client = Client(space_key="SPACEKEY", api_key="APIKEY")

# send data to arize
response = arize_client.log(
    dataframe=df,
    model_id="timeseries-model-test",
    model_version="1.0",
    model_type=ModelTypes.NUMERIC,
    environment=Environments.PRODUCTION,
    schema=schema,
)
if response.status_code != 200:
    print(
        f"logging failed with response code {response.status_code}, {response.text}"
    )
else:
    print(f"✅ You have successfully sent data to Arize.")

Now that the data is successfully logged to Arize, go to the Arize UI and check the datasets tab to see the data details. 

## Vizualize the midpoint and quantile forecasts in Arize

In the Arize app,
* Navigate to Dashboards
* Create a New Dashboard
* Create a Timeseries widget
  * Choose "Data Metrics" as the Chart Metrics Category.
  * Add 4 plot to the widget. The plots will be displayed the data.
  * Plot Prediction Average, Acutals Average, and the Average of both Quantiles forecasts.

<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/dashboard.png" width="1000"/>

## Configure MAE and MAPE in Arize 
In the Arize app,
* Navigate to the "Config" tab
 * Set the default metric. You can choose from MAE, MAPE, and others
 * Set the default evaluation windows. We recommend an evaluation window that corresponds to the frequency of the data uploads. In this case, we receive fresh forecasts every 30 days, so 30 days is a sensible ev evaluation window.
* Navigate to the "Performance Tracing" tab
* MAE and MAPE are default metrics in Arize. Select these metrics from the dropdown to see the performance of the midpoint forecast values. These are the values sent to the platform as "predictions".
 
A comprehensive list of Arize default metrics can be found here https://docs.arize.com/arize/glossary/model-metric-definitions
 
*Note that default metrics are calculated with the prediction data. These metrics are not configurable and cannot be calculated for Tags data. For bespoke time series metric calculations, we will leverage the Arize User Defined Metrics Feature


<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/MAE.png" width="1000"/>

<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/MAPE.png" width="1000"/>

## Configure Pinball Loss in Arize
Pinball Loss is used to assess the accuracy of an upper or lower quantile forecast. These forecasts are purposely biased high or low, so traditional metrics such as MAE will not be relevant. The Pinball Loss metric accounts for and adjusts for the bias.


Learn more about Pinball Loss https://www.lokad.com/pinball-loss-function-definition#:~:text=The%20pinball%20loss%20function%2C%20also,forecast%20is%20a%20subtle%20problem. 

<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/equation.png" width="500"/>

Because Pinball Loss is directly calculated from the quantile forecasts, we will use the Arize User Defined Metrics Feature to calculate the metrics. Once we write Pinball Loss as a UDM, we can use the metric elsewhere within Arize. 

In the Arize app,
*Navigate to the "Custom Metrics" tab in Arize.
* Select "Create Custom Metric" in the upper right corner.
* Define your custom metrics using the UDMs interface.



<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/custom_metric.png" width="1000"/>

## Set up monitors in Arize
We will configure performance monitors for MAE, MAPE, and Pinball Loss.
 
 
In the Arize app,
* Navigate to the "Monitors" tab
* click "New Monitor" in the upper right dropdown
* select "Create Performance Monitor.
* Here you can configure monitors for model performance metrics. Note that the default configs for the evaluation window will apply here, but can be edited.

<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/monitors-list.png" width="1000"/>

<img src="https://storage.cloud.google.com/arize-assets/claire/timeseries/monitor.png" width="1000"/>