<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# Getting Started with the Arize Platform- Demand Forecasting for a Retail Company


**In this example, you are part of a team for a retail company that maintains and monitors a demand forecasting regression model that predicts the one week unit quantity demanded for items in your stores.** The business objective of your ML model is so that your store fronts can supply them exactly the number of items demanded on time, as predicted by your model. 

**You have been alerted their are calls about stores overshelfing and unhappy customers in the last month due to mispredictions by your demand forecasting model, so you turn to Arize to gain insight as to why**.


In this walkthrough, we are going to investigate your production demand forecasting model model. We will first set-up monitors and dashboard to provide better insights to into when these events happened and what happened on the days we had unhappy customers. Then, we will go into a deep dive to investigate the root causes of those mispredictions, and what kind of insights ML Engineer can gain from using Arize features. 

**The steps to this tutorial will be:**

1. Tracking your model to the Arize platform
2. Set-up Performance Monitor and Dashboard to better understand our model performance
3. Understand when underprediction and overprediction events happen, and what they represent
4. Discover feature drifts corresponding to time periods of performance degredation, and takeaways for ML Engineers to fix

The goal of this is to see how the Arize platform can help your team quickly dive into issues critical to your operations through:
- (1) Model observability & business insights 
- (2) Model performance troubleshooting.

# Part 0: Setup and Getting the Data
The first step is to load our pre-existing dataset which includes training and production environments for our demand forecast example. Using a pre-existing dataset illustrates how simple it is to get started with the Arize platform.

## Install Dependencies and Import Libraries 📚

In [None]:
!pip install -q arize

import pandas as pd
from arize.pandas.logger import Client, Schema
from arize.utils.types import Environments, ModelTypes

## **🌐 Download the Data**
In this walkthrough, we’ll be sending real historical data (with privacy conscious changes to feature names and values). Note, that while feature names and values are made explicit in this dataset, you can achieve the same level of ML Observability using obfuscated features. 

| Feature | Type | Description |||
|:-|:-|:-|---|---|
| `item_size`| `int`| shelf item physical size |||
| `supplier_id`| `int`| unique identifier of item supplier |||
| `avg_historical_sales`| `float`| average sales of item in the last 6 month |||
| `cur_projected_sales`| `float`| sales projected based on seasonality from another times series model |||
| `item_new_release_flag`| `int (0 or 1)`| flag indicating if item was released in the last 2 month |||
| `item_stickyness_factor`| `float`| a number that represents whether an item will likely be purchased by the same customer again |||
| `item_release_year`| `int`| the year which item has been released |||
| `shelf_life_weeks`| `int`| how long the item is intended to be on sale for |||


## Inspect the Data 

The data represents a regression model trained to forecast demand for an item one week in advance. The dataset contains one month of data and the performance will be evaluated by comparing:

*   **`prediction`**: Predicted number of items to demanded by customers this week (also the number of items we will shelf in operations)
*   **`actual`**: Actual number of items recorded as demanded within a week

In [None]:
# import dataset into two dataframes for logging
val_data = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/demand_forecast_val.csv"
)
prod_data = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/demand_forecast_prod.csv"
)
feature_column_names = val_data[
    [
        "item_size",
        "supplier_id",
        "avg_historical_sales",
        "cur_projected_sales",
        "item_new_release_flag",
        "item_stickyness_factor",
        "item_release_year",
        "shelf_life_weeks",
    ]
].columns

print("✅ Dependencies installed and data successfully downloaded!")

In [None]:
# select features, prediction, and actual columns only
prod_data[list(feature_column_names) + ["predictions", "actuals"]]

# Step 1. Sending Data into Arize 💫
First, copy the Arize `API_KEY` and `SPACE_KEY` from your admin page shown below!

<img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_id = (
    "demand-forecast-demo-model"  # This is the model name that will show up in Arize
)
model_version = "v1.0"  # Version of model - can be any string

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

### Using the Python SDK
For our dataset, we have pre-formatted the feature names and dataframes for logging to Arize using our Python SDK through `arize.pandas.logger`. The `Schema` of your model specifies a mapping from column names for your logging DataFrame. 

Here's a summary below:

| Schema Argument Name | Description |||
|:- |:-|---|---|
| `feature_column_names`| names of the columns representing features |||
| `prediction_id_column_name`| list of unique ids you can use to use to match each record |||
| `prediction_label_column_name`| predictions column name |||
| `actual_label_column_name`| actuals column name |||
| `timestamp_column_name`| timestamps for when predictions were made |||

For more details on how to send data in production to Arize, check out some of our other logging tutorials and SDK documentations in Gitbook.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

## Log Validation & Production Data to Arize 

In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
validation_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

# Logging to Arize platform using arize_client.log
val_response = arize_client.log(
    dataframe=val_data,
    model_id=model_id,
    model_version=model_version,
    batch_id="baseline",
    model_type=ModelTypes.NUMERIC,
    environment=Environments.VALIDATION,
    schema=validation_schema,
)

production_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

prod_response = arize_client.log(
    dataframe=prod_data,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.NUMERIC,
    environment=Environments.PRODUCTION,
    schema=production_schema,
)

# Checking responses to make sure our data was successfully ingested
if val_response.status_code != 200 or prod_response.status_code != 200:
    print(f"logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged data to Arize")

# Step 2. Confirm Data in Arize ✅
Note that the Arize performs takes about 10 minutes to index the data. While the model should appear immediately, the data will not show up untill the indexing is done. Feel free to go grab a cup of coffee as Arize works its magic! 🔮

**The next sections (Part 2 and Part 3) are screen captures for tutorials to setting-up the model we just sent in.**

Feel free to follow and mirror our instructions to set-up the dashboards yourself, or simply read the guide below to see how Arize can quickly generate value for demand forecasting models.

**⚠️ DON'T SKIP:**
In order to move on to the next step, make sure your actuals and training/production sets are loaded into the platform. To check:
1. Navigate to models from the left bar, locate and click on model **demand-forecast-tutorial**
2. On the **Overview Tab**, make sure you can see Predictions and Actuals under the **Model Health** section. Once production actuals have been fully recorded on Arize, the row title will change from **0 Actuals** to **Actuals** with summary statistics such as cardinality listed in the tables.
3. Verify the list of **Features** below **Actuals**.

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-waiting-actuals.png" width="800">

# Step 3. Improving Model Observability & Business Insight
Now that our data has been logged to the Arize platform, let's investigate the low performances events we have been hearing about!

## **Baseline Configurations**
We will first need to set-up a baseline distribution by clicking on the **Config** button. This will serve as the reference distribution and benchmark for our production data. We will use a validation set we sent in, but you can choose a production window or training set as our reference distribution. 


Here are the steps to follow:
1.   Click on **Config** button on top right
2.   Click on **Configure Baseline** button to select a reference distribution
3. Select **Validation** for **Version v1.0**



⚠️ We recommend doing this first for all of models you track to Arize.

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-baseline-configgif.gif" width="1200">

## **Understanding Error Biases**
Each prediction of our model in production translates to an operational decision by our retail company. In this case, **demand forecasting** is only important in so far as it can allow better **supply management**.

In our example retail demand forecasting company, **under-forecasting** is much more problematic since customers aren’t delivered on what they wanted, we could lose out on customer lifetime value. Thus, we want to monitor our **Mean Error** in addition to **Mean Absolute Error**.

<img src="https://storage.googleapis.com/arize-assets/fixtures/forecasting-bias-problem.png" width="600">

## **Monitoring Biases with Performance Monitors**

Even if our model is only trained on loss functions of **Mean Squared Error** or **Mean Absolute Error**, we sometimes still care about the **Mean Error** because it often tells us about the biases in our predictions, and these biases has a tangibly different impact on our business.

Let’s set-up an Arize **Performance Monitor** to visualize and monitor our performance following these steps...


1.   Navigate to the **Monitors** tab
2.   Click on **Create Model Performance Monitor**
3.   Select **`Mean Error`** as your **Evaluation Metric**
4.   Select to trigger alert when **metrics is below -7.5** 



**You can click on the gifs to replay it**
[<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-perf-monitor.gif" width="1200">](https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-perf-monitor.gif)

### **Summary**
Performance Dashboards can help you monitor production using the metric important to your business function, such as Mean Error in this case. You can access them under performance tab and set-up alerts for when metric dips below or above a certain number.

## **Arize Dashboard Configurations**
Now that we understand the importance of **Mean Error** as a measure of prediction bias, we also want to monitor and visualize it along with **Mean Absolute Error** side by side. Many Data Scientists reading this section will immediately understand that Mean Error alone isn't the informative, because there are often **coinciding event** of **both** over and under prediction, resulting in a zero-sum Mean Error.

We can avoid this with a side-by-side time series chart in our custom times series widget on our Dashboard.

### **1. Performance Dashboard**
Performance Dashboard is a customizable feature where you can monitor time series data, feature/prediction distribution, and model metrics all in one place. You can even monitor only a slice or subset of your production data based on your model performance metric.

Following these steps
1.  Click on **Create Dashboard** and select **Regression Performance Dashboard**

This will create a dashboard with many useful default widget already created for your regression model.

2. Under the card **Model Evaluation Metric By Day**, we delete **MAPE** curve and change **RMSE** to **Mean Error**.
3. Save the widget, and publish changes to our dashboard.

### **2. Setting-up initial dashboard**
[<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-dashboard.gif" width="1200">](https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-dashboard.gif)

### **3. Creating Data Metric Times Series Widget**
Let's also create a data metric time series widget to visualize the average values of our predictions vs actuals. In this way, we can **visualize errors along with actuals** to validate the magnitude of our prediction error.


<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-data-metrics-create.png" width="1200">

We want create a new card right under our `Model Evaluation Metrics by Day` card by doing the following:

1.   Select **Times Series** Widges
2.   Select **Data Metrics**
3.   Choose `Prediction/Actual` and `Average` for curves

### **4. Interpreting Our Dashboard**
Here's the final product of what our dashboard would look like, and when those error biases happen.

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-visualize-bias.png" width="1200">


Now we can clearly visualize prediction biases and overall model accuracy with two charts.

1.   The top chart compares the errors using MAE and ME, showing us the magnitude and direction of our error biases
2.   The second shows us the averages of our predictions and actuals, giving us additional information to identify validate the over or under estimation event.


## **Observability & Business Insight Summary**
[The Arize platform](https://app.arize.com/) provides the tools for engineers, product teams, and even data scientists to quickly gain business insight for better strategical decision making.

In this section we...
1.   Set-up a **Baseline** from our validation set to continiously compare it our production data.
2.   Created a **Mean Error Performance Monitor** so that we will be alerted whenever we detects a negative bias (i.e turned away customers)
3. Customized a times series widget on **Dashboard** to visualize **Mean Error** side by side with **Mean Absolute Error** to understand both the magnitude and direction of our prediction biases.



# Step 4. Empowering ML Engineers to Troubleshoot
Now that we have identified when our underprediction and overprediction event happened, lets go into a deep dive to understand why they happened.

Arize can also be used to triage your ML model performance. The model performance troubleshooting tools are designed by ML engineers for engineers to help you understand and solve your model performance issues.

## **Investigating production windows with low performance**
In our **Drift Tab** overview, we clearly see two time periods where the distribution has changed. You can click on the dates to see the feature distribution and drift of that particular day. You might have noticed that the the first drift corresponds to when we observed an over-estimating event in Part 2, and the second corresponds to an under-estimating event.

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-whats-drift-tab.png" width="800">

Let's sort by **Drift(PSI)** and investigate several of these features and click on one of the days of the second prediction drift, we can see that a number of features have drifted in these days. 

[<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-drift.gif" width="1200">](https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-drift.gif)

## **Deep Dive into Root Cause**
Two features `item_new_release_flag` and `item_size` seem to have high drift
1.   Click on one of the days during the second drift period (underpredicting period)
2.   Click into either feature through the red alert button
3.   Observe that their feature drift timeline coincided with prediction drift

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-new-release-drift.png" width="800">

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-item-size-drift.png" width="800">


## **Turning Insight into Action**

We used the **Drift Tab** to investigate dates of feature drifts and prediction drifts. Not all feature drifts are inherently maligant and impact our model performances -- only some do. With the insights provided on Arize, you can deep dive into root causes and quickly gain intuitions, allowing for ML teams to quickly iterate, experiment, and ship new models in production.

In this case, by visualizing the feature drift and understanding the features responsible, our ML Engineers now have additional information to work with when improving our models when troubleshooting model performance issues with the drift tab.

Some possible conclusions and action items our engineers could make might be...

1.  Examining possible concept drifts relating to the two features in question
2.  Retraining our model to fit new distributions specific to this drift

# 📚 Conclusion 
In this tutorial, we first logged 30 days of production data to Arize using our **Python SDK** and developed an understanding that Mean Error is an important metric to monitor under-prediction and therefore customer satisfaction. 

We then set-up a **Performance Monitor** to monitor Mean Error, and set-up a **Performance Dashboard** to ensure we can gain greater performance observability and insight into both MAE and ME side-by-side for our business objective.

Lastly, we used **Drift Tab** to investigate potential reasons for our model under-prediction event. One feature drift in `item_new_release_flag` and `item_size` coincided with only the under-prediction event, so we came up with several conclusions that our ML Engineers could have drawn from this observation to improve and troubleshoot our model in the future.

# About Arize
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.