# Welcome to Arize Demand Forecasting Tutorial
Let’s get started with using Arize for Demand Forecasting. 

In this example, you are running a demand forecasting regression model that predicts the number of items demanded one week in advance so that your store fronts could supply them on time. You can think of each data point as one shelf item that we wanted to forecast for the next week.

You track your model to Arize and have been sending your production data  in for 30 days continuously. Both the predictions and the actual demand have been logged to Arize.

We will have 3 parts to this tutorial.  Part 1 shows one way of sending data to Arize. The other parts show examples of how Arize can help you improve (2) Model observability & insights (3) Model performance troubleshooting.

# Part 1: Setting up Arize & Data Ingestion
The first step is to set up our Arize client. After that we will log the data.

First, copy the Arize `API_KEY` and `ORG_KEY` from your admin page linked below! Copy those over to the set-up section. We will also be setting up some metadata to use across all logging.

[![Button_Open.png](https://storage.googleapis.com/arize-assets/fixtures/Button_Open.png)](https://app.arize.com/admin)

For our validation and production dataset, we have pre-formatted the feature names and dataframes for logging to Arize using our Python SDK through `arize.pandas.logger`. For more details on how to send data in production to Arize, check out some of our other tutorials and SDK documentations in Gitbook.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

In [None]:
!pip install arize -q
import pandas as pd
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

ORGANIZATION_KEY = "ORGANIZATION_KEY"
API_KEY = "API_KEY"
arize_client = Client(organization_key=ORGANIZATION_KEY, api_key=API_KEY)

model_id = "demand-forecast-tutorial"  # This is the model name that will show up in Arize
model_version = "v1.0"  # Version of model - can be any string

if ORGANIZATION_KEY == "ORGANIZATION_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE ORGANIZATION AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

# import dataset
val_data = pd.read_csv("https://storage.googleapis.com/arize-assets/fixtures/demand_forecast_val.csv")
prod_data = pd.read_csv("https://storage.googleapis.com/arize-assets/fixtures/demand_forecast_prod.csv")
feature_column_names = val_data[['item_size', 'supplier_id', 'avg_historical_sales', 'cur_projected_sales',
                        'item_new_release_flag', 'item_stickyness_factor', 'item_release_year',	'shelf_life_weeks']].columns

✅ Arize setup complete!


In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
validation_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

val_response = arize_client.log(
    dataframe=val_data,
    path="inferences.bin",
    model_id=model_id,
    model_version=model_version,
    batch_id="baseline",
    model_type=ModelTypes.NUMERIC,
    environment=Environments.VALIDATION,
    schema=validation_schema,
)

production_schema = Schema(
    feature_column_names=feature_column_names,
    prediction_id_column_name="prediction_ids",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predictions",
    actual_label_column_name="actuals",
)

prod_response = arize_client.log(
    dataframe=prod_data,
    path="inferences.bin",
    model_id=model_id,
    model_version=model_version,
    batch_id=None,
    model_type=ModelTypes.NUMERIC,
    environment=Environments.PRODUCTION,
    schema=production_schema,
)

if val_response.status_code != 200 or prod_response.status_code != 200:
    print(f"logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged data to Arize")

✅ You have successfully logged validation set to Arize


## ✍️ Moving to the Arize Platform
The next sections are screen captures for setting-up the model you just went in. Feel free to follow and mirror our instructions to set-up the dashboards yourself, or simply read the guide below to see an example of how Arize can help troubleshoot your model.

**It may take 10 minutes for data to be indexed and show up on the Arize platform** 

In [None]:
print("☕ Coffee Break: It may take 10 minutes for data to be indexed and show up on the Arize platform")

# Part 2: Improving Model Observability & Business Insight
Arize provides the tools for engineers, product teams, and data scientists to gain business insight for better strategic decision making. In this example, **Demand forecasting** is important in so far as it can allow better supply management, let’s see how Arize is more than a monitoring platform.

## 2.1 Baseline Configurations
We will first need to set-up a baseline distribution by clicking on the **Config** button. This will serve as the reference distribution and benchmark for your production data. We will use a validation set we sent in, but you can choose a production window or training set as your reference distribution.

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-baseline-configgif.gif" width="1200">

## 2.2 Monitoring Regression Biases with Arize
Even if your model is only trained on loss functions of  **Mean Squared Error** or **Mean Absolute Error**, we sometimes still care about the **Mean Error** because it often tells us about the biases in our predictions.

<img src="https://storage.googleapis.com/arize-assets/fixtures/forecasting-bias-problem.png" width="600">

In our retail demand forecasting, **under-forecasting** is much more problematic since customers aren’t delivered on what they wanted. Thus, we want to monitor our **Mean Error** in addition to **Mean Absolute Error**.

Let’s use Arize’s **Performance Monitor** to visualize our performance.

**✏️ You can click on the gifs to replay it**

[<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-perf-monitor.gif" width="1200">](https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-perf-monitor.gif)


Here, we can see that there are days where we over forecasted and under forecasted. Let’s set-up our monitors to alert when there are significantly negative Mean Error, because over-forecasting will only result in minor inventory and shelf life related costs, not angry customers!

## 2.2 Arize Dashboard Configurations
Now that we understand the importance of **Mean Error** as a measure of prediction bias, we also want to monitor **Mean Absolute Error** side by side. Data Scientists all know that Mean Error isn't always the most informative because there are often coinciding events of **both** over and under prediction, resulting in a zero-sum Mean Error. 

We can avoid this with a side-by-side time series chart in **Arize Dashboard**.

**✏️ You can click on the gifs to replay it**

[<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-dashboard.gif" width="1200">](https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-dashboard.gif)

**Arize Dashboard** is a customizable feature where you can monitor time series data, charts, and model metrics all in one place. You can even monitor only a subset of your production data (i.e monitoring a slice of your production data, only where shelf life is under two weeks).


In [None]:
print("Part 2 Finished ✅")

# Part 3: Empowering ML Engineers to Troubleshoot
Arize is an engineers first tool designed to help you understand and troubleshoot your model performance issues. 

### Part 3.1 **Investigating production windows with low performance**
In our **Drift Tab** overview, we clearly see two time periods where the distribution has changed. You can click on the dates to see the feature distribution and drift of that particular day. The first drift corresponds to when we observed an over-estimating event, and the second if our under-estimating event.

<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-whats-drift-tab.png" width="800">

Let’s click on one of the days of the second feature drift, we can see that a number of features have drifted in these days. Let’s click into `item_new_release_flag`, and observe that there is a similar drift over this time period.

[<img src="https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-drift.gif" width="1200">](https://storage.googleapis.com/arize-assets/fixtures/demand-forecast-drift.gif)

From this, our ML Engineers now have additional information to work with when improving our models. Some possible conclusions and action items here could be (1) examining possible concept drifts relating to this feature or (2) retraining our model to fit new distributions!


In [None]:
print("Part 3 Finished ✅")