# Getting Started with the Arize Platform - Comparing Models

**In this walkthrough, we are going to look at how to use Arize to compare two models with Arize performance dashboards!**

In this example, you manage the default/fraud detection model for the widely used [Lending Club](https://www.lendingclub.com/). You have been serving two models with very similar performances in traing/validation, now you would like to investigate which model will you finally use to serve customers in the end.

**Specifically, you want to not only investigate model metrics such as accuracy, but you also want to understand model performance in certain segments of your customers to optimize for better customer lifetime value**, so you turn to Arize for investigation.

## The Story: Low Fico Score

In this example, we will choose `fico_score <= 500` as our customer segment of interest. Our data science director wants to extend the product line to be more fair for different disadvantaged groups, so correctly predicting outcome for loan in this group is incredibly important.

You want to pick the best model to optimize accuracy, but you also want to optimize protecting vulnerable groups.


## Our steps to resolving this issue will be :

1. Log our models onto the Arize platform using the Python SDK
2. Set up Dashboards to view two model performances side by side
3. Use dashboard features to hone in on specific reasons to select one model over another.

# Step 1: Setup and Getting the Data
### Step 1.1: Loading data from Arize client
We will load in some pre-existing data for the Lending Club model - training data, features, predictions, and class probabilities.

In [None]:
import pandas as pd
import numpy as np
import concurrent.futures as cf
import uuid
import datetime


def unpack_data(data):
    X = data.drop(columns=['label', 'prediction', 'score'])
    y = data['label']
    pred = data['prediction']
    score = data['score']
    return X, y, pred, score

model_a_data = pd.read_csv('https://storage.googleapis.com/arize-assets/fixtures/compare-model-a.csv')
model_b_data = pd.read_csv('https://storage.googleapis.com/arize-assets/fixtures/compare-model-b.csv')

X_m1, y_m1, y_pred_m1, y_pred_score_m1 = unpack_data(model_a_data)
X_m2, y_m2, y_pred_m2, y_pred_score_m2 = unpack_data(model_b_data)

### Step 1.2: Setting up Arize Client:
First, copy the Arize `API_KEY` and `SPACE_KEY` from your admin page shown below!

<img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
!pip install arize -q
import concurrent.futures as cf
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

# Step 1.2: Set-up Arize Client and model meta data
SPACE_KEY = 'SPACE_KEY'
API_KEY = 'API_KEY'
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_type = ModelTypes.SCORE_CATEGORICAL

if (SPACE_KEY == 'SPACE_KEY' or API_KEY == 'API_KEY'):
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

# Step 2: Some helpful helper functions for later
def arize_responses_helper(responses):
    for response in cf.as_completed(responses):
        res = response.result()
        if res.status_code != 200:
            raise ValueError(f'future failed with response code {res.status_code}, {res.text}')

def simulate_production_timestamps(num_entries, days=30):
    """
    Takes in: number of entries used for bulk_log, and number of days we want simulate back trace
    Returns prediction_timestamps arguement for bulk_log, uniformally distributed over time period
    """
    current_time = datetime.datetime.now().timestamp()
    earlier_time = (datetime.datetime.now() - datetime.timedelta(days=days)).timestamp()
    optional_prediction_timestamp = np.linspace(earlier_time, current_time, num=num_entries)
    optional_prediction_timestamp = pd.Series(optional_prediction_timestamp.astype(int))
    return optional_prediction_timestamp

### Step 1.3: Logging Model A and Model B to Arize
First, we take our existing models and log them to Arize using the Python SDK. For more details on how **`arize.pandas.log`** works, visit out documentations page below.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

In [None]:
# Load our predictions, score, and actuals into a pd.DataFrame for Logging
model_a_df = X_m1.copy()
model_a_df['prediction_ids'] = [str(uuid.uuid4()) for _ in range(len(model_a_df))]
model_a_df['predictions'] = y_pred_m1
model_a_df['predictions_score'] = y_pred_score_m1
model_a_df['actuals'] = y_m1
model_a_df['prediction_ts'] = simulate_production_timestamps(len(y_pred_m1), days=30)
feature_column_names = X_m1.columns 

# Define a Schema() object for Arize to pick up data from the correct columns for logging
schema = Schema(
    prediction_id_column_name="prediction_ids",  # REQUIRED
    prediction_label_column_name="predictions",
    prediction_score_column_name="predictions_score",
    actual_label_column_name="actuals",
    timestamp_column_name="prediction_ts",
    # feature_column_names should be specific to your model
    feature_column_names=feature_column_names,
)

# arize_client.log returns a Response object from Python's requests module
log_model_a_responses = arize_client.log(
    dataframe=model_a_df,
    model_id='compare-models-demo-a',
    model_version='1.0',
    model_type=model_type,
    environment=Environments.PRODUCTION,
    schema=schema,
)

arize_responses_helper(log_model_a_responses)
print("✅ Logged Models A to Arize!")

In [None]:
# Load our predictions, score, and actuals into a pd.DataFrame for Logging
model_b_df = X_m2.copy()
model_b_df['prediction_ids'] = [str(uuid.uuid4()) for _ in range(len(model_b_df))]
model_b_df['predictions'] = y_pred_m2
model_b_df['predictions_score'] = y_pred_score_m2
model_b_df['actuals'] = y_m2
model_b_df['prediction_ts'] = simulate_production_timestamps(len(y_pred_m2), days=30)
feature_column_names = X_m2.columns

# Note: We do not have to redefine Schema() since two models have the same Schema()

# arize_client.log returns a Response object from Python's requests module
log_model_b_responses = arize_client.log(
    dataframe=model_b_df,
    model_id='compare-models-demo-b',
    model_version='1.0',
    model_type=model_type,
    environment=Environments.PRODUCTION,
    schema=schema,
)

arize_responses_helper(log_model_b_responses)
print("✅ Logged Models B to Arize!")

## Check Data Ingestion Information

Data will be available in the UI in about 10 minutes after it was received. If data from a new model is sent, the model will be reflected almost immediately in the Arize platform. However, you will not see data yet. To verify data has been sent correctly and is being processed, we recommend that you check our Data Ingestion tab. 

You will be able to see the predictions, actuals, and feature importances that have been sent in the last week, last day or last 30 minutes. 

An example view of the Data Ingestion tab from a model, when data is sent continuously over 30 minutes, is shown in the image below. 

<img src="https://storage.googleapis.com/arize-assets/fixtures/data-ingestion-tab.png" width="700">

## Coffee Time ☕️
Note that the Arize performs takes about 10 minutes to index the data. While the model should appear immediately, the data will not show up till the indexing is done. Feel free to go grab a cup of coffee as Arize works its magic! 🔮

Your Prediction Volume may look slightly different!

![image.png](https://storage.googleapis.com/arize-assets/fixtures/waiting-on-data.png)

Actual data will show up under **Model Health**. Once the number changes from **0 Actuals** to **Actuals** (with summary statistics listed in the drop-down), your production actuals will have been fully recorded on Arize!

![image.png](https://storage.googleapis.com/arize-assets/fixtures/waiting-on-actual-data.png)

# Step 2: Set-up Model Performance Dashboard on Arize
**Model Performance Dashboards** are customizable and flexible dashboards capable of enabling live and continuous monitoring of model statistics, time series, and distributions. With Dashboards, you can quickly set up recurring deepdive into slices of your data (i.e, segments of users, business bottlenecks, etc), serving purposes such as compare model performances, validate your prediction volumes, and many more.

In this example, we will use a **Template** option as shown below for **Compare Model A with Model B**. There are other templated options depending on your model types as well (i.e regression, scored model, binary classification, multi-class, etc).


## 2.1 Setting up Initial Dashboard
First, click on **Template** and then **Compare Model A with Model B**
1. Set your Dashboard Title, we will call the Dashboard **Compare Performance of A and B**
2. Select `compare-models-demo-a` and `compare-models-demo-b`

You could select model versions to compare (or even compare model versions against each other). In this example, we leave model version as `All` since we only logged one version!

**✏️ You can click on the gifs to replay it**
[![image.png](https://storage.googleapis.com/arize-assets/fixtures/create-template.gif)](https://storage.googleapis.com/arize-assets/fixtures/create-template.gif)


# Step 3: Use Dashboard Features to Compare Models
Here, we see that the two models are very similar in general distribution and prediction volumes. Prediction counts are similar, and distribution aren't significantly different either.

**Let's dig deeper into soe accuracy statistics we are interested in.**

## Step 3.1 Monitoring Overall Accuracy Statistic
Let's try creating some statistics widgets. We will first sets up a dashboard to **monitor our accuracy live in production**.

Repeat for **Both Model A and B**:
1. Use the create statistics widget
2. Edit the statistics widget
3. Select the model we logged, and **Evaluation Metrics**
4. Select **Production** as **Model Environment**
5. Finally, select **Accuracy** as your metric.


**✏️ You can click on the gifs to replay it**
[![image.png](https://storage.googleapis.com/arize-assets/fixtures/compare-accuracy.gif)](https://storage.googleapis.com/arize-assets/fixtures/compare-accuracy.gif)

Now, we see that Model A and B accuracies are 0.451 and 0.454 respectively. Not very informative.

## Step 3.2 Using Filter in Widgets
It seems like we can't tell much from accuracy alone. Let's user Arize to investigate our model performance in **certain important segments of our user**, in accordance to our business objective to be moore fair to those with `fico_score <= 500` (as stated in the beginning of this tutorial!)

Repeat for **Both Model A and B**:
1. Use the create statistics widget
2. Follow step 2-5 the same way as earlier step.
3. At the bottom, under **Filter**, Filter over **Feature** with **`fico_score`** that is **`<=`** to value of `500`.

**✏️ You can click on the gifs to replay it**
[![image.png](https://storage.googleapis.com/arize-assets/fixtures/compare-accuracy-low-fico.gif)](https://storage.googleapis.com/arize-assets/fixtures/compare-accuracy-low-fico.gif)

It seems like our **Model A** is indeed better in this segment of our user. Let's investigate a little more with Arize features!

## Step 3.3 Set up Feature Distribution with Heatmap
Now that we know there is something up, let's use the **Distribution Widget** feature to observe why accuracy is so low in this segment of users.

Repeat for **Both Model A and Model B**
1. Use the create distribution widget
2. Set **Title** and **Plot 1 Title** as anything you want
3. Select the model and **Model Environment** as **Production**
4. Select **Distribution over** as **Feature** and **`fico_score`**
5. **_Heatmap_**: Select **Distribution of** and **Accuracy**

**Note:** After step 5, you should see that our data distribution has three distinct jumps in total proportion (under 500, 500-600, and above 600). For better frame reference in our heatmap, let's _filter under 600_ for a better view.

6. For the **_Filter_** option, Select **Feature**, **`fico_score`**, **`<=`**, and **600**

**✏️ You can click on the gifs to replay it**
[![image.png](https://storage.googleapis.com/arize-assets/fixtures/compare-heatmap-low-fico.gif)](https://storage.googleapis.com/arize-assets/fixtures/compare-heatmap-low-fico.gif)

From the two distributions of **Model A** and **Model B**, we can see that when `fico_score` steps under 500 for **Model B**, the model performance sharply declines. For this reason, we should pick model A even if the accuracy is a few percent worse -- since we value this segment of our customer for fairness reasons!



# ✏️ Takeaways
In this toy example, **we knew we cared about this segment of users ahead of time,** and one model happened to be better at predicting this segment of users.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/compare-model-heatmap.png)

Using the **heatmap** feature showcases in **Step 3.3**, we can actually explore in which segment our users (or features) are we underperforming, and use that observation as a basis for analyzing which model do we want to use in production.

### 🚀 In short: Arize allows DS and ML teams to do more than **_Monitoring_** model statistics, but allows you to deep dive into truly **_Understanding_** your model in production and better serve their goals.

Equiped the **Model Performance Dashboard** feature, we can create many continuous monitoring widgets so that we can always ship the most value out of our ML models in production.



### Overview
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
