<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# Getting Started with the Arize Platform - Recommendation Systems

**In this walkthrough, we are going to monitor and investigate performance of a recommendation system recently pushed into production.**

In this walkthrough you will be taking on the persona of a machine learning engineer for a premium music service. After spending a great deal of time collecting customer data, training and testing various models your team has built a ML-powered recommendation engine to give your listeners personalized playlist recommendations based on their most listened to songs on their sound cloud. You are now ready to push your model into production and monitor how your recommendations are received by your user. With Arize you have the ability to monitor the performance of your model in production and compare it to your selected baseline. 

In this use case you will have the opportunity to:
1. Explore the decisions made by your recommendation with Arize explainability features
2. Monitor feature drift happening in real-time as a user's music preference changes 
3. Surface drift, data quality, data consistency issues to perform a root cause analysis 
4. Triage the impactful features that lead to drift 
5. Make proactive decisions to negate model degradation

 



**Dataset Generation**

The [Top 5000 Albums of All Time](https://www.kaggle.com/michaelbryantds/top-5000-albums-of-all-time-rateyourmusiccom) dataset contains 5000 ranked albums including ranking, album name, artist name, release date, genres, descriptors, average rating, number of ratings, and number of reviews. The dataset was acquired via web scraping on 12 October 2021 and decided by vote from the users of [rateyourmusic.com](rateyourmusic.com). 
Although we will not be using this full dataset directly, we will be using the album name, artist name, release date, genres to generate user recommendations.

Once your models are deployed into production, ML Observability provides deep understanding of your model’s performance, and root causing exactly why it’s behaving in certain ways. The Arize platform logs model inferences across training, validation and production environments without needing to store nor serve your models.  

The inferences for this use case are the probabilities the user will play the recommended song. Once the recommendations are made for each user, the ground truth collected is an indicator as to whether or not the user listened to the song (or skipped the song) —represented as 0 (skipped) or 1 (played). 

For example:

Recommended Song  | Outcome     | Actual Score
-----------------------|---------------------|------------------
Livin' On A Prayer BY Bon Jovi                 | User skipped song   | 0
Paradise City BY Guns N' Roses                   | User listened to song         | 1
Creep (Explicit) BY Radiohead                   | User skipped song   | 0

For a recommendation system will be using precision as our default performance metric to monitor model performance since precision is about retrieving the best items to the user (assuming that there are more useful items available).

Precision is the number of selected items that are relevant. So suppose our recommender system selects 3 items to recommend to users out of which 2 are relevant then precision will be 66%. 

See equation: 

$$Precision = \frac{TP}{TP+FP}$$

TP = song recommended + song played

FP = song recommended + song not played




## **Recommendation Engines**

Recommendation engines are more than a way of rearranging information available about user preferences and then using this information to provide informed recommendations on the basis of that information, it is a critical piece of thousands of core business models. If the recommendation systems in your company drift over time, user satisfaction, engagement will decrease, while churn and distrust in your company will increase. Having recommendations that are dynamic, adaptable and explainable are key to customer success and the success of a company's core business model. In this notebook we will utilize the user-song play count dataset to uncover different ways in which we can recommend new tracks to different users and then monitor the performance of our model in Arize.




# Step 0. Setup and Getting the Data
The first step is to load pre-existing datasets for training, validation and prediction into the colab notebook from your data source. 



## Install Dependencies and Import Libraries 📚

In [None]:
!pip install arize --upgrade -q

import datetime
import uuid
from datetime import timedelta

import numpy as np
import pandas as pd
from arize.pandas.logger import Client, Schema
from arize.utils.types import Environments, ModelTypes

print("✅ Dependencies installed and libraries imported!")

## **🌐 Download the Data**

In [None]:
# Preparing dataset for this tutorial
valid_df = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/rec-sys-val.csv"
)
prod_df = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/rec-sys-prod.csv"
)

print("✅ Data successfully downloaded!")

## Inspect the Data
Take a quick look at the dataset. The data is a combination of song metadata (song title, genre, artist and year released) and user information (age, location, last genre played). The model performance is determined by precision of the user song predictions, if the precision is high then a majority of the songs recommended to the user were played by the user. 

**Note**: We also add timestamp according to `datetime.now()` so that it show up on your UI accordingly

In [None]:
prod_df.head(10)
# We add timestamp according to datetime.now() so that it show up on your UI accordingly
prod_df["prediction_ts"] = prod_df["days_ago"].apply(
    lambda days: int((datetime.datetime.now() - timedelta(days=days)).timestamp())
)
prod_df.head()

# Step 1. Sending Data into Arize 💫

## Import and Setup Arize Client

First, copy the Arize `API_KEY` and `SPACE_KEY` from your admin page shown below!



<img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

# Saving model metadata for passing in later
model_id = "rec-sys-demo-model"
model_version = "1.0"

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

## Using the Python SDK
For our dataset, we have pre-formatted the feature names and dataframes for logging to Arize using our Python SDK through `arize.pandas.logger`. The `Schema` of your model specifies a mapping from column names for your logging DataFrame. 

Here's a summary below:

| Schema Argument Name | Description |||
|:- |:-|---|---|
| `feature_column_names`| names of the columns representing features |||
| `prediction_id_column_name`| list of unique ids you can use to use to match each record |||
| `prediction_label_column_name`| predictions column name |||
| `actual_label_column_name`| actuals column name |||
| `timestamp_column_name`| timestamps for when predictions were made |||

For more details on how to send data in production to Arize, check out some of our other logging tutorials and SDK documentations in Gitbook.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)




## Log Validation & Production Data to Arize 

In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
feature_column_names = prod_df.columns[:18]

validation_schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_label_column_name="Recommended",  # required by score_categorical
    prediction_score_column_name="Confidence",
    actual_label_column_name="Played_or_skipped",
    actual_score_column_name="Actual_Confidence",
    feature_column_names=feature_column_names,
)

val_response = arize_client.log(
    dataframe=valid_df,
    model_id=model_id,
    model_version=model_version,
    batch_id="validation_test",  # provide a batch_id to distinguish from other validation sets
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.VALIDATION,
    schema=validation_schema,
    
)

production_schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_label_column_name="Recommended",  # required by score_categorical
    prediction_score_column_name="Confidence",
    timestamp_column_name="prediction_ts",
    actual_label_column_name="Played_or_skipped",
    actual_score_column_name="Actual_Confidence",
    feature_column_names=feature_column_names,
)

prod_response = arize_client.log(
    dataframe=prod_df,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    schema=production_schema,
    
)

# Checking responses to make sure our data was successfully ingested
if val_response.status_code != 200:
    print(
        f"logging failed with response code {val_response.status_code}, {val_response.text}"
    )
elif prod_response.status_code != 200:
    print(
        f"logging failed with response code {prod_response.status_code}, {prod_response.text}"
    )
else:
    print(f"✅ You have successfully logged data to Arize")

# Step 2. Confirm Data in Arize ✅

Note that the Arize performs takes about 10 minutes to index the data.  While the model should appear immediately, the data will not show up until the indexing is complete. Feel free to head over to the **Data Ingestion** tab for your model to watch Arize works its magic!🔮

**The next sections are screen captures for tutorials to setting-up the model we just sent in.**

Feel free to follow and mirror our instructions to set-up the dashboards yourself, or simply read the guide below to see how Arize can quickly generate value for demand forecasting models.

**⚠️ DON'T SKIP:**
In order to move on to the next step, make sure your actuals and training/production sets are loaded into the platform. To check:
1. Navigate to models from the left bar, locate and click on model **rec-sys-demo-model**
2. On the **Overview Tab**, make sure you can see Predictions and Actuals under the **Model Health** section. Once production actuals have been fully recorded on Arize, the row title will change from **0 Actuals** to **Actuals** with summary statistics such as cardinality listed in the tables.
3. Verify the list of **Features** below **Actuals**.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/predictionvolume.png)





# Step 3. Configure Baseline and Monitors
Arize can automatically configure monitors that are best suited to your data. From the banner at the top of the screen, simply click **Set up Model** then select **Validation Version 1.0** and click **NEXT**. Select **Precision** as the **Default Metric**.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/initial_setup_banner.png)

You will now see that the baseline has been set and **Drift**, **Data Quality**, and **Performance** monitors have been created!!!

<img src="https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/monitors.png" width="300"/>

## Check Triggered Monitors 

You are now able to check on and edit pre-configured, i.e **managed**, monitors on the **monitor** tab. 


To automatically be notified of poor values getting introduced in production create custom monitors for your features. From the model [**Monitors**](https://docs.arize.com/arize/product-guides/monitors) tab click **New Monitor** and choose a monitor type (drift, performance, or data quality).

# Step 4. First glance
Now that your baseline and monitors are configured, take a quick glance at the **Drift** tab. From the top right of the screen, pick **-30 Days** as the Date range. You may notice a slight dips in Precision at (9/29 for example in this screen shot). We will investigate this change in later sections.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/drift-tab.png)



# Step 5. Analyzing feature drift
During initial model setup, Arize automatically created managed monitors for monitoring drift, performance, and data quality across feature inputs. As we scroll through the features in the drift tab we see several triggered monitors. Let's check out the feature corresponding to the drift alert, in this case you should notice the monitor triggered for the **Affiliate Provider** feature. 

You can nagivate to the **Affiliate Provider** feature from either the model overview page or the **drift** tab --> **feature drift**, then click the feature to notice the change in the distribution. Use the PSI (population stability index) graph to select a period of interest. We see that not has the expected amount of **direct** links has decreased in production, as well as a new input has been seen by the model. Notice that as the new affiliate provider **facebook** appears, the default threshold set by Arize was crossed, this is due to the feature drift being caused by inputs not used in the training baseline. At this point it might be a good time to train your model on the new input. 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/affiliate_drift.png)





# Step 6. Analyzing and Detect Missing Data
In the production data, values like **facebook** are being recorded against the **affiliate_provider** feature that were not part of the training data. As we scroll down we see that a monitor for data quality has been triggered on the **first_affiliate_tracked** feature due to the input data exceeding the alert threshold for percent of empty values. We can further investigate by navigating to **% Empty** , clicking on it and then being directed to the **Data Quality** tab. Here you see the dramatic increase in missing values over time.

Make sure to view the last 30 days by selecting the correct range in the top right corner of the screen. Arize keeps tracks of feature cardinality as well as fields with no data and can pin-point the exact time that this issue started.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/rec-sys-data-quality.gif)




# Step 7. Analyzing model performance

Now that we have looked over data quality and feature drift, we will investigate model performance with Arize's performance dashboards. While you are able to fully customize dashboards for your team, we will use a **scored model** template from the **template** tab in the left-hand sidebar. 

[Dashboards](https://docs.arize.com/arize/product-guides/dashboards) are comprised of widgets designed for different types of analysis across your training, validation, and production environments:
- Distribution Widget for analyzing data distribution changes over Feature, Prediction, and Actuals.
- TimeSeries Widget for analyzing time-based data. 
- Statistic Widget for getting an aggregate statistic. Data Metrics and Evaluation Metrics charts are also available for this widget. 





![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/Screen%20Shot%202021-10-19%20at%207.10.27%20PM.png)

An alternative way to create this is by navigating through **Dashboards Tab**

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/rec-sys-create-dashboard.gif)

Once we have the dashboard loaded the [Performance Dashboard](https://docs.arize.com/arize/product-guides/performance) provides you with aggregate statistics for accuracy, recall, and other customizable evaluation and data metrics. Dashboards facilitate performance troubleshooting by providing support for customizable widgets and chainable filters to drill down to specific cohorts and see respective model statistics. You can slice and filter Dashboards by any model, model version, feature, and/or actual value.



![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/Screen%20Shot%202021-10-19%20at%2010.03.38%20PM.png)

You can also slice and filter dashboards by any model, model version, and model dimension. As you can see when we filter on major cities like LA, NYC and Austin we see the precision drops from **0.702** (top screen, right most widget) to **0.54** (bottom screen, right most widget). It appears something is going on at the cohort level which is causing performance to degrade, we will investigate further. 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/Screen%20Shot%202021-10-19%20at%2010.08.54%20PM.png)




# Step 8. Heatmap view

When using the performance dashboards we have a 2D view of the data, when we use our Heatmap view we can take the performance analysis a step further with a 3D view. We can also create a **Feature Heatmap Dashboard** from our **Dashboards Tab**

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/rec-sys-data-create-heatmap.gif)

The [Feature Performance Heatmap](https://docs.arize.com/arize/product-guides/performance) provides you with model performance information across all features, at various feature/value combinations — also known as a slice. Feature Performance Heatmaps also support conditional filters (like Dashboards).

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/Screen%20Shot%202021-10-19%20at%2010.11.26%20PM.png)
![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/windows.png)
Once you have an idea of your overall model performance across various slices it's time to start diving into which features could be causing this performance degradation. At first glance we see that while most city inputs to the feature **city** are performing quite well, the input **Austin** is performing poorly and contributes a large impact on performance.

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/Austin.png)
When we scroll down to When we decided to filter on **genre_last_played** we see a sharp spike in **Heavy Metal, Hard Rock**, this could be a bug in the data pipeline causing data quality issues, or the model being deployed in a new domain, or an outlier event (like a large music festival in one of the input cities). At the point you can save the production data and retrain your model to increase performance on this specific feature.
![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/metal.png)

Heatmaps also rank order the the worst performing slices to automatically surface potential root causes of your performance degradation. To access these slices click on the fire icon on the right (it’s a white outline icon) when you’re not in edit mode (the screenshot is in edit mode).



This view has automatically identified that a good first step in improving overall model performance begins with improving the model performance for the slices where feature **genre_last_played** = **Heavy Metal, Hard Rock** and **city** = **Austin**. 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/slices.png)

Now if we filter down on both **genre_last_played** = **Heavy Metal, Hard Rock** and **city** = **Austin** we find that with a precision of 0.035 there were very few songs recommended to subscribers in Austin who had heavy metal as their last song played received a song prediction through the platform which they played (they skipped it). Now either the data ingestion needs to be fixed or your model needs to be retrained to handle Austin's new Heavy Metal scene! 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Rec-Sys-Use-Case/featuresheatmap.png)









# 📚 Conclusion

In this walkthrough we've shown how Arize can be used to log prediction data for a model, pinpoint model performance degradation, and set up monitors to catch future issues. We have been able to identify 3 areas of concern:

1. Surface drift, data quality, data consistency issues to perform a root cause analysis as a new domain appeared in our production system for which the model was not trained on. 
2. Monitor feature drift happening in real-time as a user's music preference changes and find new inputs to features which appeared in production.
3. We use performance dashboard and heatmap views to isolate abnormal inputs to features in production.



Though we covered a lot of ground, this is just scratching the surface of what the Arize platform can do. We urge you to explore more of Arize, either on your own or through one of our many other tutorials.

### About Arize
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
