<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# Getting Started with the Arize Platform - Credit Card Fraud in the Banking Industry

**You are responsible for a credit card fraud model at a large bank or payment processing company. You have been alerted by a spike in credit card chargebacks leading you to suspect that fraudsters are getting away with commiting fraud undetected!** Realizing that this flaw in your model's performance has a heavy cost on your company and customers, you understand the need for a powerful toolset to troubleshoot (and prevent) costly model degradations. You turn to Arize to find out what changed in your credit card fraud detection model and how you can improve it.

In this walkthrough, we are going to investigate your production credit card fraud model. We will validate degradation in model performance, troubleshoot the root cause, and furthermore set up proactive monitors to mitigate the impact of future degradations.  

We will set up monitors to proactively identify when our fraud model is not perfoming as expected, troubleshoot why we're seeing this deviation in production, and come up with actionable steps to improve the model.

Our steps to resolving this issue will be:
1. Get our model onto the Arize platform to investigate
2. Setup a performance dashboard to look at prediction performance
3. Understand where the model is underperforming
4. Discover the root cause of why a slice (grouping) of predictions is underperforming
5. Set up pro-active monitoring to mitigate the impact of such degradations in the future

The production data contains 1 month of data where 2 main issues exist. You will work on identifying these issues over the course of this exercise.

1. An upstream data source has introduced bad (null) values for ENTRY_MODE 
2. The model is missing fraud for certain merchant types,entry modes and merchant ids especially in certain regions.

# Step 0. Setup and Getting the Data

The first step is to load our preexisting dataset which includes training and production environments for our creditcard fraud model. Using a preexisting dataset illustrates how simple it is to get started with the Arize platform.

## Install Dependencies and Import Libraries 📚

In [1]:
!pip install arize -q


import datetime
import tempfile
import pandas as pd
import requests
from datetime import timedelta
import uuid
import numpy as np

from arize.pandas.logger import Client, Schema
from arize.utils.types import Environments, ModelTypes


[K     |████████████████████████████████| 25.6 MB 53.4 MB/s 
[?25h

## **🌐 Download the Data**
In this walkthrough, we’ll be sending real historical data (with privacy conscious changes to feature names and values). Note, that while feature names and values are made explicit in this dataset, you can achieve the same level of ML Observability using obfuscated features. 



In [2]:
production = pd.read_csv("https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/fraud_production_shap.csv", index_col=0)
train = pd.read_csv("https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/fraud_train_shap.csv", index_col=0)

print("✅ Dependencies installed and data successfully downloaded!")

✅ Dependencies installed and data successfully downloaded!


## Inspect the Data 

Take a quick look at the dataset. The data represents a model designed and trained to evaluate the probability of a credit card transaction being fraud based on various features such as merchant_type, mean_amount, transaction amount, entry_mode, etc. The dataset contains one month of data and the performance will be evaluated by comparing:

*   **PREDICTION**: The probability of a fraud transaction predicted by the model 
*   **ACTUAL**: Fraud or not fraud based on ground truth collected by credit card users

In [3]:
production.head()

Unnamed: 0,MERCHANT_TYPE,MERCHANT_ID,ENTRY_MODE,STATE,MEAN_AMOUNT,STD_AMOUNT,TX_AMOUNT,VISA_RISK_SCORE,MASTERCARD_RISK_SCORE,AMEX_RISK_SCORE,prediction_id,PREDICTION,PREDICTION_SCORE,ACTUAL_SCORE,day,prediction_ts,ACTUAL,MERCHANT_TYPE_shap,MERCHANT_ID_shap,ENTRY_MODE_shap,STATE_shap,MEAN_AMOUNT_shap,STD_AMOUNT_shap,TX_AMOUNT_shap,VISA_RISK_SCORE_shap,MASTERCARD_RISK_SCORE_shap,AMEX_RISK_SCORE_shap
0,Automotive Tire Stores,0d523,chip_contactless,NY,112.7424,24.4066,181.0421,19.0,32.0,33.0,4ead6b2a-e69b-4238-b305-85b78f341ecd,NON_FRAUD,0.46859,0.0,1.0,1641412634,NON_FRAUD,0.166667,-2.416667,1.0,0.0,-5.583333,5.0,-4.083333,22.25,30.583333,57.083332
1,Financial Institutions – Merchandise and Services,9ef24,chip_contactless,CA,107.0147,25.9146,164.6947,37.0,19.0,16.0,a1bc94de-8860-4e3d-80c7-926a475e1d84,NON_FRAUD,0.394723,0.0,1.0,1641412634,NON_FRAUD,-0.25,0.166667,0.75,-0.666667,3.5,0.5,-9.5,20.5,30.583333,58.416665
2,Business and Secretarial Schools,7d30f,credential_on_file,CA,84.5052,17.4863,39.4607,38.0,17.0,34.0,1ee39fff-75b8-4ea7-896c-4ef02f7234e8,NON_FRAUD,0.182435,0.0,1.0,1641412634,NON_FRAUD,0.0,0.0,0.5,-0.583333,3.833333,0.5,-3.75,18.333333,27.249999,57.916665
3,Money Orders – Wire Transfer,88309,manual,FL,73.4188,63.0689,143.7292,40.0,25.0,30.0,a8dbf775-fd32-4f26-b1f1-a6d186e8671a,NON_FRAUD,0.362903,0.0,1.0,1641412634,NON_FRAUD,-0.583333,0.0,-1.083333,0.166667,-9.166667,12.666666,-4.416667,21.0,27.749999,57.666665
4,Drug Stores and Pharmacies,05048,chip_contactless,NY,140.6726,17.5549,79.0607,52.0,67.0,48.0,8c0d199f-681e-418c-abb1-8dd2bdc14445,FRAUD,0.814838,0.0,1.0,1641412634,NON_FRAUD,-0.166667,-2.75,9.333333,0.083333,-176.583329,-23.083333,-4.0,24.25,19.499999,57.416665


In [None]:
feature_column_names = [
    "MERCHANT_TYPE",
    "MERCHANT_ID",
    "ENTRY_MODE",
    "STATE",
    "MEAN_AMOUNT",
    "STD_AMOUNT",
    "TX_AMOUNT",
    "VISA_RISK_SCORE",
    "MASTERCARD_RISK_SCORE",
    "AMEX_RISK_SCORE",
]
shap_column_names = [f"{x}_shap" for x in feature_column_names]

# Step 1. Sending Data into Arize 💫

Now that we have our dataset imported, we are ready to integrate into Arize. We do this by logging (sending) important data we want to analyze to the platform. There, the data will be easily visualized and troubleshooting workflows will help us find the source of our problem.

For our model, we are going to log:
*   feature data
*   predictions
*   actuals

## Import and Setup Arize Client

The first step is to setup our Arize client. After that we will log the data.

First, copy the Arize `API_KEY` and `ORG_KEY` from your admin page linked below! Copy those over to the set-up section. We will also be setting up some metadata to use across all logging.

[![Button_Open.png](https://storage.googleapis.com/arize-assets/fixtures/Button_Open.png)](https://app.arize.com/admin)


In [None]:
ORGANIZATION_KEY = "ORGANIZATION_KEY"
API_KEY = "API_KEY"

arize_client = Client(organization_key=ORGANIZATION_KEY, api_key=API_KEY)

model_id = "arize-demo-creditcard-fraud"  # This is the model name that will show up in Arize
model_version = "v1.0"  # Version of model - can be any string

if ORGANIZATION_KEY == "ORGANIZATION_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE ORGANIZATION AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

## Log Training & Production Data to Arize 

Now that our Arize client is setup, let's go ahead and log all of our data to the platform. For more details on how **`arize.pandas.logger`** works, visit out documentations page below.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

Key parameters:

*   **prediction_label_column_name**: tells Arize which column contains the predictions
*   **actual_label_column_name**: tells Arize which column contains the actual results from field data
*   **preidction_score_column_name**: tells Arize which column contains the prediction score from field data
*   **actual_label_column_name**: tells Arize which column contains the actual results from field data
*   **actual_score_column_name**: tells Arize which column contains the actual score from field data

Given that our model is predicting between categories, we will use [ModelTypes.SCORE_CATEGORICAL](https://docs.arize.com/arize/product-guides-1/models/model-types) to perform this analysis.



## Log Training Data

In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
training_schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names=feature_column_names
    )

# Logging Training DataFrame
training_response = arize_client.log(
    dataframe=train,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.TRAINING,
    schema=training_schema,
    )

# If successful, the server will return a status_code of 200
if training_response.status_code != 200:
    print(f"logging failed with response code {training_response.status_code}, {training_response.text}")
else:
    print(f"✅ You have successfully logged training set to Arize")


## Log Production Data


In [None]:
# changing dates for ease of visualization / to mimic recent produciton dataset 
END_DATE = datetime.date.today().strftime('%Y-%m-%d')
START_DATE = (datetime.date.today() - timedelta(31)).strftime('%Y-%m-%d')



def setPredictionIDandTime (df, start, end):
    out_df = pd.DataFrame()
    dts = pd.date_range(start, end).to_pydatetime().tolist()
    for dt in dts:
        day_df = df.loc[df["day"] == dt.day].copy()
        day_df["prediction_ts"] = int(dt.strftime("%s"))
        out_df = pd.concat([out_df, day_df], ignore_index=True)
    out_df["prediction_id"] = [str(uuid.uuid4()) for _ in range(out_df.shape[0])]
    return out_df.drop(columns="day")

production = setPredictionIDandTime(production, START_DATE, END_DATE)

production_schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names=feature_column_names,
    shap_values_column_names=dict(
        zip(feature_column_names, shap_column_names)
    )
)

production_response = arize_client.log(
    dataframe=production,
    model_id=model_id,
    model_version=model_version,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    schema=production_schema,
)

if production_response.status_code != 200:
    print(f"logging failed with response code {production_response.status_code}, {production_response.text}")
else:
    print(f"✅ You have successfully logged production set to Arize")

# Step 2. Confirm Data in Arize ✅

Note that the Arize performs takes about 10 minutes to index the data. While the model should appear immediately, the data will not show up until the indexing is complete. Feel free to go grab a cup of coffee as Arize works its magic! ☕🔮

**⚠️ DON'T SKIP:**
In order to move on to the next step, make sure your actuals and training/production sets are loaded into the platform. To check:
1. Navigate to models from the left bar, locate and click on model **arize-demo-creditcard-fraud**
2. On the **Overview Tab**, make sure you can see Predictions and Actuals under the **Model Health** section. Once production actuals have been fully recorded on Arize, the row title will change from **0 Actuals** to **Actuals** with summary statistics such as cardinality listed in the tables.
3. Verify the list of **Features** below **Actuals**.

(screen recording)

# Step 3. Set up Model Baseline & Managed Monitors

Now that our data has been logged into the [Arize platform](https://app.arize.com/) we can begin our investigation into our poorly performing fraud detection model. 

Arize will guide you through setting up a **Baseline** (reference environment for comparison) and automatically create **Monitors** for your model in just a few clicks —just follow the blue banner at the top of the page titled "Finish setting up your model". 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/initial_setup_banner.png)

Arize can automatically configure monitors that are best suited to your data. From the banner at the top of the screen, select the following configurations after clicking the 'Set up Model' button: 

1. Datasets: `Training Version 1.0`
2. Default Metric: `False Negative Rate`, Trigger Alert When: `False Negative Rate is above .2`, Positive Class: `FRAUD`
3. Turn On Monitoring: Drift ✅, Data Quality ✅, Performance ✅ 

You will now see that the baseline has been set and **Drift**, **Data Quality**, and **Performance** monitors have been created!!! 


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/fraud_setupbaseline.gif)

 
To prevent expensive chargebacks from getting by our models undetected Arize automatically set up monitors to ensure our model is flagged if it Performance, Data Quality, or Drift spikes above a certain threshold/before it becomes a major issue. You can see filter monitors by category, edit evaluation windows, thresholds, etc. and create custom monitors by visiting the **Monitors** tab.


# Step 4. Setting up proactive custom monitors

**Arize sets up monitors across all features, predictions, and actual values.** In fraud detection it's important to monitor your model's: 


1.   False Negative Rate — Chargeback % (fraud transactions that were identified by the model as non fraud leading to a chargeback/immediate financial loss for the company).
2.   False Positive Rate — Upset Customer % (non fraud transactions that were classified as fraud leading to an awkward moment at the register and an upset customer). 
3.  Accuracy — of all my predictions, what percent did the model predict correctly? We need to be careful with this metric as it can be potentially misleading, especially if there is a small amount of fraud transactions. If a model has 1% fraud, misclassifying all transactions can still result in 99% accuracy.


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/monitor_FNR.gif)

# Step 5. Exploring Drift


Selecting the **Drift** tab we notice a noticeable model prediction drift in production compared to our training dataset (baseline) for a period of 4 days. By taking a look at the **Distribution Comparison** beneath the **Prediction Drift over Time** widget we notice that in our training dataset (Baseline Distribution) we expect to see around 1% FRAUD whereas in our production dataset (Current Distribution) we see highs of almost 25% FRAUD!
 
 
![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/prediction_drift_fraud.gif)

# Step 6. Performance Analysis 

Once a performance monitor is triggered, navigate to the **Performance** tab to start troubleshooting your model issues and gain an understanding of what caused the degradation. The false negative rate (our default performance metric) is plotted over the last 30 days and it is overlaid on top of bars which measure the volume of predictions. Our model is doing pretty well but there have been a few spikes in false negative rate down to ~0.2 so let's look into what could be driving performance down.

If you scroll down, the **Output Segmentation** section includes a confusion matrix which is useful for our model as it is assigning a class to the prediction.

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/perftab_fraud_overview.png)

Let's scroll down even further to the **Performance Breakdown by Feature**, this section is very useful to uncover low performing cohorts within a feature. Since this model is producing SHAP values for every prediction, Arize is able to use those SHAP values to weight performance within each feature to create a **Performance Impact Score**. By sorting by this score instead of just feature importance or min/max performance, Arize is able to surface to the top, the top features that are attributing to decreased performance.

At the top of the **Performance Breakdown by Feature** list is `STATE` so let's expand this feature. Now we see a list of the inputs to this feature which are a couple of categorical values. By hovering over bar, Arize displays the volume that this input was utilized in predictions and the performance metric for that cohort. 


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/fraud_state_perf.png)

**We can start by comparing the production to training dataset.** This can help answer questions such as "Were we seeing this problem in training?" or "Does my new / previous model version perform better?". It can also be helpful to compare to other windows of production.

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/perf_tab_fraud.gif)



Here, we can identify low performing segments. By looking at performance breakdown by feature, you can dig even deeper to see which segment within each feature of the model is underperforming. In this case, we are filtering on `STATE`=`CA`, we observe `ENTRY_MODE`, which has the highest performance impact, comes to the top.

Some of the immediate insights surfaced from the **Feature Performance Tab** are detailed below. 
1. Worst performing `ENTRY_MODE` includes `credentials_on_file`, followed by `manual`, and `_NULL`
2. Worst performing `MERCHANT_TYPE` includes `Religious Organizations`,`Online Food Delivery`, and `Tax Payments`
3. Worst performing `MERCHANT_ID` includes `FL`, `CA`, and `NY`

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/perf_tab_entry_mode_filteron_ca.png)


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/perf_tab_merchant_type_filteron_Ca.png)

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/perf_tab_merchant_id_filteron_ca.png)

Moreover, we can compare the overall production dataset to other windows of production. Here, we can investigate what the production dataset looks like in a low performing segment. 

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/fraud_perftabx2.gif)


# Step 7. Feature Performance Insights and Data Quality Checks

Using the insights gathered from the **Feature Performance Tab**, we can take action to 1) prevent this type of fraud abuse from going undetected in the future and 2) use our findings to improve our model. As noted previously, the slice (feature/value combination) `ENTRY_MODE: credentials_on_file` was one of the worst performing segments for our model. Navigate back to the **Drift** tab and click into the `ENTRY_MODE` feature under the **Feature Drift** section. 


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/entry_mode.gif)

Upon first glance, we see that `ENTRY_MODE` is starting to drift around day 5. Furthermore, we noice that our training dataset (Baseline distribution) did not have any `_NULL` values. Clicking on the **Data Quality** tab, we can confirm the cardinality of the feature increased on the 5th day and fell back down 6 days later. 


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/entry_mode_drift.png)


![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/data_quality_check.gif)




# Step 8. Model Performance Overview

As we continue to check in and improve our model's performance, we want to be able to quickly and efficiently view all our important model metrics in a single pane. In the same way we set up a **Model Performance Dashboard** to view our model's most important metrics in a single configurable layout. 

Navigate to the **Templates** section on the left sidebar and scroll down to click on the **Scored Model**. From there select your model, features you care to investigate, and positive class `FRAUD`. 

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/model_overview_fraud.gif)

In addition to the default widgets Arize sets up for your dashboard, you can configure custom metrics your team cares about. In only a few clicks I added a few widgets to give me a single glance view of my model's **Accuracy**, **False Positive Rate**, and **False Negative Rate** as standalone statistics widgets. To visualize these metrics over time I also created a timeseries widget and overlayed three plots showcase the fluctuation of my metrics over time. 

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/fraud_model_overview.png)

# Step 9. Business Metrics 

Sometimes, we need metrics other than traditional statistical measures to define model performance. Business Impact is a way to measure your score model's Payoff at different thresholds (i.e, decision boundary for a score model). The Arize platform allows you to enter custom formulas, mapping model performance to your definition of model performance. Navigate to the **Business Impact** tab to set up a custom formula used to calculate the business impact of our model's performance to the overall profit/loss of your company. For example, the diagram below estimates the profit/loss of a decision made by our model. 
![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/bizimpact1.png)

Capturing this in our **Business Impact** tab visualizes the profit/loss based on our model's prediction threshold for fraud classification. 

![image](https://storage.googleapis.com/arize-assets/fixtures/CC_FRAUD/business_metrics.png)

Visualize the potential profit/loss based on these weighted decision values in Arize's Business Impact tab. Understand your business's overall profit/loss based on your model's prediction threshold for fraud classification.

# 📚 Conclusion 
In this walkthrough we've shown how Arize can be used to log prediction data for a model, pinpoint model performance degradations, set up monitors to proactively catch future issues, create dashboards for at a glance model understanding, and calculate business impact metrics through custom formulas. 

We completed the following tasks: 
1. Uploaded data from a credit card transaction fraud model 
2. Set up a model baseline (Training V1.0) and managed Performance, Data Quality, and Drift monitors
3. Created a Feature Performance Heatmap to surface low performing cohorts impacting our model which helped narrow down our undetected fraudulent transactions to certain `ENTRY_MODE`s and `MERCHANT_TYPE`s coming from a few specific `STATE`s.
4. Identified correlations between our model's degrading performance with individual feature drift and distribution variance
5. Created a model Performance Dashboard to visualize important metrics at a glance with custom timeseries metrics for ongoing analysis 
6. Captured our potential profit/loss based on our model's classification threshold using the Business Impact tab

We identified the following areas of concern: 
1. `_NULL` values for the feature `ENTRY_MODE`. This leads us to believe that we need to upsample this type of feature/value combination or fix an upstream datasource to omit this undetermined value. 
2. Suspicious activity coming from `MERCHANT_TYPE`: `Online Food Delivery`, `Religious Organizations`, `Pawn Shops and Salvage Yards`, `Automotive Tire Sotres`, and  `Tax Payments`, `ENTRY_MODE` : `credential_on_file` and `manual`. Specifically we noticed drift of these categories fluctuating on certain dates and coming from a small subset of `STATES`, namely `CA`, and `NY`. We must look into the POS systems across these states to understand what type of organized fraud is happening from these regions while retraining our model to detect anomalous food delivery and erroneous tax payment transactions. 
3. We also noticed that our model performs poorly on certain `MERCHANT_ID`s when filtered on `STATES`, namely `CA`, and `NY`. Perhaps a new model strategy is needed to form stronger inferences. 

Though we covered a lot of ground, this is just scratching the surface of what the Arize platform can do. We urge you to explore more of Arize, either on your own or through one of our many other tutorials.

# About Arize
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
