<img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/>

# Getting Started with the Arize Platform - Credit Card Fraud in the Banking Industry

**You are responsible for a credit card fraud model at a large bank or payment processing company. You have been alerted by a spike in credit card chargebacks leading you to suspect that fraudsters are getting away with commiting fraud undetected!** Realizing that this flaw in your model's performance has a heavy cost on your company and customers, you understand the need for a powerful toolset to troubleshoot (and prevent) costly model degradations. You turn to Arize to find out what changed in your credit card fraud detection model and how you can improve it.

In this walkthrough, we are going to investigate your production credit card fraud model. We will validate degradation in model performance, troubleshoot the root cause, and furthermore set up proactive monitors to mitigate the impact of future degradations.  

We will set up monitors to practively identify when our fraud model is not perfoming as expected, troubleshoot why we're seeing this deviation in production, and come up with actionable steps to improve the model.**

You manage the credit card fraud detection model for a large bank or payment processing company. One day you're alerted to a huge spike in credit card chargebacks —fraudsters are getting away with using people's credit cards without being detected! 

Our steps to resolving this issue will be:
1. Get our model onto the Arize platform to investigate
2. Setup a performance dashboard to look at prediction performance
3. Understand where the model is underperforming
4. Discover the root cause of why a slice (grouping) of predictions is underperforming
5. Set up pro-active monitoring to mitigate the impact of such degradations in the future

The production data contains 1 month of data where 2 main issues exist. You will work on identifying these issues over the course of this exercise.

1. An upstream data source has introduced bad (null) values for ENTRY_MODE 
2. The model is missing fraud for certain merchant types, especially in certain regions. 

# Step 0. Setup and Getting the Data

The first step is to load our preexisting dataset which includes training and production environments for our creditcard fraud model. Using a preexisting dataset illustrates how simple it is to get started with the Arize platform.

## Install Dependencies and Import Libraries 📚

In [None]:
!pip install arize -q
!pip install tables --upgrade -q

from arize.utils.types import ModelTypes, Environments
from arize.pandas.logger import Client, Schema

import pandas as pd
import datetime, uuid, tempfile
from datetime import timedelta




## **🌐 Download the Data**
In this walkthrough, we’ll be sending real historical data (with privacy conscious changes to feature names and values). Note, that while feature names and values are made explicit in this dataset, you can achieve the same level of ML Observability using obfuscated features. 



## Features Description 

1. **MEAN_AMOUNT**: mean transaction amount for given card
2. **STD_AMOUNT**: standard deviation from mean transaction amount 
4. **STATE**: US state where the terminal/POS system used for the transaction resides 
5. **MERCHANT_TYPE**: merchant classification category 
6. **ENTRY_MODE**: mode of card entry for the transaction 
7. **TX_AMOUNT**: transaction amount
8. **TX_ID**: unique transaction identifier
9. **TX_TIME**: transaction timestamp 
10. **VISA_SCORE**: Visa fraud scoring service for merchants 
11. **MASTERCARD_SCORE**: Mastercard fraud scoring service for merchants 
12. **AMEX_SCORE**: AmEx fraud scoring service for merchants 
13. **PREDICTION**: predicted fraud or not fraud
14. **PREDICTION_SCORE**: predicted probability of fraud/not fraud
15. **ACTUAL**: actual value, transaction was fraud or not fraud
16. **ACTUAL_SCORE**: actual probability of a click




For this historical evaluation case, the best approach to send data into Arize is via the [Python SDK pandas logger](https://docs.arize.com/arize/api-reference/python-sdk/arize.pandas). Therefore, you will need to have a Pandas dataframe for each dataset/environment. 

We have **2 Environments: training and production**. Training and production are two different datasets that correspond to their respective parts of the training/production pipeline. We download each of them, storing them in a dictionary `datasets` for later use. 

In [None]:
environments = ['training', 'production']
datasets = {}

for environment in environments:

  filepath = 'https://storage.googleapis.com/arize-assets/tutorials/use-cases/' + 'creditcard-fraud-' + environment + '.csv'  

  # Create the dataframe and store in dictionary
  datasets[environment] = pd.read_csv(filepath)
  
print("✅ Dependencies installed and data successfully downloaded!")

## Inspect the Data 

Take a quick look at the dataset. The data represents a model designed and trained to evaluate the probability of a credit card transaction being fraud based on various features such as merchant_type, mean_amount, transaction amount, entry_mode, etc. The dataset contains one month of data and the performance will be evaluated by comparing:

*   **PREDICTION**: The probability of a fraud transaction predicted by the model 
*   **ACTUAL**: Fraud or not fraud based on ground truth collected by credit card users



In [None]:
datasets['training'][:10]

# Step 1. Sending Data into Arize 💫

Now that we have our dataset imported, we are ready to integrate into Arize. We do this by logging (sending) important data we want to analyze to the platform. There, the data will be easily visualized and troubleshooting workflows will help us find the source of our problem.

For our model, we are going to log:
*   feature data
*   predictions
*   actuals

## Import and Setup Arize Client

The first step is to setup our Arize client. After that we will log the data.

First, copy the Arize `API_KEY` and `ORG_KEY` from your admin page linked below! Copy those over to the set-up section. We will also be setting up some metadata to use across all logging.

[![Button_Open.png](https://storage.googleapis.com/arize-assets/fixtures/Button_Open.png)](https://app.arize.com/admin)


In [None]:
# ORGANIZATION_KEY = "ORGANIZATION_KEY"
# API_KEY = "API_KEY"

ORGANIZATION_KEY = "ORGANIZATION_KEY"
API_KEY = "API_KEY"

arize_client = Client(organization_key=ORGANIZATION_KEY, api_key=API_KEY)

model_id = "fraud-demo-model"  # This is the model name that will show up in Arize
model_version = "v1.0"  # Version of model - can be any string

if ORGANIZATION_KEY == "ORGANIZATION_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE ORGANIZATION AND/OR API_KEY")
else:
    print("✅ Arize setup complete!")

## Log Training & Production Data to Arize 

Now that our Arize client is setup, let's go ahead and log all of our data to the platform. For more details on how **`arize.pandas.logger`** works, visit out documentations page below.

[![Buttons_OpenOrange.png](https://storage.googleapis.com/arize-assets/fixtures/Buttons_OpenOrange.png)](https://docs.arize.com/arize/sdks-and-integrations/python-sdk/arize.pandas)

Key parameters:

*   **prediction_label_column_name**: tells Arize which column contains the predictions
*   **actual_label_column_name**: tells Arize which column contains the actual results from field data
*   **preidction_score_column_name**: tells Arize which column contains the prediction score from field data
*   **actual_label_column_name**: tells Arize which column contains the actual results from field data
*   **actual_score_column_name**: tells Arize which column contains the actual score from field data

Given that our model is predicting between categories, we will use [ModelTypes.SCORE_CATEGORICAL](https://docs.arize.com/arize/product-guides-1/models/model-types) to perform this analysis.



## Log Training

In [None]:
## send training data
df_training = datasets['training']

df_training["prediction_ts"] = int(datetime.datetime.now().date().strftime("%s"))
df_training["prediction_id"] = [str(uuid.uuid4()) for _ in range(df_training.shape[0])]

features = [
    'MERCHANT_TYPE',
    'ENTRY_MODE',
    'STATE',
    'MEAN_AMOUNT',
    'STD_AMOUNT',
    'TX_AMOUNT',
    'VISA_RISK_SCORE',
    'MASTERCARD_RISK_SCORE',
    'AMEX_RISK_SCORE',
]

# Define a Schema() object for Arize to pick up data from the correct columns for logging
training_schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names=features
)

# Logging Training DataFrame
training_response = arize_client.log(
    dataframe=df_training,
    path="inferences.bin",
    model_id=model_id,
    model_version=model_version,
    batch_id=None,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.TRAINING,
    schema=training_schema,
)

# If successful, the server will return a status_code of 200
if training_response.status_code != 200:
    print(f"logging failed with response code {training_response.status_code}, {training_response.text}")
else:
    print(f"✅ You have successfully logged training set to Arize")

## Log Production Data to Arize


In [None]:
# changing dates for ease of visualization / to mimic recent produciton dataset 
END_DATE = datetime.date.today().strftime('%Y-%m-%d')
START_DATE = (datetime.date.today() - timedelta(31)).strftime('%Y-%m-%d')

df_production = datasets['production']

def _setPredictionIDandTime (df, start, end):
    out_df = pd.DataFrame()
    dts = pd.date_range(start, end).to_pydatetime().tolist()
    for dt in dts:
        day_df = df.loc[df["day_of_month"] == dt.day].copy()
        day_df["prediction_ts"] = int(dt.strftime("%s"))
        out_df = pd.concat([out_df, day_df], ignore_index=True)
    out_df["prediction_id"] = [str(uuid.uuid4()) for _ in range(out_df.shape[0])]
    return out_df.drop(columns="day_of_month")

df_production = _setPredictionIDandTime(df_production, START_DATE, END_DATE)

production_schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names=features
)

production_response = arize_client.log(
    dataframe=df_production,
    path="inferences.bin",
    model_id=model_id,
    model_version=model_version,
    batch_id=None,
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    schema=production_schema,
)

if production_response.status_code != 200:
    print(f"logging failed with response code {production_response.status_code}, {production_response.text}")
else:
    print(f"✅ You have successfully logged production set to Arize")

# Step 2. Confirm Data in Arize ✅

Note that the Arize performs takes about 10 minutes to index the data. While the model should appear immediately, the data will not show up until the indexing is complete. Feel free to go grab a cup of coffee as Arize works its magic! ☕🔮

**⚠️ DON'T SKIP:**
In order to move on to the next step, make sure your actuals and training/production sets are loaded into the platform. To check:
1. Navigate to models from the left bar, locate and click on model **credit-card-fraud**
2. On the **Overview Tab**, make sure you can see Predictions and Actuals under the **Model Health** section. Once production actuals have been fully recorded on Arize, the row title will change from **0 Actuals** to **Actuals** with summary statistics such as cardinality listed in the tables.
3. Verify the list of **Features** below **Actuals**.

![image.png](https://storage.googleapis.com/arize-assets/tutorials/use-cases/checkdata.gif)

# Step 3. Set up Model Baseline & Managed Monitors

Now that our data has been logged into the [Arize platform](https://app.arize.com/) we can begin our investigation into our poorly performing fraud detection model. 

Arize will guide you through setting up a **Baseline** (reference environment for comparison) and automatically create **Monitors** for your model in just a few clicks —just follow the blue banner at the top of the page titled "Finish setting up your model". 

![image.png](https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/images/initial_setup_banner.png)

Arize can automatically configure monitors that are best suited to your data. From the banner at the top of the screen, select the following configurations after clicking the 'Set up Model' button: 

1. Datasets: `Training Version 1.0`
2. Default Metric: `False Negative Rate`, Trigger Alert When: `False Negative Rate is above .2`, Positive Class: `FRAUD`
3. Turn On Monitoring: Drift ✅, Data Quality ✅, Performance ✅ 

You will now see that the baseline has been set and **Drift**, **Data Quality**, and **Performance** monitors have been created!!! 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/setup.gif)

To prevent expensive chargebacks from getting by our models undetected Arize automatically set up monitors to ensure our model is flagged if it Performance, Data Quality, or Drift spikes above a certain threshold/before it becomes a major issue. You can see filter monitors by category, edit evaluation windows, thresholds, etc. and create custom monitors by visiting the **Monitors** tab.


# Step 4. Exploring Drift 

Selecting the **Drift** tab we notice that despite the increase in credit card chargebacks, we do not see any noticeable model prediction drift in production compared to our training dataset (baseline).  
It's important to note that while we do not see any prediction drift, we do notice an increase in false negative rate at around day 5 and 13. This leads us to believe that our model is missing fraud that it has not seen in training. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/fnr.png)



Let's take a look at our actual ground truth data to see if it varies from what we expect to see based on our training set... Navigate to the model **Overview** page and under **Actuals** click on **actual class**. Immediately we see that the actual data we are able to collect is drifting significantly in waves (fraud pattern?) from our training set. By taking a look at the **Distribution Comparison** beneath the **Drift over Time** widget we notice that in our training dataset (Baseline Distribution) we expect to see 1% `FRAUD` whereas in our production dataset (Current Distribution) we see highs of almost 10% `FRAUD`! 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/actualdrift.png)

This confirms our hypothesis that fraud is getting through our model undetected! 


# Step 5. Setting up a proactive custom monitor 


![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/fnrmonitor.png)

# Step 6. Analyzing Root Cause 

Arize facilitates troubleshooting which features and more specifically slices (feature/value combinations) could be the culprit of our model's performance degradation. Arize provides a number of flexible dashboard templates for users to get started in creating custom layouts for surfacing model insights. Let's see how in just a few clicks we can quickly identify which features could be causing our model to miss these fraudulent transactions. Navigate to the **Templates** section on the left sidebar and scroll down to click on the **Feature Performance Heatmap**. From there select your model, select the features you care to investigate, metric `False Negative Rate`, and positive class `FRAUD`. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/perfheatmap.gif)

Arize automatically discovers and surfaces model performance issues across all features and various feature/value combinations. Visual indicators facilitate drill down analysis of the most problematic slices affecting your overall model performance. 

Some of the immediate insights surfaced from our **Feature Performance Heatmap** are detailed below. 
1. Worst performing `ENTRY_MODE` includes `_NULL`, followed by `ecommerce`, and `credentials_on_file`
2. Worst performing `MERCHANT_TYPE` includes `Online Food Delivery` and `Tax Payments`
3. Worst performing `STATE` includes `FL`, `CA`, and `NY`

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/1heatmap.png)
![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/Screen%20Shot%202021-10-03%20at%206.50.16%20PM.png)

Additionally we notice that the model's false negative rate is worst when the features `RISK_SCORE_*` are between `36` and `70`. Note the risk features are measurements of risk based various payment processing merchants, i.e. low risk score = low probability of fraud, high risk score = high probability of fraud. Hence, this mentally makes sense as our model is getting confused with risk scores that fall somewhere in the middle as shown below. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/riskscore.png )

We can further dive into a performance analysis of our model by filtering on various low performing feature/value slices. This will help determine the data needed to upsample and retrain our model to improve performance on these segments. For example, by adding a filter in our **Feature Performance Heatmap** we can narrow down where this type of undetected fraud is originating from and through what entry modes it is most prominent. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/feature%20filtesr%20.png)

In the case of `Tax Payments`, we notice that most undetected fraud is originating from a `_NULL` entry mode type and especially prominent in `NY` and `FL`. 

# Step 7. Feature Performance Insights --> Actions 

Using the insights gathered from our **Feature Performance Heatmap**, we can take action to 1) prevent this type of fraud abuse from going undetected in the future and 2) use our findings to improve our model. As noted previously, the slice (feature/value combination) `ENTRY_MODE: _NULL` was one of the worst performing segments for our model. Navigate back to the model **Overview** page and click into the `ENTRY_MODE` feature under the **Features** section. 

Upon first glance, we see that `ENTRY_MODE` is starting to drift around day 15. Furthermore, we noice that our training dataset (Baseline distribution) did not have any `_NULL` values. Clicking on the **Data Quality** tab, we can confirm the cardinality of the feature increased on the 14th day and fell back down 15 days later. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/entrymode.png)

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/dquality.png)

Similarly, by navigating to `MERCHANT_TYPE` we notice significant drift across training and production distributions across `Online Food Delivery` and `Tax Payments`. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/merch.png)

This troubleshooting flow has lead us to understand that retraining the model on these slices of feature/value combinations (namely `ENTRY_MODE`:`_NULL` and `MERCHANT_TYPE`: `Online Food Delivery` and `Tax Payments`) could significantly improve our model's False Negative Rate performance. Additionally, we can take offline action to terminate the POS terminals in `FL`, `CA`, and `NY` where this spike of fraud is originating from. 

# Step 8. Model Performance Overview

As we continue to check in and improve our model's performance, we want to be able to quickly and efficiently view all our important model metrics in a single pane. In the same way we set up a **Feature Performance Heatmap** we will create a **Model Performance Dashboard** to view our model's most important metrics in a single configurable layout. 

Navigate to the **Templates** section on the left sidebar and scroll down to click on the **Scored Model**. From there select your model, features you care to investigate, and positive class `FRAUD`. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/scoreperfoverview.gif)

In addition to the default widgets Arize sets up for your dashboard, you can configure custom metrics your team cares about. In only a few clicks I added a few widgets to give me a single glance view of my model's **Accuracy**, **False Positive Rate**, and **False Negative Rate** as standalone statistics widgets. To visualize these metrics over time I also created a timeseries widget and overlayed three plots showcase the fluctuation of my metrics over time. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/customchat.png)


# Step 9. Business Metrics 
Sometimes, we need metrics other than traditional statistical measures to define model performance. Business Impact is a way to measure your score model's Payoff at different thresholds (i.e, decision boundary for a score model). The Arize platform allows you to enter custom formulas, mapping model performance to your definition of model performance. Navigate to the **Business Impact** tab to set up a custom formula used to calculate the business impact of our model's performance to the overall profit/loss of your company. For example, the diagram below estimates the profit/loss of a decision made by our model. 
![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/bizimpact1.png)

Capturing this in our **Business Impact** tab visualizes the profit/loss based on our model's prediction threshold for fraud classification. 

![image](https://storage.googleapis.com/arize-assets/tutorials/use-cases/bizimpactform.png)

# 📚 Conclusion 
In this walkthrough we've shown how Arize can be used to log prediction data for a model, pinpoint model performance degradations, set up monitors to proactively catch future issues, create dashboards for at a glance model understanding, and calculate business impact metrics through custom formulas. 

We completed the following tasks: 
1. Uploaded data from a credit card transaction fraud model 
2. Set up a model baseline (Training V1.0) and managed Performance, Data Quality, and Drift monitors
3. Created a Feature Performance Heatmap to surface low performing cohorts impacting our model which helped narrow down our undetected fraudulent transactions to certain `ENTRY_MODE`s and `MERCHANT_TYPE`s coming from a few specific `STATE`s.
4. Identified correlations between our model's degrading performance with individual feature drift and distribution variance
5. Created a model Performance Dashboard to visualize important metrics at a glance with custom timeseries metrics for ongoing analysis 
6. Captured our potential profit/loss based on our model's classification threshold using the Business Impact tab

We identified the following areas of concern: 
1. `_NULL` values for the feature `ENTRY_MODE`. This leads us to believe that we need to upsample this type of feature/value combination or fix an upstream datasource to omit this undetermined value. 
2. Suspicious activity coming from `MERCHANT_TYPE`: `Online Food Delivery` and `Tax Payments`. Specifically we noticed drift of these two categories fluctuating on certain dates (weekends?) and coming from a small subset of `STATES`, namely `FL`, `CA`, and `NY`. We must look into the POS systems across these states to understand what type of organized fraud is happening from these regions while retraining our model to detect anomalous food delivery and erroneous tax payment transactions. 
3. We also noticed that our model performs poorly on transactions where the `RISK_SCORE_*` feature values falls between 35-70. Perhaps a new model strategy is needed to form stronger inferences based on these closely related median risk score values. 

Though we covered a lot of ground, this is just scratching the surface of what the Arize platform can do. We urge you to explore more of Arize, either on your own or through one of our many other tutorials.

# About Arize
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with:
- Automated ML monitoring and model monitoring
- Workflows to troubleshoot model performance
- Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring
- Model prediction cohort analysis
- Pre-deployment model validation
- Integrated model explainability

### Website
Visit Us At: https://arize.com/model-monitoring/

### Additional Resources
- [What is ML observability?](https://arize.com/what-is-ml-observability/)
- [Playbook to model monitoring in production](https://arize.com/the-playbook-to-monitor-your-models-performance-in-production/)
- [Using statistical distance metrics for ML monitoring and observability](https://arize.com/using-statistical-distance-metrics-for-machine-learning-observability/)
- [ML infrastructure tools for data preparation](https://arize.com/ml-infrastructure-tools-for-data-preparation/)
- [ML infrastructure tools for model building](https://arize.com/ml-infrastructure-tools-for-model-building/)
- [ML infrastructure tools for production](https://arize.com/ml-infrastructure-tools-for-production-part-1/)
- [ML infrastructure tools for model deployment and model serving](https://arize.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving/)
- [ML infrastructure tools for ML monitoring and observability](https://arize.com/ml-infrastructure-tools-ml-observability/)

Visit the [Arize Blog](https://arize.com/blog) and [Resource Center](https://arize.com/resource-hub/) for more resources on ML observability and model monitoring.
