<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Fraud Detection in the Financial Services Industry
Walk through how to use Arize for a fraud detection model using an example dataset.

## 📨 Upload Data to Arize
Upload example data to Arize, this example uses the [Python Pandas method](https://docs.arize.com/arize/sending-data-methods/log-directly-via-sdk-api).

In [None]:
# Install and import dependencies

!pip install -q arize
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, Metrics

import pandas as pd
import numpy as np
import uuid

import datetime
from datetime import timedelta

### 🌐 Upload Data to Arize: Download Data
Here are sample csv files that represent the <strong>training</strong> and <strong>production</strong> data of a model designed to evaluate the probability of a credit card transtion being fraud based on features such as `merchant_type`, `mean_amount`, `transaction_amount`, `entry_mode`, etc.


In [None]:
production = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/Tags-Demo-Data/credit_card_fraud_production.csv",
)
train = pd.read_csv(
    "https://storage.googleapis.com/arize-assets/fixtures/Tags-Demo-Data/credit_card_fraud_training.csv",
)

### 🤝 Upload Data to Arize: Create Arize Client
Sign up/login to your Arize account <a href="https://app.arize.com/auth/login">here</a>. Find your <a href="https://docs.arize.com/arize/api-reference/arize.pandas/client">Space and API keys</a>. Copy/paste into the cell below.

In [None]:
SPACE_ID = "SPACE_ID"  # update value here with your Space ID
API_KEY = "API_KEY"  # update value here with your API key

arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)

In [None]:
if SPACE_ID == "SPACE_ID" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE_ID AND/OR API_KEY")
else:
    print(
        "✅ Import and Setup Arize Client Done! Now we can start using Arize!"
    )

### 📋 Upload Data to Arize: Define Schema
Create your <a href="https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference">model schema</a>. First, we'll define the features, shap values, and tags.  

In [None]:
feature_column_names = [
    "MERCHANT_TYPE",
    "MERCHANT_ID",
    "ENTRY_MODE",
    "STATE",
    "MEAN_AMOUNT",
    "STD_AMOUNT",
    "TX_AMOUNT",
    "VISA_RISK_SCORE",
    "MASTERCARD_RISK_SCORE",
    "AMEX_RISK_SCORE",
]
shap_column_names = [f"{x}_shap" for x in feature_column_names]
tag_column_names = ['Dependents', 'Partner', 'EmploymentStatus', 'LocationCode', 'Education', 'Gender', 'Age']

### 🪵 Upload Data to Arize: Log Training Data
Define the training schema and log the training data to Arize.



In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
training_schema = Schema(
    prediction_id_column_name="prediction_id",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names= feature_column_names,
    tag_column_names = tag_column_names
)
# Logging Training DataFrame
training_response = arize_client.log(
    dataframe=train,
    model_id="fraud-detection-financial-services",
    model_version="1.0",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    environment=Environments.TRAINING,
    schema=training_schema,
)

### 🪵 Upload Data to Arize: Log Production Data
Define the production schema and log the production data to Arize.


In [None]:
# changing dates for ease of visualization / to mimic recent produciton dataset
END_DATE = datetime.date.today().strftime("%Y-%m-%d")
START_DATE = (datetime.date.today() - timedelta(31)).strftime("%Y-%m-%d")


def setPredictionIDandTime(df, start, end):
    out_df = pd.DataFrame()
    dts = pd.date_range(start, end).to_pydatetime().tolist()
    for dt in dts:
        day_df = df.loc[df["day"] == dt.day].copy()
        day_df["prediction_ts"] = int(dt.strftime("%s"))
        out_df = pd.concat([out_df, day_df], ignore_index=True)
    out_df["prediction_id"] = [str(uuid.uuid4()) for _ in range(out_df.shape[0])]
    return out_df.drop(columns="day")


production = setPredictionIDandTime(production, START_DATE, END_DATE)

production_schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names=feature_column_names,
    shap_values_column_names=dict(zip(feature_column_names, shap_column_names)),
    tag_column_names = tag_column_names,
)

production_response = arize_client.log(
    dataframe=production,
    model_id="fraud-detection-financial-services",
    model_version="1.0",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    environment=Environments.PRODUCTION,
    schema=production_schema,
)

## 🏃 Follow 'Success!' Link To Arize
Once you've successfully logged your model to Arize, follow the link to setup monitors, uncover problem areas, and more!

<strong>Note</strong>: It might take a few minutes for all the data to index in Arize, if you don't see all 5000 rows immedieatly, sit back and relax, data is on it's way!

### 🔍 In Arize: Model Setup
Now that we can see our model data in Arize, let's get our model setup with some basic configurations.
* Navigate to the 'Config' tab. Select 'Fraud' as the positive class and set 'Accuracy' as the default metric.
* Click 'Configure Baseline' and select 'Pre-Production'

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/financial_services_fraud_setup.png" width=800px>

### 🔍 In Arize: Monitor Setup

Let's setup a monitor to get alerted when our model deviates from expected behavior.
* Navigate to the 'Monitors' tab and click 'Enable' on the 'Accuracy' card. Since the consequences are severe when predicting cases that miss fraud, enable the 'False Negative Rate' card.

Scroll through the list of other metrics and monitor types, enable a few that seem interesting!

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/financial_services_fraud_monitor.png" width=800px>

### 📈 In Arize: Drift Tracing
Now, let's take a look at the 'Drift Tracing' tabs to identify areas to improve and better understand the impact of each feature on our model performance. Since fraud models typically recieve delayed ground truth data, looking at model and feature drift helps identify model issues before you recieve performance information.

Drift compares our current dataset against a baseline, in this case, our production data.

* Navigate to the 'Drift Tracing' tab



<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/financial_services_fraud_drift.png" width=800px>

### 📈 In Arize: Drift Tracing

Notice the spike in drift, click on a datapoint on the graph to uncover more information about drift turing that time.

* Scroll down to the 'Drift Breakdown' card, it looks like 'TX_AMOUNT' sgnifigantly contributes to the spike in drift
* Click on 'TX_AMOUNT' to uncover more about the feature drift

Within the feature 'TX_AMOUNT', the specific slice '140-222' has way more data in production than in training, this can indicate evolving trends and areas to focus on for feature engineering.

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/financial_services_fraud_inspect.png" width=800px>

### 📈 In Arize: Performance Tracing

Since our model has actuals (delayed ground truth) data, take a look at the 'Performance Tracing tab'.

Arize will surface the features that negatively impact your prediction performance the most. Visualize how each component within a given feature impacts your model.

* Navigate to the 'Performance Tracing' tab
* Scroll down to the 'Performance Insights' card, click on the worst performing slices

## 🚀 Continue Exploring Arize
This tutorial just scratches the surface of what Arize can do. Continue to explore the world of ML Observability with Arize to monitor, troubleshoot, and fine tune your models!

<strong>Recommended Resources:</strong>
* [Arize Community Slack](https://join.slack.com/t/arize-ai/shared_invite/zt-1is2wp3xv-SQgwwszCEeS06Sm1q4xFFw)
* [Arize Documentation](https://docs.arize.com/arize/)
* [ML Observability Course](https://courses.arize.com/)