<center><img src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="200"/></center>

# Fraud Detection in the eCommerce Industry
Walk through how to use Arize for a fraud detection model using an example dataset.

## 📨 Upload Data to Arize
Upload example data to Arize, this example uses the [Python Pandas method](https://docs.arize.com/arize/sending-data-methods/log-directly-via-sdk-api).

In [None]:
# Install and import dependencies

!pip install -q arize
from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

import pandas as pd
import numpy as np
import datetime

### 🌐 Upload Data to Arize: Download Data
Here are sample parquet files that represent the <strong>training</strong> and <strong>production</strong> data of a model designed to evaluate the probability of a merchant chargeback being fraud based on features such as transaction amount, transaction frequency, shipping address discrepancies, purchase history, CVV verication, and appropriate time frames.


In [None]:
production_df = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/ecommerce_fraud_detection_production.parquet",
)
training_df = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/ecommerce_fraud_detection_training.parquet",
)

In [None]:
# create timestamp columns
current_time = datetime.datetime.now().timestamp()

earlier_time = (
    datetime.datetime.now() - datetime.timedelta(days=30)
).timestamp()

optional_prediction_timestamps = np.linspace(
    earlier_time, current_time, num=production_df.shape[0]
)

production_df.insert(
    1, "prediction_ts", optional_prediction_timestamps.astype(int)
)
production_df[["prediction_ts"]].head()

###🤝 Upload Data to Arize: Create Arize Client
Sign up/login to your Arize account <a href="https://app.arize.com/auth/login">here</a>. Find your <a href="https://docs.arize.com/arize/api-reference/arize.pandas/client">Space and API keys</a>. Copy/paste into the cell below.

In [None]:
SPACE_ID = "SPACE_ID"  # update value here with your Space ID
API_KEY = "API_KEY"  # update value here with your API key

arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)

In [None]:
if SPACE_ID == "SPACE_ID" or API_KEY == "API_KEY":
    raise ValueError("❌ CHANGE SPACE_ID AND/OR API_KEY")
else:
    print(
        "✅ Import and Setup Arize Client Done! Now we can start using Arize!"
    )

### 🪵 Upload Data to Arize: Create Schema & Log Training Data
Create your <a href="https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference">model schema</a>.Define the training schema and log the training data to Arize.



In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
training_schema = Schema(
    prediction_label_column_name="predicted_fraud",
    actual_label_column_name="actual_fraud",
    feature_column_names=[
        "transaction_amount",
        "transaction_frequency",
        "customer_behavior",
        "shipping_address_discrepancies",
        "velocity_checks",
        "purchase_history",
        "timeframe_analysis",
        "merchant_specific_data",
    ],
)

# Logging Training DataFrame
training_response = arize_client.log(
    dataframe=training_df,
    model_id="fraud-detection-ecommerce",
    model_version="1.0",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    environment=Environments.TRAINING,
    schema=training_schema,
)

### 🪵 Upload Data to Arize: Create Schema & Log Production Data
Create your <a href="https://docs.arize.com/arize/sending-data-to-arize/model-schema-reference">model schema</a>. Define the production schema and log the production data to Arize.


In [None]:
# Define a Schema() object for Arize to pick up data from the correct columns for logging
production_schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_fraud",
    feature_column_names=[
        "transaction_amount",
        "transaction_frequency",
        "customer_behavior",
        "shipping_address_discrepancies",
        "velocity_checks",
        "purchase_history",
        "timeframe_analysis",
        "cvv_verification",
        "merchant_specific_data",
    ],
)

# Logging Training DataFrame
production_response = arize_client.log(
    dataframe=production_df,
    model_id="fraud-detection-ecommerce",
    model_version="1.0",
    model_type=ModelTypes.BINARY_CLASSIFICATION,
    environment=Environments.PRODUCTION,
    schema=production_schema,
)

## 🏃 Follow 'Success!' Link To Arize
Once you've successfully logged your model to Arize, follow the link to setup monitors, uncover problem areas, and more!

<strong>Note</strong>: It might take a few minutes for all the data to index in Arize, if you don't see all 5000 rows immedieatly, sit back and relax, data is on it's way!

###🔍 In Arize: Model Setup
Now that we can see our model data in Arize, let's get our model setup with some basic configurations.

It's important to note that since we did not send in actuals (ground truth) data with our production model, we will not be able to see performance metrics in the platform, and some graphs may populate as 'No Data'.
* Navigate to the 'Config' tab and select 'Fraud' as the positive class
* On the Config page, click 'Configure Baseline' and pick 'Pre-Production'

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/ecommerce_fraud_setup.png">

### 🔍 In Arize: Monitor Setup

Let's setup some monitors to get alerted when our model deviates from expected behavior. Since we don't have actuals, we can not enable performance monitors. Instead, we'll monitor drift as a proxy for performance.
* Navigate to the 'Monitors' tab and scroll down to 'Drift Monitors'
* Click 'Enable' for both the 'Prediction Drift' and 'Feature Drift' cards.

Drift will measure how your production model compares to your baseline dataset. In our case, the training dataset. These monitors will alert us if our predictions deviate, and if our features deviate from expected values.

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/ecommerce_fraud_monitor.png">

### 📈 In Arize: Drift Tracing
Now, let's explore our features and prediction drift to better understand how our production data compares to training.
* Navigate to the 'Drift Tracing' tab
* Click on areas of high drift and scroll down to the 'Distribution Comparison' chart  

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/ecommerce_fraud_drift.png">

* Scroll down to the 'Drift Breakdown' card, the features are sorted by highest drift to lowest drift. It looks like CVV Verication has drifted signifigantly.
* Click on the feature to analyze it's individual drift. It looks like our training data is missing any CVV information, which is a signifigant issue!

Use this information to build/rebuild your CVV feature to emcompass new fraud techniques to out-smart the frausters!

<image src="https://storage.googleapis.com/arize-assets/fixtures/Industry_Use_Case/ecommerce_fraud_feature.png">

## 🚀 Continue Exploring Arize
This tutorial just scratches the surface of what Arize can do. Continue to explore the world of ML Observability with Arize to monitor, troubleshoot, and fine tune your models!

<strong>Recommended Resources:</strong>
* [Arize Community Slack](https://join.slack.com/t/arize-ai/shared_invite/zt-1is2wp3xv-SQgwwszCEeS06Sm1q4xFFw)
* [Arize Documentation](https://docs.arize.com/arize/)
* [ML Observability Course](https://courses.arize.com/)