# Fraud Detection Model Evaluation and Monitoring

This notebook provides comprehensive analysis and monitoring of our fraud detection model's performance. We'll examine various metrics, analyze feature importance, and visualize model behavior.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
import shap
import warnings
warnings.filterwarnings('ignore')

## Load Model and Test Data

Let's load our trained model and test data to evaluate performance:

In [None]:
# Load the trained model and preprocessor
model = joblib.load('../models/xgboost_model.joblib')
preprocessor = joblib.load('../models/preprocessor.joblib')

# Load test data
test_data = pd.read_csv('../data/test.csv')

# Preprocess test data
X_test = preprocessor.transform(test_data.drop('fraud', axis=1))
y_test = test_data['fraud']

# Get predictions
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)[:,1]

## Model Performance Metrics

Let's analyze various performance metrics including accuracy, precision, recall, and F1-score:

In [None]:
# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Create confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

## ROC Curve Analysis

The ROC curve helps us understand the trade-off between sensitivity and specificity:

In [None]:
# Calculate ROC curve and AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

## Feature Importance Analysis

Let's examine which features are most important for fraud detection:

In [None]:
# Get feature importance
feature_importance = pd.DataFrame({
    'feature': preprocessor.get_feature_names_out(),
    'importance': model.feature_importances_
})

# Sort by importance
feature_importance = feature_importance.sort_values('importance', ascending=False)

# Plot feature importance
plt.figure(figsize=(12, 6))
sns.barplot(data=feature_importance.head(10), x='importance', y='feature')
plt.title('Top 10 Most Important Features')
plt.xlabel('Feature Importance')
plt.tight_layout()
plt.show()

## SHAP Value Analysis

SHAP (SHapley Additive exPlanations) values help us understand how each feature contributes to individual predictions:

In [None]:
# Calculate SHAP values
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Plot SHAP summary
shap.summary_plot(shap_values, X_test, feature_names=preprocessor.get_feature_names_out())