# Fraud Detection Model Explainability with SHAP and LIME
In this notebook, I explore model explainability techniques using SHAP (Shapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) to interpret a fraud detection model. Explainability is essential for understanding, trusting, and improving machine learning models, especially in fields like fraud detection, where knowing the `“why”` behind a model's decisions can help identify patterns and unusual behaviors in transactions.

This notebook will cover the following tasks:

* Using SHAP for Explainability:

**Summary Plot**: Provides a global view of feature importance, helping identify which factors most influence fraud predictions across the dataset.

**Force Plot**: Visualizes the contribution of each feature to a specific prediction, making it easier to understand individual instances.

**Dependence Plot**: Shows the relationship between individual features and the model output, highlighting feature interactions.

* Using LIME for Explainability:

**Feature Importance Plot**: Demonstrates the most influential features for a single prediction, giving insights into why a specific transaction was flagged as potentially fraudulent.

In [None]:
# Import neccessary modules
import pandas as pd
from sklearn.model_selection import train_test_split

import os, sys
# Add the 'src' directory to the Python path for module imports
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

from explainability import FraudDetectionExplainer

# Configure logging
from logger import SetupLogger
# Assuming this class is defined in src/
from data_preprocessing import LoadData

logger = SetupLogger(log_file='../logs/model_XAI.log').get_logger()

: 

In [None]:
# Initialize the LoadData class
fraud_data_init = LoadData(filepath='../data/processed/processed_fraud_data.csv', logger=logger)
fraud_data = fraud_data_init.load_dataset().set_index('user_id')

In [None]:
# Convert signup_time and purchase_time to datetime
fraud_data['signup_time'] = pd.to_datetime(fraud_data['signup_time'])
fraud_data['purchase_time'] = pd.to_datetime(fraud_data['purchase_time'])

# Feature engineering: Calculate the time difference between signup and purchase
fraud_data['time_diff'] = (fraud_data['purchase_time'] - fraud_data['signup_time']).dt.total_seconds()

# Drop unnecessary columns
fraud_data = fraud_data.drop(columns=['Unnamed: 0', 'signup_time', 'purchase_time', 'device_id', 'ip_address'])


# Define target and features
X_fraud = fraud_data.drop(columns=['class'])
y_fraud = fraud_data['class']

# Split for Fraud_Data.csv
X_fraud_train, X_fraud_test, y_fraud_train, y_fraud_test = train_test_split(X_fraud, y_fraud, test_size=0.2, random_state=42)


In [None]:
explainer = FraudDetectionExplainer("../best_models/random_forest_fraud_data_best_model.pkl", X_fraud_test)

explainer.explain(instance_idx=0)