# **XAI** Methods Implementation: **Credit Card Fraud Detection** 

## Overview

This notebook applies cutting-edge Explainable AI (XAI) techniques to analyze credit card fraud detection models. By implementing SHAP, LIME, and Anchors, we transform "black box" neural networks into transparent, interpretable systems that provide clear explanations for fraud predictions. 

All necessary models, dataset, libraries are imported and should be ready to use, with pre-trained models available in the `architectures/` folder.

- **Github Repos:** [Credit-Card-Transaction-Fraud-Detection-Using-Explainable-AI](https://github.com/ThongLai/Credit-Card-Transaction-Fraud-Detection-Using-Explainable-AI)

- **Run live notebook:** [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ThongLai/Credit-Card-Transaction-Fraud-Detection-Using-Explainable-AI/main?urlpath=%2Fdoc%2Ftree%2FXAI_methods.ipynb)

- **Models: [architectures](https://github.com/ThongLai/Credit-Card-Transaction-Fraud-Detection-Using-Explainable-AI/tree/main/architectures)**<a name="models" id="models"></a>

---

**Author:** Thong Minh Lai 

**Last Updated:** 04/2025

## Description

**What This Notebook Contains:**

- **Pre-trained Black Box Models**: Collection of pre-trained fraud detection models
- **Global Interpretability**: Understanding overall model behavior and feature importance
- **Local Explanations**: Detailed analysis of individual transaction predictions
- **Visual Insights**: Interactive visualizations showing why specific transactions are flagged
- **Comparative Analysis**: Multiple XAI techniques applied to the same predictions

**Key XAI Techniques Implemented**

1. **SHAP (SHapley Additive exPlanations)** 🎲
   - Measures each feature's contribution to predictions using game theory
   - Provides both global importance and transaction-specific explanations

2. **LIME (Local Interpretable Model-agnostic Explanations)** 🍋
   - Creates simple models that approximate complex models locally
   - Shows which features influenced specific fraud predictions

3. **Anchors** ⚓
   - Generates clear IF-THEN rules that explain model decisions
   - Focuses on high-precision, easy-to-understand explanations

**Why This Matters**

Understanding fraud detection models through explainable AI is essential in today's financial landscape. Financial institutions face increasing regulatory pressure for transparency in automated decision-making [[1](https://ijsra.net/content/explainable-ai-financial-technologies-balancing-innovation-regulatory-compliance)], while simultaneously needing to improve model performance to combat sophisticated fraud techniques [[2](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4980350#:~:text=This%20study%20examines%20the,consisting%20of%20284%2C807%20transactions)]. Explainable fraud detection creates a crucial bridge between complex AI systems and human oversight, enabling compliance officers to validate regulatory adherence and fraud analysts to verify and refine AI-flagged transactions with their domain expertise [[3](https://www.researchgate.net/publication/226538138_Trust_and_Stakeholder_Theory_Trustworthiness_in_the_Organisation-Stakeholder_Relationship#:~:text=Trust%20is%20a%20fundamental,stakeholders%20within%20the%20organization–stakeholder)]. This transparency builds stakeholder trust by demonstrating that fraud decisions aren't emerging from an algorithmic "black box" but are based on identifiable, reasonable patterns that can be communicated to customers, auditors, and management. As financial fraud becomes more sophisticated, this human-AI partnership represents the most effective defense, combining the pattern-recognition capabilities of neural networks with human judgment and regulatory compliance requirements.


## Global Setting Variables

In [1]:
MODEL_PATH = 'architectures/'
DATASET_PATH = 'dataset/'
MAKE_PREDICTIONS = False # Make predictions from loaded models, leave as `Fasle` to load predictions from `predictions.csv` instead
RANDOM_SEED = 42 # Set to `None` for the generator uses the current system time.

## Importing the necessary packages

In [2]:
# If you are running on `Binder`, then it is no need to set up the packages again
# %pip install -r requirements.txt

# ---OR---

# %pip install tensorflow==2.10.1 numpy==1.26.4 pandas scikit-learn imblearn matplotlib seaborn requests shap lime anchor-exp dice_ml

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

# XAI
import shap
from lime import lime_tabular
from anchor import anchor_tabular
import dice_ml

import os
import time
import utils

np.random.seed(RANDOM_SEED)

## Import test dataset and process data

### Download dataset

In [None]:
utils.download_dataset_from_kaggle('fraudTrain.csv')
utils.download_dataset_from_kaggle('fraudTest.csv')

### Read data

In [None]:
data_train = pd.read_csv(os.path.join(DATASET_PATH, 'fraudTrain.csv'), index_col=0)
data_test = pd.read_csv(os.path.join(DATASET_PATH, 'fraudTest.csv'), index_col=0)

### Process data

In [None]:
data_train = utils.feature_engineering(data_train)
X_train, y_train, data_train, transformations_train = utils.pre_processing(data_train)

data_test = utils.feature_engineering(data_test)
X_test, y_test, data_test, transformations_test = utils.pre_processing(data_test, isTestSet=True)

## Import pre-trained models and get predictions

In [None]:
models = utils.load_models()

In [None]:
if MAKE_PREDICTIONS:
    predictions = pd.DataFrame()
    
    for model_name, model in models.items():
        y_predict = model.predict(X_test)
        predictions[model_name] = y_predict.flatten()

        # Save predictions
        utils.save_predictions(model_name, y_predict)
else:
    predictions = pd.read_csv(os.path.join(DATASET_PATH, 'predictions.csv'))

In [None]:
# Temporary
model_name = predictions.keys()[0]
y_predict = predictions[model_name]
y_predict_binary = np.round(y_predict).astype(int).squeeze()
model = models[model_name]

## Credit Card Fraud Dataset Fields

| Field Name | Description |
|------------|-------------|
| **trans_date_trans_time** | Date and time when transaction occurred |
| **cc_num** | Credit card number of customer |
| **merchant** | Name of merchant where transaction occurred |
| **category** | Category of merchant (e.g., retail, food, etc.) |
| **amt** | Amount of transaction |
| **first** | First name of credit card holder |
| **last** | Last name of credit card holder |
| **gender** | Gender of credit card holder |
| **street** | Street address of credit card holder |
| **city** | City of credit card holder |
| **state** | State of credit card holder |
| **zip** | ZIP code of credit card holder |
| **lat** | Latitude location of credit card holder |
| **long** | Longitude location of credit card holder |
| **city_pop** | Population of credit card holder's city |
| **job** | Occupation of credit card holder |
| **dob** | Date of birth of credit card holder |
| **trans_num** | Transaction number |
| **unix_time** | UNIX timestamp of transaction |
| **merch_lat** | Latitude location of merchant |
| **merch_long** | Longitude location of merchant |
| **is_fraud** | Target class indicating whether transaction is fraudulent (1) or legitimate (0) |

## XAI Methods

### SHAP 🎲

#### Get SHAP values

Using `DeepExplainer`(specifcifically for neural networks)

**For `DeepExplainer`, we need to create a `background` dataset**: This is because deep neural networks are complex and non-linear, so they require reference points (background samples) to understand how the model normally behaves and accurately calculate feature importance.

Passing the entire training dataset as data will give very accurate expected values, but be unreasonably expensive. The variance of the expectation estimates scale by roughly `1/sqrt(N)` for `N` background data samples.

So 100 samples will give a good estimate, and 1000 samples a very good estimate of the expected values.

In [None]:
# All features
features = data_test.columns.drop('is_fraud').tolist()

# Collect all categorical features
categorical_features = list(data_test.select_dtypes(include=['bool', 'category', 'object']).columns)

# Collect all misclassified entries (For later explaination on why the model predicted them incorrectly)
misclassified_indices = np.where(y_test != y_predict_binary)[0]
print(f"Found {len(misclassified_indices)} misclassified instances")

In [None]:
def SHAP(model, X_train, X_test, from_idx, to_idx, background_size=100):
    X_train = np.expand_dims(X_train, axis=-1) if X_train.shape[-1] != 1 else X_train
    X_test = np.expand_dims(X_test, axis=-1) if X_test.shape[-1] != 1 else X_test

    background = X_train[np.random.choice(len(X_train), background_size, replace=False)]

    explainer = shap.DeepExplainer(model, background)

    shap_values = explainer.shap_values(X_test[from_idx:to_idx+1]) # Deep learning models expect 2D input arrays (samples × features), X_test[idx] only returns a 1D array (shape: (n_features,)

    shap_values = shap_values.squeeze()[np.newaxis, ...] if shap_values.shape[0] == 1 else shap_values.squeeze()

    return explainer, shap_values

#### Global Interpretability (whole test set)

In [None]:
# Call the function to obtain SHAP values.
from_idx = misclassified_indices[0]
to_idx = misclassified_indices[0]
# to_idx = len(X_test)-1

explainer, shap_values = SHAP(model, X_train, X_test, from_idx, to_idx, background_size=100)
shap_values.shape

In [None]:
def get_top_n_features(shap_values, features, n=10):
    mean_shap_values = np.abs(shap_values).mean(axis=0)

    df_shap = pd.DataFrame({
        'feature': features,
        'mean_abs_shap': np.squeeze(mean_shap_values)
    }).set_index('feature')

    df_shap = df_shap.reindex(df_shap['mean_abs_shap'].abs().sort_values(ascending=False).index)

    # Get top n features
    n = 10
    top_n_features = list(df_shap.head(n).index)

    display(df_shap.head(n))

    return top_n_features

top_n_features = get_top_n_features(shap_values, features, n=10)

#### Visualization
[SHAP Plots Explained](https://www.youtube.com/playlist?list=PLpoCVQU4m6j9HDOzRBL4nX4eol9DrZ3Kd)

#### Summary Plot

In [None]:
shap.summary_plot(shap_values, X_test[from_idx:to_idx+1], features)

#### Force Plot

[How to use Shapley Additive Explanations for Black Box Machine Learning Algorithms](https://www.youtube.com/watch?v=7wnG6Wnm2uU)

In [None]:
# Plot feature contributions for a prediction
shap.initjs()
baseline = explainer.expected_value.numpy()

shap.force_plot(baseline, shap_values, data_test.loc[y_test.index].drop('is_fraud', axis=1).iloc[from_idx:to_idx+1], features)

### LIME 🍋

#### Local Interpretable Model-Agnostic Explanations (LIME) [Paper, 2016](https://arxiv.org/abs/1602.04938)


> *Interpretable models that are used to explain individual predictions of black box machine learning models (for credit card fraud detection in this project)*

##### LIME Process in Fraud Detection:
1. **Select:** Choose a transaction (e.g., a potential fraud case).
2. **Perturb:** Create variations by slightly altering its features.
3. **Generate:** Build a dataset of these perturbed transactions with their fraud predictions.
4. **Train:** Fit an easy-to-interpret model on this new dataset, giving more weight to samples similar to the original transaction.
5. **Interpret:** Use the simple model to show which features drove the fraud prediction.

##### Technical Implementation:
* **Numerical Features:** Compute statistics (mean, std) and bin values (e.g., into quartiles).
* **Categorical Features:** Calculate the frequency of each category.

##### Key Parameter:
* **Kernel Width:**  
  - **Small width:** Only very similar transactions influence the explanation (high precision).  
  - **Large width:** More diverse transactions are included (wider coverage).

LIME helps make fraud detection more transparent by clarifying why a specific transaction was flagged as suspicious.

[LIME Code Tutorial from original paper authors](https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html)

#### Example

##### **Original Transaction Instance**

```python
transaction = {
    'amt': 1850.75,        # Transaction amount
    'age': 27,             # Customer age
    'dist': 792.3,         # Distance from home location
    'F': 1,                # Female gender
    'M': 0,                # Male gender
    '20 to 30': 1,         # Age bracket
    'AK': 1,               # State (Alaska)
    'shopping_pos': 1,     # Transaction category
}
```

##### **Step 1: Generate Perturbations**

Creating slightly modified versions (perturbations) of the original transaction by adding small random noise based on the feature's statistics (mean and standard deviation):

```python
perturbed_transactions = [
    {    # Perturbed 1 (`AK` changed, `CA` added)
        'amt': 1750.25, 'age': 27, 'dist': 792.3, 
        'F': 1, 'M': 0, '20 to 30': 1, 
        'AK': 0, 'CA': 1, 'shopping_pos': 1
    },
    {    # Perturbed 2 (`shopping_pos` `changed`, `grocery_pos` added)
        'amt': 1850.75, 'age': 27, 'dist': 792.3,
        'F': 1, 'M': 0, '20 to 30': 1, 
        'AK': 1, 'shopping_pos': 0, 'grocery_pos': 1
    },
    {    # Perturbed 3 (`distance`, `age` changed)
        'amt': 1850.75, 'age': 42, 'dist': 156.7,
        'F': 1, 'M': 0, '20 to 30': 0, '40 to 50': 1, 
        'AK': 1, 'shopping_pos': 1
    },
    # ... many more variations
]

# Get predictions from the black box model
predictions = [
    0.35,  # Transaction 1 - lower probability of fraud
    0.42,  # Transaction 2
    0.28,  # Transaction 3
    # ... and so on
]
```

##### **Step 2: Analyze Feature Distributions**
LIME analyzes the distribution of values in the perturbed samples:
```python
# Feature statistics for discretization
amt_stats = {
    'mean': 1523.45,
    'std': 342.87,
    'thresholds': [-0.60, -0.46, -0.32, -0.18, 0.04, 0.26, 0.48]  # normalized
}

dist_stats = {
    'mean': 457.23,
    'std': 389.52,
    'thresholds': [-0.82, -0.51, -0.20, 0.09, 0.41, 0.75, 1.08]  # normalized
}

# ...

For binary categorical features thresholds, LIME typically uses: The percent point function (ppf) of the normal distribution + A small adjustment factor (typically around 0.4 to 0.6) + A small shift constant (often between -0.1 and 0.1)

# Categorical feature analysis
AK_stats = {
    'frequency': 0.03,
    'ppf_calculation': -1.88,  # (ppf(0.03) ≈ -1.88) Inverse of standard normal CDF at 0.03 
    'threshold': -0.06,  # -1.88 * 0.03 - 0.00 ≈ -0.06 (Low adjustment factor (0.03) used due to feature rarity; no shift needed)
}

shopping_pos_stats = {
    'frequency': 0.15,
    'ppf_calculation': -1.04, # (ppf(0.15) ≈ -1.04) Inverse of standard normal CDF at 0.15
    'threshold': -0.33,  # -1.04 * 0.3 + 0.0 ≈ -0.312 (Medium adjustment factor (0.3); no shift applied as base calculation was close to desired scale)
}

grocery_pos_stats = {
    'frequency': 0.22,
    'ppf_calculation': -0.77, # (ppf(0.22) ≈ -0.77) Inverse of standard normal CDF at 0.22
    'threshold': -0.43,  # -0.77 * 0.45 - 0.08 ≈ -0.427 (Medium-high adjustment (0.45) with small negative shift to maintain consistent relationship with `shopping_pos`)
}
```

##### **Step 3: Discretize Continuous Features**
LIME converts continuous features into binary features using thresholds:
```python
# Original transaction (normalized)
normalized_transaction = {
    'amt': -0.51,  # (1850.75 - mean) / std
    'dist': 0.39,  # (792.3 - mean) / std
    # ...other features
}

# Binary features after discretization
binary_features = {
    '-0.60 < amt <= -0.46': 1,  # True
    '0.09 < dist <= 0.41': 1,   # True
    'AK <= -0.06': 1,                    # True
    'F <= 0.91': 1,                     # True
    'shopping_pos <= -0.33': 1,          # True
    # ...
    'amt <= -0.60': 0, 
    '-0.60 < amt <= -0.46': 0, 
    '-0.46 < amt <= -0.32': 0,
    '-0.32 < amt <= -0.18': 0,
    '-0.18 < amt <= 0.04': 0, 
    '0.04 < amt <= 0.26': 0,
    '0.26 < amt <= 0.48': 0,
    'amt > 0.48': 0
    # ... many more binary features (All other features will be 0)
}
```

##### **Step 4: Apply Kernel Weighting**

LIME uses this kernel weighting formula: 

$$\pi_x(z) = \exp\left(-\frac{D(x, z)^2}{\sigma^2}\right)$$

Where:
- $D(x, z)$ is the distance between original transaction $x$ and perturbed sample $z$ (binary distance will be count of features that differ)
- $\sigma$ controls how quickly weight decays with distance

**Perturbed 1**: (`AK` changed, `CA` added)
$$\pi_x(z_1) = \exp\left(-\frac{2^2}{1.5^2}\right) = \exp(-1.78) = 0.17$$

**Perturbed 2**: (`shopping_pos` changed, `grocery_pos` added)
$$\pi_x(z_2) = \exp\left(-\frac{2^2}{1.5^2}\right) = \exp(-1.78) = 0.17$$

**Perturbed 3**: (`distance` bin, `age` bracket changed, new `age` bracket added)
$$\pi_x(z_3) = \exp\left(-\frac{3^2}{1.5^2}\right) = \exp(-4) = 0.02$$


##### **Step 5: Train Local Interpretable Model**

LIME fits a weighted linear model to approximate the black box model locally:

$$g(z) = \beta_0 + \beta_1 z_1 + \beta_2 z_2 + \cdots + \beta_d z_d$$

**Loss Function** (Minimize the weighted squared error):

$$\min_{\beta_0, \beta} \sum_{i=1}^{n} \pi_x(z_i) \left( f(z_i) - \big(\beta_0 + \beta^T z_i\big) \right)^2$$

Where $f(z_i)$ is the black box prediction for perturbed transaction $z_i$.

##### **Step 6: Interpret Feature Contributions**

The coefficients (β) of the linear model show each feature's contribution:

| **Feature**   | **Contribution** | **Interpretation**            |
|---------------|:----------------:|-------------------------------|
| **AK <= -0.06**        |      **+0.73**   | **Strongly indicates fraud**  |
| **-0.60 < amt <= -0.46**       |      **+0.31**   | Moderately indicates fraud    |
| **0.09 < dist <= 0.41**      |      **+0.26**   | Moderately indicates fraud    |
| shopping_pos <= -0.33  |        +0.12     | Slightly indicates fraud      |
| -0.79 < age <= -0.11           |        -0.08     | Slightly indicates legitimate |
| F <= 0.91             |        -0.04     | Minimal impact                |

#### Implementation

In [None]:
categorical_names = {features.index(col): transformations_test[col].classes_.tolist() for col in categorical_features}
categorical_idx = list(categorical_names.keys())
kernel_width = np.sqrt(len(features)) * 0.75

# LIME explainer
explainer = lime_tabular.LimeTabularExplainer(
    training_data=transformations_train['scaler'].inverse_transform(X_train),
    class_names=['non-fraud', 'fraud'],
    feature_names=features,
    categorical_features=categorical_idx,
    categorical_names=categorical_names,
    kernel_width=kernel_width
)

In [None]:
# Try to explain data entry at `idx`
idx = misclassified_indices[100]
data_entry = transformations_test['scaler'].inverse_transform(X_test[[idx]]).squeeze()

print(f'Considering Index: `{idx}`')

# Create a prediction function that returns probabilities for BOTH classes
def model_predict_fn(x):
    x = transformations_test['scaler'].transform(x)
    preds = model.predict(x)
    two_column_preds = np.concatenate([1 - preds, preds], axis=1)
    return two_column_preds

exp = explainer.explain_instance(
    data_entry,
    model_predict_fn,
    num_features=len(features)
)

exp.show_in_notebook()

data_test.iloc[[idx]]

#### **How to interpret LIME Visualizations**

**Visual Guide:**
* **Left side**: Prediction probability is shown 
* **Blue bars (left)**: Features contributing to "legitimate" prediction
* **Orange bars (right)**: Features contributing to "fraudulent" prediction

**Tabular View:**
* Shows actual value for each feature
* Highlights contribution to each outcome (legitimate or fraudulent)

**Analysis Tips:**
Pay attention to which transaction characteristics most strongly influence the fraud determination. For example:
* Unusually large transaction amount
* Atypical merchant category
* Transaction occurring at unusual time

These insights help financial analysts understand why the model flagged specific transactions, improving both accuracy of manual reviews and overall model transparency.

### Anchors ⚓️

#### High-Precision Model-Agnostic Explanations (Anchors) [Paper, 2018](https://ojs.aaai.org/index.php/AAAI/article/view/11491)

> *Anchors explain individual predictions with simple IF-THEN rules that capture key conditions behind a decision.*

##### **Anchors Process in Fraud Detection:**
1. **Generate Rule Candidates:** Propose simple rules that could explain a model's prediction.
2. **Select Best Anchor:** Identify the rule that best explains the specific transaction.
3. **Validate Precision:** Confirm the rule’s accuracy by testing it on similar cases.
4. **Refine with Search:** Improve the rule using an efficient search algorithm.

##### **Key Insights:**
* **Anchors are IF-THEN rules** that provide clear, high-precision explanations.
* They focus on **accuracy over coverage**: the rule is very reliable when it applies, even if it covers a small group.

[Anchors Code Tutorial from original paper authors](https://github.com/marcotcr/anchor/blob/master/notebooks/Anchor%20on%20tabular%20data.ipynb)

#### Implementation

In [None]:
# Initialize Anchors explainer with the sample
explainer = anchor_tabular.AnchorTabularExplainer(
    train_data=transformations_train['scaler'].inverse_transform(X_train), 
    class_names=['non-fraud', 'fraud'],
    feature_names=features,
    categorical_names=categorical_names
)

In [None]:
# Try to explain data entry at `idx`
idx = misclassified_indices[0]
data_entry = transformations_test['scaler'].inverse_transform(X_test[[idx]]).squeeze()

print(f'Considering Index: `{idx}`')

# Explain the prediction using Anchors
def model_predict_fn(x):
    x = transformations_test['scaler'].transform(x)
    preds = model.predict(x, verbose=0)
    preds_binary = np.round(preds).astype(int).flatten()
    return preds_binary

# Print the anchor explanation
def print_anchor_explanation(exp, instance_prediction, data_entry, feature_names):
    """Anchor explanation printer"""
    print("\n" + "="*50)
    print(f"ANCHOR EXPLANATION")
    print("="*50)
    
    # Print the anchor rules
    if exp.names():
        print("\nIF THESE CONDITIONS ARE MET:")
        for i, condition in enumerate(exp.names(), 1):
            print(f"   {i}. {condition}")
        print(f"\nTHEN: Prediction is `{explainer.class_names[instance_prediction]}`")
    else:
        print(f"No specific rules found. Prediction is `{explainer.class_names[instance_prediction]}`")
    
    # Print metrics
    print("\nFEATURE VALUES FOR THIS INSTANCE:")
    print(f"  • Precision: {exp.precision():.2f} → If these conditions are met, the prediction is the same {exp.precision()*100:.1f}% of the time")
    print(f"  • Coverage: {exp.coverage():.2f} → These conditions apply to {exp.coverage()*100:.1f}% of similar instances")
    
    # Print important feature values 
    print("\nKey feature values:")
    for rule in exp.names():
        for i, feat in enumerate(feature_names):
            if feat in rule:
                print(f"  • {feat}: {data_entry[i]}")
    
    print("="*50)

instance_prediction = model_predict_fn(np.expand_dims(data_entry, axis=0))[0] # Get prediction for this instance

# Explain with optimized parameters
print("Generating explanation (this may take some time)...")
exp = explainer.explain_instance(
    data_entry, 
    model_predict_fn, 
    threshold=0.8,       # Lower precision requirement
    delta=0.2,           # More lenient statistical guarantee
    tau=0.2,             # More lenient precision constraint
    batch_size=500,      # Larger batches for efficiency
    max_anchor_size=3,   # Limit complexity
    beam_size=2          # Smaller beam search
)

print_anchor_explanation(exp, instance_prediction, data_entry, features)

# Display the original data
print("\nOriginal data:")
data_test.iloc[[idx]]

#### How to Interpret

This rule means that if these conditions are met, the model’s fraud prediction is highly reliable. This explanation gives fraud analysts a clear rule they can understand and verify, rather than just a list of contributing factors.

### DiCE 🧊

#### DiCE (Diverse Counterfactual Explanations) [Paper, 2019](https://arxiv.org/abs/1905.07697)

> *Generates "what-if" scenarios showing the smallest changes needed to flip a model's prediction, making black box fraud detection decisions actionable and understandable.*

##### DiCE Process in Fraud Detection:
1. **Select:** Choose a flagged fraudulent transaction.
2. **Generate:** Create realistic alternative versions base on an objective function that would be classified as legitimate.
3. **Optimize:** Find the minimum changes needed to flip the prediction.
4. **Compare:** Show what features would need to change (and by how much) to make the transaction legitimate.
5. **Present:** Provide multiple diverse counterfactual explanations, not just one.

##### Objective function for generating alternative versions:
* **Proximity:** Ensures counterfactuals are close to the original transaction.
* **Sparsity:** Minimizes the number of features that need to change.
* **Diversity:** Provides multiple alternative paths to a different outcome.
* **Feasibility:** Ensures changes are realistic (e.g., can't change transaction date to the future).

##### Key Parameters:
* **Proximity Weight:**
  - **High weight:** Counterfactuals very similar to original transaction.
  - **Low weight:** Allows more significant changes for greater diversity.
* **Feature Weights:**
  - Control which features are easier/harder to change (e.g., time is easier to change than location).



[DiCE Code Tutorial from Microsoft](https://interpret.ml/DiCE/)


#### Example

##### **Original Transaction Instance**

```python
transaction = {
    'merchant': 'Rodriguez, Yost and Jenkins',
    'category': 'misc_net',                   
    'amt': 780.52,                            
    'gender': 'M',                            
    'lat': 42.5545,                           
    'long': -90.3508,                         
    'city_pop': 1306,                         
    'unix_time': 1371853942,                  
    'merch_lat': 42.461127,                   
    'merch_long': -91.147148,                 
    'age': 66,                                
    'age_group': '60-69',                     
    'dist': 66.097917,                        
}
```

Model predicts this transaction as fraudulent with 0.9999497 probability

##### **Step 1: Define the Objective Function**

DiCE generates counterfactual examples by minimizing an objective function that balances several factors:

$$
\text{Objective} = \lambda_1 \times \underbrace{d(x,\tilde{x})}_{\text{Proximity}} + \lambda_2 \times \underbrace{\|x - \tilde{x}\|_0}_{\text{Sparsity}} + \lambda_3 \times \underbrace{D(\tilde{x}_i, \tilde{x}_j)}_{\text{Diversity}} + \lambda_4 \times \underbrace{\mathcal{L}(f(\tilde{x}), y_{\text{target}})}_{\text{Prediction Target}}
$$

Where:

- **Proximity:**  
  Measures how similar the counterfactual $\tilde{x}$ is to the original instance $x$ (using metrics such as L1 or L2 distance).  
  *Example:* $L1([780.52, 66.09], [125.75, 3.21]) = |780.52 - 125.75| + |66.09 - 3.21| = 717.65$

- **Sparsity:**  
  Penalizes changing too many features (typically measured by the $L_0$ norm).

- **Diversity:**  
  Encourages generating counterfactuals that are different from one another by measuring distances in feature space.

- **Prediction Target:**  
  Ensures the counterfactual shifts the model's prediction toward the desired outcome, usually quantified by a loss term: $\mathcal{L}(f(\tilde{x}), y_{\text{target}})$

The coefficients $\lambda_1$, $\lambda_2$, $\lambda_3$, and $\lambda_4$ balance these factors to find optimal counterfactuals.

##### **Step 2: Generate Counterfactuals**

Using the above objective function, **DiCE** generates candidate counterfactuals by slightly modifying feature values of the original transaction. These candidates are then evaluated with the objective function to select those that minimally differ from the original while achieving a different prediction.

```python
# Original fraudulent transaction (abbreviated)
original = {
    'merchant': 'Rodriguez, Yost and Jenkins', 
    'category': 'misc_net',
    'amt': 780.52,
    # ... other features
    'dist': 66.10,
    'age': 66,
    'age_group': '60-69',
    'city_pop': 1306
}

# Only showing changed features in each counterfactual
counterfactuals = [
    {   # CF1: Lower amount + different category
        'category': 'grocery_pos',  # Changed from 'misc_net'
        'amt': 125.75              # Changed from 780.52
    },
    {   # CF2: Known merchant + local transaction
        'merchant': 'Walmart',      # Changed from law firm
        'dist': 3.21               # Changed from 66.10 miles
    },
    {   # CF3: Urban location + younger customer
        'city_pop': 2746388,        # Changed from 1,306
        'age': 42,                  # Changed from 66
        'age_group': '40-49'        # Changed from '60-69'
    }
]

# Model predictions for counterfactuals
predictions = [0.08, 0.12, 0.15]  # All now classified as legitimate
```

##### **Step 3: Actionable Insights**

| **Path** | **Key Changes** | **New Prediction** | **Legitimate Indicators** |
|----------|----------------|-------------------|---------------------|
| **Path 1** | *Category:* misc_net → grocery_pos<br>*Amount:* $780.52 → $125.75 | **0.08** (Legitimate) | In-person transaction<br>Lower, more typical amount |
| **Path 2** | *Merchant:* Law firm → Walmart<br>*Distance:* 66.10 → 3.21 miles | **0.12** (Legitimate) | Well-known merchant<br>Local transaction |
| **Path 3** | *City population:* 1,306 → 2,746,388<br>*Age:* 66 → 42 | **0.15** (Legitimate) | Major city location<br>Lower-risk age demographic |

#### Implementation

In [None]:
# # Specify continuous features based on your dataset
# continuous_features = data_test[features].select_dtypes(include=[float, int]).columns.to_list()

# # Prepare training DataFrame (with original feature values) and add the outcome column
# X_train_df = pd.DataFrame(X_train, columns=features)
# X_train_df['is_fraud'] = y_train

# # Create the DiCE Data object
# dice_data = dice_ml.Data(
#     dataframe=X_train_df,
#     continuous_features=continuous_features,
#     outcome_name='is_fraud'
# )

# # Define a custom prediction function that returns a tf.Tensor.
# @tf.function  # Add TF function decorator for better performance
# def dice_model_predict(x, training=False):
#     # Convert to tensor once if it's not already
#     if not isinstance(x, tf.Tensor):
#         x = tf.convert_to_tensor(x, dtype=tf.float32)
    
#     # Reshape in a single operation
#     x_reshaped = tf.expand_dims(x, axis=-1)
    
#     # Get predictions
#     preds = model(x_reshaped, training=training)
    
#     # Create two-column probabilities
#     return tf.concat([1 - preds, preds], axis=1)

# # Create a DiCE Model object using the custom prediction function, with backend "TF2"
# dice_model = dice_ml.Model(model=dice_model_predict, backend="TF2")

# # Initialize the DiCE Explainer (using the "gradient" method here)
# explainer = dice_ml.Dice(dice_data, dice_model, method="gradient")

# # --- Select a Query Instance ---
# # For example, choose one misclassified instance (ensure shape is (1, n_features))
# idx = misclassified_indices[100]
# data_entry = pd.DataFrame(
#     X_test[[idx]],
#     columns=features
# )

# print(f"Generating counterfactuals for instance at index {idx}...")

# # --- Generate Counterfactuals ---
# # 'total_CFs' defines how many counterfactual candidates to produce,
# # and 'desired_class' being "opposite" instructs DiCE to flip the prediction.
# dice_exp = explainer.generate_counterfactuals(
#     data_entry, total_CFs=3, desired_class="opposite", verbose =True
# )

# # Display the generated counterfactual explanations
# print(dice_exp.cf_examples_list[0].final_cfs_df)


In [None]:

# # Define features
# features = data_test.columns.drop('is_fraud').tolist()

# # Define continuous and categorical features
# continuous_features = data_test[features].select_dtypes(include=['number']).columns.tolist()
# categorical_features = [f for f in features if f not in continuous_features]

# # Choose an instance to explain
# idx = misclassified_indices[100]

# # Get counterfactuals
# dice_explanations = get_dice_counterfactuals(
#     model=model,
#     X_train=X_train,
#     y_train=y_train,
#     X_test=X_test,
#     features=features,
#     idx=idx,
#     misclassified_indices=misclassified_indices,
#     scaler=transformations_test['scaler']  # Pass your scaler if needed
# )


#### How to Interpret

### Single Feature Partial Dependence Plot

[How to Build Shap Single Feature Partial Dependence Plot (PDP Plot)](https://www.youtube.com/watch?v=CgKyAlA-0wA)

In [None]:
# Dependence plot for specific feature
shap.dependence_plot("amt", shap_values, X_test[from_idx:to_idx+1], features)

### Other Visualization

In [None]:
fraud_data = data_test[data_test['is_fraud'] == 1]
non_fraud_data = data_test[data_test['is_fraud'] == 0]

In [None]:
# Statistical analysis of top n features
for feature in top_n_features:
    plt.hist(fraud_data[feature].astype(int), alpha=0.5, label='Fraud', bins=30)
    plt.hist(non_fraud_data[feature].astype(int), alpha=0.5, label='Non-Fraud', bins=30)
    plt.title(f'Distribution of {feature}')
    plt.legend()
    plt.show()

In [None]:
feature = 'amt'
mean_value = data_test[feature].mean()

fraud_ratio = len(fraud_data[fraud_data[feature] > mean_value]) * 100 / len(data_test[data_test[feature] > mean_value])
legitimate_ratio = 100 - fraud_ratio

plt.pie([fraud_ratio, legitimate_ratio],
        labels=['Fraudulent', 'Legitimate'],
        colors=['crimson', 'lightgreen'],
        autopct='%1.1f%%', startangle=90)
plt.title(f'Distribution of `{feature}` Above Mean of Dataset ({mean_value:.2f})')

In [None]:
feature = 'shopping_net'

fraud_ratio = len(fraud_data[fraud_data[feature]]) * 100 / len(data_test[data_test[feature]])
legitimate_ratio = 100 - fraud_ratio

plt.pie([fraud_ratio, legitimate_ratio],
        labels=['Fraudulent', 'Legitimate'],
        colors=['crimson', 'lightgreen'],
        autopct='%1.1f%%', startangle=90)
plt.title(f'Distribution of `{feature}` Transactions Above Mean of Dataset ({mean_value:.2f})')

In [None]:
feature = 'grocery_pos'

fraud_ratio = len(fraud_data[fraud_data[feature]]) * 100 / len(data_test[data_test[feature]])
legitimate_ratio = 100 - fraud_ratio

plt.pie([fraud_ratio, legitimate_ratio],
        labels=['Fraudulent', 'Legitimate'],
        colors=['crimson', 'lightgreen'],
        autopct='%1.1f%%', startangle=90)
plt.title(f'Distribution of `{feature}` Transactions Above Mean of Dataset ({mean_value:.2f})')

In [None]:
feature = 'gas_transport'

fraud_ratio = len(fraud_data[fraud_data[feature]]) * 100 / len(data_test[data_test[feature]])
legitimate_ratio = 100 - fraud_ratio

plt.pie([fraud_ratio, legitimate_ratio],
        labels=['Fraudulent', 'Legitimate'],
        colors=['crimson', 'lightgreen'],
        autopct='%1.1f%%', startangle=90)
plt.title(f'Distribution of `{feature}` Transactions Above Mean of Dataset ({mean_value:.2f})')

In [None]:
feature = 'misc_net' # Miscellaneous online transactions

fraud_ratio = len(fraud_data[fraud_data[feature]]) * 100 / len(data_test[data_test[feature]])
legitimate_ratio = 100 - fraud_ratio

plt.pie([fraud_ratio, legitimate_ratio],
        labels=['Fraudulent', 'Legitimate'],
        colors=['crimson', 'lightgreen'],
        autopct='%1.1f%%', startangle=90)
plt.title(f'Distribution of `{feature}` Transactions Above Mean of Dataset ({mean_value:.2f})')

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

state_columns = ['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'FL', 'GA', 'HI', 'IA', 
                'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 
                'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 
                'OR', 'PA', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']

states = []
fraud_rates = []
transaction_counts = []
fraud_counts = []

# Calculate metrics for each state
for state in state_columns:
    state_transactions = data_test[data_test[state] == 1]
    total = len(state_transactions)
    
    if total > 0:
        fraud_count = state_transactions['is_fraud'].sum()
        fraud_rate = fraud_count / total
        
        states.append(state)
        fraud_rates.append(fraud_rate)
        transaction_counts.append(total)
        fraud_counts.append(fraud_count)

state_fraud_df = pd.DataFrame({
    'State': states,
    'Fraud_Rate': fraud_rates,
    'Transaction_Count': transaction_counts,
    'Fraud_Count': fraud_counts
})

# Sort by fraud rate descending
state_fraud_df = state_fraud_df.sort_values('Fraud_Rate', ascending=False)

plt.figure(figsize=(12, 10))
ax = sns.barplot(x='Fraud_Rate', y='State', data=state_fraud_df, palette='viridis')

plt.title('Fraud Rate by State', fontsize=16)
plt.xlabel('Fraud Rate', fontsize=12)
plt.ylabel('State', fontsize=12)

for i, v in enumerate(state_fraud_df['Fraud_Rate']):
    ax.text(v + 0.005, i, f"{v:.2%}", va='center')

plt.tight_layout()
plt.show()

# Create a second visualization - bubble chart with fraud rates and transaction volume
plt.figure(figsize=(14, 8))

scatter = plt.scatter(state_fraud_df['Transaction_Count'], 
                     state_fraud_df['Fraud_Rate'], 
                     s=state_fraud_df['Fraud_Count']*5,
                     alpha=0.7,
                     c=state_fraud_df['Fraud_Rate'],
                     cmap='Reds')

for _, row in state_fraud_df.head(5).iterrows():
    plt.annotate(row['State'], 
                (row['Transaction_Count'], row['Fraud_Rate']),
                xytext=(5, 5),
                textcoords='offset points',
                fontweight='bold')

plt.xscale('log')  # Use log scale for better visualization if counts vary widely
plt.grid(True, alpha=0.3)
plt.title('Fraud Rate vs Transaction Volume by State', fontsize=16)
plt.xlabel('Number of Transactions (log scale)', fontsize=12)
plt.ylabel('Fraud Rate', fontsize=12)

cbar = plt.colorbar(scatter)
cbar.set_label('Fraud Rate', rotation=270, labelpad=15)

handles, labels = plt.gca().get_legend_handles_labels()
legend1 = plt.legend(handles, labels, loc="upper left", title="States")

plt.tight_layout()
plt.show()

### Feature Ablation Study

In [None]:

def feature_ablation_study_global(model, X, y, from_idx, to_idx, selected_features, all_features):
    global replacement_values, ablated_pred
    base_pred = model.predict(X, verbose=0)
    base_auc = roc_auc_score(y, base_pred)

    # Precompute replacement values for each feature to avoid repeated computation.
    replacement_values = {}
    for idx, feature in enumerate(all_features):
        replacement_values[feature] = np.median(X[:, idx])

    print(f"Global (AUC) Feature Ablation Study from index [{from_idx}] to index [{to_idx}]:")
    test_set = X[from_idx:to_idx+1]
    records = []
    for idx, feature in enumerate(all_features):
        X_temp = test_set.copy()
        X_temp[:, idx] = replacement_values[feature]

        ablated_pred = model.predict(X_temp, verbose=0)
        ablated_auc = roc_auc_score(y[from_idx:to_idx+1], ablated_pred)

        impact = ((base_auc - ablated_auc) / base_auc) * 100

        records.append({
            'feature': feature,
            'ablation_auc': ablated_auc,
            'impact_score': impact
        })

    df_results = pd.DataFrame(records).sort_values(by='impact_score', ascending=False, key=abs).reset_index(drop=True)
    df_results['ranking'] = df_results.index+1  # Ranking: 1 denotes the highest impact.
    df_results = df_results[df_results['feature'].isin(selected_features)].reset_index(drop=True)

    return df_results

In [None]:
df_ablation_global = feature_ablation_study_global(model, X_test, y_test, from_idx, to_idx, top_n_features, features)
df_ablation_global

### Local Interterpretation

In [None]:
def feature_ablation_single_entry(model, data_entry, selected_features, all_features):
    baseline_values = np.zeros_like(data_entry)

    original_prediction = model.predict(data_entry.reshape(1, -1), verbose=0)

    print(f"Local Feature Ablation Study: ")
    records = []
    for i, feature in enumerate(all_features):
        ablated_entry = data_entry.copy()
        ablated_entry[i] = baseline_values[i]

        ablated_pred = model.predict(ablated_entry.reshape(1, -1), verbose=0) # Compute prediction on the ablated entry

        impact = original_prediction - ablated_pred # Calculate the drop (or change) in prediction

        records.append({
            'feature': feature,
            'ablation_auc': ablated_pred.squeeze(),
            'impact_score': impact.squeeze()
        })

    df_results = pd.DataFrame(records).sort_values(by='impact_score', ascending=False, key=abs).reset_index(drop=True)
    df_results['ranking'] = df_results.index+1  # Ranking: 1 denotes the highest impact.
    df_results = df_results[df_results['feature'].isin(selected_features)].reset_index(drop=True)

    return df_results, original_prediction


In [None]:
idx = 2
df_ablation_local, original_prob = feature_ablation_single_entry(model, X_test[idx], top_n_features, features)
df_ablation_local

In [None]:
# Plot feature contributions for a prediction
shap.initjs()
baseline = explainer.expected_value.numpy()

shap.force_plot(baseline, shap_values[idx:idx+1], processed_data.loc[y_test.index].drop('is_fraud', axis=1).iloc[idx:idx+1], features)

In [None]:
# TO DO

In [None]:
X_train.shape

## Not Active Codes


```
def feature_ablation_study_local(model, X, y, index, selected_features, all_features):
    global records, base_pred, base_diff, ablated_diff
    base_pred = model.predict(X[index:index+1], verbose=0)
    base_diff = y[index:index+1].to_numpy() - base_pred
    
    # Precompute replacement values for each feature to avoid repeated computation.
    replacement_values = {}
    for idx, feature in enumerate(all_features):
        replacement_values[feature] = np.median(X[:, idx])

    print(f"Local Feature Ablation Study of index [{index}]:")
    test_set = X[index:index+1]
    records = []
    for idx, feature in enumerate(all_features):
        X_temp = test_set.copy()
        X_temp[:, idx] = replacement_values[feature]
        
        ablated_pred = model.predict(X_temp, verbose=0)
        ablated_diff = y[index:index+1].to_numpy() - ablated_pred
        
        impact = np.mean(base_diff - ablated_diff / base_diff)
        
        records.append({
            'feature': feature,
            'ablation_diff': np.mean(ablated_diff),
            'impact_score': impact
        })

    df_results = pd.DataFrame(records).sort_values(by='impact_score', ascending=False, key=abs).reset_index(drop=True)
    df_results['ranking'] = df_results.index+1  # Ranking: 1 denotes the highest impact.
    
    df_results = df_results[df_results['feature'].isin(selected_features)].reset_index(drop=True)
    
    return df_results

df_ablation_local = feature_ablation_study_local(model, X_test, y_test, 2, top_n_features, features)
df_ablation_local

```