# Week 2 · Day 4 — Explainability & Stakeholder-Ready Interpretation

## Purpose
Fraud models must be transparent and regulator-friendly. Today we:
1. Generate **transaction-level explanations** (local explainability) using SHAP.
2. Convert explanations into **reason codes** that analysts can review.
3. Identify which features most often cause **false positives** (legit transactions flagged as fraud).

## Outputs
- Notebook: `07_explainability.ipynb`
- Review template: Markdown (can be exported to PDF)

In [1]:
# =========================
# 1) Imports
# =========================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import shap

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

  from .autonotebook import tqdm as notebook_tqdm


## 2) Load Data (Time-Ordered)
We keep the same time-based split used in prior days to mimic real-world deployment:
- Train = earlier transactions  
- Test  = later transactions  

This avoids leakage and allows us to test stability across time.

In [16]:
# -------------------------
# Config
# -------------------------
DATA_PATH = "../data/processed/cleaned_transactions.csv"
TIMESTAMP_COL = "timestamp"
TARGET_COL = "is_fraud"

# -------------------------
# Load
# -------------------------
df = pd.read_csv(DATA_PATH)
df.columns = df.columns.str.strip()

# Parse timestamp and sort
df[TIMESTAMP_COL] = pd.to_datetime(df[TIMESTAMP_COL], errors="coerce")
df = df.dropna(subset=[TIMESTAMP_COL]).sort_values(TIMESTAMP_COL).reset_index(drop=True)

# -------------------------
# Feature groups
# -------------------------
numeric_features = (
    df.select_dtypes(include=["number"])
      .columns
      .drop([TARGET_COL], errors="ignore")
      .tolist()
)

categorical_features = (
    df.select_dtypes(include=["object", "category", "bool"])
      .columns
      .drop([TARGET_COL, TIMESTAMP_COL], errors="ignore")
      .tolist()
)

# ✅ NEW: Remove identifier-like columns (high-cardinality, not stakeholder-friendly, can cause leakage)
ID_COLS = ["transaction_id", "customer_id", "device_id", "ip_address"]
categorical_features = [c for c in categorical_features if c not in ID_COLS]

# Build X/y
X = df[numeric_features + categorical_features]
y = df[TARGET_COL].astype(int)

# -------------------------
# Time split (80/20)
# -------------------------
split_index = int(len(df) * 0.8)
X_train_raw, X_test_raw = X.iloc[:split_index], X.iloc[split_index:]
y_train, y_test = y.iloc[:split_index], y.iloc[split_index:]

print("Dropped ID columns:", ID_COLS)
print("Train shape:", X_train_raw.shape, "| Test shape:", X_test_raw.shape)
print("Train fraud rate:", round(y_train.mean(), 4), "| Test fraud rate:", round(y_test.mean(), 4))


Dropped ID columns: ['transaction_id', 'customer_id', 'device_id', 'ip_address']
Train shape: (8832, 22) | Test shape: (2208, 22)
Train fraud rate: 0.0768 | Test fraud rate: 0.1409


## 3) Train the Model to Explain (Reproducible)
This notebook is separate from Week 2 Day 3, so the trained model objects from that notebook are not available here.
To ensure reproducibility and transparency, we rebuild and refit the selected model pipeline in this notebook before generating explanations.


In [17]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer

numeric_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])

categorical_transformer = Pipeline([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("encoder", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
])

preprocess = ColumnTransformer([
    ("num", numeric_transformer, numeric_features),
    ("cat", categorical_transformer, categorical_features)
], remainder="drop")

## Evaluation helper (optional)
We use the same metric calculation so results are consistent with Day 3.

In [9]:
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score

def eval_at_threshold(y_true, y_proba, threshold=0.5):
    y_pred = (y_proba >= threshold).astype(int)
    return {
        "threshold": threshold,
        "precision": precision_score(y_true, y_pred, zero_division=0),
        "recall": recall_score(y_true, y_pred, zero_division=0),
        "f1": f1_score(y_true, y_pred, zero_division=0),
        "roc_auc": roc_auc_score(y_true, y_proba),
    }

## Train the model we will explain (RF + class weights)
We refit the chosen “best” model here so SHAP explanations are generated from an identical pipeline.


In [18]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline

rf_weighted = Pipeline(steps=[
    ("preprocess", preprocess),
    ("model", RandomForestClassifier(
        n_estimators=300,
        random_state=42,
        n_jobs=-1,
        class_weight="balanced_subsample"
    ))
])

rf_weighted.fit(X_train_raw, y_train)

MODEL_PIPELINE = rf_weighted
THRESHOLD = 0.5

print("Model ready:", MODEL_PIPELINE.named_steps["model"].__class__.__name__)

Model ready: RandomForestClassifier


## 4) SHAP Setup — Feature Names After Preprocessing
SHAP explanations must use the **same feature space** that the model was trained on.
Because our pipeline uses OneHotEncoder, we rebuild feature names after preprocessing.


In [19]:
def get_feature_names_from_preprocess(preprocess: ColumnTransformer, numeric_features, categorical_features):
    num_names = list(numeric_features)
    cat_encoder = preprocess.named_transformers_["cat"].named_steps["encoder"]
    cat_names = cat_encoder.get_feature_names_out(categorical_features).tolist()
    return num_names + cat_names


## 5) Build Transaction-Level SHAP Explainer (Local Explanations)
We explain individual transactions in a regulator-friendly way:
- show predicted **fraud probability** (confidence)
- show decision (FLAG/ALLOW) based on threshold
- show top SHAP contributors as **reason codes**


In [20]:
import shap
import numpy as np
import pandas as pd

def make_shap_for_tree_pipeline(trained_pipeline: Pipeline,
                               X_background_raw: pd.DataFrame,
                               X_explain_raw: pd.DataFrame,
                               numeric_features,
                               categorical_features):
    """
    Builds SHAP explanations for a tree model inside a preprocessing pipeline.

    Key improvements:
    - Removes additivity crash risk (check_additivity=False)
    - Uses interventional perturbation with a proper background dataset
    - Explains in probability space for stakeholder readability
    """
    preprocess = trained_pipeline.named_steps["preprocess"]
    model = trained_pipeline.named_steps["model"]

    feature_names = get_feature_names_from_preprocess(preprocess, numeric_features, categorical_features)

    # Transform background + explain rows into the SAME feature space
    X_bg = preprocess.transform(X_background_raw)
    X_ex = preprocess.transform(X_explain_raw)

    X_bg = X_bg.toarray() if hasattr(X_bg, "toarray") else X_bg
    X_ex = X_ex.toarray() if hasattr(X_ex, "toarray") else X_ex

    X_bg_df = pd.DataFrame(X_bg, columns=feature_names)
    X_ex_df = pd.DataFrame(X_ex, columns=feature_names)

    # ✅ Proper interventional setup + probability output
    explainer = shap.TreeExplainer(
        model,
        data=X_bg_df,
        feature_perturbation="interventional",
        model_output="probability"
    )

    shap_values = explainer.shap_values(X_ex_df, check_additivity=False)

    # Binary classifier output handling
    if isinstance(shap_values, list):
        shap_vals = shap_values[1]  # class 1 = fraud
    else:
        arr = np.array(shap_values)
        shap_vals = arr[:, :, 1] if (arr.ndim == 3 and arr.shape[-1] == 2) else arr

    # Guard against interaction values (n, p, p)
    arr2 = np.array(shap_vals)
    if arr2.ndim == 3 and arr2.shape[1] == arr2.shape[2]:
        raise ValueError("Interaction SHAP detected (n,p,p). Use shap_values, not shap_interaction_values.")

    return explainer, X_ex_df, shap_vals, feature_names

## 6) Analyst View — Convert SHAP into Reason Codes
This function produces a stakeholder-ready explanation for one transaction:
- Fraud probability
- Decision (FLAG/ALLOW)
- Top factors pushing toward fraud (**reason codes**)
- Top factors pushing away from fraud (why it might be legit)


In [21]:
import numpy as np
import pandas as pd

def analyst_view(trained_pipeline: Pipeline,
                 X_background_raw: pd.DataFrame,
                 X_row_raw: pd.DataFrame,
                 numeric_features,
                 categorical_features,
                 threshold=0.5,
                 top_k=8):
    """
    Stakeholder-ready transaction explanation:
    - fraud probability
    - decision using threshold
    - top reason codes (positive SHAP)
    """
    proba = float(trained_pipeline.predict_proba(X_row_raw)[:, 1][0])
    decision = "FLAG (fraud suspected)" if proba >= threshold else "ALLOW (likely legit)"

    _, X_ex_df, shap_vals, feature_names = make_shap_for_tree_pipeline(
        trained_pipeline,
        X_background_raw=X_background_raw,
        X_explain_raw=X_row_raw,
        numeric_features=numeric_features,
        categorical_features=categorical_features
    )

    sv = shap_vals[0]

    contrib = pd.DataFrame({
        "feature": feature_names,
        "feature_value": X_ex_df.iloc[0].values,
        "shap_value": sv,
        "abs_shap": np.abs(sv),
    }).sort_values("abs_shap", ascending=False)

    top = contrib.head(top_k).copy()
    top_pos = top[top["shap_value"] > 0].copy()
    top_neg = top[top["shap_value"] < 0].copy()

    return {
        "fraud_probability": proba,
        "threshold": threshold,
        "decision": decision,
        "top_contributors_all": top,
        "top_positive_reasons": top_pos,
        "top_negative_factors": top_neg,
    }

## 7) Example: Explain One Flagged Transaction
We pick a transaction from the test set and generate a local explanation.


In [22]:
# Background sample (reference distribution)
X_bg = X_train_raw.sample(min(500, len(X_train_raw)), random_state=42)

# Predict test probabilities
test_proba = MODEL_PIPELINE.predict_proba(X_test_raw)[:, 1]
pred_flag = (test_proba >= THRESHOLD).astype(int)

flagged_idx = np.where(pred_flag == 1)[0]
if len(flagged_idx) == 0:
    print("No flagged transactions at this threshold.")
else:
    high_conf_idx = flagged_idx[np.argsort(test_proba[flagged_idx])[::-1][:3]]
    borderline_idx = flagged_idx[np.argsort(np.abs(test_proba[flagged_idx] - THRESHOLD))[:3]]

    print("High confidence indices:", high_conf_idx.tolist())
    print("Borderline indices:", borderline_idx.tolist())

    idx = int(high_conf_idx[0])
    row = X_test_raw.iloc[[idx]]

    res = analyst_view(
        trained_pipeline=MODEL_PIPELINE,
        X_background_raw=X_bg,
        X_row_raw=row,
        numeric_features=numeric_features,
        categorical_features=categorical_features,
        threshold=THRESHOLD,
        top_k=10
    )

    print("Decision:", res["decision"])
    print("Fraud probability:", round(res["fraud_probability"], 4))
    display(res["top_positive_reasons"])

High confidence indices: [2030, 364, 1258]
Borderline indices: [1241, 1645, 315]
Decision: FLAG (fraud suspected)
Fraud probability: 1.0


Unnamed: 0,feature,feature_value,shap_value,abs_shap
9,txn_velocity_24h,1.828476,0.114454,0.114454
8,txn_velocity_1h,1.792885,0.100015,0.100015
7,risk_score_internal,3.221917,0.087165,0.087165
3,ip_risk_score,2.277841,0.087065,0.087065
4,account_age_days,-0.993465,0.067585,0.067585
5,device_trust_score,-1.545503,0.066379,0.066379
6,chargeback_history_count,4.07959,0.054024,0.054024
11,amount_src_num,0.524115,0.035543,0.035543
8011,location_mismatch_True,1.0,0.032933,0.032933
8022,kyc_tier_low,1.0,0.031923,0.031923


### Example flagged transaction — reason codes (local SHAP)

For a high-confidence flagged transaction (fraud probability ≈ 1.0), the strongest drivers were:

- **High transaction velocity (1h and 24h)**: suggests burst behaviour often associated with fraud rings or account takeover.
- **High risk scores (internal + IP risk)**: indicates the transaction shares known fraud signatures and risky network attributes.
- **Low account age**: newer accounts tend to have less trust/history and are more frequently used for fraud.
- **Low device trust + location mismatch**: suggests the transaction is being initiated from an unusual device/location pattern.
- **Low KYC tier**: weaker identity verification increases exposure to fraudulent activity.

These reasons can be surfaced as analyst-facing “reason codes” to support manual review and auditability.

In [23]:
# =========================
# False Positive Driver Analysis (SHAP on false positives)
# =========================
# Goal:
#   Identify which features most often push LEGIT transactions (y=0)
#   into being flagged as FRAUD (pred=1).
#
# Output:
#   A ranked table of features by mean(|SHAP|) across false positives.
#   This supports your reflection question for Day 4.
# =========================

def false_positive_driver_analysis(
    trained_pipeline: Pipeline,
    X_train_raw: pd.DataFrame,
    X_test_raw: pd.DataFrame,
    y_test: pd.Series,
    numeric_features,
    categorical_features,
    threshold=0.5,
    top_k=15,
    background_size=500
):
    # 1) Predict fraud probabilities on the test set
    y_proba = trained_pipeline.predict_proba(X_test_raw)[:, 1]

    # 2) Convert probabilities → predicted labels using chosen threshold
    y_pred = (y_proba >= threshold).astype(int)

    # 3) False positives = predicted fraud (1) but actually legit (0)
    fp_mask = (y_pred == 1) & (y_test.values == 0)
    fp_count = int(fp_mask.sum())

    print(f"Threshold = {threshold}")
    print(f"False positives found: {fp_count}")

    # If no false positives, return early
    if fp_count == 0:
        return None, fp_mask

    # Extract the false positive rows
    X_fp = X_test_raw.loc[fp_mask]

    # 4) Choose background data for SHAP (reference distribution)
    #    Keep it moderate for speed and stability.
    X_bg = X_train_raw.sample(min(background_size, len(X_train_raw)), random_state=RANDOM_STATE)

    # 5) Compute SHAP values on false positives only
    _, X_fp_df, shap_vals_fp, feature_names = make_shap_for_tree_pipeline(
        trained_pipeline=trained_pipeline,
        X_background_raw=X_bg,
        X_explain_raw=X_fp,
        numeric_features=numeric_features,
        categorical_features=categorical_features
    )

    # 6) Aggregate: mean absolute SHAP across all FP cases
    mean_abs = np.abs(shap_vals_fp).mean(axis=0)

    fp_drivers = (
        pd.DataFrame({"feature": feature_names, "mean_abs_shap": mean_abs})
        .sort_values("mean_abs_shap", ascending=False)
        .head(top_k)
        .reset_index(drop=True)
    )

    return fp_drivers, fp_mask


# ===== Run FP analysis =====
fp_drivers, fp_mask = false_positive_driver_analysis(
    trained_pipeline=MODEL_PIPELINE,
    X_train_raw=X_train_raw,
    X_test_raw=X_test_raw,
    y_test=y_test,
    numeric_features=numeric_features,
    categorical_features=categorical_features,
    threshold=THRESHOLD,  # try 0.5 first; later you can test 0.6/0.7
    top_k=15
)

display(fp_drivers)

Threshold = 0.5
False positives found: 0


None

In [25]:
RANDOM_STATE = 42

for t in [0.4, 0.3, 0.2, 0.1]:
    fp_drivers, fp_mask = false_positive_driver_analysis(
        trained_pipeline=MODEL_PIPELINE,
        X_train_raw=X_train_raw,
        X_test_raw=X_test_raw,
        y_test=y_test,
        numeric_features=numeric_features,
        categorical_features=categorical_features,
        threshold=t,
        top_k=10
    )
    if fp_drivers is not None:
        print("\nTop FP drivers at threshold", t)
        display(fp_drivers.head(5))
        break

Threshold = 0.4
False positives found: 0
Threshold = 0.3
False positives found: 2

Top FP drivers at threshold 0.3


Unnamed: 0,feature,mean_abs_shap
0,account_age_days,0.047683
1,amount_src_num,0.036124
2,amount_usd_num,0.035883
3,fee,0.03234
4,amount_usd,0.031099


### Reflection — Which features most often lead to false positives and why?

At the default operating threshold (0.5), the model produced **0 false positives**, meaning no legitimate transactions were incorrectly flagged in the test set.  
To understand false-positive risk under a more aggressive policy, we lowered the threshold. At **threshold = 0.3**, we observed **2 false positives**.

The most common drivers of these false positives were:
- **account_age_days**
- **amount_src_num / amount_usd_num / amount_usd**
- **fee**

**Interpretation (business context):**
These features are strongly associated with fraud patterns (new accounts and high-value transfers). However, they also occur in legitimate scenarios such as new customers making first-time transfers, urgent family support payments, tuition/rent payments, or other high-value remittances.  
This explains why lowering the threshold increases the likelihood of false positives: the model becomes more sensitive and begins to treat legitimate “high-risk-looking” behaviour as fraud.