# Credit Card Fraud Detection API
API walkthrough showing data preparation, SOTA models (Isolation Forest, autoencoder, XGBoost, LightGBM, CatBoost, RandomForest, Logistic Regression), ensemble construction, threshold tuning, and WIT wiring.


This notebook implements the API for the Credit Card Fraud Detection project.

- **Description of what the notebook does:** Demonstrates the reusable API functions that load, preprocess, train, and evaluate fraud detectors (anomaly and supervised) and wire up WIT for inspection.
- **Point to references:** See [`WIT.API.md`](./WIT.API.md) for detailed API documentation and explanations of each function.
- **Citations:** Dataset source: [Kaggle Credit Card Fraud Detection Dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud).
- **Notebook flow:** Load → Preprocess → Train → Evaluate → Visualize (WIT).
- **Comments:** All comments are present.


## Notebook outline
- Section 1: Setup and imports
- Section 2: Loading, cleaning, and engineering features
- Section 3: Train/validation/test split and scaling
- Section 4: SMOTE-Tomek for class imbalance
- Section 5: Anomaly models (Isolation Forest, autoencoder)
- Section 6: Supervised models (LogReg, RandomForest, XGBoost, CatBoost)
- Section 7: Soft-voting ensemble and validation threshold tuning
- Section 8: WIT configuration on a held out sample
This notebook is intentionally API-centric: it shows how to call utilities rather than plotting every diagnostic.

## Goals
- Load and clean the Kaggle credit card data
- Engineer features, scale safely, and stratify splits with a validation fold
- Train anomaly models and supervised baselines + gradient boosting stack
- Build a weighted soft-voting ensemble and tune decision thresholds on validation
- Expose a WIT widget for decision-boundary inspection and what-if analysis


## Setup

In [1]:
import logging
from pprint import pprint

import pandas as pd

from WIT_utils import (
    load_raw_data,
    clean_data,
    engineer_features,
    split_features_target,
    scale_features,
    balance_with_smote_tomek,
    train_isolation_forest,
    predict_isolation_forest,
    train_autoencoder,
    predict_autoencoder,
    train_supervised_models,
    build_soft_voting_ensemble,
    build_predict_fn,
    build_wit_widget,
    evaluate_binary_classification,
    optimize_threshold,
)

logging.basicConfig(level=logging.INFO)


## Configuration

In [2]:

N_SAMPLE = 100000  # set to None for full dataset
VAL_SIZE = 0.15
SEED = 42


## Load, clean, and engineer features

In [3]:

raw_df = load_raw_data(nrows=N_SAMPLE)
df = clean_data(engineer_features(raw_df))
print(df.shape)
print(df['Class'].value_counts(normalize=True).rename('fraud_ratio'))
df.head()


2025-12-11 21:30:52,098 - INFO - Loading dataset from data/raw/creditcard.csv


2025-12-11 21:30:52,564 - INFO - Loaded 100000 rows and 31 columns


2025-12-11 21:30:52,709 - INFO - Dropped 381 duplicate rows


(99619, 34)
Class
0    0.997761
1    0.002239
Name: fraud_ratio, dtype: float64


Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V24,V25,V26,V27,V28,Amount,Class,Hour,Amount_log1p,Amount_per_hour
0,0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0,0,5.01476,149.62
1,0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0,0,1.305626,2.69
2,1,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0,0,5.939276,378.66
3,1,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0,0,4.824306,123.5
4,2,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0,0,4.262539,69.99


## Split, scale, and create validation fold

In [4]:

X_train, X_test, y_train, y_test = split_features_target(df)
X_train_s, X_test_s, scaler = scale_features(X_train, X_test)
# Validation fold from training set (no leakage)
from sklearn.model_selection import train_test_split
X_tr, X_val, y_tr, y_val = train_test_split(
    X_train_s, y_train, test_size=VAL_SIZE, stratify=y_train, random_state=SEED
)
print("Train/Val/Test sizes:", len(X_tr), len(X_val), len(X_test_s))


2025-12-11 21:30:52,781 - INFO - Train/Test split: 79695/19924 rows (fraud ratio train=0.0022, test=0.0023)


2025-12-11 21:30:52,807 - INFO - Scaled features with StandardScaler


Train/Val/Test sizes: 67740 11955 19924


## Balance training data with SMOTE-Tomek

In [5]:

X_bal, y_bal, sampler = balance_with_smote_tomek(X_tr, y_tr)
print("Balanced train shape:", X_bal.shape, "fraud ratio=", y_bal.mean())


2025-12-11 21:30:58,821 - INFO - After SMOTE-Tomek: 135178 rows (fraud ratio=0.5000)


Balanced train shape: (135178, 33) fraud ratio= 0.5


## Anomaly model: Isolation Forest

In [6]:

iso_model = train_isolation_forest(X_tr)
val_pred, val_scores = predict_isolation_forest(iso_model, X_val)
val_metrics = evaluate_binary_classification(y_val, val_pred, val_scores)
test_pred, test_scores = predict_isolation_forest(iso_model, X_test_s)
test_metrics = evaluate_binary_classification(y_test, test_pred, test_scores)
print("Validation:")
pprint({k: v for k, v in val_metrics.items() if k not in ['classification_report', 'confusion_matrix']})
print("Confusion matrix (val):", val_metrics['confusion_matrix'])
print("Test:")
pprint({k: v for k, v in test_metrics.items() if k not in ['classification_report', 'confusion_matrix']})
print("Confusion matrix (test):", test_metrics['confusion_matrix'])


2025-12-11 21:30:59,867 - INFO - Trained Isolation Forest (contamination=0.00172)


Validation:
{'f1': 0.25,
 'pr_auc': 0.2311338084174197,
 'precision': 0.38461538461538464,
 'recall': 0.18518518518518517,
 'roc_auc': 0.9730792160369625}
Confusion matrix (val): [[11920     8]
 [   22     5]]
Test:
{'f1': 0.2558139534883721,
 'pr_auc': 0.1733418876790203,
 'precision': 0.2682926829268293,
 'recall': 0.24444444444444444,
 'roc_auc': 0.9393799151533444}
Confusion matrix (test): [[19849    30]
 [   34    11]]


## Anomaly model: Autoencoder

In [7]:

ae_model, ae_thresh, _ = train_autoencoder(X_tr.values, y_tr)
ae_pred_val, ae_errors_val = predict_autoencoder(ae_model, X_val.values, ae_thresh)
ae_val_metrics = evaluate_binary_classification(y_val, ae_pred_val, ae_errors_val)
ae_pred_test, ae_errors_test = predict_autoencoder(ae_model, X_test_s.values, ae_thresh)
ae_test_metrics = evaluate_binary_classification(y_test, ae_pred_test, ae_errors_test)
print("Validation:")
pprint({k: v for k, v in ae_val_metrics.items() if k not in ['classification_report', 'confusion_matrix']})
print("Confusion matrix (val):", ae_val_metrics['confusion_matrix'])
print("Test:")
pprint({k: v for k, v in ae_test_metrics.items() if k not in ['classification_report', 'confusion_matrix']})
print("Confusion matrix (test):", ae_test_metrics['confusion_matrix'])


2025-12-11 21:31:00,550 - INFO - Autoencoder training restricted to normal class (n=67589)


2025-12-11 21:31:03,480 - INFO - Autoencoder trained; anomaly threshold set at 7.003267


Validation:
{'f1': 0.13513513513513514,
 'pr_auc': 0.09160251776799921,
 'precision': 0.10638297872340426,
 'recall': 0.18518518518518517,
 'roc_auc': 0.9729208584842388}
Confusion matrix (val): [[11886    42]
 [   22     5]]
Test:
{'f1': 0.1125,
 'pr_auc': 0.060747705199871674,
 'precision': 0.0782608695652174,
 'recall': 0.2,
 'roc_auc': 0.9361436691986518}
Confusion matrix (test): [[19773   106]
 [   36     9]]


## Supervised SOTA models

In [8]:

models = train_supervised_models(X_bal, y_bal)
supervised_val = {}
for name, model in models.items():
    proba_val = model.predict_proba(X_val)[:, 1]
    pred_val = (proba_val >= 0.5).astype(int)
    supervised_val[name] = evaluate_binary_classification(y_val, pred_val, proba_val)
    print(f"{name} validation f1={supervised_val[name]['f1']:.4f} roc_auc={supervised_val[name]['roc_auc']:.4f}")


log_reg validation f1=0.1943 roc_auc=0.9330
random_forest validation f1=0.8889 roc_auc=0.9945
xgboost validation f1=0.8889 roc_auc=0.9911
catboost validation f1=0.7869 roc_auc=0.9839


## Weighted soft-voting ensemble with threshold tuning

In [9]:

ensemble = build_soft_voting_ensemble(models)
ensemble.fit(X_bal, y_bal)
proba_val = ensemble.predict_proba(X_val)[:, 1]
best_threshold, best_stats = optimize_threshold(y_val.values, proba_val)
print("Best validation threshold:", best_threshold)
print("Validation precision/recall/f1 at best threshold:", best_stats)

proba_test = ensemble.predict_proba(X_test_s)[:, 1]
pred_test = (proba_test >= best_threshold).astype(int)
ensemble_metrics = evaluate_binary_classification(y_test, pred_test, proba_test)
print("Ensemble test metrics:")
pprint({k: v for k, v in ensemble_metrics.items() if k not in ['classification_report', 'confusion_matrix']})
print("Confusion matrix (test):", ensemble_metrics['confusion_matrix'])


Best validation threshold: 0.7711071100177596
Validation precision/recall/f1 at best threshold: {'best_precision': 0.96, 'best_recall': 0.8888888888888888, 'best_f1': 0.9230769225776627}
Ensemble test metrics:
{'f1': 0.8536585365853658,
 'pr_auc': 0.8601393578483686,
 'precision': 0.9459459459459459,
 'recall': 0.7777777777777778,
 'roc_auc': 0.969294230092057}
Confusion matrix (test): [[19877     2]
 [   10    35]]


## WIT integration

In [10]:

# Build WIT widget on a sample of the test set with ensemble predictions
sample_df = pd.concat([X_test_s.reset_index(drop=True), y_test.reset_index(drop=True)], axis=1).sample(400, random_state=SEED)
feature_cols = [c for c in sample_df.columns if c != 'Class']
predict_fn = build_predict_fn(ensemble, feature_cols)
wit = build_wit_widget(sample_df, feature_cols, predict_fn, target_col='Class')
wit


If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates. Install with `pip install witwidget ipywidgets==7.* ipython<9 tensorflow==2.13.0` or on Apple Silicon use `tensorflow-macos==2.13.0` with an ARM Python.


## Technical Notes
- **Stratified splitting**: training/validation/test splits preserve fraud ratio, preventing optimistic leakage.
- **Scaling discipline**: scalers fit on training data only; validation/test use `.transform` to avoid peeking.
- **Imbalance handling**: SMOTE-Tomek reduces overlapping boundary points after minority synthesis.
- **Anomaly vs supervised**: Isolation Forest and autoencoder provide unsupervised signals; supervised boosters consume balanced data to improve recall.
- **Threshold tuning**: `optimize_threshold` maximizes F1 on validation, then applied to test for production-like gating.
- **Interpretability**: WIT runs on a held-out sample with the ensemble predict function; use feature sliders (Amount_log1p, Hour) and confusion matrix coloring to inspect FP/FN.
- **Reproducibility**: seeds are fixed, dependencies pinned, and nbconvert execution completes end-to-end in this environment.
