# Credit Card Fraud Detection with a PyTorch MLP

This notebook is a **portfolio-style** example of building a **binary classifier** for fraud detection using a
**multi-layer perceptron (MLP)** in **PyTorch**.

It includes:
- data loading and basic cleaning (pandas)
- train/validation/test splits (stratified)
- feature scaling (scikit-learn)
- an MLP with BatchNorm + Dropout
- training with **SGD** vs **Adam**
- evaluation with precision/recall/F1


## Setup
If you're running locally and missing packages, uncomment the install cell.


In [None]:
# Uncomment if needed:
# !pip install -q pandas numpy matplotlib scikit-learn seaborn torch


In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix

import matplotlib.pyplot as plt
import seaborn as sns

import torch
import torch.nn as nn
from torch.optim import SGD, Adam
from torch.utils.data import TensorDataset, DataLoader

np.random.seed(42)
torch.manual_seed(42)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device


## Data access
You can load the CSV from disk, or download it from Kaggle.

### Option A — Local file
Place the CSV in the same folder and set `file_path` accordingly.

### Option B — Kaggle download (optional)
If you want a reproducible download step, you can use the Kaggle API. That requires a Kaggle token.


In [None]:
# Optional Kaggle download (requires credentials configured in your environment)
# !pip install -q kaggle
# !kaggle datasets download -d gzdekzlkaya/credit-card-fraud-detection-dataset
# !unzip -o "credit-card-fraud-detection-dataset.zip"


In [None]:
file_path = "creditcard_fraud_detection.csv"  # update if needed
df = pd.read_csv(file_path)
df.head()


In [None]:
df.info()


## Quick exploration
Fraud datasets are typically **highly imbalanced** (very few positives). We'll inspect the class distribution.


In [None]:
target_col = "Class"  # typical for the popular credit card fraud dataset
df[target_col].value_counts().rename({0: "Legit", 1: "Fraud"})


In [None]:
plt.figure(figsize=(5,4))
df[target_col].value_counts().sort_index().plot(kind="bar")
plt.title("Class distribution (0=legit, 1=fraud)")
plt.xlabel("Class")
plt.ylabel("Count")
plt.tight_layout()
plt.show()


## 1) Split and scale (no leakage)
We:
1. Split into train/validation/test using **stratification**.
2. Fit the scaler on **train only**, then transform validation/test.

Note: In the classic dataset, features `V1..V28` are already scaled, while `Time` and `Amount` are not.
Scaling everything consistently is still fine for an MLP.


In [None]:
X = df.drop(columns=[target_col])
y = df[target_col].astype(int)

# 80% train, 10% validation, 10% test
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, train_size=0.8, stratify=y, random_state=42
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, train_size=0.5, stratify=y_temp, random_state=42
)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_val_s   = scaler.transform(X_val)
X_test_s  = scaler.transform(X_test)

X_train_s.shape, X_val_s.shape, X_test_s.shape


## 2) DataLoaders
We build PyTorch DataLoaders for mini-batch training.

For binary classification with `BCEWithLogitsLoss`, the target is float with shape `(N, 1)`.


In [None]:
def to_loader(X_np, y_series, batch_size=2048, shuffle=False):
    X_tensor = torch.tensor(X_np.astype(np.float32))
    y_tensor = torch.tensor(y_series.values.astype(np.float32)).unsqueeze(1)
    ds = TensorDataset(X_tensor, y_tensor)
    return DataLoader(ds, batch_size=batch_size, shuffle=shuffle)

batch_size = 2048
train_dl = to_loader(X_train_s, y_train, batch_size=batch_size, shuffle=True)
val_dl   = to_loader(X_val_s, y_val, batch_size=batch_size, shuffle=False)
test_dl  = to_loader(X_test_s, y_test, batch_size=batch_size, shuffle=False)


## 3) Model: MLP with BatchNorm + Dropout
We output **logits** (no final sigmoid). Sigmoid is applied only for predictions/metrics.


In [None]:
class FraudMLP(nn.Module):
    def __init__(self, input_size, hidden=(256, 128, 64), p_drop=0.3):
        super().__init__()
        layers = []
        prev = input_size
        for h in hidden:
            layers += [
                nn.Linear(prev, h),
                nn.BatchNorm1d(h),
                nn.ReLU(),
                nn.Dropout(p_drop),
            ]
            prev = h
        layers += [nn.Linear(prev, 1)]  # logits
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        return self.net(x)


## 4) Training utilities (weighted loss for imbalance)
With imbalanced targets, a weighted loss helps the model focus on the minority class.

For `BCEWithLogitsLoss`, you can set:
`pos_weight = (#negative / #positive)`.


In [None]:
def compute_pos_weight(y_series, device):
    counts = y_series.value_counts().to_dict()
    n_pos = counts.get(1, 0)
    n_neg = counts.get(0, 0)
    if n_pos == 0:
        return torch.tensor([1.0], device=device)
    return torch.tensor([n_neg / n_pos], device=device)

def batch_metrics_from_logits(logits, y_true):
    probs = torch.sigmoid(logits)
    preds = (probs >= 0.5).float()
    acc = (preds == y_true).float().mean().item()
    return acc

def run_epoch(model, dl, loss_fn, optimizer=None):
    is_train = optimizer is not None
    model.train() if is_train else model.eval()

    total_loss, total_acc, n_batches = 0.0, 0.0, 0
    for Xb, yb in dl:
        Xb, yb = Xb.to(device), yb.to(device)
        logits = model(Xb)
        loss = loss_fn(logits, yb)

        if is_train:
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        total_loss += loss.item()
        total_acc += batch_metrics_from_logits(logits, yb)
        n_batches += 1

    return total_loss / n_batches, total_acc / n_batches

def train_model(model, train_dl, val_dl, loss_fn, optimizer, epochs=20, print_every=5):
    train_losses, val_losses = [], []
    train_accs, val_accs = [], []

    for epoch in range(1, epochs + 1):
        tr_loss, tr_acc = run_epoch(model, train_dl, loss_fn, optimizer=optimizer)
        va_loss, va_acc = run_epoch(model, val_dl, loss_fn, optimizer=None)

        train_losses.append(tr_loss); val_losses.append(va_loss)
        train_accs.append(tr_acc);   val_accs.append(va_acc)

        if epoch == 1 or epoch == epochs or epoch % print_every == 0:
            print(f"Epoch {epoch:>3} | train loss {tr_loss:.5f} acc {tr_acc:.3f} | val loss {va_loss:.5f} acc {va_acc:.3f}")

    return train_losses, val_losses, train_accs, val_accs

def plot_curves(train_vals, val_vals, title, ylabel):
    plt.figure(figsize=(10, 5))
    plt.plot(train_vals, label="Train")
    plt.plot(val_vals, label="Validation")
    plt.title(title)
    plt.xlabel("Epoch")
    plt.ylabel(ylabel)
    plt.legend()
    plt.grid(True)
    plt.show()

def get_predictions(model, dl):
    model.eval()
    ys, preds = [], []
    probs_all = []
    with torch.no_grad():
        for Xb, yb in dl:
            Xb = Xb.to(device)
            logits = model(Xb)
            probs = torch.sigmoid(logits).cpu().numpy().reshape(-1)
            pred = (probs >= 0.5).astype(int)

            probs_all.extend(list(probs))
            preds.extend(list(pred))
            ys.extend(list(yb.cpu().numpy().reshape(-1).astype(int)))

    return np.array(ys), np.array(preds), np.array(probs_all)


## 5) Train with SGD (baseline)
SGD can work well with tuning (learning rate, momentum, schedules). We'll start with a reasonable baseline.


In [None]:
input_size = X_train_s.shape[1]

pos_weight = compute_pos_weight(y_train, device)
loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight)

model_sgd = FraudMLP(input_size=input_size, hidden=(256, 128, 64), p_drop=0.3).to(device)
optimizer_sgd = SGD(model_sgd.parameters(), lr=0.01, momentum=0.9)

epochs = 15
tr_l, va_l, tr_a, va_a = train_model(model_sgd, train_dl, val_dl, loss_fn, optimizer_sgd, epochs=epochs, print_every=5)

plot_curves(tr_l, va_l, "Loss (SGD)", "BCEWithLogitsLoss")
plot_curves(tr_a, va_a, "Accuracy (SGD)", "Accuracy")


In [None]:
y_true, y_pred, _ = get_predictions(model_sgd, test_dl)
print(classification_report(y_true, y_pred, digits=4))

cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(4,4))
sns.heatmap(cm, annot=True, fmt="d", cbar=False)
plt.title("Confusion Matrix (SGD)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()


## 6) Train with Adam
Adam often converges quickly on MLPs with less tuning.


In [None]:
model_adam = FraudMLP(input_size=input_size, hidden=(256, 128, 64), p_drop=0.3).to(device)
optimizer_adam = Adam(model_adam.parameters(), lr=0.001)

epochs = 10
tr_l2, va_l2, tr_a2, va_a2 = train_model(model_adam, train_dl, val_dl, loss_fn, optimizer_adam, epochs=epochs, print_every=5)

plot_curves(tr_l2, va_l2, "Loss (Adam)", "BCEWithLogitsLoss")
plot_curves(tr_a2, va_a2, "Accuracy (Adam)", "Accuracy")


In [None]:
y_true2, y_pred2, _ = get_predictions(model_adam, test_dl)
print(classification_report(y_true2, y_pred2, digits=4))

cm2 = confusion_matrix(y_true2, y_pred2)
plt.figure(figsize=(4,4))
sns.heatmap(cm2, annot=True, fmt="d", cbar=False)
plt.title("Confusion Matrix (Adam)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()


## Notes
- For fraud detection, **recall** on the fraud class is often more important than raw accuracy.
- Consider tuning the decision threshold (e.g., 0.2 instead of 0.5) and tracking PR-AUC.
- You can also try resampling strategies (undersampling/oversampling) or focal loss.
