
# Financial Fraud Detection — Classical + QUBO on One Dataset (Colab-Ready)

This notebook builds a **fraud scoring** model and an **alert selection** policy using a single synthetic dataset:

1) **Classical path**
   - Train a **logistic regression** (implemented from scratch) to predict fraud probability \( \hat{p}_i \) per transaction.
   - Optimize which alerts to **review** via a **MILP** (PuLP): choose up to a review **capacity** \(B\) to maximize expected utility. Includes a **greedy fallback** if PuLP isn't available.
   - Visualize ROC & Precision–Recall.

2) **QUBO path (quantum-ready)**
   - Build a **transaction affinity graph** (shared device/merchant) to capture ring behavior.
   - Select alerts with a **QUBO** that combines: expected utility, a **budget** penalty, and a **pairwise coupling** that promotes selecting **connected suspicious clusters**. Solve with `neal` (simulated annealing).

We reuse the same dataset and compare the review sets and utilities.


In [None]:

# Install dependencies if needed
def _silent_imports():
    flags = {"pulp": False, "dimod": False, "neal": False}
    try:
        import pulp
        flags["pulp"] = True
    except Exception:
        pass
    try:
        import dimod
        flags["dimod"] = True
    except Exception:
        pass
    try:
        import neal
        flags["neal"] = True
    except Exception:
        pass
    return flags

flags = _silent_imports()
if not flags["pulp"]:
    %pip -q install pulp
if not flags["dimod"] or not flags["neal"]:
    %pip -q install dimod neal

flags = _silent_imports()
print("PuLP:", flags["pulp"], "| dimod:", flags["dimod"], "| neal:", flags["neal"])


In [None]:

# ==== One Synthetic Dataset ====
import numpy as np, pandas as pd

rng = np.random.default_rng(404)

N = 3000                           # transactions
review_capacity = 250              # number of cases reviewers can handle
reward_detect = 500.0              # benefit if a fraud is correctly reviewed/caught
cost_false_pos = 25.0              # review cost + customer friction when it's not fraud

# Features: amount, hour, merchant risk, device age, velocity (txn per hour last day), prior_declines
amount = np.exp(rng.normal(3.4, 0.9, size=N))  # log-normal-ish
hour = rng.integers(0, 24, size=N)
merchant_risk = rng.uniform(0, 1, size=N)
device_age_days = rng.integers(0, 400, size=N)
velocity = rng.exponential(scale=2.0, size=N)
prior_declines = rng.binomial(4, 0.2, size=N)

# Latent group effects (rings): shared device_id and merchant_id clusters
n_devices = 450
n_merchants = 220
device_id = rng.integers(0, n_devices, size=N)
merchant_id = rng.integers(0, n_merchants, size=N)

# True fraud logit model with group bumps
w = np.array([0.0025,  # amount
              0.05,    # hour (late-night risk)
              1.2,     # merchant risk
             -0.004,   # device age (older -> safer)
              0.35,    # velocity
              0.28])   # prior declines
X = np.column_stack([amount, (hour>=22)|(hour<=5), merchant_risk, device_age_days, velocity, prior_declines]).astype(float)

# Group risk terms
dev_risk = rng.normal(0, 0.4, size=n_devices)
mer_risk = rng.normal(0, 0.5, size=n_merchants)
g_bump = dev_risk[device_id] + mer_risk[merchant_id]

logit = X @ w + 0.6*g_bump - 4.8  # base rate low
p_true = 1/(1+np.exp(-logit))
y = rng.binomial(1, p_true)  # labels

df = pd.DataFrame({
    "amount": amount, "is_late": ((hour>=22)|(hour<=5)).astype(int),
    "merchant_risk": merchant_risk, "device_age": device_age_days,
    "velocity": velocity, "prior_declines": prior_declines,
    "device_id": device_id, "merchant_id": merchant_id,
    "label": y
})
print("Fraud rate:", df["label"].mean().round(4), "| N:", N)
df.head()



## Part 1 — Classical: Manual Logistic Regression + MILP Alert Selection


In [None]:

# Train/test split
import numpy as np, pandas as pd

idx = np.arange(len(df))
rng = np.random.default_rng(1)
rng.shuffle(idx)
split = int(0.75*len(df))
train_idx, test_idx = idx[:split], idx[split:]

features = ["amount","is_late","merchant_risk","device_age","velocity","prior_declines"]
X_all = df[features].values.astype(float)
y_all = df["label"].values.astype(int)

X_train, y_train = X_all[train_idx], y_all[train_idx]
X_test, y_test   = X_all[test_idx],  y_all[test_idx]

# Feature standardization (important for gradient)
mu = X_train.mean(axis=0); sigma = X_train.std(axis=0) + 1e-9
Xtr = (X_train - mu)/sigma
Xte = (X_test  - mu)/sigma

# Manual logistic regression (L2-regularized) via gradient descent
def train_logreg(X, y, lr=0.1, l2=1e-2, iters=800):
    n, d = X.shape
    w = np.zeros(d); b = 0.0
    for t in range(iters):
        z = X@w + b
        p = 1/(1+np.exp(-z))
        grad_w = X.T@(p - y)/n + l2*w
        grad_b = (p - y).mean()
        w -= lr*grad_w
        b -= lr*grad_b
    return w, b

w_hat, b_hat = train_logreg(Xtr, y_train, lr=0.2, l2=5e-3, iters=1000)

def predict_proba(X):
    z = X@w_hat + b_hat
    return 1/(1+np.exp(-z))

p_train = predict_proba(Xtr)
p_test = predict_proba(Xte)

print("Train mean p̂:", float(p_train.mean()).__round__(4), "| Test mean p̂:", float(p_test.mean()).__round__(4))


In [None]:

# ROC and Precision–Recall
import numpy as np, matplotlib.pyplot as plt

def roc_pr(y, p):
    # thresholds from sorted unique probabilities
    order = np.argsort(-p)
    y_sorted = y[order]; p_sorted = p[order]
    tps = np.cumsum(y_sorted==1)
    fps = np.cumsum(y_sorted==0)
    T = len(y)
    P = (y==1).sum()
    Nn = (y==0).sum()
    # ROC
    TPR = tps / max(P,1)
    FPR = fps / max(Nn,1)
    # PR
    precision = tps / np.maximum((np.arange(T)+1), 1)
    recall = TPR
    return FPR, TPR, precision, recall

FPR, TPR, Pprec, Rrec = roc_pr(y_test, p_test)

plt.figure()
plt.plot(FPR, TPR)
plt.title("ROC (Test)")
plt.xlabel("FPR"); plt.ylabel("TPR"); plt.tight_layout()

plt.figure()
plt.plot(Rrec, Pprec)
plt.title("Precision–Recall (Test)")
plt.xlabel("Recall"); plt.ylabel("Precision"); plt.tight_layout()



### MILP Alert Selection (capacity \(B\))

We decide which transactions to **review** to maximize **expected utility**:
\[
\max \sum_i z_i \big( \hat{p}_i \cdot R - (1-\hat{p}_i)\cdot C \big)
\quad \text{s.t. } \sum_i z_i \le B,\; z_i\in\{0,1\}.
\]
If PuLP is unavailable, we use a **greedy density** fallback (sort by \( (\hat{p}R - (1-\hat{p})C) \)).


In [None]:

import numpy as np, pandas as pd

try:
    import pulp
    HAVE_PULP = True
except Exception:
    HAVE_PULP = False

# We'll optimize on the **test** set (as if in production)
util = p_test*reward_detect - (1-p_test)*cost_false_pos

def solve_milp_alerts(util, capacity):
    n = len(util)
    prob = pulp.LpProblem("AlertSelection", pulp.LpMaximize)
    z = [pulp.LpVariable(f"z_{i}", lowBound=0, upBound=1, cat="Binary") for i in range(n)]
    prob += pulp.lpSum(float(util[i]) * z[i] for i in range(n))
    prob += pulp.lpSum(z) <= int(capacity)
    _ = prob.solve(pulp.PULP_CBC_CMD(msg=False))
    status = pulp.LpStatus[prob.status]
    z_sol = np.array([int(round(pulp.value(z[i]) or 0)) for i in range(n)])
    obj = float(pulp.value(prob.objective))
    return status, z_sol, obj

def greedy_alerts(util, capacity):
    order = np.argsort(-util)
    z = np.zeros(len(util), dtype=int)
    z[order[:int(capacity)]] = 1
    return "Heuristic", z, float(util[order[:int(capacity)]].sum())

if HAVE_PULP:
    status_mip, z_mip, obj_mip = solve_milp_alerts(util, review_capacity)
else:
    status_mip, z_mip, obj_mip = greedy_alerts(util, review_capacity)

print("Alert selection (classical) — status:", status_mip, "| expected utility:", round(obj_mip,2))



## Part 2 — QUBO: Graph-Aware Alert Selection with Budget

We add **pairwise structure** to encourage selecting **connected suspicious rings**.

- Binary \(s_i\): 1 if we **review** transaction \(i\).  
- Linear benefit: \(-U_i s_i\) where \(U_i = \hat{p}_i R - (1-\hat{p}_i)C\) (negated for minimization).  
- **Budget penalty:** \(\lambda (\sum_i s_i - B)^2\).  
- **Graph coupling:** for edges \( (i,j) \) (shared device/merchant), add **negative** weight \(-\gamma w_{ij} s_i s_j\) so selecting both lowers energy (promotes clusters).


In [None]:

from collections import defaultdict
import numpy as np, pandas as pd
import dimod, neal

# We'll build the QUBO on the **test set** for comparability
df_test = df.iloc[test_idx].copy()
U = util  # expected utility vector for test set
n = len(U)

# Build a simple affinity graph (shared device or merchant)
# weight 1.0 for shared device, 0.7 for shared merchant
edges = []
row_index = np.arange(n)
# Map device_id and merchant_id to indices within test set
dev_to_rows = {}
mer_to_rows = {}
for local_row, global_row in enumerate(test_idx):
    dev_to_rows.setdefault(int(df.loc[global_row,"device_id"]), []).append(local_row)
    mer_to_rows.setdefault(int(df.loc[global_row,"merchant_id"]), []).append(local_row)

for rows in dev_to_rows.values():
    if len(rows) > 1:
        for a_idx in range(len(rows)):
            for b_idx in range(a_idx+1, len(rows)):
                edges.append((rows[a_idx], rows[b_idx], 1.0))

for rows in mer_to_rows.values():
    if len(rows) > 1:
        for a_idx in range(len(rows)):
            for b_idx in range(a_idx+1, len(rows)):
                edges.append((rows[a_idx], rows[b_idx], 0.7))

# QUBO parameters
lam = float(5.0 * np.maximum(1.0, np.std(U)))   # budget penalty weight
gamma = 1.5                                      # cluster coupling

Q = defaultdict(float)

# Linear utility (negated for minimization): -U_i
for i in range(n):
    Q[(i,i)] += -float(U[i])

# Budget penalty: lam*(sum s - B)^2
B = float(review_capacity)
for i in range(n):
    Q[(i,i)] += lam*(1 - 2*B)
for i in range(n):
    for j in range(i+1, n):
        Q[(i,j)] += 2*lam

# Graph coupling: -gamma * w_ij * s_i s_j
for (i,j,w) in edges:
    if i == j: continue
    if i > j: i,j = j,i
    Q[(i,j)] += -gamma * float(w)

# Solve with SA
bqm = dimod.BinaryQuadraticModel.from_qubo(dict(Q))
sampleset = neal.SimulatedAnnealingSampler().sample(bqm, num_reads=1500)
best = sampleset.first
s_qubo = np.array([best.sample.get(i,0) for i in range(n)], dtype=int)

budget_used = int(s_qubo.sum())
util_qubo = float((U * s_qubo).sum())
print("QUBO — selected:", budget_used, " (target:", int(B), ") | expected utility:", round(util_qubo,2))


In [None]:

# Compare classical vs QUBO selections
import numpy as np, pandas as pd

chosen_classical = np.where(z_mip==1)[0]
chosen_qubo = np.where(s_qubo==1)[0]

overlap = len(set(chosen_classical).intersection(set(chosen_qubo)))
print("Classical selected:", len(chosen_classical), "| QUBO selected:", len(chosen_qubo), "| Overlap:", overlap)

df_compare = pd.DataFrame({
    "idx": np.arange(len(U)),
    "p_hat": p_test,
    "utility": U,
    "classical": z_mip,
    "qubo": s_qubo
}).sort_values("p_hat", ascending=False).head(15)
df_compare.head(15)



## Wrap-up

- The **classical** pipeline learns a probability model and uses **MILP** to pick the best alerts under capacity.  
- The **QUBO** adds **structure** via a transaction graph, encouraging clusters typical of fraud rings, while also honoring a **soft** budget.  
- Try varying `review_capacity`, `reward_detect`, `cost_false_pos`, and the QUBO weights `lam`, `gamma` to see trade-offs between **precision**, **recall**, and **cluster capture**.
