# Notebook 03 — Error Analysis & Business Impact

Goal:
- Inspect false positives / false negatives at the chosen threshold
- Understand patterns behind model errors
- Translate metrics into business terms (review load, missed fraud loss)
- Produce decision-ready recommendations

In [None]:
import sys
from pathlib import Path
sys.path.append(str(Path("..").resolve()))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from src.config import CFG
from src.io import load_csv
from src.split import stratified_split
from src.models import train_logreg_baseline
from src.thresholding import find_threshold_min_cost, apply_threshold
from src.evaluation import evaluate_at_threshold

plt.rcParams["figure.figsize"] = (7, 4)

In [None]:
DATA_PATH = "../data/raw/creditcard.csv"
df = load_csv(DATA_PATH)

target_col = CFG.target_col

train_df, test_df = stratified_split(
    df=df,
    target_col=target_col,
    test_size=CFG.test_size,
    seed=CFG.seed,
)

X_train = train_df.drop(columns=[target_col])
y_train = train_df[target_col].values

X_test = test_df.drop(columns=[target_col])
y_test = test_df[target_col].values

trained = train_logreg_baseline(X_train, y_train, seed=CFG.seed)
y_prob = trained.predict_proba(X_test)

## Threshold selection

We analyze errors under a business-driven threshold.
Here we use the **minimum expected cost** threshold based on FN/FP costs.

In [None]:
t = find_threshold_min_cost(
    y_true=y_test,
    y_prob=y_prob,
    cost_fn=CFG.cost_false_negative,
    cost_fp=CFG.cost_false_positive,
)

metrics = evaluate_at_threshold(
    y_true=y_test,
    y_prob=y_prob,
    threshold=t,
    cost_fn=CFG.cost_false_negative,
    cost_fp=CFG.cost_false_positive,
)

t, metrics

In [None]:
y_pred = apply_threshold(y_prob, t)

analysis_df = test_df.copy()
analysis_df["y_true"] = y_test
analysis_df["y_prob"] = y_prob
analysis_df["y_pred"] = y_pred

analysis_df.head()

In [None]:
cm = np.array(metrics["TN_FP_FN_TP"]) if "TN_FP_FN_TP" in metrics else None
# if not present, compute quickly:
if cm is None:
    # metrics["confusion_matrix"] exists as [[TN, FP],[FN, TP]]
    m = np.array(metrics["confusion_matrix"])
    tn, fp, fn, tp = m[0,0], m[0,1], m[1,0], m[1,1]
else:
    tn, fp, fn, tp = cm

tn, fp, fn, tp

In [None]:
fp_df = analysis_df[(analysis_df["y_true"] == 0) & (analysis_df["y_pred"] == 1)]
fn_df = analysis_df[(analysis_df["y_true"] == 1) & (analysis_df["y_pred"] == 0)]
tp_df = analysis_df[(analysis_df["y_true"] == 1) & (analysis_df["y_pred"] == 1)]
tn_df = analysis_df[(analysis_df["y_true"] == 0) & (analysis_df["y_pred"] == 0)]

len(fp_df), len(fn_df), len(tp_df), len(tn_df)

## Error patterns

Because features V1–V28 are PCA-transformed and not interpretable, we focus on:
- Amount patterns (transaction size)
- probability distributions (model confidence)

In [None]:
summary = pd.DataFrame({
    "group": ["TP", "FN", "FP", "TN"],
    "count": [len(tp_df), len(fn_df), len(fp_df), len(tn_df)],
    "amount_mean": [tp_df["Amount"].mean(), fn_df["Amount"].mean(), fp_df["Amount"].mean(), tn_df["Amount"].mean()],
    "amount_median": [tp_df["Amount"].median(), fn_df["Amount"].median(), fp_df["Amount"].median(), tn_df["Amount"].median()],
    "prob_mean": [tp_df["y_prob"].mean(), fn_df["y_prob"].mean(), fp_df["y_prob"].mean(), tn_df["y_prob"].mean()],
})

summary

In [None]:
plt.hist(fp_df["Amount"], bins=50, alpha=0.7, label="FP")
plt.hist(fn_df["Amount"], bins=50, alpha=0.7, label="FN")
plt.title("Amount distribution: False Positives vs False Negatives")
plt.xlabel("Amount")
plt.ylabel("Count")
plt.legend()
plt.show()

In [None]:
plt.hist(fp_df["y_prob"], bins=50, alpha=0.7, label="FP")
plt.hist(fn_df["y_prob"], bins=50, alpha=0.7, label="FN")
plt.title("Model confidence for errors")
plt.xlabel("Predicted probability (fraud)")
plt.ylabel("Count")
plt.legend()
plt.show()

## Business impact

We translate confusion matrix into business terms:
- FP → manual review load / customer friction
- FN → missed fraud loss

We use simplified cost assumptions from `CFG`:
- cost_false_negative
- cost_false_positive

In [None]:
cost_fn = CFG.cost_false_negative
cost_fp = CFG.cost_false_positive

expected_review_cost = fp * cost_fp
expected_missed_fraud_cost = fn * cost_fn
total_expected_cost = expected_review_cost + expected_missed_fraud_cost

expected_review_cost, expected_missed_fraud_cost, total_expected_cost

In [None]:
business_table = pd.DataFrame([
    {"item": "False Positives (alerts)", "count": int(fp), "unit_cost": cost_fp, "estimated_cost": expected_review_cost},
    {"item": "False Negatives (missed fraud)", "count": int(fn), "unit_cost": cost_fn, "estimated_cost": expected_missed_fraud_cost},
])

business_table

## Decision-ready conclusion

- Selected threshold: **T = {T}**
- Under this threshold:
  - Alerts (FP): **{FP}**
  - Missed fraud (FN): **{FN}**
  - Estimated review cost: **{REV_COST}**
  - Estimated missed-fraud cost: **{FRAUD_COST}**
  - Total expected cost: **{TOTAL}**

### Recommendation
- Use this threshold if the business prioritizes (cost minimization / recall / precision).
- If alert volume is too high, increase threshold or add a second-stage model.
- If missed fraud is too costly, lower threshold and accept higher review load.

### Next steps
- Try stronger models (e.g. Gradient Boosting) and compare PR-AUC + cost curves
- Probability calibration (Platt/Isotonic) to make thresholds more reliable
- Monitoring plan: weekly alert rate, drift checks, PR-AUC on labeled data