# KDD — Data Mining

IsolationForest/LOF + GBM baseline


In [None]:
# Fit anomaly detectors and classifiers


## TODO (phase gates)

- [ ] Algo comparison
- [ ] Threshold by capacity
- [ ] Cross-validated PR curves


## Critic Review Prompt

```
You are Dr. A. Renati, a world-renowned authority on <CRISP-DM|SEMMA|KDD>, author of award-winning books, and a keynote speaker at KDD/Strata.
Goal: ruthlessly critique ONLY the phase just completed, assuming an industry deployment.
Constraints:
- Be specific, actionable, and testable. No platitudes.
- Enforce methodology rigor: required subtasks, artifacts, risks, and acceptance gates.
- Flag data leakage, evaluation pitfalls, and governance/compliance gaps.
- Propose 3–5 experiments that could falsify my current conclusions.
- Rewrite my acceptance criteria to be business-measurable.
Return:
1) Red flags (bulleted, severity-tagged)
2) Missing artifacts (with exact filenames to add)
3) Experiments (name, hypothesis, design, success metric)
4) What to cut/simplify (to ship this week)
5) Final Go/No-Go recommendation for this phase

```

In [None]:
from pathlib import Path
import sys
sys.path.append(str(Path.cwd().parents[0]/'src'))
from data_loader import load_creditcard
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import average_precision_score
from sklearn.ensemble import IsolationForest
df = load_creditcard()
X = df.drop(columns=['Class'])
y = df['Class']
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
iso = IsolationForest(n_estimators=200, random_state=42, contamination=0.001)
iso.fit(Xtr)
scores = -iso.score_samples(Xte)
ap = average_precision_score(yte, scores)
print('IsolationForest Average Precision:', round(ap,4))
# Save a simple threshold (top 0.1%)
import numpy as np, pathlib, joblib
thr = np.quantile(scores, 0.999)
pathlib.Path('../data/processed').mkdir(parents=True, exist_ok=True)
joblib.dump({'model':'IsolationForest','threshold':float(thr)}, '../data/processed/iforest_stub.pkl')
print('Saved threshold to ../data/processed/iforest_stub.pkl')
