In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Day 4 – Threshold Tuning & Calibration\n",
    "\n",
    "In fraud detection, class imbalance makes **threshold = 0.5** unreliable.\n",
    "- We tune the threshold for **F1 (fraud class)**.\n",
    "- We add **Precision-Recall curves** and **calibration curves**.\n",
    "- This improves both **decision-making** (when to flag fraud) and **probability quality** (confidence in fraud risk)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 1,
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from pathlib import Path\n",
    "import joblib\n",
    "from sklearn.metrics import precision_recall_curve, average_precision_score, roc_curve, auc\n",
    "from sklearn.calibration import calibration_curve\n",
    "from src.data_prep import load_raw, get_feature_target, build_preprocessor\n",
    "from src.evaluate import evaluate_classification, find_best_threshold"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load dataset and trained model"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 2,
   "outputs": [],
   "source": [
    "df = load_raw(\"../data/raw/creditcard.csv\")\n",
    "X, y = get_feature_target(df)\n",
    "\n",
    "# load trained Random Forest model for demo\n",
    "model_path = Path(\"../models/rf_pipeline.joblib\")\n",
    "model = joblib.load(model_path)\n",
    "\n",
    "y_proba = model.predict_proba(X)[:, 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Baseline metrics (threshold = 0.5)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 3,
   "outputs": [],
   "source": [
    "metrics_05 = evaluate_classification(y, y_proba, threshold=0.5)\n",
    "metrics_05"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Find best threshold (optimize fraud F1)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 4,
   "outputs": [],
   "source": [
    "best_thr = find_best_threshold(y, y_proba, metric=\"f1\")\n",
    "best_thr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Precision-Recall Curve"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 5,
   "outputs": [],
   "source": [
    "prec, rec, thr = precision_recall_curve(y, y_proba)\n",
    "ap = average_precision_score(y, y_proba)\n",
    "\n",
    "plt.figure(figsize=(6, 4))\n",
    "plt.plot(rec, prec, label=f\"AP = {ap:.4f}\")\n",
    "plt.xlabel(\"Recall\")\n",
    "plt.ylabel(\"Precision\")\n",
    "plt.title(\"Precision-Recall Curve\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ROC Curve"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 6,
   "outputs": [],
   "source": [
    "fpr, tpr, _ = roc_curve(y, y_proba)\n",
    "roc_auc = auc(fpr, tpr)\n",
    "\n",
    "plt.figure(figsize=(6, 4))\n",
    "plt.plot(fpr, tpr, label=f\"ROC AUC = {roc_auc:.4f}\")\n",
    "plt.plot([0,1],[0,1], linestyle=\"--\", color=\"gray\")\n",
    "plt.xlabel(\"False Positive Rate\")\n",
    "plt.ylabel(\"True Positive Rate\")\n",
    "plt.title(\"ROC Curve\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calibration Curve (probability quality)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": 7,
   "outputs": [],
   "source": [
    "prob_true, prob_pred = calibration_curve(y, y_proba, n_bins=10, strategy=\"uniform\")\n",
    "\n",
    "plt.figure(figsize=(6, 4))\n",
    "plt.plot(prob_pred, prob_true, marker=\"o\", label=\"Calibration curve\")\n",
    "plt.plot([0,1],[0,1], linestyle=\"--\", color=\"gray\", label=\"Perfect calibration\")\n",
    "plt.xlabel(\"Mean predicted probability\")\n",
    "plt.ylabel(\"Fraction of positives\")\n",
    "plt.title(\"Calibration Curve\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 📌 Insights\n",
    "- Default threshold (0.5) is not optimal.\n",
    "- Tuned threshold increases **fraud recall & F1**.\n",
    "- Precision-Recall curve is more informative than ROC in imbalanced datasets.\n",
    "- Calibration curves show whether fraud probabilities are **trustworthy**."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


{'cells': [{'cell_type': 'markdown',
   'metadata': {},
   'source': ['# Day 4 – Threshold Tuning & Calibration\n',
    '\n',
    'In fraud detection, class imbalance makes **threshold = 0.5** unreliable.\n',
    '- We tune the threshold for **F1 (fraud class)**.\n',
    '- We add **Precision-Recall curves** and **calibration curves**.\n',
    '- This improves both **decision-making** (when to flag fraud) and **probability quality** (confidence in fraud risk).']},
  {'cell_type': 'code',
   'metadata': {},
   'execution_count': 1,
   'outputs': [],
   'source': ['import pandas as pd\n',
    'import matplotlib.pyplot as plt\n',
    'from pathlib import Path\n',
    'import joblib\n',
    'from sklearn.metrics import precision_recall_curve, average_precision_score, roc_curve, auc\n',
    'from sklearn.calibration import calibration_curve\n',
    'from src.data_prep import load_raw, get_feature_target, build_preprocessor\n',
    'from src.evaluate import evaluate_classification, find_best