# Model Evaluation Essentials (Compact but Complete)

This note covers **ROC–AUC**, **Precision–Recall Curves**, **Threshold Tuning**, and an **End-to-End Interview Checklist**. Short, practical, and exam/interview ready.

---

## 1. ROC Curve & AUC

### Intuition

ROC answers one question:

> *How well does the model separate positives from negatives across **all thresholds**?*

Instead of fixing a threshold (like 0.5), ROC evaluates **ranking quality** of predictions.

---

### Definitions

* **TPR (Recall / Sensitivity)**
  $$
  TPR = \frac{TP}{TP + FN}
  $$

* **FPR (Fall-out)**
  $$
  FPR = \frac{FP}{FP + TN}
  $$

---

### ROC Curve

* X-axis → **FPR**
* Y-axis → **TPR**
* Each point = different probability threshold

**Good model** → curve bends toward top-left

**Random model** → diagonal line

---

### AUC (Area Under Curve)

$$
AUC = P( score_{positive} > score_{negative} )
$$

Interpretation:

* **0.5** → random guessing
* **0.7–0.8** → acceptable
* **0.8–0.9** → strong
* **> 0.9** → excellent (or suspicious)

---

### When to Use ROC–AUC

✔ Balanced datasets
✔ When ranking matters
❌ Highly imbalanced datasets

---

### Code (sklearn)

```python
from sklearn.metrics import roc_curve, roc_auc_score

probs = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, probs)
auc = roc_auc_score(y_test, probs)
```

---

## 2. Precision–Recall (PR) Curve

### Why PR Curve?

ROC can look **misleadingly good** on imbalanced data.
PR focuses **only on the positive class**.

---

### Definitions

* **Precision**
  $$
  Precision = \frac{TP}{TP + FP}
  $$

* **Recall**
  $$
  Recall = \frac{TP}{TP + FN}
  $$

---

### PR Curve

* X-axis → **Recall**
* Y-axis → **Precision**

Shows trade-off:

* Increase Recall → Precision usually drops

---

### Baseline

Baseline Precision = positive class ratio

If positives = 5%, baseline = 0.05

---

### When to Use PR Curve

✔ Imbalanced datasets
✔ Positive class is important (fraud, cancer, spam)

---

### Code

```python
from sklearn.metrics import precision_recall_curve, average_precision_score

precision, recall, thresholds = precision_recall_curve(y_test, probs)
ap = average_precision_score(y_test, probs)
```

---

## 3. Threshold Tuning

### Default Threshold = Bad Idea

Most models use:

```text
probability >= 0.5 → positive
```

This is **arbitrary**.

---

### Why Tune Threshold?

* Business cost ≠ symmetric
* FP vs FN have different impact

Examples:

* **Cancer detection** → minimize FN → lower threshold
* **Spam filter** → minimize FP → higher threshold

---

### Strategy 1: Maximize F1

```python
from sklearn.metrics import f1_score

best_t, best_f1 = 0, 0
for t in thresholds:
    preds = (probs >= t).astype(int)
    f1 = f1_score(y_test, preds)
    if f1 > best_f1:
        best_f1, best_t = f1, t
```

---

### Strategy 2: Constraint-Based

Examples:

* Recall ≥ 95%
* FPR ≤ 1%

Choose threshold satisfying constraint.

---

### Key Insight

> Threshold tuning changes **metrics**, not model quality.

---

## 4. Metric Selection Cheat Sheet

| Problem          | Priority       | Metric      |
| ---------------- | -------------- | ----------- |
| Cancer detection | FN costly      | Recall      |
| Spam filter      | FP costly      | Precision   |
| Fraud detection  | Both           | PR–AUC / F1 |
| Balanced data    | Ranking        | ROC–AUC     |
| Imbalanced data  | Positive class | PR–AUC      |

---

## 5. End-to-End Evaluation Checklist (Interview Gold)

### Step 1: Understand Data

* Balanced or imbalanced?
* What is **positive class**?

---

### Step 2: Start Simple

* Confusion Matrix
* Accuracy (only as sanity check)

---

### Step 3: Error Cost Analysis

* Which is worse: FP or FN?
* Real-world impact?

---

### Step 4: Choose Metrics

* Precision / Recall / F1
* ROC–AUC or PR–AUC

---

### Step 5: Tune Threshold

* Use PR / ROC curves
* Optimize business objective

---

### Step 6: Validate

* Cross-validation
* Check metric stability

---

### Step 7: Final Report

* Metric values
* Confusion matrix
* Threshold used
* Why these metrics?

---

## Interview One-Liners

* **ROC–AUC** measures ranking, not classification.
* **PR–AUC** is better for imbalanced datasets.
* **Accuracy** is useless when classes are skewed.
* **Threshold tuning** aligns model with business cost.
* **F1** penalizes imbalance between precision & recall.

---

## Final Takeaway

> Good models are trained on data.
> **Great models are selected using the right metrics.**

---

If you want:

* Visual plots version
* Real project walkthrough
* Interview Q&A only version

Just say it.


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')

FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/digit-recognizer/train.csv'

In [None]:
df.head()

NameError: name 'df' is not defined

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,1:],df.iloc[:,0],test_size=0.2,random_state=2)

NameError: name 'df' is not defined

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [None]:

clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

In [None]:
clf1.fit(X_train,y_train)
clf2.fit(X_train,y_train)

NameError: name 'X_train' is not defined

In [None]:
y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)
from sklearn.metrics import accuracy_score,confusion_matrix
print("Accuracy of Logistic Regression",accuracy_score(y_test,y_pred1))
print("Accuracy of Decision Trees",accuracy_score(y_test,y_pred2))

NameError: name 'X_test' is not defined

In [None]:
print("Logistic Regression Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,10)))

Logistic Regression Confusion Matrix



NameError: name 'confusion_matrix' is not defined

In [None]:
print("Decision Tree Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,10)))

Decision Tree Confusion Matrix



NameError: name 'confusion_matrix' is not defined

In [None]:
from sklearn.metrics import precision_score,recall_score,f1_score
precision_score(y_test,y_pred1,average='weighted')

NameError: name 'y_test' is not defined

In [None]:
recall_score(y_test,y_pred1,average='weighted')

NameError: name 'y_test' is not defined

In [None]:
f1_score(y_test,y_pred1,average='weighted')

NameError: name 'y_test' is not defined

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred1))

NameError: name 'y_test' is not defined