# Class Imbalance

This notebook is a **companion to `07_class_imbalance.md`**.

Purpose:
- Illustrate why accuracy fails
- Compare metrics under imbalance
- Reinforce interview intuition

---

In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

np.random.seed(42)

## Simulated Imbalanced Dataset

---

In [None]:
n = 1000
y_true = np.random.choice([0, 1], size=n, p=[0.95, 0.05])

# Naive model predicts all zeros
y_pred_naive = np.zeros_like(y_true)

pd.Series(y_true).value_counts(normalize=True)

## Naive Model Performance

---

In [None]:
pd.DataFrame({
    'Accuracy': [accuracy_score(y_true, y_pred_naive)],
    'Precision': [precision_score(y_true, y_pred_naive, zero_division=0)],
    'Recall': [recall_score(y_true, y_pred_naive)],
    'F1': [f1_score(y_true, y_pred_naive)]
}).T.rename(columns={0: 'Value'})

## Confusion Matrix

---

In [None]:
cm = confusion_matrix(y_true, y_pred_naive)
pd.DataFrame(cm, index=['Actual 0', 'Actual 1'], columns=['Pred 0', 'Pred 1'])

## Interview Takeaways

- High accuracy can be meaningless
- Minority class performance matters most
- Metrics must reflect business cost

In interviews, always ask about base rates.

---