# Classification Metrics (Confusion Matrix, ROC-AUC, PR Curve, F1)

This notebook explains important classification metrics: **Confusion Matrix, ROC-AUC, Precision-Recall Curve, and F1 Score**. We'll demonstrate each using a simple classification problem.

## 1. Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, roc_auc_score, roc_curve, precision_recall_curve, f1_score, classification_report

## 2. Load Example Dataset
We'll use the breast cancer dataset from sklearn.

In [None]:
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name='target')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('Train shape:', X_train.shape)
print('Test shape:', X_test.shape)

## 3. Train a Logistic Regression Model

In [None]:
clf = LogisticRegression(max_iter=10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test)[:, 1]

## 4. Confusion Matrix
A confusion matrix shows counts of true positives, false positives, true negatives, and false negatives.

In [None]:
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=data.target_names)
disp.plot(cmap='Blues')
plt.show()

## 5. ROC Curve and AUC
**ROC Curve** plots True Positive Rate vs. False Positive Rate. **AUC** (Area Under Curve) summarizes the ROC curve.

In [None]:
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {auc:.2f})')
plt.plot([0,1], [0,1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

## 6. Precision-Recall (PR) Curve
The PR curve shows the tradeoff between precision and recall.

In [None]:
precision, recall, thresholds = precision_recall_curve(y_test, y_proba)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()

## 7. F1 Score
**F1 Score** is the harmonic mean of precision and recall.
\[
F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
\]

In [None]:
f1 = f1_score(y_test, y_pred)
print(f'F1 Score: {f1:.2f}')

## 8. Classification Report
A summary including precision, recall, f1-score, and support for each class.

In [None]:
print(classification_report(y_test, y_pred, target_names=data.target_names))

## 9. Summary
- **Confusion Matrix**: Breakdown of predictions vs. reality.
- **ROC-AUC**: Quality of separation between classes.
- **PR Curve**: Precision/recall trade-off.
- **F1 Score**: Balance of precision and recall.