# Exercise: Comparing Classifiers Using ROC Curves and AUC

**Objective:**  
In this exercise, you will practice evaluating and comparing different classification models using ROC curves and AUC scores. You will:
1. Load and explore a dataset
2. Train **two different classifiers** of your choice
3. Generate ROC curves for both models
4. Calculate AUC scores
5. Compare the models and determine which one performs better

**Available Classifiers:**
- Logistic Regression
- Decision Tree
- Random Forest

**What You'll Learn:**
- How to train multiple models on the same dataset
- How to generate and interpret ROC curves
- How to use AUC to compare model performance
- How to make informed decisions about which model is better



## Step 1: Import Libraries

First, import all the necessary libraries you'll need for this exercise.


In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Import classifiers (you'll use two of these)
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# Import metrics for evaluation
from sklearn.metrics import (
    confusion_matrix, 
    ConfusionMatrixDisplay,
    accuracy_score, 
    precision_score, 
    recall_score, 
    f1_score,
    roc_curve, 
    roc_auc_score
)

print("Libraries imported successfully!")


## Step 2: Load and Explore the Dataset

We'll use the **Breast Cancer Wisconsin dataset**, which is a binary classification problem. The task is to predict whether a tumor is **malignant (cancerous)** or **benign (non-cancerous)** based on various features.


In [None]:
# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Create a DataFrame for better visualization
df = pd.DataFrame(X, columns=data.feature_names)
df['target'] = y

print("Dataset shape:", X.shape)
print("\nTarget distribution:")
print(pd.Series(y).value_counts())
print("\n0 = Malignant (cancerous)")
print("1 = Benign (non-cancerous)")
print("\nFirst few rows:")
df.head()


## Step 3: Train-Test Split

Split the data into training and testing sets to evaluate model performance on unseen data.


In [None]:
# Split the data: 75% training, 25% testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]} samples")
print(f"Testing set size: {X_test.shape[0]} samples")


## Step 4: Train Your First Classifier

**TODO:** Choose your first classifier from the options below and train it on the training data.

**Options:**
- `LogisticRegression(max_iter=10000, random_state=42)`
- `DecisionTreeClassifier(random_state=42, max_depth=5)`
- `RandomForestClassifier(random_state=42, n_estimators=100)`


In [None]:
# TODO: Initialize your first classifier
# Example: model_1 = LogisticRegression(max_iter=10000, random_state=42)
model_1 = None  # Replace None with your chosen classifier

# TODO: Fit the model on the training data
# model_1.fit(X_train, y_train)

# TODO: Make predictions on the test set
# y_pred_1 = model_1.predict(X_test)
# y_prob_1 = model_1.predict_proba(X_test)[:, 1]  # Probabilities for the positive class

print("Model 1 trained successfully!")


## Step 5: Train Your Second Classifier

**TODO:** Choose a **different** classifier and train it on the same training data.


In [None]:
# TODO: Initialize your second classifier (choose a DIFFERENT one from model_1)
# Example: model_2 = DecisionTreeClassifier(random_state=42, max_depth=5)
model_2 = None  # Replace None with your chosen classifier

# TODO: Fit the model on the training data
# model_2.fit(X_train, y_train)

# TODO: Make predictions on the test set
# y_pred_2 = model_2.predict(X_test)
# y_prob_2 = model_2.predict_proba(X_test)[:, 1]

print("Model 2 trained successfully!")


## Step 6: Evaluate Model 1

**TODO:** Calculate key metrics for your first model.


In [None]:
# TODO: Calculate metrics for Model 1
# acc_1 = accuracy_score(y_test, y_pred_1)
# prec_1 = precision_score(y_test, y_pred_1)
# rec_1 = recall_score(y_test, y_pred_1)
# f1_1 = f1_score(y_test, y_pred_1)
# auc_1 = roc_auc_score(y_test, y_prob_1)

# TODO: Print the metrics
# print("Model 1 Performance:")
# print(f"Accuracy: {acc_1:.3f}")
# print(f"Precision: {prec_1:.3f}")
# print(f"Recall: {rec_1:.3f}")
# print(f"F1 Score: {f1_1:.3f}")
# print(f"AUC: {auc_1:.3f}")


## Step 7: Evaluate Model 2

**TODO:** Calculate key metrics for your second model.


In [None]:
# TODO: Calculate metrics for Model 2
# acc_2 = accuracy_score(y_test, y_pred_2)
# prec_2 = precision_score(y_test, y_pred_2)
# rec_2 = recall_score(y_test, y_pred_2)
# f1_2 = f1_score(y_test, y_pred_2)
# auc_2 = roc_auc_score(y_test, y_prob_2)

# TODO: Print the metrics
# print("Model 2 Performance:")
# print(f"Accuracy: {acc_2:.3f}")
# print(f"Precision: {prec_2:.3f}")
# print(f"Recall: {rec_2:.3f}")
# print(f"F1 Score: {f1_2:.3f}")
# print(f"AUC: {auc_2:.3f}")


## Step 8: Plot ROC Curves for Both Models

**TODO:** Generate ROC curves for both models on the same plot to visually compare their performance.


In [None]:
# TODO: Calculate ROC curve points for Model 1
# fpr_1, tpr_1, _ = roc_curve(y_test, y_prob_1)

# TODO: Calculate ROC curve points for Model 2
# fpr_2, tpr_2, _ = roc_curve(y_test, y_prob_2)

# TODO: Create the plot
# plt.figure(figsize=(8, 6))
# plt.plot(fpr_1, tpr_1, label=f'Model 1 (AUC = {auc_1:.2f})', linewidth=2)
# plt.plot(fpr_2, tpr_2, label=f'Model 2 (AUC = {auc_2:.2f})', linewidth=2)
# plt.plot([0, 1], [0, 1], 'k--', label='Random Guessing (AUC = 0.50)')
# plt.xlabel('False Positive Rate', fontsize=12)
# plt.ylabel('True Positive Rate', fontsize=12)
# plt.title('ROC Curve Comparison', fontsize=14)
# plt.legend(fontsize=10)
# plt.grid(alpha=0.3)
# plt.show()


## Step 9: Compare and Interpret

**TODO:** Answer the following questions based on your results:

1. **Which model has a higher AUC score?**
   - Write your answer here:

2. **Looking at the ROC curves, which curve is closer to the upper-left corner?**
   - Write your answer here:

3. **Based on the AUC interpretation guide from the lecture:**
   - Model 1 AUC = _____ → Performance level: _____
   - Model 2 AUC = _____ → Performance level: _____

4. **Which model would you recommend for this task and why?**
   - Write your answer here:

5. **Are there any trade-offs between the two models?** (Consider accuracy, precision, recall, and interpretability)
   - Write your answer here:



## Step 10: Create a Summary Table

**TODO:** Create a comparison table showing all metrics for both models side by side.


In [None]:
# TODO: Create a summary DataFrame
# summary = pd.DataFrame({
#     'Metric': ['Accuracy', 'Precision', 'Recall', 'F1 Score', 'AUC'],
#     'Model 1': [acc_1, prec_1, rec_1, f1_1, auc_1],
#     'Model 2': [acc_2, prec_2, rec_2, f1_2, auc_2]
# })

# TODO: Print the table
# print("\nModel Comparison Summary:")
# print(summary.round(3).to_string(index=False))

# TODO: Highlight which model wins on each metric
# print("\nBest performing model per metric:")
# for idx, row in summary.iterrows():
#     metric = row['Metric']
#     if row['Model 1'] > row['Model 2']:
#         print(f"{metric}: Model 1 ({row['Model 1']:.3f})")
#     elif row['Model 2'] > row['Model 1']:
#         print(f"{metric}: Model 2 ({row['Model 2']:.3f})")
#     else:
#         print(f"{metric}: Tie ({row['Model 1']:.3f})")


## Bonus Challenge (Optional)

**Try the following to deepen your understanding:**

1. **Train a third classifier** and add it to the ROC curve comparison. Which of the three performs best?

2. **Experiment with hyperparameters:**
   - For Decision Tree: try different values of `max_depth` (3, 5, 10, None)
   - For Random Forest: try different values of `n_estimators` (50, 100, 200)
   - For Logistic Regression: try adding `C=0.1` or `C=10` (regularization strength)
   - How does this affect the AUC?

3. **Create confusion matrices** for both models and compare the distribution of errors. Which model makes more false positives? Which makes more false negatives?

4. **Threshold analysis:** Pick one model and experiment with different decision thresholds (0.3, 0.5, 0.7) to see how precision and recall change.



## Key Takeaways

After completing this exercise, you should understand:

1. **ROC curves visualize classifier performance** across all thresholds, not just one cutoff point.

2. **AUC provides a single number** to compare models—higher is always better.

3. **Different models have different strengths:**
   - Logistic Regression: Simple, interpretable, works well with linear relationships
   - Decision Tree: Captures non-linear patterns, easy to visualize, but can overfit
   - Random Forest: Often high performance, reduces overfitting, but less interpretable

4. **The "best" model depends on context:** Consider not just AUC but also precision, recall, interpretability, and computational cost.

5. **ROC curves help you make informed decisions** about which model to deploy in real-world applications.
