# Evaluation
The core objective of classification evaluation metrics is to quantify how accurately a model predicts class labels. Since simple Accuracy can be misleading, especially with imbalanced data, we use metrics derived from the Confusion Matrix.
## 1. The Confusion Matrix
All key metrics start here. The Confusion Matrix is a table that summarizes the performance of a classifier by comparing the predicted classes against the actual classes.

|          | Predicted Positive | Predicted Negative |
|----------|----------|----------|
|Actual Positive|True Positive (TP): Correctly predicted positive.|False Negative (FN): Incorrectly predicted negative (a Type II Error).|
|Actual Negative|False Positive (FP): Incorrectly predicted positive (a Type I Error).|True Negative (TN): Correctly predicted negative.|

## 2. Core Classification Metrics
|Metric	|Formula|	What it Measures|	When to Use|
|----------|----------|----------|----------|
|Accuracy|$$\frac{TP + TN}{\text{Total Samples}}$$|The overall proportion of correct predictions.|When the dataset classes are balanced.|
Precision|$$\frac{TP}{TP + FP}$$|Of all predictions the model made for the Positive class, how many were correct? (Focuses on minimizing False Positives).|When the cost of a False Positive is high (e.g., classifying a safe email as spam).
|Recall (Sensitivity)|$$\frac{TP}{TP + FN}$$|Of all actual Positive instances, how many did the model correctly identify? (Focuses on minimizing False Negatives).|When the cost of a False Negative is high (e.g., failing to detect a disease).
|F1-Score|$$2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$|The harmonic mean of Precision and Recall.When you need a single metric that represents a balance between Precision and Recall, especially with imbalanced data.|

## 3. Beyond the Matrix: ROC and AUC
These metrics evaluate the classifier's performance across all possible decision thresholds, not just the default 0.5.
- ROC Curve (Receiver Operating Characteristic): A plot of the True Positive Rate (Recall) against the False Positive Rate (FPR) at various threshold settings.

- AUC (Area Under the Curve): Represents the probability that the model will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

    - AUC = 1: Perfect classification.

    - AUC = 0.5: No better than random guessing.

AUC is an excellent overall measure of model separability and robustness.

In [20]:
# CODE CELL 1: Setup and Prediction

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    confusion_matrix, 
    accuracy_score, 
    classification_report
)
from sklearn.datasets import load_breast_cancer 
from sklearn.preprocessing import StandardScaler

# --- 1. Load and Prepare Data ---
# We use the Breast Cancer dataset (binary classification)
data = load_breast_cancer()
X, y = data.data, data.target 

# Scaling the features is generally recommended for Logistic Regression
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, random_state=42
)

# --- 2. Train Model ---
model = LogisticRegression(solver='liblinear', random_state=42) 
model.fit(X_train, y_train)

# --- 3. Make Predictions ---
y_pred = model.predict(X_test)

In [24]:
# CODE CELL 2: Evaluation and Reporting

# 1. Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("--- 1. Confusion Matrix (TN, FP, FN, TP) ---")
print(cm)
print(f"\nInterpretation:")
print(f"True Negative (TN): {cm[0, 0]} (Correctly predicted negative)")
print(f"False Positive (FP): {cm[0, 1]} (Incorrectly predicted positive - Type I Error)")
print(f"False Negative (FN): {cm[1, 0]} (Incorrectly predicted negative - Type II Error)")
print(f"True Positive (TP): {cm[1, 1]} (Correctly predicted positive)")

# 2. Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print(f"\n--- 2. Overall Accuracy ---")
print(f"Accuracy: {accuracy:.4f} (Total correct predictions / Total samples)")

# 3. Classification Report (Provides Precision, Recall, and F1-Score for each class)
print("\n--- 3. Full Classification Report ---")
# The report displays metrics for Class 0 (Malignant) and Class 1 (Benign)
print(classification_report(y_test, y_pred, target_names=data.target_names))

# Note: In the Classification Report above:
# - Precision for 'benign' (class 1) is calculated as TP / (TP + FP)
# - Recall for 'benign' (class 1) is calculated as TP / (TP + FN)

--- 1. Confusion Matrix (TN, FP, FN, TP) ---
[[ 62   1]
 [  2 106]]

Interpretation:
True Negative (TN): 62 (Correctly predicted negative)
False Positive (FP): 1 (Incorrectly predicted positive - Type I Error)
False Negative (FN): 2 (Incorrectly predicted negative - Type II Error)
True Positive (TP): 106 (Correctly predicted positive)

--- 2. Overall Accuracy ---
Accuracy: 0.9825 (Total correct predictions / Total samples)

--- 3. Full Classification Report ---
              precision    recall  f1-score   support

   malignant       0.97      0.98      0.98        63
      benign       0.99      0.98      0.99       108

    accuracy                           0.98       171
   macro avg       0.98      0.98      0.98       171
weighted avg       0.98      0.98      0.98       171

