# Model Evaluation and Optimization

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Conduct experiments and collect performance metrics
- Compare results with baseline or standard models
- Analyze failure cases and identify model weaknesses
- Visualize results using graphs, confusion matrices, and heat maps
- Iteratively improve model parameters or retrain with improved data

## ðŸ”— Prerequisites

- âœ… Unit 3: Model development completed
- âœ… Trained models ready for evaluation
- âœ… Understanding of evaluation metrics

---

## Official Structure Reference

This notebook covers practical activities from **Course 12, Unit 4**:
- Conducting experiments and collecting performance metrics
- Comparing results with baseline or standard models
- Analyzing failure cases and identifying weaknesses in the model
- Visualizing results using graphs, confusion matrices, or heat maps
- Iteratively improving model parameters or retraining with improved data
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 4 Practical Content

---

## Introduction

**Model Evaluation and Optimization** involves comprehensive analysis of model performance, identifying areas for improvement, and iteratively refining the model.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (confusion_matrix, classification_report, 
                            roc_curve, auc, precision_recall_curve)
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

print("âœ… Libraries imported!")


In [None]:
# Generate data and split
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training: {X_train.shape}, Test: {X_test.shape}")


## Part 1: Model Comparison - Multiple Algorithms

Compare different algorithms to select the best approach.


In [None]:
# Train multiple models
models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'SVM': SVC(probability=True, random_state=42)
}

results = {}

print("=" * 60)
print("Model Comparison")
print("=" * 60)

for name, model in models.items():
    # Train
    model.fit(X_train, y_train)
    
    # Predict
    y_pred = model.predict(X_test)
    y_proba = model.predict_proba(X_test)[:, 1] if hasattr(model, 'predict_proba') else None
    
    # Calculate metrics
    from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
    results[name] = {
        'accuracy': accuracy_score(y_test, y_pred),
        'precision': precision_score(y_test, y_pred),
        'recall': recall_score(y_test, y_pred),
        'f1': f1_score(y_test, y_pred),
        'predictions': y_pred,
        'probabilities': y_proba
    }
    
    print(f"\n{name}:")
    print(f"  Accuracy:  {results[name]['accuracy']:.4f}")
    print(f"  Precision: {results[name]['precision']:.4f}")
    print(f"  Recall:    {results[name]['recall']:.4f}")
    print(f"  F1-Score:  {results[name]['f1']:.4f}")

# Find best model
best_model_name = max(results, key=lambda x: results[x]['f1'])
print(f"\nâœ… Best Model (F1-Score): {best_model_name} ({results[best_model_name]['f1']:.4f})")


## Part 2: Confusion Matrix and ROC Curve Visualization


In [None]:
# Visualize confusion matrix for best model
best_predictions = results[best_model_name]['predictions']
cm = confusion_matrix(y_test, best_predictions)

print("=" * 60)
print(f"Confusion Matrix: {best_model_name}")
print("=" * 60)
print(f"\n{cm}")
print(f"\nTrue Negatives: {cm[0,0]}, False Positives: {cm[0,1]}")
print(f"False Negatives: {cm[1,0]}, True Positives: {cm[1,1]}")

# ROC Curve (if probabilities available)
if results[best_model_name]['probabilities'] is not None:
    fpr, tpr, thresholds = roc_curve(y_test, results[best_model_name]['probabilities'])
    roc_auc = auc(fpr, tpr)
    print(f"\nROC AUC Score: {roc_auc:.4f}")
    
    print("\nâœ… Use matplotlib to visualize:")
    print("  - Confusion matrix heatmap")
    print("  - ROC curves for all models")
    print("  - Precision-Recall curves")
    print("  - Feature importance plots")
else:
    print("\nNote: Some models don't provide probability estimates")


## Part 3: Failure Case Analysis

Analyze where and why the model fails to improve performance.


In [None]:
# Analyze failure cases
best_model = models[best_model_name]
predictions = results[best_model_name]['predictions']

# Find misclassified samples
misclassified_mask = predictions != y_test
misclassified_indices = np.where(misclassified_mask)[0]

print("=" * 60)
print("Failure Case Analysis")
print("=" * 60)
print(f"Total test samples: {len(y_test)}")
print(f"Correctly classified: {np.sum(predictions == y_test)}")
print(f"Misclassified: {len(misclassified_indices)} ({len(misclassified_indices)/len(y_test)*100:.2f}%)")

# Analyze false positives and false negatives
false_positives = np.where((predictions == 1) & (y_test == 0))[0]
false_negatives = np.where((predictions == 0) & (y_test == 1))[0]

print(f"\nFalse Positives (Type I errors): {len(false_positives)}")
print(f"False Negatives (Type II errors): {len(false_negatives)}")

# Feature analysis for failure cases (if probabilities available)
if results[best_model_name]['probabilities'] is not None:
    probs = results[best_model_name]['probabilities']
    fp_probs = probs[false_positives] if len(false_positives) > 0 else []
    fn_probs = probs[false_negatives] if len(false_negatives) > 0 else []
    
    if len(fp_probs) > 0:
        print(f"\nFalse Positive Analysis:")
        print(f"  Average confidence: {np.mean(fp_probs):.4f}")
        print(f"  These samples were predicted as positive but are actually negative")
    
    if len(fn_probs) > 0:
        print(f"\nFalse Negative Analysis:")
        print(f"  Average confidence: {np.mean(fn_probs):.4f}")
        print(f"  These samples were predicted as negative but are actually positive")

print("\nâœ… Use failure case analysis to:")
print("  - Identify data quality issues")
print("  - Improve feature engineering")
print("  - Adjust class weights or thresholds")
print("  - Collect more training data for difficult cases")


## Summary

### Key Evaluation Steps:
1. **Baseline Comparison**: Compare with simple baseline models
2. **Multiple Algorithms**: Try different algorithms to find best fit
3. **Comprehensive Metrics**: Use accuracy, precision, recall, F1, ROC-AUC
4. **Visualization**: Confusion matrices, ROC curves, precision-recall curves
5. **Failure Analysis**: Understand where and why model fails
6. **Iterative Improvement**: Refine based on analysis

### Optimization Strategies:
- **Data**: Collect more data, improve data quality, feature engineering
- **Model**: Try different architectures, ensemble methods
- **Hyperparameters**: Optimize learning rate, regularization, etc.
- **Thresholds**: Adjust decision thresholds for precision/recall trade-off

**Reference:** Course 12, Unit 4: "Evaluation and Optimization" - All practical activities covered
