# **AI TECH INSTITUTE** · *Intermediate AI & Data Science*
### Week 7 - Lab 02: Cross-Validation & Model Comparison
**Instructor:** Amir Charkhi | **Type:** Hands-On Practice

> Practice what you learned in Notebook 03

## 🎯 Lab Objectives

In this lab, you'll practice:
- Implementing K-fold cross-validation
- Using stratified cross-validation
- Comparing multiple models fairly
- Understanding when to use which CV strategy

**Time**: 25-35 minutes  
**Difficulty**: ⭐⭐⭐☆☆ (Intermediate)

---

## 📚 Quick Reference

**Cross-Validation:**
```python
from sklearn.model_selection import cross_val_score, StratifiedKFold

# Simple cross-validation
scores = cross_val_score(model, X, y, cv=5)

# Stratified K-Fold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv)
```

---

In [None]:
# Setup - Run this cell first!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_wine, load_iris
from sklearn.model_selection import (
    train_test_split, cross_val_score, StratifiedKFold, KFold
)
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
print("✅ Setup complete! Let's practice cross-validation!")

---

## 📊 Exercise 1: Basic Cross-Validation

Let's start with simple K-fold cross-validation!

### Task 1.1: Load Data and Simple CV

In [None]:
# Load wine dataset
wine = load_wine()
X = wine.data
y = wine.target

print(f"Dataset: {len(X)} wine samples, {len(np.unique(y))} classes")

# TODO 1.1: Perform 5-fold cross-validation
# Steps:
#   1. Create a LogisticRegression model (max_iter=1000)
#   2. Use cross_val_score with cv=5
#   3. Calculate mean and std of scores

# Your code here:
model = # Create model
cv_scores = # Perform cross-validation

mean_score = # Calculate mean
std_score = # Calculate std

# Validation (Don't modify)
print(f"\n5-Fold CV Scores: {cv_scores}")
print(f"Mean: {mean_score:.4f}")
print(f"Std:  {std_score:.4f}")
print(f"\nResult: {mean_score:.4f} ± {std_score:.4f}")

if len(cv_scores) == 5:
    print("\n✅ Correct! You performed 5-fold CV")
    print("🎉 Task 1.1 Complete!")
else:
    print("\n❌ Should have 5 scores - check your cv parameter")

### Task 1.2: Visualize Fold Performance

In [None]:
# TODO 1.2: Create a bar plot showing performance across folds
# Include a horizontal line showing the mean

# Your code here (plotting):
plt.figure(figsize=(10, 6))
# Create bar plot of cv_scores
# Add a horizontal line at mean_score
# Add labels, title, and grid


# Validation
print("🎉 Task 1.2 Complete!")

---

## 🎯 Exercise 2: Stratified vs Regular K-Fold

See the difference stratification makes!

### Task 2.1: Compare Both Approaches

In [None]:
# TODO 2.1: Compare regular K-Fold vs Stratified K-Fold
# Use the same model for both

model = LogisticRegression(max_iter=1000)

# Your code here:
# 1. Regular K-Fold
kfold = # Create KFold(n_splits=5, shuffle=True, random_state=42)
scores_regular = # cross_val_score with kfold

# 2. Stratified K-Fold
kfold_stratified = # Create StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores_stratified = # cross_val_score with kfold_stratified

# Validation (Don't modify)
print("Regular K-Fold:")
print(f"  Scores: {scores_regular}")
print(f"  Mean: {scores_regular.mean():.4f} ± {scores_regular.std():.4f}")

print("\nStratified K-Fold:")
print(f"  Scores: {scores_stratified}")
print(f"  Mean: {scores_stratified.mean():.4f} ± {scores_stratified.std():.4f}")

print(f"\n💡 Difference in std: {abs(scores_regular.std() - scores_stratified.std()):.4f}")
print("   Stratified usually has lower variance (more stable)")

if scores_stratified.std() <= scores_regular.std():
    print("\n✅ Stratified CV is more stable!")
    print("🎉 Task 2.1 Complete!")

### Task 2.2: Check Class Distribution in Folds

In [None]:
# TODO 2.2: Examine class distribution in each fold
# This shows WHY stratification matters

print("Original class distribution:")
print(pd.Series(y).value_counts(normalize=True).sort_index())

print("\nClass distribution in Stratified K-Fold:")

# Your code here:
# Loop through the folds and print class distribution in each test set
for fold, (train_idx, test_idx) in enumerate(kfold_stratified.split(X, y), 1):
    y_test_fold = # Get y values for test_idx
    # Print fold number and class proportions
    

print("\n💡 Notice: All folds have similar class proportions!")
print("🎉 Task 2.2 Complete!")

---

## 🏆 Exercise 3: Comparing Multiple Models

Now let's compare several models using CV!

### Task 3.1: Evaluate Multiple Models

In [None]:
# TODO 3.1: Compare 4 different models using cross-validation
# Models: Logistic Regression, Decision Tree, Random Forest, KNN

# Load fresh dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define models
models = {
    'Logistic Regression': LogisticRegression(max_iter=200),
    'Decision Tree': DecisionTreeClassifier(max_depth=5, random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=50, random_state=42),
    'K-Nearest Neighbors': KNeighborsClassifier(n_neighbors=5)
}

# Your code here:
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
results = []

for name, model in models.items():
    # Perform CV and store results
    scores = # cross_val_score
    results.append({
        'Model': name,
        'Mean': # mean of scores
        'Std': # std of scores
    })

# Create DataFrame and sort by mean score
results_df = pd.DataFrame(results).sort_values('Mean', ascending=False)

# Validation (Don't modify)
print("Model Comparison (5-Fold Stratified CV):\n")
print(results_df.to_string(index=False))

best_model = results_df.iloc[0]['Model']
best_score = results_df.iloc[0]['Mean']

print(f"\n🏆 Winner: {best_model}")
print(f"   Score: {best_score:.4f}")
print("\n🎉 Task 3.1 Complete!")

### Task 3.2: Visualize Model Comparison

In [None]:
# TODO 3.2: Create a horizontal bar chart comparing models
# Include error bars showing standard deviation

# Your code here:
plt.figure(figsize=(10, 6))
# Create horizontal bar plot with error bars
# Use results_df data
# Add title, labels, and grid


print("🎉 Task 3.2 Complete!")

---

## 🎯 Exercise 4: Understanding CV Scores

Let's dig deeper into what CV tells us!

### Task 4.1: Interpret CV Results

In [None]:
# Let's create two models with different characteristics
model_consistent = RandomForestClassifier(n_estimators=100, random_state=42)
model_unstable = DecisionTreeClassifier(max_depth=20, random_state=42)

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

scores_consistent = cross_val_score(model_consistent, X, y, cv=cv)
scores_unstable = cross_val_score(model_unstable, X, y, cv=cv)

# TODO 4.1: Analyze and explain the difference
# Calculate mean and std for both, then interpret

# Your code here:
mean_cons = # Mean of scores_consistent
std_cons = # Std of scores_consistent

mean_unstable = # Mean of scores_unstable
std_unstable = # Std of scores_unstable

# Print comparison
print("Model Comparison:")
print(f"Random Forest:  {mean_cons:.4f} ± {std_cons:.4f}")
print(f"Decision Tree:  {mean_unstable:.4f} ± {std_unstable:.4f}")

# TODO: Fill in this interpretation
# Which model is more stable?
# Which has higher variance?
# Which would you choose?

if std_cons < std_unstable:
    print("\n✅ Random Forest is more consistent across folds!")
    print("   Lower std = more stable predictions")
    print("🎉 Task 4.1 Complete!")

### Task 4.2: When is High Variance a Problem?

In [None]:
# TODO 4.2: Multiple choice - Select the correct answer
# When should you be concerned about high variance in CV scores?

# Options:
# A: "When std > 0.05"
# B: "When std is large relative to mean (e.g., std/mean > 0.1)"
# C: "Never - high variance is always fine"
# D: "Only when mean score is low"

your_answer = ""  # Put A, B, C, or D

# Validation
correct_answer = "B"
explanation = """High variance relative to mean suggests the model's performance 
is inconsistent across different data splits. A std/mean ratio > 0.1 (10% coefficient 
of variation) is often considered concerning."""

if your_answer.upper() == correct_answer:
    print("✅ Correct!")
    print(explanation)
    print("\n🎉 Task 4.2 Complete!")
else:
    print(f"❌ Incorrect. The answer is {correct_answer}.")
    print(explanation)

---

## 🧪 Exercise 5: Putting It All Together

Final challenge: Complete ML workflow with CV!

### Task 5.1: Complete Evaluation Workflow

In [None]:
# TODO 5.1: Follow the complete best-practice workflow
# This combines everything you've learned!

# Load data
wine = load_wine()
X = wine.data
y = wine.target

print("Complete ML Evaluation Workflow")
print("="*50)

# Step 1: Split data (hold out test set)
# TODO: Split into 80% train, 20% test, stratified, random_state=42
X_train, X_test, y_train, y_test = # Your split here

print(f"\nStep 1: Data Split")
print(f"  Train: {len(X_train)} samples")
print(f"  Test:  {len(X_test)} samples (LOCKED)")

# Step 2: Compare models using CV on training data only!
print(f"\nStep 2: Model Selection (CV on training data)")

models_to_try = {
    'Logistic': LogisticRegression(max_iter=200),
    'Tree': DecisionTreeClassifier(max_depth=5, random_state=42),
    'Forest': RandomForestClassifier(n_estimators=50, random_state=42)
}

# TODO: Use StratifiedKFold, evaluate each model
cv = # Create StratifiedKFold
best_score = 0
best_model_name = None
best_model = None

for name, model in models_to_try.items():
    scores = # CV scores
    mean = scores.mean()
    print(f"  {name}: {mean:.4f} ± {scores.std():.4f}")
    
    if mean > best_score:
        best_score = mean
        best_model_name = name
        best_model = model

print(f"\n  → Selected: {best_model_name}")

# Step 3: Train best model on full training set
print(f"\nStep 3: Training {best_model_name} on full training set")
# TODO: Fit best_model on X_train, y_train


# Step 4: Evaluate ONCE on test set
print(f"\nStep 4: Final Evaluation on Test Set")
# TODO: Make predictions and calculate accuracy
y_pred = # Predict on X_test
test_accuracy = # Calculate accuracy

print(f"  Test Accuracy: {test_accuracy:.4f}")

# Validation
print("\n" + "="*50)
if test_accuracy > 0.8:
    print("\n✅ Excellent workflow! Model performs well.")
    print("\n💡 Key Points:")
    print("  - Used CV to SELECT model (on training data)")
    print("  - Held out test set until the end")
    print("  - Evaluated ONCE on test set")
    print("\n🎉 Task 5.1 Complete!")
    print("🎉 Lab 02 Complete!")
else:
    print(f"\n⚠️ Test accuracy is {test_accuracy:.1%} - check your code")

---

## 🏆 Lab Complete!

### What You Practiced:

✅ **Exercise 1**: Basic K-fold cross-validation  
✅ **Exercise 2**: Stratified vs regular K-fold  
✅ **Exercise 3**: Comparing multiple models  
✅ **Exercise 4**: Interpreting CV results  
✅ **Exercise 5**: Complete evaluation workflow  

### Key Takeaways:

1. **Always use StratifiedKFold** for classification
2. **CV gives mean ± std** - both matter!
3. **High variance = unstable** model
4. **Use CV for selection**, test set for final evaluation
5. **Never touch test set** during model development

### The Golden Workflow:

```python
1. Split: Train (80%) + Test (20%) - LOCK test set
2. Use CV on training set to compare models
3. Select best model based on CV results
4. Train best model on full training set
5. Evaluate ONCE on test set - this is your final score
```

### Next Steps:

- Try **Lab 03** for mini-project practice
- Experiment with different K values (3, 5, 10)
- Compare models on your own datasets

**Excellent work! 🎉**