# Lab 1.6.4: Baseline Comparison Framework

**Module:** 1.6 - Classical ML Foundations  
**Time:** 2 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê‚≠ê (Advanced)

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Build a reusable `BaselineExperiment` class for ML comparisons
- [ ] Implement automated model comparison with consistent metrics
- [ ] Generate comprehensive comparison reports
- [ ] Test the framework on multiple real-world datasets
- [ ] Create visualizations for model performance analysis

---

## üìö Prerequisites

- Completed: Labs 1.6.1, 1.6.2, 1.6.3
- Knowledge of: XGBoost, scikit-learn, cross-validation

---

## üåç Real-World Context

**The Scientific Method for ML:**

In industry, every ML project should start with a baseline comparison:
1. **Establish baselines**: Simple models that are hard to beat
2. **Fair comparison**: Same splits, same metrics, same preprocessing
3. **Document everything**: Training time, inference time, memory usage
4. **Reproducibility**: Anyone can replicate your results

**Why This Matters:**
- At Google, all ML models must beat a baseline to ship
- At Kaggle, XGBoost baseline often beats complex neural networks
- In research papers, reviewers expect rigorous baseline comparisons
- For startups, a simple model that works beats a complex one that doesn't

Today we'll build a framework that enforces these best practices automatically!

---

## üßí ELI5: Why Baselines Matter

> **Imagine you're trying to run faster...**
>
> You could:
> - Buy $500 running shoes
> - Hire an expensive coach
> - Follow a complex training program
>
> But first, you should know:
> - How fast can you already run? (baseline)
> - How fast does a regular person run? (simple baseline)
> - Did the expensive shoes actually help? (fair comparison)
>
> **In ML terms:**
> - Complex model = expensive running shoes
> - XGBoost baseline = your current running speed
> - If the complex model doesn't beat the baseline, it's not worth the cost!
>
> **The shocking truth:** On tabular data, XGBoost baseline beats deep learning ~70% of the time!

---

## Part 1: Setup and Imports

In [None]:
# Core imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from time import time
from datetime import datetime
from typing import Dict, List, Any, Optional, Union, Callable
from dataclasses import dataclass
import json
import warnings
warnings.filterwarnings('ignore')

# PyTorch for GPU detection
import torch

# scikit-learn
from sklearn.model_selection import cross_val_score, KFold, StratifiedKFold
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, roc_auc_score,
    mean_squared_error, mean_absolute_error, r2_score
)
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, Ridge, Lasso
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.datasets import (
    fetch_california_housing, load_breast_cancer, load_wine,
    make_classification, make_regression
)

# XGBoost
import xgboost as xgb

# Plotting
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Set random seed
np.random.seed(42)

# Determine device for XGBoost
XGB_DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"‚úÖ All imports successful!")
print(f"   XGBoost device: {XGB_DEVICE}")

---

## Part 2: Building the BaselineExperiment Class

Let's build a professional-grade experiment framework!

In [None]:
@dataclass
class ModelResult:
    """
    Stores results from a single model evaluation.
    
    Attributes:
        name: Model name
        metrics: Dictionary of metric names to values
        cv_scores: Cross-validation scores array
        train_time: Training time in seconds
        inference_time: Inference time in seconds
        feature_importance: Feature importance array (if available)
        model: The trained model object
    """
    name: str
    metrics: Dict[str, float]
    cv_scores: np.ndarray
    train_time: float
    inference_time: float
    feature_importance: Optional[np.ndarray] = None
    model: Any = None
    
    def __repr__(self):
        return f"ModelResult(name='{self.name}', metrics={self.metrics})"
    
    def to_dict(self) -> Dict:
        """Convert to dictionary (for JSON serialization)."""
        return {
            'name': self.name,
            'metrics': self.metrics,
            'cv_mean': float(self.cv_scores.mean()),
            'cv_std': float(self.cv_scores.std()),
            'train_time': self.train_time,
            'inference_time': self.inference_time
        }

print("‚úÖ ModelResult dataclass defined!")

In [None]:
class BaselineExperiment:
    """
    A reusable framework for comparing ML models on tabular data.
    
    Features:
    - Automatic cross-validation
    - Consistent metrics across models
    - Training and inference timing
    - Feature importance extraction
    - Visualization and reporting
    
    Example:
        >>> exp = BaselineExperiment(
        ...     X=X, y=y,
        ...     task='classification',
        ...     feature_names=['feat1', 'feat2']
        ... )
        >>> exp.add_model('XGBoost', xgb.XGBClassifier())
        >>> exp.add_model('Random Forest', RandomForestClassifier())
        >>> exp.run()
        >>> exp.report()
    """
    
    def __init__(
        self,
        X: np.ndarray,
        y: np.ndarray,
        task: str = 'classification',
        feature_names: Optional[List[str]] = None,
        cv_folds: int = 5,
        test_size: float = 0.2,
        random_state: int = 42,
        scale_features: bool = True
    ):
        """
        Initialize the experiment.
        
        Args:
            X: Feature matrix
            y: Target vector
            task: 'classification' or 'regression'
            feature_names: Optional list of feature names
            cv_folds: Number of cross-validation folds
            test_size: Proportion of data for test set
            random_state: Random seed for reproducibility
            scale_features: Whether to scale features (for non-tree models)
        """
        self.X = X.astype(np.float32)
        self.y = y
        self.task = task
        self.feature_names = feature_names or [f'feature_{i}' for i in range(X.shape[1])]
        self.cv_folds = cv_folds
        self.test_size = test_size
        self.random_state = random_state
        self.scale_features = scale_features
        
        # Store models to evaluate
        self.models: Dict[str, Any] = {}
        self.needs_scaling: Dict[str, bool] = {}
        
        # Results storage
        self.results: List[ModelResult] = []
        
        # Train/test split
        self._setup_data()
        
        # Set up cross-validation
        if task == 'classification':
            self.cv = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=random_state)
        else:
            self.cv = KFold(n_splits=cv_folds, shuffle=True, random_state=random_state)
        
        print(f"‚úÖ BaselineExperiment initialized!")
        print(f"   Task: {task}")
        print(f"   Samples: {len(X):,}")
        print(f"   Features: {X.shape[1]}")
        print(f"   CV Folds: {cv_folds}")
    
    def _setup_data(self):
        """Split data into train and test sets."""
        from sklearn.model_selection import train_test_split
        
        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(
            self.X, self.y,
            test_size=self.test_size,
            random_state=self.random_state,
            stratify=self.y if self.task == 'classification' else None
        )
        
        # Prepare scaled versions
        if self.scale_features:
            self.scaler = StandardScaler()
            self.X_train_scaled = self.scaler.fit_transform(self.X_train)
            self.X_test_scaled = self.scaler.transform(self.X_test)
        else:
            self.X_train_scaled = self.X_train
            self.X_test_scaled = self.X_test
    
    def add_model(self, name: str, model: Any, needs_scaling: bool = False):
        """
        Add a model to the experiment.
        
        Args:
            name: Display name for the model
            model: sklearn-compatible model instance
            needs_scaling: Whether this model needs scaled features
        """
        self.models[name] = model
        self.needs_scaling[name] = needs_scaling
        print(f"   Added model: {name}")
    
    def add_default_models(self):
        """
        Add default baseline models for the task.
        """
        print("\nüì¶ Adding default baseline models...")
        
        if self.task == 'classification':
            # XGBoost (with GPU support if available)
            self.add_model(
                'XGBoost',
                xgb.XGBClassifier(
                    n_estimators=100,
                    max_depth=6,
                    learning_rate=0.1,
                    device=XGB_DEVICE,
                    random_state=self.random_state,
                    verbosity=0
                ),
                needs_scaling=False
            )
            
            # Random Forest
            self.add_model(
                'Random Forest',
                RandomForestClassifier(
                    n_estimators=100,
                    max_depth=16,
                    n_jobs=-1,
                    random_state=self.random_state
                ),
                needs_scaling=False
            )
            
            # Logistic Regression
            self.add_model(
                'Logistic Regression',
                LogisticRegression(
                    max_iter=1000,
                    random_state=self.random_state,
                    n_jobs=-1
                ),
                needs_scaling=True
            )
        
        else:  # regression
            # XGBoost (with GPU support if available)
            self.add_model(
                'XGBoost',
                xgb.XGBRegressor(
                    n_estimators=100,
                    max_depth=6,
                    learning_rate=0.1,
                    device=XGB_DEVICE,
                    random_state=self.random_state,
                    verbosity=0
                ),
                needs_scaling=False
            )
            
            # Random Forest
            self.add_model(
                'Random Forest',
                RandomForestRegressor(
                    n_estimators=100,
                    max_depth=16,
                    n_jobs=-1,
                    random_state=self.random_state
                ),
                needs_scaling=False
            )
            
            # Ridge Regression
            self.add_model(
                'Ridge Regression',
                Ridge(alpha=1.0, random_state=self.random_state),
                needs_scaling=True
            )
    
    def _get_metrics(self, y_true: np.ndarray, y_pred: np.ndarray, y_proba: Optional[np.ndarray] = None) -> Dict[str, float]:
        """Calculate metrics for the task."""
        if self.task == 'classification':
            metrics = {
                'accuracy': accuracy_score(y_true, y_pred),
                'precision': precision_score(y_true, y_pred, average='weighted', zero_division=0),
                'recall': recall_score(y_true, y_pred, average='weighted', zero_division=0),
                'f1': f1_score(y_true, y_pred, average='weighted', zero_division=0)
            }
            if y_proba is not None and len(np.unique(y_true)) == 2:
                try:
                    metrics['roc_auc'] = roc_auc_score(y_true, y_proba[:, 1] if y_proba.ndim > 1 else y_proba)
                except:
                    pass
        else:
            metrics = {
                'rmse': np.sqrt(mean_squared_error(y_true, y_pred)),
                'mae': mean_absolute_error(y_true, y_pred),
                'r2': r2_score(y_true, y_pred)
            }
        return metrics
    
    def _evaluate_model(self, name: str, model: Any) -> ModelResult:
        """Evaluate a single model."""
        # Select appropriate data
        if self.needs_scaling.get(name, False):
            X_train = self.X_train_scaled
            X_test = self.X_test_scaled
        else:
            X_train = self.X_train
            X_test = self.X_test
        
        # Cross-validation scoring
        if self.task == 'classification':
            scoring = 'accuracy'
        else:
            scoring = 'neg_root_mean_squared_error'
        
        cv_scores = cross_val_score(model, X_train, self.y_train, cv=self.cv, scoring=scoring, n_jobs=-1)
        
        # Training
        start_time = time()
        model.fit(X_train, self.y_train)
        train_time = time() - start_time
        
        # Inference
        start_time = time()
        y_pred = model.predict(X_test)
        inference_time = time() - start_time
        
        # Get probabilities if available
        y_proba = None
        if self.task == 'classification' and hasattr(model, 'predict_proba'):
            try:
                y_proba = model.predict_proba(X_test)
            except:
                pass
        
        # Calculate metrics
        metrics = self._get_metrics(self.y_test, y_pred, y_proba)
        
        # Feature importance
        feature_importance = None
        if hasattr(model, 'feature_importances_'):
            feature_importance = model.feature_importances_
        elif hasattr(model, 'coef_'):
            feature_importance = np.abs(model.coef_).flatten()
        
        return ModelResult(
            name=name,
            metrics=metrics,
            cv_scores=cv_scores if self.task == 'classification' else -cv_scores,
            train_time=train_time,
            inference_time=inference_time,
            feature_importance=feature_importance,
            model=model
        )
    
    def run(self, verbose: bool = True):
        """
        Run the experiment on all models.
        """
        print("\nüöÄ Running Baseline Experiment...")
        print("=" * 60)
        
        self.results = []
        
        for name, model in self.models.items():
            if verbose:
                print(f"\nüìä Evaluating: {name}")
            
            result = self._evaluate_model(name, model)
            self.results.append(result)
            
            if verbose:
                if self.task == 'classification':
                    print(f"   CV Accuracy: {result.cv_scores.mean():.4f} (+/- {result.cv_scores.std():.4f})")
                    print(f"   Test Accuracy: {result.metrics['accuracy']:.4f}")
                else:
                    print(f"   CV RMSE: {result.cv_scores.mean():.4f} (+/- {result.cv_scores.std():.4f})")
                    print(f"   Test RMSE: {result.metrics['rmse']:.4f}")
                print(f"   Train Time: {result.train_time:.3f}s")
        
        print("\n‚úÖ Experiment complete!")
        return self
    
    def get_best_model(self) -> ModelResult:
        """Return the best performing model."""
        if not self.results:
            raise ValueError("No results yet. Run the experiment first!")
        
        if self.task == 'classification':
            return max(self.results, key=lambda r: r.metrics['accuracy'])
        else:
            return min(self.results, key=lambda r: r.metrics['rmse'])
    
    def report(self) -> pd.DataFrame:
        """
        Generate a comparison report.
        """
        if not self.results:
            raise ValueError("No results yet. Run the experiment first!")
        
        print("\nüìã Baseline Comparison Report")
        print("=" * 70)
        
        # Build comparison DataFrame
        data = []
        for result in self.results:
            row = {
                'Model': result.name,
                'CV Mean': result.cv_scores.mean(),
                'CV Std': result.cv_scores.std(),
                **result.metrics,
                'Train Time (s)': result.train_time,
                'Inference Time (s)': result.inference_time
            }
            data.append(row)
        
        df = pd.DataFrame(data)
        
        # Sort by primary metric
        if self.task == 'classification':
            df = df.sort_values('accuracy', ascending=False)
        else:
            df = df.sort_values('rmse', ascending=True)
        
        print(df.to_string(index=False))
        
        # Best model
        best = self.get_best_model()
        print(f"\nüèÜ Best Model: {best.name}")
        
        return df
    
    def plot_comparison(self, save_path: Optional[str] = None):
        """
        Create visualization of model comparison.
        """
        if not self.results:
            raise ValueError("No results yet. Run the experiment first!")
        
        fig, axes = plt.subplots(2, 2, figsize=(14, 10))
        
        names = [r.name for r in self.results]
        
        # 1. Primary metric comparison
        ax1 = axes[0, 0]
        if self.task == 'classification':
            metric_values = [r.metrics['accuracy'] for r in self.results]
            metric_name = 'Accuracy'
        else:
            metric_values = [r.metrics['rmse'] for r in self.results]
            metric_name = 'RMSE'
        
        colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(names)))
        bars = ax1.bar(names, metric_values, color=colors)
        ax1.set_ylabel(metric_name)
        ax1.set_title(f'Model Comparison: {metric_name}')
        ax1.tick_params(axis='x', rotation=15)
        
        # Add value labels
        for bar, val in zip(bars, metric_values):
            ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height(),
                    f'{val:.4f}', ha='center', va='bottom', fontsize=10)
        
        # 2. Cross-validation boxplot
        ax2 = axes[0, 1]
        cv_data = [r.cv_scores for r in self.results]
        bp = ax2.boxplot(cv_data, labels=names, patch_artist=True)
        for patch, color in zip(bp['boxes'], colors):
            patch.set_facecolor(color)
        ax2.set_ylabel('CV Score')
        ax2.set_title('Cross-Validation Score Distribution')
        ax2.tick_params(axis='x', rotation=15)
        
        # 3. Training time comparison
        ax3 = axes[1, 0]
        train_times = [r.train_time for r in self.results]
        bars = ax3.bar(names, train_times, color=colors)
        ax3.set_ylabel('Time (seconds)')
        ax3.set_title('Training Time')
        ax3.tick_params(axis='x', rotation=15)
        
        # 4. Feature importance (for best model)
        ax4 = axes[1, 1]
        best = self.get_best_model()
        if best.feature_importance is not None:
            # Get top 10 features
            top_k = min(10, len(self.feature_names))
            indices = np.argsort(best.feature_importance)[-top_k:]
            top_features = [self.feature_names[i] for i in indices]
            top_importance = best.feature_importance[indices]
            
            ax4.barh(top_features, top_importance, color='steelblue')
            ax4.set_xlabel('Importance')
            ax4.set_title(f'Feature Importance ({best.name})')
        else:
            ax4.text(0.5, 0.5, 'Feature importance\nnot available',
                    ha='center', va='center', transform=ax4.transAxes)
            ax4.set_title('Feature Importance')
        
        plt.tight_layout()
        
        if save_path:
            plt.savefig(save_path, dpi=150, bbox_inches='tight')
            print(f"üíæ Saved: {save_path}")
        
        plt.show()
    
    def save_results(self, filepath: str):
        """Save results to JSON file."""
        data = {
            'experiment_time': datetime.now().isoformat(),
            'task': self.task,
            'n_samples': len(self.X),
            'n_features': self.X.shape[1],
            'cv_folds': self.cv_folds,
            'results': [r.to_dict() for r in self.results]
        }
        
        with open(filepath, 'w') as f:
            json.dump(data, f, indent=2)
        
        print(f"üíæ Results saved to: {filepath}")

print("‚úÖ BaselineExperiment class defined!")

---

## Part 3: Testing on Classification Dataset

In [None]:
# Test 1: Breast Cancer Classification
print("üî¨ Test 1: Breast Cancer Classification")
print("=" * 60)

# Load data
data = load_breast_cancer()
X, y = data.data, data.target

print(f"Dataset: {data.DESCR.split(chr(10))[0]}")
print(f"Samples: {len(X):,}")
print(f"Features: {X.shape[1]}")
print(f"Classes: {np.unique(y)}")

In [None]:
# Create experiment
exp_cancer = BaselineExperiment(
    X=X, y=y,
    task='classification',
    feature_names=list(data.feature_names),
    cv_folds=5
)

# Add default models
exp_cancer.add_default_models()

In [None]:
# Run experiment
exp_cancer.run()

In [None]:
# Generate report
report_cancer = exp_cancer.report()

In [None]:
# Visualize
exp_cancer.plot_comparison(save_path='cancer_comparison.png')

---

## Part 4: Testing on Regression Dataset

In [None]:
# Test 2: California Housing Regression
print("\nüè† Test 2: California Housing Regression")
print("=" * 60)

# Load data
housing = fetch_california_housing()
X, y = housing.data, housing.target

print(f"Dataset: California Housing")
print(f"Samples: {len(X):,}")
print(f"Features: {X.shape[1]}")

In [None]:
# Create experiment
exp_housing = BaselineExperiment(
    X=X, y=y,
    task='regression',
    feature_names=list(housing.feature_names),
    cv_folds=5
)

# Add default models
exp_housing.add_default_models()

# Run
exp_housing.run()

In [None]:
# Report and visualize
report_housing = exp_housing.report()
exp_housing.plot_comparison(save_path='housing_comparison.png')

---

## Part 5: Testing on Large Synthetic Dataset

In [None]:
# Test 3: Large Synthetic Dataset
print("\nüìä Test 3: Large Synthetic Dataset")
print("=" * 60)

# Generate large dataset
X_large, y_large = make_classification(
    n_samples=100_000,
    n_features=50,
    n_informative=30,
    n_redundant=10,
    n_classes=2,
    random_state=42
)

print(f"Samples: {len(X_large):,}")
print(f"Features: {X_large.shape[1]}")

In [None]:
# Create experiment
exp_large = BaselineExperiment(
    X=X_large, y=y_large,
    task='classification',
    cv_folds=3  # Fewer folds for speed
)

# Add default models
exp_large.add_default_models()

# Run
exp_large.run()

In [None]:
# Report and visualize
report_large = exp_large.report()
exp_large.plot_comparison(save_path='large_comparison.png')

---

## Part 6: Summary Across All Datasets

In [None]:
# Summary across datasets
print("üìä Summary: XGBoost vs Other Models")
print("=" * 70)

summary_data = []

for exp, name in [(exp_cancer, 'Breast Cancer'), 
                  (exp_housing, 'California Housing'),
                  (exp_large, 'Large Synthetic')]:
    
    best = exp.get_best_model()
    xgb_result = [r for r in exp.results if r.name == 'XGBoost'][0]
    
    summary_data.append({
        'Dataset': name,
        'Task': exp.task,
        'Samples': len(exp.X),
        'Best Model': best.name,
        'XGBoost Rank': sorted([r.name for r in exp.results], 
                               key=lambda n: [r.metrics.get('accuracy', -r.metrics.get('rmse', 0)) 
                                             for r in exp.results if r.name == n][0],
                               reverse=True).index('XGBoost') + 1,
        'XGBoost Time': f"{xgb_result.train_time:.3f}s"
    })

summary_df = pd.DataFrame(summary_data)
print(summary_df.to_string(index=False))

# Key insight
xgb_wins = sum(1 for d in summary_data if d['Best Model'] == 'XGBoost')
print(f"\nüèÜ XGBoost won {xgb_wins}/{len(summary_data)} experiments!")

---

## ‚úã Try It Yourself

### Exercise 1: Add More Models

Extend the framework to include LightGBM and a simple neural network.

<details>
<summary>üí° Hint</summary>
Use `lightgbm.LGBMClassifier` and create a simple sklearn MLP.
</details>

In [None]:
# Exercise 1: Your code here
# Add LightGBM and neural network to the experiment

# import lightgbm as lgb
# from sklearn.neural_network import MLPClassifier
# 
# exp = BaselineExperiment(X=X, y=y, task='classification')
# exp.add_default_models()
# exp.add_model('LightGBM', lgb.LGBMClassifier(n_estimators=100, verbose=-1))
# exp.add_model('MLP', MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=500), needs_scaling=True)
# exp.run()
# exp.report()

### Exercise 2: Custom Metrics

Modify the framework to support custom metrics (e.g., profit-based metric for business applications).

<details>
<summary>üí° Hint</summary>
Add a `custom_metrics` parameter that accepts a dictionary of metric functions.
</details>

In [None]:
# Exercise 2: Your code here
# Add custom metric support

# def profit_metric(y_true, y_pred):
#     """Custom profit metric: TP=$100, FP=-$50, FN=-$200"""
#     tp = np.sum((y_true == 1) & (y_pred == 1))
#     fp = np.sum((y_true == 0) & (y_pred == 1))
#     fn = np.sum((y_true == 1) & (y_pred == 0))
#     return 100*tp - 50*fp - 200*fn
# 
# # Add to experiment...

### Exercise 3: Hyperparameter Tuning Integration

Create a method that automatically tunes the best-performing model.

<details>
<summary>üí° Hint</summary>
Use Optuna within a `tune_best_model()` method.
</details>

In [None]:
# Exercise 3: Your code here
# Add automatic tuning for best model

# def tune_best_model(self, n_trials=50):
#     best = self.get_best_model()
#     # Use Optuna to tune the best model
#     ...

---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Data Leakage in Preprocessing

In [None]:
# ‚ùå Wrong: Fitting scaler on all data
# scaler = StandardScaler()
# X_scaled = scaler.fit_transform(X)  # Leaks test info!
# X_train, X_test = train_test_split(X_scaled, ...)

# ‚úÖ Right: Fit only on training data
# X_train, X_test = train_test_split(X, ...)
# scaler = StandardScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)  # Only transform!

print("üí° Always fit preprocessing on training data only!")
print("   The test set should be 'unseen' in every way.")

### Mistake 2: Inconsistent Evaluation

In [None]:
# ‚ùå Wrong: Different splits for different models
# model1.fit(X_train1, y_train1)
# model2.fit(X_train2, y_train2)  # Different split!

# ‚úÖ Right: Same splits, same evaluation
# Use our BaselineExperiment class - it handles this automatically!

print("üí° Fair comparison requires identical data splits!")
print("   BaselineExperiment ensures all models see the same data.")

---

## üéâ Checkpoint

Congratulations! You've built a professional baseline comparison framework. You've learned:

- ‚úÖ **BaselineExperiment class**: Reusable, extensible experiment framework
- ‚úÖ **Consistent evaluation**: Same data, same metrics, fair comparison
- ‚úÖ **Automatic reporting**: Visualizations and summaries
- ‚úÖ **Best practices**: Proper train/test splits, no data leakage
- ‚úÖ **Key insight**: XGBoost is a powerful baseline for tabular data!

---

## üöÄ Module Complete!

You've finished Module 1.6: Classical ML Foundations! You now know:

1. **Lab 1.6.1**: XGBoost often beats neural networks on tabular data
2. **Lab 1.6.2**: Optuna makes hyperparameter tuning efficient
3. **Lab 1.6.3**: RAPIDS provides 10-100x GPU acceleration
4. **Lab 1.6.4**: Always start with baselines and compare fairly

**Key Takeaway:** Start every ML project with an XGBoost baseline. It's fast, powerful, and often wins!

---

## üìñ Further Reading

- [Why do tree-based models still outperform deep learning on tabular data?](https://arxiv.org/abs/2207.08815)
- [XGBoost Documentation](https://xgboost.readthedocs.io/)
- [scikit-learn User Guide](https://scikit-learn.org/stable/user_guide.html)
- [RAPIDS cuML Documentation](https://docs.rapids.ai/api/cuml/stable/)

---

## üßπ Cleanup

In [None]:
# Clean up
import gc

del exp_cancer, exp_housing, exp_large
del X_large, y_large

gc.collect()

print("‚úÖ Cleanup complete!")
print("\nüéâ Congratulations on completing Module 1.6!")

---

## ‚û°Ô∏è Next Steps

Continue to **Module 1.7: Capstone - MicroGrad+** to build your own deep learning framework from scratch!