# Module 03: Boosting Fundamentals - AdaBoost

**Difficulty**: ‚≠ê‚≠ê‚≠ê Advanced

**Estimated Time**: 80 minutes

**Prerequisites**: 
- Module 00: Introduction to Ensemble Learning
- Module 01: Bagging and Bootstrap Aggregation
- Understanding of weighted samples and exponential loss

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand how boosting differs from bagging (sequential vs parallel)
2. Explain the AdaBoost algorithm and its mathematical foundation
3. Implement AdaBoost from scratch to understand the mechanics
4. Apply AdaBoost to classification problems using scikit-learn
5. Tune AdaBoost hyperparameters for optimal performance
6. Understand when to use boosting vs bagging

---

## 1. Setup and Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Scikit-learn models and utilities
from sklearn.datasets import make_classification, make_moons, load_breast_cancer
from sklearn.model_selection import (
    train_test_split,
    cross_val_score,
    learning_curve,
    validation_curve
)
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import (
    AdaBoostClassifier,
    RandomForestClassifier,
    BaggingClassifier
)
from sklearn.metrics import (
    accuracy_score,
    classification_report,
    confusion_matrix,
    roc_auc_score
)

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

# Set random seeds for reproducibility
np.random.seed(42)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Setup complete!")

## 2. Boosting vs Bagging: Key Differences

### Philosophical Difference

**Bagging (Module 01-02)**:
- **Democracy approach**: All models vote equally
- Train models independently in parallel
- Each model sees different random sample
- Reduces variance, doesn't affect bias
- Use complex base models (deep trees)

**Boosting**:
- **Iterative learning**: Each model corrects previous errors
- Train models sequentially (each depends on previous)
- Each model focuses on hard examples
- Reduces both bias and variance
- Use simple base models (shallow trees)

### Visual Comparison

In [None]:
# Create visualization comparing bagging and boosting
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bagging illustration
ax = axes[0]
ax.text(0.5, 0.95, 'Original Data', ha='center', fontsize=11, fontweight='bold')
for i in range(3):
    # Bootstrap samples
    ax.add_patch(plt.Rectangle((0.15 + i*0.25, 0.75), 0.15, 0.12, 
                                fill=True, color='lightblue', alpha=0.7, edgecolor='black'))
    ax.text(0.225 + i*0.25, 0.81, f'Bootstrap\nSample {i+1}', 
            ha='center', fontsize=8, va='center')
    
    # Models (parallel)
    ax.arrow(0.225 + i*0.25, 0.75, 0, -0.13, head_width=0.025, 
             head_length=0.02, fc='gray', alpha=0.5)
    ax.add_patch(plt.Circle((0.225 + i*0.25, 0.55), 0.06, 
                             fill=True, color='orange', alpha=0.7, edgecolor='black'))
    ax.text(0.225 + i*0.25, 0.55, f'M{i+1}', ha='center', va='center', fontsize=9)
    
    # Arrows to final
    ax.arrow(0.225 + i*0.25, 0.49, (0.5 - (0.225 + i*0.25))*0.8, -0.2, 
             head_width=0.02, head_length=0.02, fc='black', alpha=0.3)

# Final prediction
ax.add_patch(plt.Rectangle((0.35, 0.15), 0.3, 0.12, 
                            fill=True, color='gold', alpha=0.8, edgecolor='black', linewidth=2))
ax.text(0.5, 0.21, 'Average/Vote\n(Equal Weight)', ha='center', va='center', 
        fontsize=9, fontweight='bold')

ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.axis('off')
ax.set_title('BAGGING\n(Parallel, Independent)', fontweight='bold', fontsize=12)

# Boosting illustration
ax = axes[1]
ax.text(0.5, 0.95, 'Original Data', ha='center', fontsize=11, fontweight='bold')

colors = ['lightblue', 'lightcoral', 'lightgreen']
for i in range(3):
    y_pos = 0.75 - i*0.21
    
    # Weighted data
    ax.add_patch(plt.Rectangle((0.05, y_pos), 0.35, 0.12, 
                                fill=True, color=colors[i], alpha=0.7, edgecolor='black'))
    if i == 0:
        ax.text(0.225, y_pos + 0.06, 'All samples\nequal weight', 
                ha='center', va='center', fontsize=7)
    else:
        ax.text(0.225, y_pos + 0.06, f'Reweight\n(focus on errors)', 
                ha='center', va='center', fontsize=7)
    
    # Model
    ax.arrow(0.4, y_pos + 0.06, 0.1, 0, head_width=0.02, 
             head_length=0.02, fc='gray', alpha=0.5)
    ax.add_patch(plt.Circle((0.6, y_pos + 0.06), 0.06, 
                             fill=True, color='orange', alpha=0.7, edgecolor='black'))
    ax.text(0.6, y_pos + 0.06, f'M{i+1}', ha='center', va='center', fontsize=9)
    
    # Weight
    ax.text(0.75, y_pos + 0.06, f'Œ±{i+1}', fontsize=10, fontweight='bold', 
            style='italic', color='red')
    
    # Feedback arrow (except last)
    if i < 2:
        ax.annotate('', xy=(0.1, y_pos - 0.06), xytext=(0.6, y_pos),
                   arrowprops=dict(arrowstyle='->', lw=1.5, color='red', alpha=0.6))
        ax.text(0.35, y_pos - 0.09, 'Update\nweights', fontsize=7, 
                ha='center', color='red', style='italic')

# Final prediction
ax.add_patch(plt.Rectangle((0.25, 0.02), 0.5, 0.1, 
                            fill=True, color='gold', alpha=0.8, edgecolor='black', linewidth=2))
ax.text(0.5, 0.07, 'Weighted Sum\n(Œ±‚ÇÅM‚ÇÅ + Œ±‚ÇÇM‚ÇÇ + Œ±‚ÇÉM‚ÇÉ)', 
        ha='center', va='center', fontsize=8, fontweight='bold')

ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.axis('off')
ax.set_title('BOOSTING\n(Sequential, Adaptive)', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.show()

print("\nüìä Key Differences Summary:")
print("=" * 60)
print("\nBAGGING:")
print("  ‚Ä¢ Models trained in parallel (independent)")
print("  ‚Ä¢ Each model has equal weight")
print("  ‚Ä¢ Focus: Reduce variance")
print("  ‚Ä¢ Best with: Complex models (deep trees)")
print("\nBOOSTING:")
print("  ‚Ä¢ Models trained sequentially (dependent)")
print("  ‚Ä¢ Models have different weights based on performance")
print("  ‚Ä¢ Focus: Reduce both bias and variance")
print("  ‚Ä¢ Best with: Simple models (shallow trees, stumps)")

## 3. AdaBoost Algorithm

### The Name: Adaptive Boosting

**AdaBoost** = **Ada**ptive **Boost**ing
- **Adaptive**: Weights adapt to focus on misclassified examples
- **Boosting**: Sequential combination of weak learners

### The Algorithm (Binary Classification)

**Input**: Training data $(x_1, y_1), ..., (x_n, y_n)$ where $y_i \in \{-1, +1\}$

**Initialize**: Sample weights $w_i^{(1)} = \frac{1}{n}$ for all $i$

**For** $t = 1$ **to** $T$:

1. **Train weak learner**: $h_t(x)$ on weighted dataset

2. **Calculate error**: $\epsilon_t = \sum_{i: h_t(x_i) \neq y_i} w_i^{(t)}$

3. **Calculate model weight**: $\alpha_t = \frac{1}{2} \ln\left(\frac{1 - \epsilon_t}{\epsilon_t}\right)$

4. **Update sample weights**: 
   $$w_i^{(t+1)} = w_i^{(t)} \cdot e^{-\alpha_t y_i h_t(x_i)}$$
   
5. **Normalize**: $w_i^{(t+1)} = \frac{w_i^{(t+1)}}{\sum_j w_j^{(t+1)}}$

**Final Model**: $H(x) = \text{sign}\left(\sum_{t=1}^T \alpha_t h_t(x)\right)$

### Key Insights

1. **Error-based weighting**: Better models get higher $\alpha$ (more influence)
2. **Misclassification focus**: Misclassified samples get higher weights
3. **Exponential loss**: Severely penalizes misclassifications
4. **Weak learners**: Even slightly-better-than-random models help!

### Model Weight Interpretation

$$\alpha_t = \frac{1}{2} \ln\left(\frac{1 - \epsilon_t}{\epsilon_t}\right)$$

- If $\epsilon_t = 0.5$ (random): $\alpha_t = 0$ (no contribution)
- If $\epsilon_t < 0.5$ (better than random): $\alpha_t > 0$ (positive contribution)
- If $\epsilon_t \to 0$ (perfect): $\alpha_t \to \infty$ (maximum contribution)

In [None]:
# Visualize relationship between error and model weight
errors = np.linspace(0.01, 0.99, 100)
alphas = 0.5 * np.log((1 - errors) / errors)

plt.figure(figsize=(10, 6))
plt.plot(errors, alphas, linewidth=2.5, color='darkblue')
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5, label='Œ± = 0 (no contribution)')
plt.axvline(x=0.5, color='orange', linestyle='--', alpha=0.5, label='Œµ = 0.5 (random)')
plt.xlabel('Model Error (Œµ)', fontsize=12)
plt.ylabel('Model Weight (Œ±)', fontsize=12)
plt.title('AdaBoost: Model Weight as Function of Error', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend(fontsize=10)
plt.xlim(0, 1)
plt.ylim(-3, 3)

# Add annotations
plt.annotate('Perfect model\n(Œµ ‚Üí 0, Œ± ‚Üí ‚àû)', xy=(0.1, 2), fontsize=9,
            bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.7))
plt.annotate('Random model\n(Œµ = 0.5, Œ± = 0)', xy=(0.5, 0.3), fontsize=9,
            bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.7))
plt.annotate('Worse than random\n(Œµ > 0.5, Œ± < 0)', xy=(0.7, -1.5), fontsize=9,
            bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.7))

plt.tight_layout()
plt.show()

print("\nüí° Interpretation:")
print("  ‚Ä¢ Low error (< 0.5) ‚Üí Positive weight ‚Üí Model contributes positively")
print("  ‚Ä¢ High error (> 0.5) ‚Üí Negative weight ‚Üí Model contributes negatively")
print("  ‚Ä¢ Random (= 0.5) ‚Üí Zero weight ‚Üí Model ignored")

### üéØ Exercise 1: Understanding Sample Reweighting

Implement the sample weight update mechanism:

1. Create a simple dataset with 10 samples
2. Initialize equal weights (1/10 each)
3. Simulate a weak learner that misclassifies 3 samples
4. Calculate $\alpha$ based on error
5. Update and normalize sample weights
6. Visualize how weights change (which samples get emphasized?)

In [None]:
# Your code here


## 4. AdaBoost from Scratch

Let's implement a simplified version of AdaBoost to understand the mechanics:

In [None]:
class SimpleAdaBoost:
    """
    Simplified AdaBoost implementation for binary classification.
    
    This implementation helps understand the core algorithm.
    For production use, always use sklearn's AdaBoostClassifier.
    """
    
    def __init__(self, n_estimators=50):
        self.n_estimators = n_estimators
        self.estimators_ = []
        self.alphas_ = []
        self.weight_history_ = []
    
    def fit(self, X, y):
        """
        Train AdaBoost ensemble.
        
        Args:
            X: Feature matrix (n_samples, n_features)
            y: Target vector (n_samples,), values in {-1, +1}
        """
        n_samples = X.shape[0]
        
        # Initialize weights uniformly
        weights = np.ones(n_samples) / n_samples
        
        for t in range(self.n_estimators):
            # Save weights for visualization
            self.weight_history_.append(weights.copy())
            
            # Train weak learner on weighted data
            # Decision stump (depth 1 tree) is classic weak learner
            stump = DecisionTreeClassifier(max_depth=1, random_state=t)
            stump.fit(X, y, sample_weight=weights)
            
            # Get predictions
            predictions = stump.predict(X)
            
            # Calculate weighted error
            incorrect = (predictions != y)
            error = np.sum(weights * incorrect) / np.sum(weights)
            
            # Avoid division by zero and log(0)
            error = np.clip(error, 1e-10, 1 - 1e-10)
            
            # Calculate model weight (alpha)
            alpha = 0.5 * np.log((1 - error) / error)
            
            # Update sample weights
            weights *= np.exp(-alpha * y * predictions)
            
            # Normalize weights
            weights /= np.sum(weights)
            
            # Store estimator and its weight
            self.estimators_.append(stump)
            self.alphas_.append(alpha)
        
        return self
    
    def predict(self, X):
        """
        Make predictions using weighted majority voting.
        
        Args:
            X: Feature matrix (n_samples, n_features)
        
        Returns:
            predictions: Predicted classes (n_samples,)
        """
        # Get weighted sum of predictions
        weighted_sum = np.zeros(X.shape[0])
        
        for alpha, estimator in zip(self.alphas_, self.estimators_):
            weighted_sum += alpha * estimator.predict(X)
        
        # Return sign (-1 or +1)
        return np.sign(weighted_sum)
    
    def score(self, X, y):
        """
        Calculate accuracy score.
        """
        predictions = self.predict(X)
        return np.mean(predictions == y)

# Test our implementation
X, y = make_classification(
    n_samples=200,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_clusters_per_class=1,
    random_state=42
)

# Convert to {-1, +1}
y = 2 * y - 1

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train our AdaBoost
ada_custom = SimpleAdaBoost(n_estimators=50)
ada_custom.fit(X_train, y_train)

# Evaluate
train_acc = ada_custom.score(X_train, y_train)
test_acc = ada_custom.score(X_test, y_test)

print("\nüìä Custom AdaBoost Performance:")
print(f"Training Accuracy: {train_acc:.4f}")
print(f"Test Accuracy:     {test_acc:.4f}")
print(f"\nNumber of estimators: {len(ada_custom.estimators_)}")
print(f"\nModel weights (Œ±) range: [{min(ada_custom.alphas_):.4f}, {max(ada_custom.alphas_):.4f}]")

### Visualize AdaBoost Learning Process

In [None]:
# Visualize how sample weights evolve
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

# Show weight evolution at different stages
stages = [0, 5, 15, 49]  # Iteration numbers to visualize

for idx, stage in enumerate(stages):
    ax = axes[idx]
    
    # Get weights at this stage
    weights = ada_custom.weight_history_[stage]
    
    # Scale weights for visualization (larger circles = higher weight)
    sizes = weights * 50000
    
    # Plot samples colored by true class, sized by weight
    scatter = ax.scatter(
        X_train[:, 0], 
        X_train[:, 1],
        c=y_train,
        s=sizes,
        alpha=0.6,
        cmap='coolwarm',
        edgecolors='black',
        linewidth=0.5
    )
    
    ax.set_xlabel('Feature 1', fontsize=10)
    ax.set_ylabel('Feature 2', fontsize=10)
    ax.set_title(f'Iteration {stage + 1}\n(Larger circles = Higher weight)', 
                 fontsize=11, fontweight='bold')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° Observation:")
print("  As iterations progress, misclassified samples (near decision boundary)")
print("  get larger weights (bigger circles), forcing subsequent models to focus on them.")

### üéØ Exercise 2: Tracking Model Evolution

Analyze how the ensemble improves over iterations:

1. Calculate training and test accuracy after each iteration
2. Plot accuracy curves for both sets
3. Calculate and plot the ensemble's weighted voting margin
4. At what iteration does performance plateau?
5. Does AdaBoost overfit with too many iterations?

In [None]:
# Your code here


## 5. AdaBoost with Scikit-learn

### Using AdaBoostClassifier

In [None]:
# Load real dataset
data = load_breast_cancer()
X_cancer = data.data
y_cancer = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X_cancer, y_cancer, test_size=0.3, random_state=42
)

# Single decision stump (baseline)
stump = DecisionTreeClassifier(max_depth=1, random_state=42)
stump.fit(X_train, y_train)
stump_acc = stump.score(X_test, y_test)

# AdaBoost with default settings
ada_default = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=50,
    learning_rate=1.0,
    random_state=42
)
ada_default.fit(X_train, y_train)
ada_acc = ada_default.score(X_test, y_test)

# Random Forest (for comparison)
rf = RandomForestClassifier(n_estimators=50, random_state=42)
rf.fit(X_train, y_train)
rf_acc = rf.score(X_test, y_test)

# Compare
print("\nüìä Performance Comparison:")
print("=" * 50)
print(f"\nSingle Decision Stump: {stump_acc:.4f}")
print(f"AdaBoost (50 stumps):  {ada_acc:.4f}")
print(f"Random Forest:         {rf_acc:.4f}")
print(f"\n‚úÖ AdaBoost improvement over stump: {(ada_acc - stump_acc):.4f}")

# Visualize
models = ['Single\nStump', 'AdaBoost\n(50 stumps)', 'Random\nForest']
accuracies = [stump_acc, ada_acc, rf_acc]
colors = ['lightcoral', 'lightgreen', 'skyblue']

plt.figure(figsize=(10, 6))
bars = plt.bar(models, accuracies, color=colors, edgecolor='black', linewidth=2)
plt.ylabel('Test Accuracy', fontsize=12)
plt.title('Model Comparison: Weak Learner vs Ensembles', fontsize=14, fontweight='bold')
plt.ylim(0.85, 1.0)
plt.grid(True, alpha=0.3, axis='y')

for bar, acc in zip(bars, accuracies):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, height + 0.005,
            f'{acc:.4f}', ha='center', va='bottom', fontweight='bold', fontsize=11)

plt.tight_layout()
plt.show()

## 6. Hyperparameter Tuning

### Key AdaBoost Hyperparameters

1. **n_estimators**: Number of weak learners
   - More is usually better (with diminishing returns)
   - Can overfit with too many on noisy data
   
2. **learning_rate**: Shrinkage parameter
   - Controls contribution of each weak learner
   - Lower values ‚Üí more robust but need more estimators
   - Trade-off: n_estimators ‚Üë + learning_rate ‚Üì
   
3. **base_estimator**: The weak learner
   - Decision stumps (max_depth=1) are classic choice
   - Can use deeper trees for more complex patterns
   - Should be simple to avoid overfitting
   
4. **algorithm**: 'SAMME' or 'SAMME.R'
   - SAMME.R (default): Uses class probabilities (faster convergence)
   - SAMME: Uses class labels (original AdaBoost)

In [None]:
# Study effect of n_estimators and learning_rate
n_estimators_range = [10, 25, 50, 100, 200, 500]
learning_rates = [0.01, 0.1, 0.5, 1.0, 2.0]

results = []

for lr in learning_rates:
    scores = []
    for n_est in n_estimators_range:
        ada = AdaBoostClassifier(
            estimator=DecisionTreeClassifier(max_depth=1),
            n_estimators=n_est,
            learning_rate=lr,
            random_state=42
        )
        ada.fit(X_train, y_train)
        scores.append(ada.score(X_test, y_test))
    results.append(scores)

# Visualize
plt.figure(figsize=(12, 7))

for lr, scores in zip(learning_rates, results):
    plt.plot(n_estimators_range, scores, marker='o', linewidth=2, 
             markersize=8, label=f'learning_rate={lr}')

plt.xlabel('Number of Estimators', fontsize=12)
plt.ylabel('Test Accuracy', fontsize=12)
plt.title('AdaBoost: Effect of n_estimators and learning_rate', 
          fontsize=14, fontweight='bold')
plt.legend(fontsize=10, loc='lower right')
plt.grid(True, alpha=0.3)
plt.xscale('log')
plt.tight_layout()
plt.show()

print("\nüí° Key Observations:")
print("  ‚Ä¢ Higher learning_rate converges faster but may overfit")
print("  ‚Ä¢ Lower learning_rate is more stable but needs more estimators")
print("  ‚Ä¢ Optimal: Balance between n_estimators and learning_rate")

### üéØ Exercise 3: Base Estimator Complexity

Investigate effect of base estimator complexity:

1. Train AdaBoost with different max_depth values [1, 2, 3, 5, 10]
2. For each depth, vary n_estimators [10, 50, 100, 200]
3. Create heatmap showing test accuracy
4. Which combination works best?
5. Why might very deep trees perform worse?

In [None]:
# Your code here


## 7. When to Use AdaBoost

### ‚úÖ Use AdaBoost When:

1. **Simple base models underfit**
   - Single decision stump too weak
   - Need to reduce bias

2. **Interpretability somewhat important**
   - Can analyze individual stumps
   - More interpretable than deep trees

3. **Clean, well-labeled data**
   - AdaBoost sensitive to noisy labels
   - Outliers get high weights

4. **Binary classification**
   - Original algorithm designed for this
   - SAMME extension handles multiclass

### ‚ùå Avoid AdaBoost When:

1. **Data is very noisy**
   - Focuses on hard examples (including noise/outliers)
   - Can overfit to noise

2. **Need real-time predictions**
   - Sequential nature makes it slower than Random Forest
   - Can't parallelize as easily

3. **High-dimensional sparse data**
   - Gradient boosting (XGBoost, LightGBM) often better
   - AdaBoost can struggle with sparsity

### AdaBoost vs Random Forest

| Aspect | AdaBoost | Random Forest |
|--------|----------|---------------|
| **Training** | Sequential | Parallel |
| **Base learner** | Shallow trees (stumps) | Deep trees |
| **Focus** | Reduce bias | Reduce variance |
| **Noise sensitivity** | High | Low |
| **Overfitting risk** | Medium | Low |
| **Speed** | Slower | Faster (parallel) |
| **Best for** | Clean data, need accuracy | Noisy data, need robustness |

### üéØ Exercise 4: Robustness to Noise

Test how AdaBoost and Random Forest handle noisy labels:

1. Create a clean dataset
2. Flip random percentage of labels (0%, 5%, 10%, 20%, 30%)
3. Train both AdaBoost and Random Forest on each
4. Evaluate on clean test set
5. Plot performance vs noise level
6. Which is more robust to label noise?

In [None]:
# Your code here


## 8. Summary and Next Steps

### üéì Key Takeaways

1. **Boosting Philosophy**:
   - Sequential learning: each model corrects previous errors
   - Adaptive weighting: hard examples get more attention
   - Weak learners: even simple models help when combined

2. **AdaBoost Algorithm**:
   - Adaptively reweight samples based on errors
   - Weight models based on performance
   - Combine via weighted majority voting

3. **Key Hyperparameters**:
   - `n_estimators`: Number of weak learners (50-500 typical)
   - `learning_rate`: Shrinkage factor (0.1-1.0)
   - Base estimator: Usually decision stumps (max_depth=1)

4. **When to Use**:
   - ‚úÖ Clean data, need to reduce bias
   - ‚ùå Noisy data, need robustness

5. **Advantages**:
   - Simple and effective
   - Often achieves high accuracy
   - Works with various base learners
   - Some interpretability (can analyze stumps)

6. **Limitations**:
   - Sensitive to noisy data and outliers
   - Can overfit with too many estimators
   - Sequential (not parallelizable)
   - Requires proper tuning of learning_rate

### üìö What's Next?

- **Module 04**: Gradient Boosting Machines (more general framework)
- **Module 05**: XGBoost (optimized, regularized boosting)
- **Module 06**: LightGBM (fast, efficient boosting)
- **Module 07**: CatBoost (categorical feature handling)

### üéØ Practice Recommendations

1. Apply AdaBoost to your own classification problem
2. Compare with Random Forest on same data
3. Experiment with different base learners (not just trees)
4. Analyze which samples get highest weights (are they outliers?)
5. Try on a Kaggle dataset

### üìñ Additional Resources

- **Original Paper**: Freund & Schapire (1997). "A Decision-Theoretic Generalization of On-Line Learning"
- **Tutorial**: "A Short Introduction to Boosting" by Freund & Schapire
- **Sklearn Guide**: https://scikit-learn.org/stable/modules/ensemble.html#adaboost
- **Book**: "The Elements of Statistical Learning" Chapter 10

---

**üöÄ Ready for more powerful boosting? Let's explore Gradient Boosting in Module 04!**