# Module 8: Ensemble Methods

---

Ensemble methods combine multiple models to produce a stronger, more robust predictor than any individual model. This module covers the three main ensemble strategies: **bagging**, **boosting**, and **stacking**.

---

## Table of Contents

1. [Why Ensembles Work](#1.-Why-Ensembles-Work)
2. [Bagging and Random Forests](#2.-Bagging-and-Random-Forests)
3. [Boosting: AdaBoost and Gradient Boosting](#3.-Boosting)
4. [XGBoost](#4.-XGBoost)
5. [Stacking](#5.-Stacking)
6. [Comprehensive Comparison](#6.-Comprehensive-Comparison)
7. [Exercises](#7.-Exercises)
8. [Summary and Further Reading](#8.-Summary-and-Further-Reading)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_breast_cancer, make_classification
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (BaggingClassifier, RandomForestClassifier,
                               AdaBoostClassifier, GradientBoostingClassifier,
                               StackingClassifier, VotingClassifier)
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

# Prepare data
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

---

## 1. Why Ensembles Work

The core idea behind ensemble methods is the **wisdom of crowds**: combining many imperfect models often produces a result that is better than any single model.

| Strategy | Description | Key Idea |
|----------|------------|----------|
| **Bagging** | Train many models on bootstrap samples, average/vote | Reduces variance |
| **Boosting** | Train models sequentially, each fixing previous errors | Reduces bias |
| **Stacking** | Use a meta-model to combine predictions of base models | Leverages diversity |

In [None]:
# Demonstrate that a single weak classifier has high variance
# while an ensemble of weak classifiers is more stable

n_trials = 20
single_accuracies = []
ensemble_accuracies = []

for i in range(n_trials):
    # Random subsample
    idx = np.random.choice(len(X_train_s), size=int(0.7 * len(X_train_s)), replace=True)
    
    # Single tree
    dt = DecisionTreeClassifier(max_depth=3, random_state=i)
    dt.fit(X_train_s[idx], y_train[idx])
    single_accuracies.append(dt.score(X_test_s, y_test))
    
    # Bagging ensemble (50 trees)
    bag = BaggingClassifier(estimator=DecisionTreeClassifier(max_depth=3),
                            n_estimators=50, random_state=i)
    bag.fit(X_train_s[idx], y_train[idx])
    ensemble_accuracies.append(bag.score(X_test_s, y_test))

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(range(n_trials), single_accuracies, 'o-', label=f'Single Tree (std={np.std(single_accuracies):.3f})',
        color='#FF5722', linewidth=1.5)
ax.plot(range(n_trials), ensemble_accuracies, 's-', label=f'Ensemble of 50 Trees (std={np.std(ensemble_accuracies):.3f})',
        color='#2196F3', linewidth=1.5)
ax.set_xlabel('Trial', fontsize=13)
ax.set_ylabel('Test Accuracy', fontsize=13)
ax.set_title('Variance Reduction Through Ensembling', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
plt.tight_layout()
plt.show()

print(f"Single tree — Mean: {np.mean(single_accuracies):.4f}, Std: {np.std(single_accuracies):.4f}")
print(f"Ensemble    — Mean: {np.mean(ensemble_accuracies):.4f}, Std: {np.std(ensemble_accuracies):.4f}")
print("\nThe ensemble is both more accurate and more stable across trials.")

---

## 2. Bagging and Random Forests

**Bagging (Bootstrap Aggregating)** creates multiple models trained on random subsets of the training data (with replacement). Final prediction is made by majority vote (classification) or averaging (regression).

**Random Forest** is bagging applied to decision trees, with an additional twist: at each split, only a random subset of features is considered. This further decorrelates the trees.

In [None]:
# Random Forest with different numbers of trees
n_trees_range = [1, 5, 10, 25, 50, 100, 200, 300]
rf_accuracies = []

for n_trees in n_trees_range:
    rf = RandomForestClassifier(n_estimators=n_trees, random_state=42, n_jobs=-1)
    rf.fit(X_train_s, y_train)
    rf_accuracies.append(rf.score(X_test_s, y_test))

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(n_trees_range, rf_accuracies, 'o-', linewidth=2, color='#4CAF50', markersize=8)
ax.set_xlabel('Number of Trees', fontsize=13)
ax.set_ylabel('Test Accuracy', fontsize=13)
ax.set_title('Random Forest — Accuracy vs Number of Trees', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("Performance improves rapidly at first, then plateaus.")
print("More trees rarely hurt accuracy but increase computation time.")

In [None]:
# Feature importance from Random Forest
rf_final = RandomForestClassifier(n_estimators=200, random_state=42, n_jobs=-1)
rf_final.fit(X_train_s, y_train)

importance = pd.DataFrame({
    'Feature': cancer.feature_names,
    'Importance': rf_final.feature_importances_
}).sort_values('Importance', ascending=False).head(15)

fig, ax = plt.subplots(figsize=(10, 7))
ax.barh(importance['Feature'], importance['Importance'], color='#4CAF50', edgecolor='white')
ax.set_xlabel('Feature Importance (Mean Decrease in Impurity)', fontsize=13)
ax.set_title('Top 15 Features — Random Forest', fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()

print(f"\nRandom Forest Test Accuracy: {rf_final.score(X_test_s, y_test):.4f}")

---

## 3. Boosting: AdaBoost and Gradient Boosting

**Boosting** trains models sequentially, where each new model focuses on the mistakes of the previous models.

- **AdaBoost**: Adjusts sample weights — misclassified samples get higher weight.
- **Gradient Boosting**: Fits new models to the residual errors of the ensemble.

In [None]:
# AdaBoost
ada = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),  # weak learners (stumps)
    n_estimators=200,
    learning_rate=0.5,
    random_state=42
)
ada.fit(X_train_s, y_train)

# Gradient Boosting
gb = GradientBoostingClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=3,
    random_state=42
)
gb.fit(X_train_s, y_train)

print("Boosting Results — Breast Cancer Dataset")
print("=" * 50)
print(f"AdaBoost:          Train={ada.score(X_train_s, y_train):.4f}  Test={ada.score(X_test_s, y_test):.4f}")
print(f"Gradient Boosting: Train={gb.score(X_train_s, y_train):.4f}  Test={gb.score(X_test_s, y_test):.4f}")

In [None]:
# Staged prediction — how accuracy improves as trees are added
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# AdaBoost staged
ada_staged_train = list(ada.staged_score(X_train_s, y_train))
ada_staged_test = list(ada.staged_score(X_test_s, y_test))
axes[0].plot(range(1, len(ada_staged_train)+1), ada_staged_train, label='Train', color='#2196F3', linewidth=2)
axes[0].plot(range(1, len(ada_staged_test)+1), ada_staged_test, label='Test', color='#FF5722', linewidth=2)
axes[0].set_xlabel('Number of Estimators', fontsize=12)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('AdaBoost — Staged Performance', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=11)

# Gradient Boosting staged
gb_staged_train = list(gb.staged_score(X_train_s, y_train))
gb_staged_test = list(gb.staged_score(X_test_s, y_test))
axes[1].plot(range(1, len(gb_staged_train)+1), gb_staged_train, label='Train', color='#2196F3', linewidth=2)
axes[1].plot(range(1, len(gb_staged_test)+1), gb_staged_test, label='Test', color='#FF5722', linewidth=2)
axes[1].set_xlabel('Number of Estimators', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_title('Gradient Boosting — Staged Performance', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=11)

plt.tight_layout()
plt.show()

---

## 4. XGBoost

**XGBoost (Extreme Gradient Boosting)** is an optimized gradient boosting library widely used in machine learning competitions and industry. It adds regularization, handles missing values, and is highly efficient.

Note: XGBoost is an external library. If not installed, run `pip install xgboost`.

In [None]:
try:
    from xgboost import XGBClassifier
    
    xgb = XGBClassifier(
        n_estimators=200,
        learning_rate=0.1,
        max_depth=3,
        random_state=42,
        use_label_encoder=False,
        eval_metric='logloss'
    )
    xgb.fit(X_train_s, y_train)
    
    print(f"XGBoost Results:")
    print(f"  Train accuracy: {xgb.score(X_train_s, y_train):.4f}")
    print(f"  Test accuracy:  {xgb.score(X_test_s, y_test):.4f}")
    
    # Feature importance
    xgb_importance = pd.DataFrame({
        'Feature': cancer.feature_names,
        'Importance': xgb.feature_importances_
    }).sort_values('Importance', ascending=False).head(10)
    
    fig, ax = plt.subplots(figsize=(10, 5))
    ax.barh(xgb_importance['Feature'], xgb_importance['Importance'], color='#9C27B0', edgecolor='white')
    ax.set_xlabel('Feature Importance', fontsize=13)
    ax.set_title('Top 10 Features — XGBoost', fontsize=14, fontweight='bold')
    ax.invert_yaxis()
    plt.tight_layout()
    plt.show()
    
except ImportError:
    print("XGBoost is not installed. To install, run: pip install xgboost")
    print("Skipping this section.")

---

## 5. Stacking

**Stacking** trains a meta-model on the predictions of several base models. The idea is that different models capture different patterns, and a meta-learner can combine them optimally.

In [None]:
# Stacking ensemble
base_models = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('svm', SVC(kernel='rbf', probability=True, random_state=42)),
]

stacking = StackingClassifier(
    estimators=base_models,
    final_estimator=LogisticRegression(max_iter=5000),
    cv=5
)

stacking.fit(X_train_s, y_train)
print(f"Stacking Ensemble:")
print(f"  Train accuracy: {stacking.score(X_train_s, y_train):.4f}")
print(f"  Test accuracy:  {stacking.score(X_test_s, y_test):.4f}")

---

## 6. Comprehensive Comparison

In [None]:
# Compare all ensemble methods
all_models = {
    'Single Decision Tree': DecisionTreeClassifier(max_depth=5, random_state=42),
    'Bagging (50 Trees)': BaggingClassifier(n_estimators=50, random_state=42),
    'Random Forest (200)': RandomForestClassifier(n_estimators=200, random_state=42),
    'AdaBoost (200)': AdaBoostClassifier(n_estimators=200, random_state=42),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=200, random_state=42),
}

results = []
cv_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for name, model in all_models.items():
    scores = cross_val_score(model, X_train_s, y_train, cv=cv_fold, scoring='accuracy')
    model.fit(X_train_s, y_train)
    test_acc = model.score(X_test_s, y_test)
    results.append({
        'Model': name,
        'CV Mean': scores.mean(),
        'CV Std': scores.std(),
        'Test Acc': test_acc
    })

results_df = pd.DataFrame(results).sort_values('Test Acc', ascending=False)
print("Ensemble Methods Comparison — Breast Cancer Dataset")
print("=" * 70)
print(results_df.to_string(index=False))

# Bar chart
fig, ax = plt.subplots(figsize=(12, 6))
x = np.arange(len(results_df))
ax.bar(x, results_df['Test Acc'], color=['#FF5722', '#2196F3', '#4CAF50', '#FF9800', '#9C27B0'],
       edgecolor='white')
ax.set_xticks(x)
ax.set_xticklabels(results_df['Model'], rotation=20, ha='right')
ax.set_ylabel('Test Accuracy', fontsize=13)
ax.set_title('Ensemble Methods — Test Accuracy Comparison', fontsize=14, fontweight='bold')
ax.set_ylim(0.92, 1.0)
for i, v in enumerate(results_df['Test Acc']):
    ax.text(i, v + 0.002, f'{v:.3f}', ha='center', fontsize=11)
plt.tight_layout()
plt.show()

---

## 7. Exercises

### Exercise 1: Random Forest Tuning

In [None]:
# Exercise 1: Tune a Random Forest using GridSearchCV
# Search over:
#   n_estimators: [50, 100, 200]
#   max_depth: [3, 5, 10, None]
#   min_samples_split: [2, 5, 10]
#
# Report the best parameters and best CV score.
# Compare the tuned model's test accuracy with a default Random Forest.

from sklearn.model_selection import GridSearchCV

# Your code here:


### Exercise 2: Boosting Learning Rate Analysis

In [None]:
# Exercise 2: Analyze the effect of learning_rate on Gradient Boosting
# 1. Try learning rates: [0.01, 0.05, 0.1, 0.3, 0.5, 1.0]
# 2. Use n_estimators=200, max_depth=3
# 3. Record train and test accuracy for each
# 4. Plot learning_rate vs accuracy
# 5. What is the relationship between learning_rate and overfitting?

# Your code here:


### Exercise 3: Voting Classifier

In [None]:
# Exercise 3: Build a VotingClassifier
# 1. Combine: LogisticRegression, RandomForest, SVM
# 2. Try both 'hard' voting (majority) and 'soft' voting (probability-based)
# 3. Compare their test accuracies
# 4. Is the voting classifier better than any individual model?

# Your code here:


---

## 8. Summary and Further Reading

### What We Covered

| Method | Strategy | Key Benefit | When to Use |
|--------|----------|-------------|-------------|
| Bagging | Parallel, bootstrap | Reduces variance | High-variance models (deep trees) |
| Random Forest | Bagging + random features | Robust, feature importance | General-purpose, strong baseline |
| AdaBoost | Sequential, reweight samples | Focuses on hard examples | Weak learners |
| Gradient Boosting | Sequential, fit residuals | Often best accuracy | Tabular data competitions |
| XGBoost | Optimized gradient boosting | Speed, regularization | Industry standard for tabular data |
| Stacking | Meta-learner | Leverages model diversity | When you have diverse base models |

### Recommended Reading

- [Scikit-learn Ensemble Methods](https://scikit-learn.org/stable/modules/ensemble.html)
- [XGBoost Documentation](https://xgboost.readthedocs.io/)
- Chapter 7 of Aurélien Géron, *Hands-On Machine Learning* (Ensemble Learning and Random Forests)

### Next Module

In **Module 9: Neural Networks and Deep Learning**, we will explore the fundamentals of neural networks, from perceptrons to convolutional networks, and build models using Keras/TensorFlow.

---