# Ensemble Techniques: Bagging and Boosting

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand ensemble methods (Bagging, Boosting)
- Experiment with ensemble techniques to improve model performance
- Compare bagging vs boosting approaches
- Apply ensemble methods to classification problems

## ðŸ”— Prerequisites

- âœ… Understanding of decision trees and classification
- âœ… Python 3.8+ installed

---

## Official Structure Reference

This notebook covers practical activities from **Course 04, Unit 3**:
- Experimenting with ensemble techniques (Bagging, Boosting) to improve prediction accuracy
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 3 Practical Content

---

## Introduction to Ensemble Methods

**Ensemble methods** combine multiple models to improve performance:
- **Bagging**: Parallel training, reduces variance (e.g., Random Forest)
- **Boosting**: Sequential training, reduces bias (e.g., AdaBoost, Gradient Boosting)


In [1]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report

print("âœ… Libraries imported successfully!")


âœ… Libraries imported successfully!


In [2]:
# Generate dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set: {X_train.shape}, Test set: {X_test.shape}")


Training set: (800, 10), Test set: (200, 10)


## Part 1: Base Model (Single Decision Tree)


In [3]:
# Base model: Single decision tree
base_tree = DecisionTreeClassifier(random_state=42)
base_tree.fit(X_train, y_train)
base_pred = base_tree.predict(X_test)
base_acc = accuracy_score(y_test, base_pred)

print("=" * 60)
print("Base Model: Single Decision Tree")
print("=" * 60)
print(f"Accuracy: {base_acc:.4f}")


Base Model: Single Decision Tree
Accuracy: 0.8400


## Part 2: Bagging (Random Forest is a type of Bagging)


In [4]:
# Bagging Classifier
bagging = BaggingClassifier(
    estimator=DecisionTreeClassifier(),
    n_estimators=100,
    random_state=42
)
bagging.fit(X_train, y_train)
bagging_pred = bagging.predict(X_test)
bagging_acc = accuracy_score(y_test, bagging_pred)

print("=" * 60)
print("Bagging Classifier (100 trees)")
print("=" * 60)
print(f"Accuracy: {bagging_acc:.4f}")
print(f"Improvement over base: {bagging_acc - base_acc:.4f}")


Bagging Classifier (100 trees)
Accuracy: 0.8950
Improvement over base: 0.0550


## Part 3: Boosting Methods


In [5]:
# AdaBoost
adaboost = AdaBoostClassifier(n_estimators=100, random_state=42)
adaboost.fit(X_train, y_train)
adaboost_pred = adaboost.predict(X_test)
adaboost_acc = accuracy_score(y_test, adaboost_pred)

# Gradient Boosting
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)
gb_pred = gb.predict(X_test)
gb_acc = accuracy_score(y_test, gb_pred)

print("=" * 60)
print("Boosting Methods:")
print("=" * 60)
print(f"AdaBoost Accuracy: {adaboost_acc:.4f}")
print(f"Gradient Boosting Accuracy: {gb_acc:.4f}")

print("\n" + "=" * 60)
print("Comparison Summary:")
print("=" * 60)
print(f"Base Tree:        {base_acc:.4f}")
print(f"Bagging:          {bagging_acc:.4f} (+{bagging_acc-base_acc:.4f})")
print(f"AdaBoost:         {adaboost_acc:.4f} (+{adaboost_acc-base_acc:.4f})")
print(f"Gradient Boost:   {gb_acc:.4f} (+{gb_acc-base_acc:.4f})")


Boosting Methods:
AdaBoost Accuracy: 0.8500
Gradient Boosting Accuracy: 0.9000

Comparison Summary:
Base Tree:        0.8400
Bagging:          0.8950 (+0.0550)
AdaBoost:         0.8500 (+0.0100)
Gradient Boost:   0.9000 (+0.0600)


## Summary

### Key Concepts:
1. **Bagging**: Trains models in parallel, averages predictions (reduces variance)
2. **Boosting**: Trains models sequentially, focuses on errors (reduces bias)
3. **Ensemble Benefit**: Multiple models outperform single models
4. **When to use**: Bagging for high variance models, Boosting for high bias models

**Reference:** Course 04, Unit 3: "Experimenting with ensemble techniques (Bagging, Boosting)"
