# Ensemble Techniques: Bagging and Boosting

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand ensemble methods (Bagging, Boosting)
- Experiment with ensemble techniques to improve model performance
- Compare bagging vs boosting approaches
- Apply ensemble methods to classification problems

## ðŸ”— Prerequisites

- âœ… Understanding of decision trees and classification
- âœ… Python 3.8+ installed

---

## Official Structure Reference

This notebook covers practical activities from **Course 04, Unit 3**:
- Experimenting with ensemble techniques (Bagging, Boosting) to improve prediction accuracy
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 3 Practical Content

---

## Introduction to Ensemble Methods

**Ensemble methods** combine multiple models to improve performance:
- **Bagging**: Parallel training, reduces variance (e.g., Random Forest)
- **Boosting**: Sequential training, reduces bias (e.g., AdaBoost, Gradient Boosting)


In [None]:
import numpy as np
from sklearn.datasets 
import make_classification
from sklearn.model_selection 
import train_test_split
from sklearn.tree 
import DecisionTreeClassifier
from sklearn.ensemble 
import BaggingClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics 
import accuracy_score, classification_report_
print("âœ… Libraries imported successfully!")


In [None]:
# Generate dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set: {X_train.shape}, Test set: {X_test.shape}")


## Part 1: Base Model (Single Decision Tree)


In [None]:
# Base model: Single decision tree_base_tree = DecisionTreeClassifier(random_state=42)
base_tree.fit(X_train, y_train)_base_pred =  base_tree.predict(X_test)
base_pred = base_tree.predict(X_test)_base_acc =  accuracy_score(y_test, base_pred)
base_acc = accuracy_score(y_test, base_pred)

print("=" * 60)
print("Base Model: Single Decision Tree")
print("=" * 60)
print(f"Accuracy: {base_acc:.4f}")


## Part 2: Bagging (Random Forest is a type of Bagging)


In [None]:
# Bagging Classifier_bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=100,
    random_state=42)
bagging.fit(X_train, y_train)_bagging_pred =  bagging.predict(X_test)
bagging_pred = bagging.predict(X_test)_bagging_acc =  accuracy_score(y_test, bagging_pred)
bagging_acc = accuracy_score(y_test, bagging_pred)

print("=" * 60)
print("Bagging Classifier (100 trees)")
print("=" * 60)
print(f"Accuracy: {bagging_acc:.4f}")
print(f"Improvement over base: {bagging_acc - base_acc:.4f}")


## Part 3: Boosting Methods


In [None]:
# AdaBoost_adaboost = AdaBoostClassifier(n_estimators=100, random_state=42)
adaboost.fit(X_train, y_train)_adaboost_pred =  adaboost.predict(X_test)
adaboost_pred = adaboost.predict(X_test)_adaboost_acc =  accuracy_score(y_test, adaboost_pred)
adaboost_acc = accuracy_score(y_test, adaboost_pred)

# Gradient Boosting_gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)_gb_pred =  gb.predict(X_test)
gb_pred = gb.predict(X_test)_gb_acc =  accuracy_score(y_test, gb_pred)
gb_acc = accuracy_score(y_test, gb_pred)

print("=" * 60)
print("Boosting Methods:")
print("=" * 60)
print(f"AdaBoost Accuracy: {adaboost_acc:.4f}")
print(f"Gradient Boosting Accuracy: {gb_acc:.4f}")

print("\n" + "=" * 60)
print("Comparison Summary:")
print("=" * 60)
print(f"Base Tree:        {base_acc:.4f}")
print(f"Bagging:          {bagging_acc:.4f} (+{bagging_acc-base_acc:.4f})")
print(f"AdaBoost:         {adaboost_acc:.4f} (+{adaboost_acc-base_acc:.4f})")
print(f"Gradient Boost:   {gb_acc:.4f} (+{gb_acc-base_acc:.4f})")


## Summary

### Key Concepts:
1. **Bagging**: Trains models in parallel, averages predictions (reduces variance)
2. **Boosting**: Trains models sequentially, focuses on errors (reduces bias)
3. **Ensemble Benefit**: Multiple models outperform single models
4. **When to use**: Bagging for high variance models, Boosting for high bias models

**Reference:** Course 04, Unit 3: "Experimenting with ensemble techniques (Bagging, Boosting)"
