## Day 35 — Ensemble Learning: Boosting, AdaBoost & Gradient Boosting

This notebook is part of my **Machine Learning Learning Journey** and focuses on
**Boosting-based ensemble methods**.

The session builds on Bagging and Random Forest concepts and explains:
- Why boosting is needed
- Sequential ensemble learning
- Bias reduction using boosting
- AdaBoost (Adaptive Boosting)
- Gradient Boosting intuition
- Overfitting control and regularization


## 1. Recap: Ensemble Learning

Ensemble Learning:
> Combine multiple learners to build a strong model

Key goals:
- Improve generalization
- Reduce bias or variance
- Stabilize predictions

Types:
- Bagging → reduces variance
- Boosting → reduces bias


## 2. Why Boosting?

Single models may suffer from:
- High bias (underfitting)
- Inability to learn complex patterns

Boosting:
- Learners are trained **sequentially**
- Each new learner focuses on **previous errors**
- Overall bias is reduced


## 3. Bagging vs Boosting

| Aspect | Bagging | Boosting |
|------|--------|----------|
| Training | Parallel | Sequential |
| Goal | Reduce variance | Reduce bias |
| Data sampling | Bootstrap sampling | Reweighted samples |
| Dependency | Independent learners | Dependent learners |


## 4. Sequential Learning in Boosting

Workflow:
1. Train weak learner 1
2. Identify misclassified points
3. Increase importance of misclassified samples
4. Train next learner
5. Repeat until error is minimized


## 5. Weak Learners

Weak learner:
- Slightly better than random guessing
- High bias model

Common choice:
- Decision Tree with depth = 1
- Also called **Decision Stump**


## 6. AdaBoost (Adaptive Boosting)

AdaBoost is a **sequential ensemble algorithm** where:
- Each learner focuses on previous mistakes
- Misclassified samples get higher weight
- Correctly classified samples get lower weight

Base learner:
- Decision Stump (1 split decision tree)


## 7. AdaBoost Training Intuition

Steps:
1. Assign equal weights to all samples
2. Train weak learner
3. Compute error rate
4. Increase weight of misclassified points
5. Train next learner on reweighted data
6. Repeat until minimum error


## 8. AdaBoost Model Representation

- Final model:
\[
F(x) = \alpha_1 h_1(x) + \alpha_2 h_2(x) + \dots + \alpha_n h_n(x)
\]

Where:
- \( h_i(x) \) = weak learner
- \( \alpha_i \) = importance of learner


## 9. Boosting & Imbalanced Data

Boosting naturally helps with imbalance:
- Misclassified minority samples are reweighted
- Minority patterns are learned more effectively
- Reduces bias toward majority class


## 10. Gradient Boosting Motivation

Instead of reweighting samples,
Gradient Boosting:
- Fits models on **residual errors**
- Optimizes loss using **gradient descent**


## 11. Gradient Boosting Intuition

Steps:
1. Start with a simple model (mean prediction)
2. Compute residuals (errors)
3. Train shallow decision tree on residuals
4. Add new tree to model
5. Repeat sequentially


## 12. Learning Rate (η)

Learning rate controls:
- Contribution of each tree

Small η:
- Slower learning
- Better generalization

Large η:
- Faster learning
- Risk of overfitting


## 13. Overfitting in Gradient Boosting

Risk:
- Too many trees
- Deep trees
- High learning rate

Solution:
- Shallow trees
- Low learning rate
- Hyperparameter tuning


## 14. Extreme Gradient Boosting (XGBoost)

XGBoost improves Gradient Boosting by:
- Parallelized tree construction
- Regularization (L1 & L2)
- Better handling of overfitting
- GPU acceleration

Used widely in competitions and industry.


## AdaBoost Example

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

ada = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=50,
    learning_rate=1.0
)

ada.fit(X_train, y_train)
print("AdaBoost Accuracy:", accuracy_score(y_test, ada.predict(X_test)))



AdaBoost Accuracy: 0.9736842105263158


## Gradient Boosting Example

In [2]:
from sklearn.ensemble import GradientBoostingClassifier

gb = GradientBoostingClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3
)

gb.fit(X_train, y_train)
print("Gradient Boosting Accuracy:", accuracy_score(y_test, gb.predict(X_test)))


Gradient Boosting Accuracy: 0.956140350877193


## Summary

- Boosting is a sequential ensemble technique
- Focuses on reducing bias
- AdaBoost reweights misclassified samples
- Gradient Boosting fits models on residuals
- Learning rate controls contribution of trees
- XGBoost adds regularization and parallelism
- Boosting is powerful but prone to overfitting if not tuned
