# Ensemble Methods

Ensemble methods combine multiple models to improve accuracy and robustness. The main types are:

1. Bagging (Bootstrap Aggregating)
2. Boosting (Sequential Learning)
3. Stacking (Blending Multiple Models)

We'll demonstrate these using sklearn on the breast cancer dataset.

Lets load the data and preprocess it

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

<span style="color: #00008B;">**Bagging (Bootstrap Aggregating)**</span>

Bagging reduces variance by training multiple models on random subsets of the data.

In [3]:
# Bagging using Decision Tree
bagging_clf = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50, random_state=42)
bagging_clf.fit(X_train, y_train)
y_pred_bagging = bagging_clf.predict(X_test)

# Evaluate
print(f"Bagging Accuracy: {accuracy_score(y_test, y_pred_bagging):.2f}")

Bagging Accuracy: 0.96


This how bagging works; 

- Uses multiple independent models (often the same type).
- Reduces variance (helps in overfitting).

Example: Random Forest is an extension of bagging.


<span style="color: #00008B;">**Boosting (Sequential Learning)**</span>

Boosting trains models sequentially, where each model corrects previous errors.

a. AdaBoost (Adaptive Boosting)

AdaBoost assigns higher weights to misclassified instances.

In [4]:
# AdaBoost with Decision Trees
adaboost_clf = AdaBoostClassifier(n_estimators=50, random_state=42)
adaboost_clf.fit(X_train, y_train)
y_pred_adaboost = adaboost_clf.predict(X_test)

# Evaluate
print(f"AdaBoost Accuracy: {accuracy_score(y_test, y_pred_adaboost):.2f}")

AdaBoost Accuracy: 0.96


b. Gradient Boosting

Gradient Boosting minimizes errors using gradient descent

In [6]:
# Gradient Boosting Classifier
gb_clf = GradientBoostingClassifier(n_estimators=50, random_state=42)
gb_clf.fit(X_train, y_train)
y_pred_gb = gb_clf.predict(X_test)

# Evaluate
print(f"Gradient Boosting Accuracy: {accuracy_score(y_test, y_pred_gb):.2f}")

Gradient Boosting Accuracy: 0.96


Here is how bagging works; 

- Boosting focuses on hard-to-classify examples.
- Stronger than bagging but prone to overfitting.

Example: XGBoost, LightGBM, CatBoost.

<span style="color: #00008B;">**Stacking (Blending Multiple Models)**</span>

Stacking combines different models and uses a meta-model for final predictions.

In [8]:
# Define base models
estimators = [
    ('rf', DecisionTreeClassifier()),
    ('svc', SVC(probability=True))
]

# Stacking Classifier with Logistic Regression as meta-model
stacking_clf = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stacking_clf.fit(X_train, y_train)
y_pred_stacking = stacking_clf.predict(X_test)

# Evaluate
print(f"Stacking Accuracy: {accuracy_score(y_test, y_pred_stacking):.2f}")

Stacking Accuracy: 0.97


Stacking combines predictions from different models. It is the strongest in prediction acuracy but computationally expensive. Uses a meta-model(.eg. logitistic regression)

|Method	| Description | Example Models|
|:------:|:------:|:-------:|
|Bagging | Uses multiple independent models to reduce variance.	| Random Forest|
|Boosting | Sequentially learns from errors to improve accuracy.|	AdaBoost, Gradient Boosting, XGBoost|
|Stacking |	Combines different models using a meta-learner.	| Blending multiple classifiers|


When faced with a an ML problem here is a brief guide to check on; 

- Bagging: If overfitting is an issue (Random Forest).
- Boosting: If high accuracy is required (XGBoost, LightGBM).
- Stacking: If multiple diverse models work well together.

However, ty experimenting with different models for better results!🚀😊