# Ensemble Learning with Python

Ensemble learning methods leverage multiple base models to improve prediction performance compared to any individual estimator. In this document, we explore popular ensemble methods and implement examples using scikit‑learn.

## Bagging

Bagging (Bootstrap Aggregating) builds multiple models (typically of the same type) from different subsamples of the training dataset. The predictions are then aggregated (e.g., via majority voting) to produce the final prediction.

In [4]:
from sklearn.datasets import load_iris
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a bagging classifier with decision trees
bagging = BaggingClassifier(
    estimator=DecisionTreeClassifier(),
    n_estimators=10,
    random_state=42
)

# Train the classifier and make predictions
bagging.fit(X_train, y_train)
y_pred = bagging.predict(X_test)

# Evaluate the model
print("Bagging Accuracy:", accuracy_score(y_test, y_pred))


Bagging Accuracy: 1.0


## Boosting

Boosting is a sequential ensemble technique in which subsequent models attempt to correct the errors of their predecessors. One of the most popular boosting methods is AdaBoost.

In [5]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

# Create an AdaBoost classifier with decision trees as base estimators
ada = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=50,
    random_state=42
)

# Train the AdaBoost classifier and make predictions
ada.fit(X_train, y_train)
y_pred_ada = ada.predict(X_test)

# Evaluate the model
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred_ada))


AdaBoost Accuracy: 1.0


## Voting Classifier

Voting classifiers combine conceptually different machine learning models to make a final decision based on a majority vote (hard voting) or by averaging predicted probabilities (soft voting).

In [6]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, VotingClassifier

# Initialize base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = RandomForestClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Combine classifiers using hard voting
voting = VotingClassifier(
    estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
    voting='hard'
)

# Train the voting classifier and make predictions
voting.fit(X_train, y_train)
y_pred_voting = voting.predict(X_test)

# Evaluate the model
print("Voting Classifier Accuracy:", accuracy_score(y_test, y_pred_voting))

Voting Classifier Accuracy: 1.0


## Stacking

Stacking (stacked generalization) involves training a new model (the meta-learner) to combine the predictions of several base models. The predictions of the base models serve as inputs for the meta-model.

In [7]:
from sklearn.ensemble import StackingClassifier

# Define the base estimators for stacking
estimators = [
    ('lr', LogisticRegression(random_state=42)),
    ('rf', RandomForestClassifier(random_state=42))
]

# Define a stacking classifier with a Logistic Regression meta-classifier
stacking = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression()
)

# Train the stacking classifier and make predictions
stacking.fit(X_train, y_train)
y_pred_stack = stacking.predict(X_test)

# Evaluate the model
print("Stacking Classifier Accuracy:", accuracy_score(y_test, y_pred_stack))


Stacking Classifier Accuracy: 1.0
