# Ensemble learning 

is a powerful machine learning paradigm where multiple models (often referred to as "base learners" or "weak learners") are combined to produce a single, robust model. The primary goal of ensemble learning is to improve the performance, accuracy, and generalization of the model compared to individual base learners.

# Key Concepts in Ensemble Learning

1- Bagging (Bootstrap Aggregating): 

    a- In bagging, multiple models are trained independently on different subsets of the training data (obtained through bootstrap sampling). 

    b- The final prediction is typically made by averaging (for regression) or voting (for classification) the predictions of all the models. 

    c- Random Forest is a popular bagging technique.

2- Boosting: 

    a- Boosting sequentially trains models, where each model attempts to correct the errors of the previous model. 

    b- This focuses on the difficult cases in the training set. 

    c- Popular boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost.

3- Stacking: 

    a- Stacking involves training multiple models (base learners) and then using another model (meta-learner) to combine their predictions.

    b- The meta-learner typically takes the predictions of the base learners as input features. 




In [1]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

In [2]:
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

In [3]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Random Forest Classifier

Key Parameters of Random Forest

1- n_estimators: The number of trees in the forest. More trees generally improve performance but also increase computation time.

2- max_depth: The maximum depth of each tree. Limiting the depth can help prevent overfitting.

3- min_samples_split: The minimum number of samples required to split an internal node.

4- min_samples_leaf: The minimum number of samples required to be at a leaf node.

5- max_features: The number of features to consider when looking for the best split.

In [4]:
# Initialize the Random Forest classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model
rf.fit(X_train, y_train)


In [5]:
# Make predictions on the test set
y_pred = rf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')
print('Classification Report:\n', classification_report(y_test, y_pred))

Accuracy: 1.000
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# AdaBoost Classifier

Key Parameters of AdaBoost

1- estimator: The base estimator from which the boosted ensemble is built. Commonly used base estimators include decision trees.

2- n_estimators: The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.

3- learning_rate: The learning rate shrinks the contribution of each classifier by learning_rate.

In [6]:

# Initialize the base estimator
base_estimator = DecisionTreeClassifier(max_depth=1) # stump decision

# Initialize the AdaBoost classifier
ada = AdaBoostClassifier(estimator=base_estimator, n_estimators=50, random_state=42)

# Fit the model
ada.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ada.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')
print('Classification Report:\n', classification_report(y_test, y_pred))

Accuracy: 0.933
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.89      0.89      0.89         9
           2       0.91      0.91      0.91        11

    accuracy                           0.93        30
   macro avg       0.93      0.93      0.93        30
weighted avg       0.93      0.93      0.93        30



# Bagging (Bootstrap Aggregating)

Concept:

Bagging aims to reduce the variance of a model by training multiple models (usually of the same type) on different subsets of the training data and then combining their predictions.

Process:

Data Sampling: Create multiple subsets of the training data by randomly sampling with replacement (bootstrap sampling).

Model Training: Train a model (base learner) on each of these subsets independently.

Prediction Aggregation: 

    Aggregate the predictions of all the models. 

    For classification, this is typically done by voting (majority rule). 

    For regression, this is usually done by averaging the predictions.

Advantages:

Reduces overfitting by combining multiple models.

Effective for high variance models like decision trees.

Example: Random Forest is a well-known example of a bagging technique where multiple decision trees are trained on different 
subsets of the data, and their predictions are averaged.

In [7]:
# Initialize the base estimator
base_estimator = DecisionTreeClassifier()

# Initialize the Bagging classifier
bagging = BaggingClassifier(estimator=base_estimator, n_estimators=50, random_state=42)

# Fit the model
bagging.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bagging.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')

Accuracy: 1.000


# Boosting

Concept:

Boosting aims to convert weak learners into strong learners by sequentially training models, each trying to correct the errors of the previous one.

Process:

Initialize Weights: Assign equal weights to all training samples.

Sequential Training: Train a model on the weighted dataset. Adjust the weights of the samples based on the errors. Samples that were incorrectly predicted gain more weight so that the next model focuses more on these hard cases.

Model Combination: Combine the models' predictions using a weighted sum (for regression) or weighted vote (for classification).

Advantages:

    Effective for reducing both bias and variance.

    Can improve performance on complex datasets.

Example: AdaBoost and Gradient Boosting are popular boosting techniques

In [8]:
# Initialize the base estimator
base_estimator = DecisionTreeClassifier(max_depth=1)

# Initialize the AdaBoost classifier
ada = AdaBoostClassifier(estimator=base_estimator, n_estimators=50, random_state=42)

# Fit the model
ada.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ada.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')

Accuracy: 0.933


# Stacking

Concept:
Stacking combines multiple models (base learners) by training a meta-learner to aggregate their predictions. The base learners are typically different types of models.

Process:

Train Base Learners: Train several base learners on the training data.

Generate Meta-Features: Use the base learners to make predictions on a hold-out validation set or via cross-validation. These predictions become the input features for the meta-learner.

Train Meta-Learner: Train a meta-learner on the predictions of the base learners.

Advantages:

    Can leverage the strengths of different models.

    Often improves predictive performance by combining diverse models.

Example: A stacking ensemble might combine a decision tree, a logistic regression, and a k-nearest neighbors classifier, with a linear regression model acting as the meta-learner.

In [9]:
# Define base learners
base_learners = [
    ('dt', DecisionTreeClassifier(random_state=42)),
    ('knn', KNeighborsClassifier()),
    ('lr', LogisticRegression())
]

# Initialize the Stacking classifier
stacking = StackingClassifier(estimators=base_learners, final_estimator=LogisticRegression())

# Fit the model
stacking.fit(X_train, y_train)

# Make predictions on the test set
y_pred = stacking.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')

Accuracy: 1.000


# Voting Classifier

Hard Voting

    In hard voting, each individual classifier votes for a class, and the class that receives the most votes is chosen as the final prediction. Essentially, it’s a majority vote system.

    Mechanism: Each classifier in the ensemble outputs its predicted class label. The final prediction is the class that has the majority of the votes.

    Use Case: Hard voting is straightforward and works well when you have classifiers that perform well individually.

Soft Voting

    In soft voting, each classifier outputs the probabilities of each class, and the class with the highest average probability is chosen as the final prediction.

    Mechanism: Each classifier in the ensemble outputs a probability distribution over the classes. These probabilities are averaged (or weighted average if specified), and the final prediction is the class with the highest average probability.

    Use Case: Soft voting often performs better than hard voting when the individual classifiers are well-calibrated and their probability estimates are accurate.

In [10]:
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier

# Define base learners
model1 = LogisticRegression(random_state=1)
model2 = DecisionTreeClassifier(random_state=1)

# Create a VotingClassifier from the models
model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')

# Fit the model
model.fit(X_train,y_train)

# Evaluate the model's performance on the test set
model.score(X_test,y_test)

1.0

In [11]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define individual classifiers
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = SVC(probability=True, random_state=1)

# Hard Voting Classifier
hard_voting_clf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)], voting='hard')

# Soft Voting Classifier
soft_voting_clf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)], voting='soft')

# Train the hard voting classifier
hard_voting_clf.fit(X_train, y_train)
y_pred_hard = hard_voting_clf.predict(X_test)

# Train the soft voting classifier
soft_voting_clf.fit(X_train, y_train)
y_pred_soft = soft_voting_clf.predict(X_test)

# Evaluate the classifiers
accuracy_hard = accuracy_score(y_test, y_pred_hard)
accuracy_soft = accuracy_score(y_test, y_pred_soft)

print(f'Hard Voting Classifier Accuracy: {accuracy_hard * 100:.2f}%')
print(f'Soft Voting Classifier Accuracy: {accuracy_soft * 100:.2f}%')

Hard Voting Classifier Accuracy: 100.00%
Soft Voting Classifier Accuracy: 100.00%


# Summary

Bagging: Reduces variance by training multiple models on different subsets of the data and combining their predictions. 

    Example: Random Forest.

Boosting: Converts weak learners into strong learners by sequentially training models, each correcting the errors of the previous ones. 

    Example: AdaBoost , GradientBoost , XGBoost.

Stacking: Combines multiple models by training a meta-learner on their predictions to improve overall performance. 

    Example: StackingClassifier.