# Ensemble Learning

Ensemble learning works by aggregating the predictions of a group of predictors, the results will be best than the best predictor alone, this group of predictors are called ensemble and the technique is called ensemble learning .

For example we can train an ensemble of Decision Tree Classifiers each of then is trained on different subsets of the training dataset, and the class that has the most votes will be the final prediction to this ensemble of trees, this ensemble is called -> Random Forest Algorithm .

Using independent predictors make the ensemble methods give better predictions, to get an independent predictors you can use different algorithms. Different algorithms will make different types of errors.

In [25]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    classification_report,
    confusion_matrix,
)
from sklearn.datasets import load_wine

In [26]:
dataset = load_wine()
X = dataset.data
y = dataset.target

In [27]:
wine_df = pd.DataFrame(data=dataset.data, columns=dataset.feature_names)
wine_df["target"] = dataset.target
wine_df

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0,2
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0,2
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0,2
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0,2


In [28]:
# Split into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

In [29]:
# apply StandardScaler to dataset
scaler = StandardScaler()
scaler.fit(X_train)

In [30]:
# We transform the training and test data
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [31]:
print("Mean of features before scaling:")
print(X_train.mean(axis=0))
print("Standard deviation of features before scaling:")
print(X_train.std(axis=0))

print("Mean of features after scaling:")
print(X_train_scaled.mean(axis=0))
print("Standard deviation of features after scaling:")
print(X_train_scaled.std(axis=0))

Mean of features before scaling:
[1.29710084e+01 2.41336134e+00 2.37210084e+00 1.95773109e+01
 1.00857143e+02 2.27831933e+00 2.01453782e+00 3.64033613e-01
 1.58873950e+00 5.02512605e+00 9.56352941e-01 2.59378151e+00
 7.41588235e+02]
Standard deviation of features before scaling:
[8.48387888e-01 1.10317287e+00 2.67128754e-01 3.49197007e+00
 1.51037230e+01 6.60139175e-01 1.01585312e+00 1.26665837e-01
 5.76250638e-01 2.19464278e+00 2.35998383e-01 7.31504713e-01
 3.06349842e+02]
Mean of features after scaling:
[ 9.14301314e-16 -9.35817717e-16  2.92856309e-15  1.02625658e-16
 -2.54931463e-16  2.28715273e-15 -3.88344818e-17 -4.68812664e-16
  6.26016512e-16  5.39251183e-16  1.70825072e-15  2.16446842e-16
 -1.01692697e-16]
Standard deviation of features after scaling:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


## 1. Voting Classifiers

Voting method works by training some number predictors and the aggregate the predictions of each classifier, the final prediction is the class with the most votes, this method is called Hard Voting Classification .

In [32]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

lr_classifier = LogisticRegression(max_iter=10000)
rf_classifier = RandomForestClassifier()
svc = SVC()

# build VotingClassifier using these 3 classifiers
voting_classifier = VotingClassifier(
    estimators=[("lr", lr_classifier), ("rf", rf_classifier), ("svc", svc)],
    voting="hard",
)

The [VotingClassifier]('https://www.geeksforgeeks.org/ml-voting-classifier-using-sklearn/') class takes a list of estimators as input. In this case, the list contains three estimators: the LogisticRegression classifier, the RandomForestClassifier classifier, and the SVC classifier. The voting parameter specifies the voting strategy. In this case, we are using the "hard" voting strategy, which means that the class with the most votes is the predicted class.

In [33]:
# train the new VotingClassifier
voting_classifier.fit(X_train, y_train)

In [34]:
# select a metric and print the result
y_pred = voting_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 1.00


Considering that our dataset is very small, we expected that the chosen metric would yield the results obtained. In most real-world scenarios, it's unusual to achieve a 100% accuracy, and it could indicate that the model has essentially memorized the training data rather than generalized from it.

In [35]:
# We calculate and print precision, recall, and F1-score
precision = precision_score(y_test, y_pred, average="weighted")
recall = recall_score(y_test, y_pred, average="weighted")
f1 = f1_score(y_test, y_pred, average="weighted")

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")

# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Print confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Precision: 1.00
Recall: 1.00
F1-score: 1.00
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        20
           1       1.00      1.00      1.00        24
           2       1.00      1.00      1.00        15

    accuracy                           1.00        59
   macro avg       1.00      1.00      1.00        59
weighted avg       1.00      1.00      1.00        59

Confusion Matrix:
[[20  0  0]
 [ 0 24  0]
 [ 0  0 15]]


Another type of voting is make the classifiers estimate the class probabilities and then averaging them over all the individual classifiers, the final prediction will be the highest class probability, this method is called Soft Voting Classification . In sklearn we need just to change the voting argument from 'hard' to ''soft'.

## 2. Bagging and Pasting

We can use the same classifier, but training them on different subsets of the training dataset, if the sampling was with replacement the method is called **bagging**, while if the sampling was without replacement the method is called **pasting** , the final prediction is made by aggregating the predictions of all predictors (most frequent prediction).

In [36]:
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression

# build BaggingClassifier using LogisticRegression

# We create an instance of the base estimator (Logistic Regression):
base_estimator = LogisticRegression(max_iter=10000)

# We create a BaggingClassifier using the base estimator:
bagging_classifier = BaggingClassifier(
    base_estimator=base_estimator, n_estimators=10, random_state=42
)

# We fit the BaggingClassifier:
bagging_classifier.fit(X_train, y_train)

# We make predictions:
y_pred = bagging_classifier.predict(X_test)

In [37]:
# train the new BaggingClassifier

In [38]:
# select a metric and print the result

We can switch from bagging to pasting by changing the **bootstrap**argument from True to False.

As we discussed that we can train the classifiers on different subsets of the training dataset, also we can train them on different subsets of the training dataset features using **max_features** and **bootstrap_features** arguments.

## 3. Random Forests

It is an ensemble of Decision Tree Classifiers that trained using the bagging or the pasting methods. The Random Forest Classifier has hyperparameters of both Decision Tree Classifier and the bagging method, instead of searching for the best test at each node in the whole datasets like Decision Tree Classifiers, Random Forests search for the best test in a random subsets of the datasets which introduce more randomness.

In [39]:
from sklearn.ensemble import RandomForestClassifier

# build a RandomForestClassifier

In [40]:
# train the new RandomForestClassifier

In [41]:
# select a metric and print the result

## 4. Boosting

Boosting ensemble methods works by training the predictors sequentially, each predictor try to correct the one before it.

## 4.1 AdaBoost

One if the most popular Boosting ensemble methods is the AdaBoost , each predictor will give more attention to the predecessor underfit, by increasing the relative weights of the mislabeled instances, and this updated weights will be used by the next predictor for training and prediction. This process will be repeated until the last predictor.

In [42]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegression

# build a AdaBoostClassifier using LogisticRegression

In [43]:
# train the new AdaBoostClassifier

In [44]:
# select a metric and print the result

### 4.2 Gradient Boosting

Just like the AdaBoost the predictors are trained sequentially each predictor try to correct the one before it, but rather that updating the weights Gradient Boosting fit the new predictor to the residual errors that made by the previous predictor.

In [45]:
from sklearn.ensemble import GradientBoostingClassifier

# build a GradientBoostingClassifier

In [46]:
# train the new GradientBoostingClassifier

In [47]:
# select a metric and print the result