# **Introduction**


Similarly, if we aggregate the predictions of a group of models (such as classifiers or regressors), we will often get better predictions than the best individual predictor. A group of predictors is called an **ensemble**. Thus this technique is called **ensemble learning**, and an ensemble learning algorithm is called an Ensemble Method.

As an example of an ensemble method, we can train a **group of decision tree classifiers**, each on a random subset of the training data. **Such an ensemble of decision trees is called a random forest**. Despite its simplicity, this is one of the most powerful machine learning algorithms available today. In this chapter, we will discuss the most famous ensemble learning methods, including: **Bagging, Boosting, & Stacking.**

# **Voting Classifiers**

Suppose we have trained a few classifiers, each achieving an 80% accuracy. A very simple way to create an even better classifiers is to aggregate the predictions of all our classifiers and choose the prediction that is the most frequent.

**Majority voting classification is called Hard Voting**

![](https://drive.google.com/uc?export=view&id=1Y01QJdvZ4mucKd2HIfZjnZPPdn35ISIc
)

Somewhat surprisingly, this classifier achieves an even better accuracy than the best predictor in the ensemble. Even if each classifier is a weak learner (does slightly better then random guessing). Assuming that we have a sufficient number of weak learners and enough diversity.

Due to the law of large numbers, if we build an ensemble containing 1,000 classifiers with individual accuracies of $51%$ & trained for binary classification, If we predict the majority voting class, we can hope for up to $75%$ accuracy.

This is only true if all classifiers are completely independent, making uncorrelated errors, which is clearly not the case because they are trained on the same data.

One way to get diverse classifiers is use different algorithms for each one of them & train them on different subset of the training data.

Let's implement a hard voting ensemble learner using scikit-learn:

**Python implmentation**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

In [2]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

In [3]:
log_clf = LogisticRegression(solver='lbfgs')
rf_clf = RandomForestClassifier(n_estimators=100)
svm_clf = SVC(gamma='scale')

In [4]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

In [5]:
X, y = datasets.make_moons(n_samples=10000, noise=0.5)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33)

In [6]:
X_train.shape, y_train.shape, X_val.shape, y_val.shape


((6700, 2), (6700,), (3300, 2), (3300,))

In [7]:
voting_clf = VotingClassifier(estimators=[('lr', log_clf), ('rf', rf_clf), ('svc', svm_clf)], voting='hard')

In [8]:
voting_clf.fit(X_train, y_train)


VotingClassifier(estimators=[('lr',
                              LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='auto',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=None,
                                                 solver='lbfgs', tol=0.0001,
                                                 verbose=0, warm_start=False)),
                             ('rf',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                             

Let's take a look at the performance of each classifier + ensemble method on the validation set:



In [9]:
from sklearn.metrics import accuracy_score


In [10]:
for clf in [log_clf, rf_clf, svm_clf, voting_clf]:
    clf.fit(X_train, y_train)
    y_hat = clf.predict(X_val)
    print(clf.__class__.__name__, accuracy_score(y_val, y_hat))

LogisticRegression 0.8151515151515152
RandomForestClassifier 0.803939393939394
SVC 0.8303030303030303
VotingClassifier 0.8254545454545454


There we have it! The voting classifier slightly outperforms the individual classifiers.

If all ensemble method learners can estimate class probabilities, we can average their probabilities per class then predict the class with the highest probability. This is called Soft voting. It often yields results better than hard voting because it weights confidence.

# **References**

[Chapter 7. Ensemble Learning & Random Forests](https://github.com/Akramz/Hands-on-Machine-Learning-with-Scikit-Learn-Keras-and-TensorFlow/blob/master/07.Ensembles_RFs.ipynb)