# Chapter 7
## Ensemble Learning and Random Forests

By aggregating the responses of a group of predictors, we can generally get better predictions than if we were using a single predictor.
Such a group of predictors is called an *ensemble*; thus this technique is called **Ensemble Learning**

In the last exercise of chapter 6, we created an ensemble of Decision Trees, which formed a *Random Forest*.

Ensemble methods are typically used near the end of a project once you already have built a few good predictors, to combine them into an even better predictor.

In this chapter we will discuss the most popular Ensemble methods, including *bagging*, *boosting* and *stacking*. We will also explore Random Forests.

## Voting Classifiers

Suppose we have trained a Logistic Regression classifier, a SVM classifier and a Random Forest classifier, each achieving about 80% accuracy.

A very simple way of creating a better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes. This majority-vote classifier is called a *hard voting classifier*. This type of voting classifier often achieves higher accuracy than the best classifier in the ensemble.

In fact if even if a classifier is a *weak learner* (i.e. only slightly better than random guessing), the ensemble can still be a *strong learner* provided there are a sufficient number of weak learners and they are sufficiently diverse.

This happens due to the law of large numbers (see coin tossing explanation pg 191). 

Thus suppose we build an ensemble containing 1,000 classifiers that are individually correct 51% of the time. By using majority voting, we can hope for up to 75% accuracy.

However, this is only true if all classifiers are perfectly independent, making uncorrelated errors, which is clearly not the case because they are trained on the same data. They are likely to make the same types of errors, so there will be many majority votes for the wrong class, reducing the ensemble's accuracy.

**Note:** Ensemble methods work best when predictors are as independent from one another as possible. A good way to achieve this is to train them using very different algorithms, increasing the chance they will make very different types of errors, thus improving the ensemble's accuracy.

Example of a hard voting classifier

In [10]:
# Generate data
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [14]:
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')

In [16]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.904
SVC 0.896
VotingClassifier 0.912


If the classifiers can estimate class probabilities(i.e. the have the predict_proba method), we can set skelearn to predict the class with the highest class probability, averaged over all the individual classifiers. This is called *soft-voting*. (set voting=soft)

It often achieves higher performance than hard voting because it gives more weight to highly confident votes. 

For the above, we SVC does not output class probabilities by defaultm but we can set its **probability** hyperparameter to True to output probabilities and use soft voting.