# Ensemble Learning

"In this chapter, we will discuss the most popular Ensemble methods, including *bagging*, *boosting*, *stacking*, and a few others. We will also explore Random Forests."

#### Voting Classifiers

If you have a few classifiers trained on the same data, perhaps a Logistic Regression classifer, SVM classifer, Random Forest classifier, etc. and each achieves ~80% accuracy on the test set, you can aggregate all these classifiers into one *voting classifier* that gets above 80% accuracy.

The simplest way to do this is to simply take the mode of the predictions, or which ever class make up the majority of predictions from each classifer. "This majority-vote classifer is called a **hard voting** classifier."

#### Important Note about Voting Classifiers

"Ensemble methods work best when the predictors are as independent from one another as possible. One way to get diverse classifers is to train them using very different algorithms. This increases the chance that they will make very different types of errors, improving the ensemble's accuracy."

"The following code creates and trains a voting classifier in Scikit-Learn, composed of three diverse classifiers:"

In [7]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

x, y = make_moons(n_samples=1000, noise=0.4)
x_train, x_test, y_train, y_test = train_test_split(x, y)
y_train[0]

1

In [11]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')
voting_clf.fit(x_train,y_train)



VotingClassifier(estimators=[('lr',
                              LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='warn',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=None,
                                                 solver='warn', tol=0.0001,
                                                 verbose=0, warm_start=False)),
                             ('rf',
                              RandomForestClassifier(bootstrap=True,
                                                     class_weight=None,
                                                     criterion='gini',...
                                        

In [12]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.86
RandomForestClassifier 0.86
SVC 0.876
VotingClassifier 0.88




"There you have it! The voting classifier slightly outperforms all the individual classifiers."

"If all classifiers are able to estimate class probabilities (i.e., they have a `predict_proba()` method), then you can tell Scikit-Learn to predict the class with the highest class probability, averaged over all the individual classifiers. This is called **soft voting**. It often achieves higher performance than hard voting because it gives more weight to highly confident votes. All you need to do is relpace `voting='hard'` with `voting='soft'` and ensure that all classifiers can estimate class probabilities. This is not the case of the `SVC` class by default, so you need to set its `probability` hyperparameter to `True` (this will make the SVC class use cross-validiation to estimate class probabilities, slowing down training and it will add a `predict_proba()` method). If you modify the preceding code to use soft voting, you will find that the voting classifier achieves over 91% accuracy!"

In [16]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(probability=True)

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='soft')
voting_clf.fit(x_train,y_train)



VotingClassifier(estimators=[('lr',
                              LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='warn',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=None,
                                                 solver='warn', tol=0.0001,
                                                 verbose=0, warm_start=False)),
                             ('rf',
                              RandomForestClassifier(bootstrap=True,
                                                     class_weight=None,
                                                     criterion='gini',...
                                        

In [18]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.86
RandomForestClassifier 0.852
SVC 0.876
VotingClassifier 0.868




#### Bagging and Pasting

While we can use a wide variety of estimators and various machine learning algorithms to get a diverse group of estimators, we can also use similar estimators, but train them on different data.

**Bagging** means that sampling is performed with replacement, so the same predictor can sample the same data multiple times. **Pasting**, however, has no replacement in its sampling method.

Using bagging/pasting, "once all predictors are trained, the ensemble can make a prediction for a new instance by simply aggregating the predictions of all predictors. The aggregation function is typically the *statistical mode*... for classification, or the average for regression. Each individual predictor has a higher bias than if it were trained on the original training set, but the aggregation reduces both bias and variance. Generally, the net result is that the ensemble has a similar bias but a lower variance than a single predictor trained on the original training set."

Luckily, predictors can all be trained in parallel, and predictions can be made in parallel, via different CPU cores or even different servers. Therefore, bagging and pasting are very popular methods.

#### Bagging and Pasting in Scikit-Learn

"Scikit-Learn offers a simple API for both bagging and pasting with the `BaggingClassifier` class (or `BaggingRegressor` for regression). The following code trains an ensemble of 500 Decision Tree classifiers, each trained on 100 training instances randomly sampled from the training set with replacement (this is an example of bagging, but if you want to use pasting instead, just set `bootstrap=False`)."

In [19]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1)
bag_clf.fit(x_train, y_train)
y_pred = bag_clf.predict(x_test)

In [20]:
accuracy_score(y_test, y_pred)

0.884

#### Out-of-Bag Evaluation

With bagging, most estimators will only see ~63% of the training data and miss ~37%. Which instances are included in the "missed" and "trained on" columns changes for each estimator, so all the data is likely seen by the ensemble. However, we can take the 37% missed on each estimator and use that for evaluations, "without the need for a separate validation set or cross-validation. You can evaluate the ensemble itself by averaging out the **out-of-bag** (oob) instances of each predictor."

In [22]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    bootstrap=True, n_jobs=-1, oob_score=True)
bag_clf.fit(x_train, y_train)
bag_clf.oob_score_

0.848

The out of bag evaluation is a pretty good indicator for how well the whole ensemble will do on the test set.