If you aggregate the predictions of a group of predictors (such as classifiers or regressors), you will often get better predictions than with the best individual predictor. A group of predictors is called an ensemble; thus, this technique is called Ensemble Learning, and an Ensemble Learning algorithm is called an Ensemble method.

Random Forests:

Train a group of Decision Tree classifiers, each on a different random subset of the training set. For prediction, each predictor will vote for the results. The result with most votes wins.

# Voting Classifiers

Hard voting classifier: the majority-vote wins

In [82]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=500, noise=0.3, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [83]:
log_clf = LogisticRegression(solver="lbfgs", random_state=42)
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(gamma="scale", random_state=42)


voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)], voting='hard')

voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression(random_state=42)),
                             ('rf', RandomForestClassifier(random_state=42)),
                             ('svc', SVC(random_state=42))])

In [84]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.896
SVC 0.896
VotingClassifier 0.912


# Bagging und Pasting

Bagging: sampling with replacement (bootstrap=True)

Pating: sampling without replacement (bootstrap=False)


Predictors can all be trained in parallel, via different CPU cores or even different servers. Similarly, predictions can be made in parallel. This is one of the reasons bagging and pasting are such popular methods: they scale very well.

In [85]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1
)

bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)

In [87]:
accuracy_score(y_test, y_pred)

0.912

## Out of Bag Evaluation

Since a predictor never sees the oob instances during training, it can be evaluated on these instances, without the need for a separate validation set. 

Use `obb_score=True`

In [98]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    bootstrap=True, n_jobs=1, oob_score=True)

bag_clf.fit(X_train, y_train)
bag_clf.oob_score_

0.8986666666666666

In [99]:
from sklearn.metrics import accuracy_score
y_pred = bag_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.904

# Random Forests

In [101]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)

y_pred_rf = rnd_clf.predict(X_test)

In [103]:
accuracy_score(y_test, y_pred_rf)

0.912

In [104]:
# It is roughly equivalent to this
bag_clf = BaggingClassifier( DecisionTreeClassifier(splitter="random", max_leaf_nodes=16), n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1)

# Boosting

A new predictor to correct its predecessor, by increasing the weights of the underfitted instances. The weights get boosted.

## AdaBoost

"Adaptive Boosting"

In [108]:
from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), n_estimators=200,
    algorithm="SAMME.R", learning_rate=0.5)
ada_clf.fit(X_train, y_train)

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1),
                   learning_rate=0.5, n_estimators=200)

## GradientBoost

Using the residual error to correct the previous model

In [112]:
from sklearn.tree import DecisionTreeRegressor 

tree_reg1 = DecisionTreeRegressor(max_depth=2)
tree_reg1.fit(X, y)

y2 = y - tree_reg1.predict(X)
tree_reg2 = DecisionTreeRegressor(max_depth=2) 
tree_reg2.fit(X, y2)

y3 = y2 - tree_reg2.predict(X)
tree_reg3 = DecisionTreeRegressor(max_depth=2) 
tree_reg3.fit(X, y3)

y_pred = sum(tree.predict(X_test) for tree in (tree_reg1, tree_reg2, tree_reg3))

y_pred

array([ 0.7362266 ,  0.1015663 ,  0.1015663 ,  0.85124217,  1.02257171,
        0.93722015,  0.1015663 ,  0.1015663 , -0.1231484 ,  0.1015663 ,
        1.0457984 ,  0.1015663 ,  0.93722015,  0.7658906 ,  0.89856769,
        0.1015663 ,  0.1015663 ,  0.89856769,  0.89856769,  0.1015663 ,
        0.1015663 ,  1.19073666,  0.7362266 , -0.1231484 ,  0.1015663 ,
        0.1015663 ,  0.7362266 ,  0.1015663 ,  0.96044684,  0.1015663 ,
        0.96044684,  1.0457984 , -0.06976324,  0.1015663 ,  0.85124217,
        0.56489706,  0.1015663 ,  0.85124217,  0.85124217,  0.96044684,
        0.26390739,  0.93722015,  0.56489706,  0.1015663 ,  0.1015663 ,
        0.1015663 ,  0.7658906 ,  0.7362266 ,  1.02257171,  0.7362266 ,
        0.7362266 ,  1.0457984 ,  0.1015663 ,  0.1015663 ,  0.7362266 ,
        0.1015663 ,  0.7362266 ,  0.85124217,  0.1015663 ,  0.96044684,
        0.1015663 ,  0.7658906 ,  0.89856769,  0.1015663 ,  0.89856769,
       -0.06976324, -0.06976324,  0.56489706,  0.1015663 ,  1.19

In [113]:
from sklearn.ensemble import GradientBoostingRegressor
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0)
gbrt.fit(X, y)

GradientBoostingRegressor(learning_rate=1.0, max_depth=2, n_estimators=3)

# Stacking

Train a model to perform aggregation