# Chapter 7: Ensemble Learning and Random Forests

## Exercise 1
If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?

Yes, you can combine the models to get even better results. You can use a voting classifier that takes into account the prediction of each of the models before making a final prediction. If the models are independent, or trained on different subsets of the data, you can obtain even better results.

## Exercise 2
What is the difference between hard and soft voting classifiers?

A hard voting classifier takes into account just the votes of each classifier inside the ensemble in order to make the final prediction. A soft voting classifier also takes into account the confidence score of each classifier for the prediction, giving more weight to highly confident votes (and usually increasing the performance of the classifier).

## Exercise 3
Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, random forest, or stacking ensembles?

Bagging ensembles, pasting ensembles and random forests can be distributed across multiple servers, since each predictor in the ensemble is independent from the others. Boosting ensembles cannot be distributed, because each predictor is built based on the previous predictor. Finally, stacking ensembles can only by distributed for a given layer, since all the predictors of one layer depend on the previous layer.

## Exercise 4
What is the benefit of out-of-bag evaluation?

With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on. This makes it possible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set. Thus, you have more instances available for training, and your ensemble can perform slightly better.

## Exercise 5
What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests?

Extra-Trees use random thresholds for each feature on each node instead of searching for the best possible thresholds. This randomness trades more bias for a lower variance. Extra-Trees are faster that regular Random Forests, since you don't need to calculate the best possible threshold at each node.

## Exercise 6
If your AdaBoost ensemble underfits the training data, what hyperparameters should you tweak and how?

You can try increasing the number of estimators or reducing the regularization hyperparameter of the base estimator.

## Exercise 7
If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?

You can try decreasing the learning rate, or using early stopping to find the right number of predictors.

## Exercise 8
Load the MNIST data, and split it into a training set, a validation set, and a test set. Then train various classifiers, such as Random Forest classifier, an Extra-Trees classifier, and an SVM. Next, try to combine them into an ensemble that outperforms them all on the validation set, using a soft or hard voting classifier. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?

### Getting the data

In [59]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

import numpy as np


RANDOM_SEED = 42

digits = load_digits()
X, y = digits['data'], digits['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=RANDOM_SEED)

### Training the classifiers

In [67]:
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.model_selection import cross_val_score, GridSearchCV

params_svm = {
    'C': [0.11, 0.33, 1, 3, 10, 33, 100],
    'gamma': ['auto', 0.11, 0.33, 1, 3, 10, 33, 100],
    'kernel': ('rbf', 'linear', 'poly'),
    'probability': [True]
}
svm_clf = SVC()
svm_grid = GridSearchCV(svm_clf, params_svm)
svm_grid.fit(X_train, y_train)

params_rf = {
    'n_estimators': [3, 10, 20, 42, 75, 100],
    'min_weight_fraction_leaf': [0.01, 0.03, 0.1, 0.33],
    'bootstrap': [False, True]
}
rf_clf = RandomForestClassifier()
rf_grid = GridSearchCV(rf_clf, params_rf)
rf_grid.fit(X_train, y_train)

params_et = params_rf
et_clf = ExtraTreesClassifier()
et_grid = GridSearchCV(et_clf, params_et)
et_grid.fit(X_train, y_train)

GridSearchCV(cv=None, error_score='raise',
       estimator=ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'n_estimators': [3, 10, 20, 42, 75, 100], 'min_weight_fraction_leaf': [0.01, 0.03, 0.1, 0.33], 'bootstrap': [False, True]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

### Measuring the performance of the best classifiers

In [68]:
best_svm = svm_grid.best_estimator_
best_rf = rf_grid.best_estimator_
best_et = et_grid.best_estimator_
print('Score of SVM classifier: {:.3f}'.format(svm_grid.best_score_))
print('Score of Random Forest classifier: {:.3f}'.format(rf_grid.best_score_))
print('Score of Extra-Trees classifier: {:.3f}'.format(et_grid.best_score_))

Score of SVM classifier: 0.985
Score of Random Forest classifier: 0.951
Score of Extra-Trees classifier: 0.955


### Training the Voting classifier

In [69]:
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import cross_val_score

estimators = [('svm', best_svm), ('rf', best_rf), ('et', best_et)]

params_voting = {
    'voting': ('soft', 'hard'),
    'weights': [[2, 1, 1], [1, 2, 2], [1, 2, 1], [1, 1, 2], [2, 2, 1], [2, 1, 2]],
}
voting_clf = VotingClassifier(estimators)
voting_clf_grid = GridSearchCV(voting_clf, params_voting)
voting_clf_grid.fit(X_train, y_train)

GridSearchCV(cv=None, error_score='raise',
       estimator=VotingClassifier(estimators=[('svm', SVC(C=0.11, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False)), ('rf', RandomForestClassi...bose=0, warm_start=False))],
         flatten_transform=None, n_jobs=1, voting='hard', weights=None),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'voting': ('soft', 'hard'), 'weights': [[2, 1, 1], [1, 2, 2], [1, 2, 1], [1, 1, 2], [2, 2, 1], [2, 1, 2]]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

### Measuring the performance of the voting classifier

In [70]:
best_voting = voting_clf_grid.best_estimator_
print('Score of Voting classifier: {:.3f}'.format(voting_clf_grid.best_score_))

Score of Voting classifier: 0.985


## Exercise 9
Run the individual classifiers from the preious exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image's class. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble's predictions. How does it compare to the voting classifier you trained earlier?