## Ensemble Learning and Random Forests
Hard voting classifier predictions, Ensemble methods work best when the predictors are as independ‐
ent from one another as possible. One way to get diverse classifiers
is to train them using very different algorithms. This increases the
chance that they will make very different types of errors, improving
the ensemble’s accuracy.

### Importing libraries

In [8]:
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

### Preparing the data

In [12]:
from sklearn import datasets
import pandas as pd
import numpy as np

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica

In [13]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

### Runing Voting classifier model

In [14]:
log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

In [15]:
voting_clf = VotingClassifier(estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)], voting='hard')
voting_clf.fit(X_train, y_train)

In [16]:
# Let’s look at each classifier’s accuracy on the test set
from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 1.0
RandomForestClassifier 1.0
SVC 1.0
VotingClassifier 1.0


Other dataset gave us LogisticRegression 0.864, RandomForestClassifier 0.872, SVC 0.888, VotingClassifier 0.896. The voting classifier slightly outperforms all the individual classifiers.

replace voting="hard" with
voting="soft" and ensure that all classifiers can estimate class probabilities. you need to set svm probability hyperparameter to True, and finally we can use predict_proba() method.

### Bagging and Pasting in Scikit-Learn
Another approach is to use the same training algorithm for every
predictor, but to train them on different random subsets of the training set. When
sampling is performed with replacement, this method is called bagging 1 (short for
bootstrap aggregating 2 ). When sampling is performed without replacement, it is called
pasting. 

The following code trains an
ensemble of 500 Decision Tree classifiers, 5 each trained on 100 training instances ran‐
domly sampled from the training set with replacement (this is an example of bagging,
but if you want to use pasting instead, just set bootstrap=False ). The n_jobs param‐
eter tells Scikit-Learn the number of CPU cores to use for training and predictions
(–1 tells Scikit-Learn to use all available cores), The BaggingClassifier automatically performs soft voting
instead of hard voting

**This is abagging ensemble of 500 trees**

In [17]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=500,
                            max_samples=100, bootstrap=True, n_jobs=-1)
bag_clf.fit(X_train, y_train)
y_pred2 = bag_clf.predict(X_test)

In [18]:
accuracy_score(y_test, y_pred2)

1.0

### Out-of-Bag Evaluation
With bagging, some instances may be sampled several times for any given predictor,
while others may not be sampled at all.
the training instances that are not sampled are called out-of-bag (oob) instances, this method will use them to evaluate the accuracy instead of test split.

In [21]:
bag_clf2 = BaggingClassifier(DecisionTreeClassifier(), n_estimators=500,
                            max_samples=100, bootstrap=True, n_jobs=-1, oob_score=True)
bag_clf2.fit(X_train, y_train)
bag_clf2.oob_score_

0.9553571428571429

According to this oob evaluation, this BaggingClassifier is likely to achieve about
93.1% accuracy on the test set. Let’s verify this

In [22]:
from sklearn.metrics import accuracy_score

y_pred3 = bag_clf2.predict(X_test)
accuracy_score(y_test, y_pred3)

1.0

### Random Forests
Random Forest 9 is an ensemble of Decision Trees, generally
trained via the bagging method (or sometimes pasting), typically with max_samples
set to the size of the training set. Instead of building a BaggingClassifier and pass‐
ing it a DecisionTreeClassifier , you can instead use the RandomForestClassifier
class, which is more convenient and optimized for Decision Trees 10 (similarly, there is
a RandomForestRegressor class for regression tasks). The following code trains a
Random Forest classifier with 500 trees (each limited to maximum 16 nodes), using
all available CPU cores

In [23]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)
y_pred_rf = rnd_clf.predict(X_test)

This results in a
greater tree diversity, which (once again) trades a higher bias for a lower variance,
generally yielding an overall better model. The following BaggingClassifier is
roughly equivalent to the previous RandomForestClassifier

In [None]:
bag_clf = BaggingClassifier(DecisionTreeClassifier(splitter="random", max_leaf_nodes=16),
                            n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1)

It is hard to tell in advance whether a RandomForestClassifier
will perform better or worse than an ExtraTreesClassifier . Gen‐
erally, the only way to know is to try both and compare them using
cross-validation (and tuning the hyperparameters using grid
search).

### AdaBoost (Adaptive Boosting)
The general idea of most
boosting methods is to train predictors sequentially, each trying to correct its prede‐
cessor. Scikit-Learn actually uses a multiclass version of AdaBoost called SAMME, it is equivalent to AdaBoost. Moreover, Scikit-Learn can use a variant of SAMME called SAMME.R (the R stands
for “Real”), which relies on class probabilities rather than predictions and generally
performs better.


In [24]:
from sklearn.ensemble import AdaBoostClassifier

ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=200,
                             algorithm="SAMME.R", learning_rate=0.5)
ada_clf.fit(X_train, y_train)

The following code trains an AdaBoost classifier based on 200 Decision Stumps using
Scikit-Learn’s AdaBoostClassifier class (as you might expect, there is also an Ada
BoostRegressor class). A Decision Stump is a Decision Tree with max_depth=1 —in
other words, a tree composed of a single decision node plus two leaf nodes. This is
the default base estimator for the AdaBoostClassifier class.

If your AdaBoost ensemble is overfitting the training set, you can
try reducing the number of estimators or more strongly regulariz‐
ing the base estimator.

### Gradient Boosting
instead of tweaking the instance weights at every
iteration like AdaBoost does, this method tries to fit the new predictor to the residual
errors made by the previous predictor. small learning rate needs more trees.

In [25]:
from sklearn.ensemble import GradientBoostingRegressor

gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0)
gbrt.fit(X_train, y_train)

In order to find the optimal number of trees, you can use early stopping. A simple way to implement this is to use the staged_predict() method

In [26]:
from sklearn.metrics import mean_squared_error

gbrt2 = GradientBoostingRegressor(max_depth=2, n_estimators=120)
gbrt2.fit(X_train, y_train)

errors = [mean_squared_error(y_test, y_pred0) for y_pred0 in gbrt2.staged_predict(X_test)]

bst_n_estimators = np.argmin(errors)
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=bst_n_estimators)
gbrt_best.fit(X_train, y_train)

### Stacking
Unfortunately, Scikit-Learn does not support stacking directly, but it is not too hard
to roll out your own implementation (see the following exercises). Alternatively, you
can use an open source implementation such as brew (available at https://github.com/viisar/brew).