In [10]:
import os
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib as mpl
import matplotlib.pyplot as plt

#To make the notebook's output stabel across runs
np.random.seed(42)

#Uses Jupyter's own backend to plot
%matplotlib inline

#To make pretty figures
mpl.rc("axes", labelsize=14)
mpl.rc("xtick", labelsize=12)
mpl.rc("ytick", labelsize=12)

#Path to saving images
IMAGE_PATH = os.path.join("images")
os.makedirs(IMAGE_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGE_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension ,dpi=resolution)

***Ensemble Learning*** is a technique where you take a group of predictors (such as classifiers and regressors) and you aggregate their predictions, which is most often better than an individual predictor.

A group of predictors is called an ***ensemble***<br>
An Ensemble Learning algorithm is called an ***Ensemble method***

An example of **Ensemble method**: train a group of Decision Tree classifiers, each on a different random subset of the training set. To make the predictions, you obtain the predictions of all the individual trees, then predict the class that gets the most votes. Such an ensemble of Decision Trees is called a ***Random Forest***, as simply as it may sound, it's one of the most powerful Machine Learning algorithms available today

**When should I use an Esemble method?**

It's often used near the end of a project, once a few good predictors have been built, combine them into an even better predictor

We will cover the most popular **Ensemble methods**, _bagging_, _boosting_ and _stacking_

# Section: Voting Classifiers

***Hard voting*** is when you aggregate the predictions of several classifiers and predict the class that get the most votes (a majority vote classifier)

This voting classifier often achieves a higher accuracy than the best calssifier in the ensemble. Even if each classifier is a ***weak learner*** (it does slightly better than random guessing), the ensemble can still be a ***strong learner*** (achieving high accuracy), provided there are a sufficent number of weak learners and they're sufficiently diverse

**Ensemble methods**  work best when the predictors are as independent from one another as possible. One way to get diverse classifiers is to train them using very different algorithms. This increases the chance that they will make very different types of errors, improving the ensemble's accuracy

Let's create and train a voting classifier (**hard voting**), composed of three diverse classifiers

<img src="images/Ensemble and RF - Hard voting classifier predictions.png">

In [4]:
X, y = make_moons(n_samples=500, noise=0.3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                    random_state=42)


In [18]:
log_clf_hardv = LogisticRegression(solver="lbfgs", random_state=42)
rnd_clf_hardv = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf_hardv = SVC(gamma="scale", random_state=42)

voting_clf_hardv = VotingClassifier(
    estimators = [("lr", log_clf_hardv), ("rf", rnd_clf_hardv), 
                  ("svc", svm_clf_hardv)],
    voting = "hard"
)

In [19]:
voting_clf_hardv.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression(random_state=42)),
                             ('rf', RandomForestClassifier(random_state=42)),
                             ('svc', SVC(random_state=42))])

In [20]:
for clf in (log_clf_hardv, rnd_clf_hardv, svm_clf_hardv, voting_clf_hardv):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.85
RandomForestClassifier 0.88
SVC 0.87
VotingClassifier 0.87


***Soft voting*** is when you predict the class with the highest class probabilty, averaged over all the individual classifiers. Assuming that all classifiers are able to estimate class probablities - they have the _predict_proba()_ method

It often achieves a higher performance than hard voting beause it gives more weight to highly confident votes

Let's create and train a voting classifier(**soft voting**), composed of three diverse classifiers

**Note** SVC doesn't have the _predict_proba()_ method by default, so we need to set _probability=True_ hyperparameter (this makes SVC use cross validation to estimate class probabilities, slowing down training), then it will add a _predict proba()_ method

In [25]:
log_clf_softv = LogisticRegression(solver="lbfgs", random_state=42)
rnd_clf_softv = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf_softv = SVC(gamma="scale", probability=True, random_state=42)

voting_clf_softv = VotingClassifier(
    estimators = [("lr", log_clf_softv), ("rf", rnd_clf_softv), 
                  ("svm", svm_clf_softv)],
    voting = "soft"
)
voting_clf_softv.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression(random_state=42)),
                             ('rf', RandomForestClassifier(random_state=42)),
                             ('svm', SVC(probability=True, random_state=42))],
                 voting='soft')

In [26]:
for clf in (log_clf_softv, rnd_clf_softv, svm_clf_softv, voting_clf_softv):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.85
RandomForestClassifier 0.88
SVC 0.87
VotingClassifier 0.89


As we can see it achieved a higher accuracy, 89% (87% for **hard voting**)

# End Of Section: Voting Classifiers

# Section: Bagging And Pasting

We've just seen that in order to get a diverse set of classifiers, we need different algorithms. Another approach is to use the same training algorithm for every predictor (e.g classifier) and train them on different random subsets of the training set

When sampling is performed ***with replacement***, it's ***called bagging*** (short for bootstrap aggregating). When sampling is performed ***without replacement***, it's ***called pasting***

That is, both bagging and pasting allow training instances to be sampled several times across multiple predictors, but only bagging allows training instances to be sampled several times for the same predictor

<img src="images/Ensemble and RF - Bagging and pasting on different random samples.png">