In [13]:
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

Ensemble learning methods pool a collection of *diverse* (trained via different algorithms) learners and computes the classification via hard or soft voting.
For example, in hard voting, if two out of three classifiers agree, the resulting classification will be their agreed value.

There are several ensemble methodologies:
1. Bagging
2. Boosting
3. Stacking

The following code implements a simple voting classifier constructed from three different learners.

In [49]:
# Let's use Iris data, but split it up.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)

In [50]:
# Train three different classifiers on this set
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
# Pull in the voting-based learner
from sklearn.ensemble import VotingClassifier

In [51]:
dt_clf = DecisionTreeClassifier(max_depth=3)
log_clf = LogisticRegression()
svc_clf = SVC()
# Load into the Voting Classifier

voting_clf = VotingClassifier(estimators=[('dt',dt_clf),('log',log_clf),('svc',svc_clf)],
                             voting='hard')

In [52]:
# Peeking at the performance...
from sklearn.metrics import accuracy_score

for clf in [dt_clf, log_clf, svc_clf, voting_clf]:
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score( y_test, y_pred ))

DecisionTreeClassifier 0.9473684210526315
LogisticRegression 0.8947368421052632
SVC 0.9736842105263158
VotingClassifier 0.9736842105263158


The results above show that on this small data set the `VotingClassifier` overall achieves the same or better performance than `SVC`.