# Ensemble Demonstration

In the first example we will look at creating a sime voting ensemble classifier.

We will start by loading up the dataset and constructing a test/train split.

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

Now we can train a set of individual classifiers. We then use these to construct a votingClassifier.

In [2]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

log_clf = LogisticRegression(solver="liblinear", random_state=42)
svm_clf = SVC(gamma="auto", random_state=42)
dt_clf = clf = DecisionTreeClassifier(max_depth=2, random_state=42)

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('dt', dt_clf), ('svc', svm_clf)],
    voting='hard')

voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr',
                              LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='auto',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=42,
                                                 solver='liblinear', tol=0.0001,
                                                 verbose=0, warm_start=False)),
                             ('dt',
                              DecisionTreeClassifier(ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini...
                                        

Now we can now calculate the accuracy for the individual classifiers and the ensemble.

In [3]:
from sklearn.metrics import accuracy_score

for clf in (log_clf, dt_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.848
DecisionTreeClassifier 0.904
SVC 0.904
VotingClassifier 0.896


Now we can look at the Bagging example. We will create a bagging classifier made with 500 decision tree weak learners. I have enable the out-of-bag validation to demonstrate this functionality.

In [4]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1, random_state=42,oob_score=True)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
print('oob score: ',bag_clf.oob_score_)
print('Test accuracy: ',accuracy_score(y_test, y_pred))


oob score:  0.8986666666666666
Test accuracy:  0.912


Lets train an individual decision tree for comparison.

In [5]:
tree_clf = DecisionTreeClassifier(random_state=42)
tree_clf.fit(X_train, y_train)
y_pred_tree = tree_clf.predict(X_test)
print(accuracy_score(y_test, y_pred_tree))

0.864
