# Voting Classifier

The voting classifier is an effective ensembling technique for combining classifiers. Essentially it uses multiple models to classify an instance and chooses the most common choice between the models. Let's implement it below:

In [68]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import numpy as np
np.random.seed(42)

X, y = make_moons(n_samples=500, noise=0.3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [72]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression(solver="lbfgs", random_state=42)
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(gamma="scale", probability = True, random_state=42)

In [73]:
from sklearn.metrics import accuracy_score

log_clf.fit(X_train, y_train)
log_clf_y_pred = log_clf.predict(X_test)

rnd_clf.fit(X_train, y_train)
rnd_clf_y_pred = rnd_clf.predict(X_test)

svm_clf.fit(X_train, y_train)
svm_clf_y_pred = svm_clf.predict(X_test)

print("Logistic Regression: " + str(accuracy_score(y_test, log_clf_y_pred)))
print("Random Forest Classifier: " + str(accuracy_score(y_test, rnd_clf_y_pred)))
print("Support Vector Machine: " + str(accuracy_score(y_test, svm_clf_y_pred)))

Logistic Regression: 0.864
Random Forest Classifier: 0.896
Support Vector Machine: 0.896


In [74]:
voting_clf_y_pred = mode(np.vstack((log_clf_y_pred, rnd_clf_y_pred, svm_clf_y_pred)))[0][0]
print("Voting Classifier: " + str(accuracy_score(y_test, voting_clf_y_pred)))

Voting Classifier: 0.912


As we can see the ensemble of weak learners creates a strong learner! Above we used hard voting(choosing majority vote) for the classifier but we can also use soft voting taking the average of the predicted probabilities and select the highest average:

In [94]:
soft_voting_clf_y_pred = np.mean([rnd_clf.predict_proba(X_test), svm_clf.predict_proba(X_test), log_clf.predict_proba(X_test)], axis=0).argmax(axis=1)
print("Soft Voting Classifier: " + str(accuracy_score(y_test, soft_voting_clf_y_pred)))

Soft Voting Classifier: 0.92


Now we got even better!