# Voting Classifiers

A way to improve a set of independent classifiers is to aggregate the prediction of each classsifier and predict the class the gets the most most from all the classifiers. This prediction by majority vote is called a *hard voting* classifier. This classifier made up of multiple classifiers performs better than the best classifier in the ensemble. Furthermore, a *strong learner* (a classifier that has high accuracy) can be constructed by aggregating *weak learner* (classifiers that perform slightly better than guessing) into an ensemble. This performance improvement can be achieved if there are enough weak learners and are sufficiently diverse. 

The idea behind the improvement achieved with ensembles of classifiers is the law of large numbers. For example, if there is an ensemble that has 1000 classifiers each with an accuracy of 51%, the prediction accuracy using the majority voted class should be around 75%. This happens because the more classifiers there are, the more the ratio of classification gets to 51% instead of fluctuating around 51%. This works as long as the classifiers in the ensemble are independet from each other. That is not always the case, because classifiers tend to make the same type of errors. This is why very different algorithms are used in an ensemble That way, the chance that the algorithms are making different errors is greater, so the accuracy of the ensemble is increased.

The code below creates a voting classifier

In [9]:
from sklearn.datasets import load_iris
from sklearn.model_selection import StratifiedShuffleSplit
import numpy as np

iris = load_iris()
X = iris.data
y = iris.target

split = StratifiedShuffleSplit(n_splits = 1, test_size = 0.2, random_state = 42)
for train_index, test_index in split.split(X, y):
    X_train = X[train_index]
    y_train = y[train_index]

    X_test = X[test_index]
    y_test = y[test_index]


from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')

In [10]:
from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.966666666667
RandomForestClassifier 0.933333333333
SVC 0.966666666667
VotingClassifier 1.0


If all the classifiers have the method `predict_proba()` to estimate class probabilities, then it is possible to predict the highest class probability using the average from the individual classifiers. This method is called *soft voting* and most of the time performs better than *hard voting* because it gives more weight to votes with high confidence. To use soft voting instead of hard voting, it is enough to set `voting='soft` and make sure all classifiers in the ensemble are capable of estimating probabilities. 

# Bagging and Pasting

An option to get diverse classifiers, different from using different algorithms in the ensemble, is to train a single algorithm using random subsets of the trainig set. When sampling is performed with replacement it is called **bagging**, while **pasting** is when there is sampling without replacement. That is, bagging allows training instaces to be sampled several times for the same predictor. 

Once the data is sampled, the predictots are trained using the subsets and the ensemble makes a prediction aggregating the predictions of all the predictors. The aggregation function generally is the *statistical mode*, or the most frequent prediction. Since the predictor was trained using subsets of the data, it has higher bias than if the whole training set was used. However, the aggregation procedure reduces variance and bias. In general, the ensemble has a lower variance and a similar bias than a single classifier in the ensemble. 

Since the individual classifiers can be trained and make predictions in parallel, the methods of bagging and pasting scale very well.