# Voting Classifier

- Voting classifier falls under the category of ensemble learning algorithms
- The basic goal of ensemble learning is to build multiple models and generate a cumulative result from all of them
- The idea being the cumulative answer from several not so good learners would be better than a single result from a very good learner
- Ensembling techniques also help greatly reduce the overfitting of data

There are two kinds of voting classifiers:
    1. hard voting classifier
    2. soft voting classifier
    

## Hard Voting classifier

- If we choose hard voting, we are trying to find out which result was predicted by a majority of models involved in our ensemble learning
- Eg: We have 4 sub models and they predicted [0, 0, 1, 0] our hard voting classifier would predict the outcome as 0
- In such kind of voting, we look form a winner takes all system. The internal models will only give 1 result each and we choose the most frequent output
- If there is a tie then the output class would be selected as the first value in ascending sequence of classes.

## Soft voting Classifier

- In a soft votinng procedure, each model is responsible for giving out a confidence along with its output
- Eg:
    - M1:  1: 0.9,  0: 0.1
    - M2:  1: 0.8,  0: 0.2
    - M3:  1: 0.7,  0: 0.3
    - M4:  1: 0.6,  0: 0.4
- Our soft voting mechanism takes the average of each possible output as predicted by all sub models and which ever probability is higher that would be given as output
- In above case, the average confidence would be,  1:  0.75,  0:  0.25, so our soft voting classifier would predict 1 as the output
- Additionally we can provide weights for each model in both hard and soft voting cases

# Voting Regressor:

- The basic idea of a voting Regressor remains same as Voting Classifier
- But we take the average of the predicted results of the regressors involved

# Source:
- https://www.youtube.com/watch?v=dD7gvbfBiyA
- https://scikit-learn.org/stable/modules/ensemble.html#voting-classifier

# Example

### Hard Voting Classifier

In [2]:
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

In [3]:
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

In [4]:
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = GaussianNB()

In [5]:
eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')

In [6]:
for clf, label in zip([clf1, clf2, clf3, eclf], ['Logistic Regression', 'Random Forest', 'naive Bayes', 'Ensemble']):
    scores = cross_val_score(clf, X, y, scoring='accuracy', cv=5)
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy: 0.95 (+/- 0.04) [Logistic Regression]
Accuracy: 0.94 (+/- 0.04) [Random Forest]
Accuracy: 0.91 (+/- 0.04) [naive Bayes]
Accuracy: 0.95 (+/- 0.04) [Ensemble]


### Soft Voting Classifier

In [7]:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from itertools import product
from sklearn.ensemble import VotingClassifier

In [12]:
# Loading some example data
iris = datasets.load_iris()
X = iris.data[:, [0, 2]]
y = iris.target

# Training classifiers
clf1 = DecisionTreeClassifier(max_depth=4)
clf2 = KNeighborsClassifier(n_neighbors=7)
clf3 = SVC(kernel='rbf', probability=True)
eclf = VotingClassifier(estimators=[('dt', clf1), ('knn', clf2), ('svc', clf3)], voting='soft', weights=[2, 3, 2])

In [13]:
for clf, label in zip([clf1, clf2, clf3, eclf], ['Decision Tree', 'KNN', 'SVC', 'Ensemble']):
    scores = cross_val_score(clf, X, y, scoring='accuracy', cv=5)
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy: 0.93 (+/- 0.07) [Decision Tree]
Accuracy: 0.94 (+/- 0.04) [KNN]
Accuracy: 0.95 (+/- 0.03) [SVC]
Accuracy: 0.95 (+/- 0.05) [Ensemble]


### Voting Regressor

In [14]:
from sklearn.datasets import load_boston
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import VotingRegressor

In [15]:
# Loading some example data
X, y = load_boston(return_X_y=True)

In [16]:
# Training classifiers
reg1 = GradientBoostingRegressor(random_state=1, n_estimators=10)
reg2 = RandomForestRegressor(random_state=1, n_estimators=10)
reg3 = LinearRegression()
ereg = VotingRegressor(estimators=[('gb', reg1), ('rf', reg2), ('lr', reg3)])
ereg = ereg.fit(X, y)