Alright, let’s go deep into the **Voting Classifier** in machine learning.

---

# 🔹 What is a Voting Classifier?

A **Voting Classifier** is an **ensemble learning** technique that combines predictions from multiple different machine learning models (called *base learners* or *estimators*) to make a final decision.

The idea is:

* Instead of relying on a single model, combine several models to improve generalization and reduce the risk of overfitting.
* Each model contributes its “vote” to the final prediction.

It is implemented in **`sklearn.ensemble.VotingClassifier`**.

---

# 🔹 Types of Voting

## 1. **Hard Voting (Majority Voting)**

* Each classifier predicts a **class label**.
* The final prediction is the class that receives the **majority of votes**.
* Example:

  * Model A → Class 0
  * Model B → Class 1
  * Model C → Class 1
  * Final prediction → **Class 1** (majority wins).

**Use cases**:

* Works well when base classifiers are accurate and diverse.
* Easy to interpret.
* Good for discrete classification problems.

---

## 2. **Soft Voting (Weighted Probability Averaging)**

* Each classifier predicts a **probability distribution** over classes.
* The probabilities are averaged (optionally weighted).
* The class with the **highest average probability** is chosen.
* Example:

  * Model A → \[0.7, 0.3]
  * Model B → \[0.6, 0.4]
  * Model C → \[0.8, 0.2]
  * Average → \[0.7, 0.3] → Final prediction → **Class 0**

**Use cases**:

* Generally performs better than hard voting.
* Requires classifiers to support `predict_proba()`.
* More robust when models output well-calibrated probabilities.

---

# 🔹 Weighted Voting

Both **hard** and **soft** voting can use **weights**:

* Assign higher weights to stronger models.
* Example: `(weights=[1, 2, 3])` means the third model’s vote/probability counts 3× more.

---

# 🔹 Why Use Voting Classifiers?

✅ **Improved accuracy** — reduces variance & bias by combining models.
✅ **Model diversity** — linear + tree + boosting models cover different patterns.
✅ **Robustness** — less sensitive to errors of a single weak model.
✅ **Flexibility** — can combine very different algorithms.

---

# 🔹 When Voting Classifier Works Best?

* When base models are **diverse** (different biases).

  * Example: Logistic Regression (linear), Random Forest (tree-based), XGBoost (boosting).
* When base models are **individually strong but not perfect**.
* When their errors are **uncorrelated** — combining helps cancel out mistakes.

---

# 🔹 Limitations

❌ If base models are very similar, voting won’t add much value.
❌ Requires more computation (training multiple models).
❌ Soft voting requires probability outputs (not all models support this).

---

# 🔹 Extensions

* **Weighted Voting** → give priority to stronger models.
* **Stacking** → train a meta-model on predictions of base learners (more powerful).
* **Bagging/Boosting** → alternative ensemble approaches.

---

👉 In short:
A **Voting Classifier** is a **simple but powerful ensemble method** that combines multiple models using majority vote (hard) or probability averaging (soft). It improves performance, stability, and robustness compared to individual models.

---


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/iris/Iris.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df = df.iloc[:,1:]

In [None]:
df.head()

In [None]:
# Label encode Species
from sklearn.preprocessing import LabelEncoder

In [None]:
encoder = LabelEncoder()

In [None]:
df['Species'] = encoder.fit_transform(df['Species'])

In [None]:
df.head()

In [None]:
import seaborn as sns
sns.pairplot(df,hue='Species')

In [None]:
new_df = df[df['Species'] != 0][['SepalLengthCm','SepalWidthCm','Species']]

In [None]:
new_df.head()

In [None]:
new_df.shape

In [None]:
X = df.iloc[:,0:2]
y = df.iloc[:,-1]

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

In [None]:
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = KNeighborsClassifier()

In [None]:
estimators = [('lr',clf1),('rf',clf2),('knn',clf3)]

In [None]:
for estimator in estimators:
    x = cross_val_score(estimator[1],X,y,cv=10,scoring='accuracy')
    print(estimator[0],np.round(np.mean(x),2))

In [None]:
from sklearn.ensemble import VotingClassifier

In [None]:
vc = VotingClassifier(estimators=estimators,voting='hard')
x = cross_val_score(vc,X,y,cv=10,scoring='accuracy')
print(np.round(np.mean(x),2))

In [None]:
vc1 = VotingClassifier(estimators=estimators,voting='soft')
x = cross_val_score(vc1,X,y,cv=10,scoring='accuracy')
print(np.round(np.mean(x),2))

In [175]:
for i in range(1,4):
    for j in range(1,4):
        for k in range(1,4):
            vc = VotingClassifier(estimators=estimators,voting='soft',weights=[i,j,k])
            x = cross_val_score(vc,X,y,cv=10,scoring='accuracy')
            print("for i={},j={},k={}".format(i,j,k),np.round(np.mean(x),2))

for i=1,j=3,k=3 0.75
for i=2,j=1,k=1 0.77
for i=2,j=1,k=2 0.77
for i=2,j=1,k=3 0.77
for i=2,j=2,k=1 0.77
for i=2,j=2,k=2 0.76
for i=2,j=2,k=3 0.75
for i=2,j=3,k=1 0.74
for i=2,j=3,k=2 0.77
for i=2,j=3,k=3 0.76
for i=3,j=1,k=1 0.8
for i=3,j=1,k=2 0.78
for i=3,j=1,k=3 0.79
for i=3,j=2,k=1 0.79
for i=3,j=2,k=2 0.77
for i=3,j=2,k=3 0.77
for i=3,j=3,k=1 0.75
for i=3,j=3,k=2 0.77
for i=3,j=3,k=3 0.77


In [176]:
from sklearn.svm import SVC

In [177]:
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=2)

svm1 = SVC(probability=True, kernel='poly', degree=1)
svm2 = SVC(probability=True, kernel='poly', degree=2)
svm3 = SVC(probability=True, kernel='poly', degree=3)
svm4 = SVC(probability=True, kernel='poly', degree=4)
svm5 = SVC(probability=True, kernel='poly', degree=5)

estimators = [('svm1',svm1),('svm2',svm2),('svm3',svm3),('svm4',svm4),('svm5',svm5)]

for estimator in estimators:
    x = cross_val_score(estimator[1],X,y,cv=10,scoring='accuracy')
    print(estimator[0],np.round(np.mean(x),2))

svm1 0.85
svm2 0.85
svm3 0.89
svm4 0.81
svm5 0.86


In [178]:
vc1 = VotingClassifier(estimators=estimators,voting='soft')
x = cross_val_score(vc1,X,y,cv=10,scoring='accuracy')
print(np.round(np.mean(x),2))

0.93
