# Voting Ensemble Classifier

## What is Voting Ensemble?

A **Voting Classifier** is an ensemble learning method that combines multiple different machine learning models and makes predictions based on the majority vote (hard voting) or averaged probabilities (soft voting).

### Types of Voting:

1. **Hard Voting**: Each classifier votes for a class, and the majority class wins
   - Final prediction = mode of all predictions
   - Example: If 3 models predict [A, B, A] â†’ Final prediction is A

2. **Soft Voting**: Averages the predicted probabilities from all classifiers
   - Final prediction = argmax of averaged probabilities
   - Generally performs better when classifiers are well-calibrated
   - Requires `probability=True` for models like SVM

### Why Use Voting Ensemble?

- **Reduces variance**: Combines diverse models to reduce overfitting
- **Improves accuracy**: Often outperforms individual models
- **Robust predictions**: Different models capture different patterns
- **Simple to implement**: Easy to understand and tune

## Implementation Steps:

1. Import necessary libraries
2. Load and split data
3. Create individual base models (diverse algorithms work best)
4. Combine them in a VotingClassifier
5. Compare ensemble performance vs individual models

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression


## Step 1: Load Data and Create Ensemble

We'll create a voting ensemble with three diverse classifiers:
- **Decision Tree**: Non-linear, tree-based model
- **Logistic Regression**: Linear probabilistic model
- **SVM (Support Vector Machine)**: Kernel-based classifier

Using `voting='soft'` means we'll average the predicted probabilities from all three models.

In [2]:
data = load_iris()
x, y = data.data, data.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
dt = DecisionTreeClassifier(max_depth=3, random_state=42)
lr = LogisticRegression(max_iter=300)
svm = SVC(probability=True,kernel='rbf')
voting_clf = VotingClassifier(
    estimators=[('dt', dt), ('lr', lr), ('svm', svm)],
    voting='soft'
)
voting_clf.fit(x_train,y_train)

y_pred = voting_clf.predict(x_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Ensemble Model Accuracy: {accuracy:.2f}")





Ensemble Model Accuracy: 1.00


## Step 2: Compare Individual Model Performance

Let's evaluate each base model individually to see how the ensemble compares to its components.

In [3]:
for name, model in voting_clf.named_estimators_.items():
    y_pred_individual = model.predict(x_test)
    accuracy_individual = accuracy_score(y_test, y_pred_individual)
    print(f"{name} Model Accuracy: {accuracy_individual:.2f}")

dt Model Accuracy: 1.00
lr Model Accuracy: 1.00
svm Model Accuracy: 1.00
