# 18. Ensemble Learning: Voting

**Purpose:** Learn and revise **Voting** ensembles in Scikit-learn.

---

## What is a Voting Ensemble?

A **voting ensemble** combines predictions from **several different estimators**:

- **Hard voting (majority):** Each classifier votes for a class; the class with the most votes wins.
- **Soft voting:** Each classifier outputs **probabilities**; the class with the **highest average probability** wins (often better than hard voting).

**Key idea:** Use **diverse** models (e.g. tree, SVM, logistic regression) so their errors don't correlate; averaging reduces variance. No training of the ensemble beyond fitting each base estimator.

## Concepts to Remember

| Concept | Description |
|--------|-------------|
| **voting** | 'hard' = majority class; 'soft' = average predicted probabilities (needs predict_proba). |
| **estimators** | List of (name, estimator) tuples; use diverse model types. |
| **weights** | Optional weights per estimator for soft voting. |
| **When to use** | Quick way to combine different algorithms; often improves over any single one. |

In [1]:
import numpy as np
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [2]:
np.random.seed(42)
X = np.random.randn(300, 4)
y = (X[:, 0] ** 2 + X[:, 1] > 0).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

In [3]:
estimators = [
    ("lr", Pipeline([("scale", StandardScaler()), ("clf", LogisticRegression(max_iter=1000))])),
    ("tree", DecisionTreeClassifier(max_depth=4)),
    ("svc", Pipeline([("scale", StandardScaler()), ("clf", SVC(probability=True))])),
]
voting = VotingClassifier(estimators=estimators, voting="soft")
voting.fit(X_train_s, y_train)
y_pred = voting.predict(X_test_s)

print("Voting (soft) Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Voting (soft) Accuracy: 0.95
              precision    recall  f1-score   support

           0       0.87      0.93      0.90        14
           1       0.98      0.96      0.97        46

    accuracy                           0.95        60
   macro avg       0.92      0.94      0.93        60
weighted avg       0.95      0.95      0.95        60



## Key Takeaways

- **VotingClassifier** / **VotingRegressor**; **voting='soft'** usually better than 'hard' when models support **predict_proba**.
- For SVC to participate in soft voting, use **probability=True** (adds Platt scaling).
- Preprocess (e.g. scaling) per-estimator with **Pipeline** so each model gets scaled input.