# Module 09: Voting Classifiers and Regressors

**Difficulty**: ⭐⭐
**Estimated Time**: 40 minutes
**Prerequisites**: 
- Module 00: Introduction to Ensemble Methods
- Module 01: Bagging and Bootstrap Aggregation
- Module 02: Random Forest

## Learning Objectives
By the end of this notebook, you will be able to:
1. Understand the difference between hard and soft voting
2. Implement VotingClassifier and VotingRegressor with scikit-learn
3. Combine different types of algorithms effectively
4. Tune voting weights for optimal performance
5. Know when voting ensembles help and when they don't
6. Compare voting with stacking and other ensemble methods

## 1. Introduction to Voting Ensembles

### What is Voting?

**Voting** is one of the simplest ensemble methods where multiple models "vote" on the final prediction. The final output is determined by combining the votes of all models.

### Types of Voting:

#### 1. **Hard Voting (Majority Voting)**

Each model casts a vote for a class, and the class with the most votes wins.

```
Example for 3-class classification:

Model 1:  Class A
Model 2:  Class B
Model 3:  Class A    } Hard Vote → Class A wins (2 votes)
Model 4:  Class A
Model 5:  Class B

Final Prediction: Class A
```

#### 2. **Soft Voting (Weighted Voting)**

Models output probability estimates, which are averaged. The class with highest average probability wins.

```
Example for binary classification:

           Class 0   Class 1
Model 1:    0.4       0.6
Model 2:    0.3       0.7    } Average: 0.35, 0.65
Model 3:    0.4       0.6

Final Prediction: Class 1 (probability = 0.65)
```

**Soft voting usually performs better** because it uses more information (probabilities vs just labels).

### Voting for Regression:

For regression, voting simply **averages the predictions** of all models:

```
Model 1 predicts: 45.2
Model 2 predicts: 47.8
Model 3 predicts: 46.1

Final prediction: (45.2 + 47.8 + 46.1) / 3 = 46.37
```

### Why Voting Works:

- **Reduces variance**: Individual model errors cancel out
- **Robust**: Not dependent on any single model
- **Simple**: Easy to understand and implement
- **Diversity**: Different models make different errors

## 2. Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Scikit-learn
from sklearn.datasets import make_classification, load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import (
    VotingClassifier, VotingRegressor,
    RandomForestClassifier, RandomForestRegressor,
    GradientBoostingClassifier, GradientBoostingRegressor
)
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    mean_squared_error, r2_score
)

# Boosting libraries
from xgboost import XGBClassifier, XGBRegressor
from lightgbm import LGBMClassifier, LGBMRegressor

# Configuration
%matplotlib inline
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Set random seed
np.random.seed(42)

## 3. Hard Voting Classifier

In [None]:
# Load dataset
cancer_data = load_breast_cancer()
X = cancer_data.data
y = cancer_data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Dataset: {X.shape}")
print(f"Train: {X_train.shape}, Test: {X_test.shape}")

In [None]:
# Create diverse classifiers
clf1 = LogisticRegression(random_state=42, max_iter=1000)
clf2 = RandomForestClassifier(n_estimators=100, random_state=42)
clf3 = GaussianNB()
clf4 = SVC(kernel='rbf', random_state=42)

# Create hard voting classifier
hard_voting_clf = VotingClassifier(
    estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3), ('svc', clf4)],
    voting='hard'
)

# Train
hard_voting_clf.fit(X_train, y_train)

# Predict
y_pred_hard = hard_voting_clf.predict(X_test)
hard_voting_acc = accuracy_score(y_test, y_pred_hard)

print("Hard Voting Classifier:")
print(f"Test Accuracy: {hard_voting_acc:.4f}")

In [None]:
# Compare with individual classifiers
print("\nIndividual Classifier Performance:")
print("=" * 50)

individual_scores = {}
for name, clf in [('LogisticRegression', clf1), ('RandomForest', clf2), 
                  ('GaussianNB', clf3), ('SVC', clf4)]:
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    individual_scores[name] = acc
    print(f"{name:20s}: {acc:.4f}")

print("=" * 50)
print(f"{'Hard Voting':20s}: {hard_voting_acc:.4f}")
print(f"\nAverage of individual scores: {np.mean(list(individual_scores.values())):.4f}")
print(f"Best individual score: {max(individual_scores.values()):.4f}")

## 4. Soft Voting Classifier

In [None]:
# For soft voting, all classifiers must support predict_proba
# SVC needs probability=True
clf1_soft = LogisticRegression(random_state=42, max_iter=1000)
clf2_soft = RandomForestClassifier(n_estimators=100, random_state=42)
clf3_soft = GaussianNB()
clf4_soft = SVC(kernel='rbf', probability=True, random_state=42)  # Enable probability

# Create soft voting classifier
soft_voting_clf = VotingClassifier(
    estimators=[('lr', clf1_soft), ('rf', clf2_soft), ('gnb', clf3_soft), ('svc', clf4_soft)],
    voting='soft'
)

# Train
soft_voting_clf.fit(X_train, y_train)

# Predict
y_pred_soft = soft_voting_clf.predict(X_test)
soft_voting_acc = accuracy_score(y_test, y_pred_soft)

print("Soft Voting Classifier:")
print(f"Test Accuracy: {soft_voting_acc:.4f}")
print(f"\nComparison:")
print(f"Hard Voting: {hard_voting_acc:.4f}")
print(f"Soft Voting: {soft_voting_acc:.4f}")
print(f"Improvement: {soft_voting_acc - hard_voting_acc:.4f}")

In [None]:
# Visualize comparison
methods = ['LogReg', 'RF', 'GNB', 'SVC', 'Hard Vote', 'Soft Vote']
scores = list(individual_scores.values()) + [hard_voting_acc, soft_voting_acc]

plt.figure(figsize=(12, 6))
colors = ['skyblue'] * 4 + ['orange', 'green']
bars = plt.bar(methods, scores, color=colors, alpha=0.7)

# Highlight voting methods
bars[4].set_edgecolor('darkorange')
bars[4].set_linewidth(2)
bars[5].set_edgecolor('darkgreen')
bars[5].set_linewidth(2)

plt.ylabel('Accuracy')
plt.title('Voting Ensemble vs Individual Classifiers')
plt.ylim([min(scores) - 0.01, 1.0])
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

for i, (method, score) in enumerate(zip(methods, scores)):
    plt.text(i, score + 0.002, f'{score:.4f}', ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.show()

## 5. Weighted Voting

We can assign different weights to classifiers based on their performance or reliability.

In [None]:
# Calculate weights based on cross-validation scores
print("Calculating optimal weights using cross-validation...\n")

weights = []
for name, clf in [('lr', clf1_soft), ('rf', clf2_soft), ('gnb', clf3_soft), ('svc', clf4_soft)]:
    cv_scores = cross_val_score(clf, X_train, y_train, cv=5, scoring='accuracy')
    mean_score = cv_scores.mean()
    weights.append(mean_score)
    print(f"{name:5s}: CV Accuracy = {mean_score:.4f}")

# Normalize weights
weights = np.array(weights)
weights_normalized = weights / weights.sum()

print(f"\nNormalized Weights: {weights_normalized}")

In [None]:
# Create weighted voting classifier
weighted_voting_clf = VotingClassifier(
    estimators=[('lr', clf1_soft), ('rf', clf2_soft), ('gnb', clf3_soft), ('svc', clf4_soft)],
    voting='soft',
    weights=weights.tolist()  # Use CV scores as weights
)

# Train
weighted_voting_clf.fit(X_train, y_train)

# Evaluate
y_pred_weighted = weighted_voting_clf.predict(X_test)
weighted_voting_acc = accuracy_score(y_test, y_pred_weighted)

print("\nWeighted Voting Results:")
print(f"Equal weights (soft voting):    {soft_voting_acc:.4f}")
print(f"Performance-based weights:      {weighted_voting_acc:.4f}")
print(f"Improvement: {weighted_voting_acc - soft_voting_acc:.4f}")

## 6. Voting with Boosting Algorithms

Let's create a powerful voting ensemble combining different boosting algorithms.

In [None]:
# Create ensemble of boosting algorithms
boosting_ensemble = VotingClassifier(
    estimators=[
        ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
        ('gb', GradientBoostingClassifier(n_estimators=100, random_state=42)),
        ('xgb', XGBClassifier(n_estimators=100, random_state=42, eval_metric='logloss')),
        ('lgbm', LGBMClassifier(n_estimators=100, random_state=42, verbose=-1))
    ],
    voting='soft'
)

# Train
boosting_ensemble.fit(X_train, y_train)

# Evaluate
y_pred_boosting = boosting_ensemble.predict(X_test)
boosting_ensemble_acc = accuracy_score(y_test, y_pred_boosting)

print("Boosting Ensemble (Voting):")
print(f"Test Accuracy: {boosting_ensemble_acc:.4f}")

In [None]:
# Compare individual boosting models
print("\nIndividual Boosting Model Performance:")
print("=" * 50)

boosting_models = [
    ('RandomForest', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('GradientBoosting', GradientBoostingClassifier(n_estimators=100, random_state=42)),
    ('XGBoost', XGBClassifier(n_estimators=100, random_state=42, eval_metric='logloss')),
    ('LightGBM', LGBMClassifier(n_estimators=100, random_state=42, verbose=-1))
]

boosting_scores = {}
for name, model in boosting_models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    boosting_scores[name] = acc
    print(f"{name:20s}: {acc:.4f}")

print("=" * 50)
print(f"{'Voting Ensemble':20s}: {boosting_ensemble_acc:.4f}")
print(f"\nBest individual: {max(boosting_scores.values()):.4f}")
print(f"Improvement over best: {boosting_ensemble_acc - max(boosting_scores.values()):.4f}")

## 7. Voting Regressor

In [None]:
# Load regression dataset
diabetes = load_diabetes()
X_reg = diabetes.data
y_reg = diabetes.target

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

print(f"Regression dataset: {X_reg.shape}")
print(f"Train: {X_train_reg.shape}, Test: {X_test_reg.shape}")

In [None]:
# Create diverse regressors
reg1 = Ridge(alpha=1.0)
reg2 = RandomForestRegressor(n_estimators=100, random_state=42)
reg3 = GradientBoostingRegressor(n_estimators=100, random_state=42)
reg4 = SVR(kernel='rbf')

# Create voting regressor
voting_reg = VotingRegressor(
    estimators=[('ridge', reg1), ('rf', reg2), ('gb', reg3), ('svr', reg4)]
)

# Train
voting_reg.fit(X_train_reg, y_train_reg)

# Predict
y_pred_voting = voting_reg.predict(X_test_reg)
voting_r2 = r2_score(y_test_reg, y_pred_voting)
voting_mse = mean_squared_error(y_test_reg, y_pred_voting)

print("Voting Regressor:")
print(f"R² Score: {voting_r2:.4f}")
print(f"MSE: {voting_mse:.2f}")

In [None]:
# Compare with individual regressors
print("\nIndividual Regressor Performance:")
print("=" * 60)

reg_scores = {}
for name, model in [('Ridge', reg1), ('RandomForest', reg2), 
                    ('GradientBoosting', reg3), ('SVR', reg4)]:
    model.fit(X_train_reg, y_train_reg)
    y_pred = model.predict(X_test_reg)
    r2 = r2_score(y_test_reg, y_pred)
    mse = mean_squared_error(y_test_reg, y_pred)
    reg_scores[name] = r2
    print(f"{name:20s}: R² = {r2:.4f}, MSE = {mse:.2f}")

print("=" * 60)
print(f"{'Voting Regressor':20s}: R² = {voting_r2:.4f}, MSE = {voting_mse:.2f}")
print(f"\nBest individual R²: {max(reg_scores.values()):.4f}")
print(f"Improvement: {voting_r2 - max(reg_scores.values()):.4f}")

In [None]:
# Visualize predictions
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.ravel()

# Individual models
models = [('Ridge', reg1), ('RF', reg2), ('GB', reg3), ('SVR', reg4)]
for idx, (name, model) in enumerate(models):
    y_pred = model.predict(X_test_reg)
    r2 = r2_score(y_test_reg, y_pred)
    
    axes[idx].scatter(y_test_reg, y_pred, alpha=0.5)
    axes[idx].plot([y_test_reg.min(), y_test_reg.max()],
                   [y_test_reg.min(), y_test_reg.max()],
                   'r--', linewidth=2)
    axes[idx].set_xlabel('Actual')
    axes[idx].set_ylabel('Predicted')
    axes[idx].set_title(f'{name} (R² = {r2:.4f})')

# Voting ensemble
axes[4].scatter(y_test_reg, y_pred_voting, alpha=0.5, color='green')
axes[4].plot([y_test_reg.min(), y_test_reg.max()],
             [y_test_reg.min(), y_test_reg.max()],
             'r--', linewidth=2)
axes[4].set_xlabel('Actual')
axes[4].set_ylabel('Predicted')
axes[4].set_title(f'Voting Ensemble (R² = {voting_r2:.4f})', fontweight='bold')

# Hide last subplot
axes[5].axis('off')

plt.tight_layout()
plt.show()

## 8. When Voting Helps and When It Doesn't

In [None]:
# Experiment 1: Voting with similar models (should not help much)
print("Experiment 1: Voting with Similar Models")
print("=" * 60)

# Multiple random forests with different random seeds
similar_models = VotingClassifier(
    estimators=[
        ('rf1', RandomForestClassifier(n_estimators=100, random_state=1)),
        ('rf2', RandomForestClassifier(n_estimators=100, random_state=2)),
        ('rf3', RandomForestClassifier(n_estimators=100, random_state=3)),
        ('rf4', RandomForestClassifier(n_estimators=100, random_state=4))
    ],
    voting='soft'
)

similar_models.fit(X_train, y_train)
similar_acc = similar_models.score(X_test, y_test)

# Single random forest for comparison
single_rf = RandomForestClassifier(n_estimators=100, random_state=42)
single_rf.fit(X_train, y_train)
single_acc = single_rf.score(X_test, y_test)

print(f"Single Random Forest:           {single_acc:.4f}")
print(f"Voting (4 Random Forests):      {similar_acc:.4f}")
print(f"Improvement: {similar_acc - single_acc:.4f}")
print("\nConclusion: Minimal improvement - models are too similar!\n")

In [None]:
# Experiment 2: Voting with diverse models (should help)
print("Experiment 2: Voting with Diverse Models")
print("=" * 60)

# Diverse algorithms
diverse_models = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression(random_state=42, max_iter=1000)),
        ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
        ('xgb', XGBClassifier(n_estimators=100, random_state=42, eval_metric='logloss')),
        ('knn', KNeighborsClassifier(n_neighbors=5))
    ],
    voting='soft'
)

diverse_models.fit(X_train, y_train)
diverse_acc = diverse_models.score(X_test, y_test)

# Best individual from diverse set
best_individual = max([
    ('LogReg', LogisticRegression(random_state=42, max_iter=1000)),
    ('RF', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('XGB', XGBClassifier(n_estimators=100, random_state=42, eval_metric='logloss')),
    ('KNN', KNeighborsClassifier(n_neighbors=5))
], key=lambda x: cross_val_score(x[1], X_train, y_train, cv=3).mean())

best_individual[1].fit(X_train, y_train)
best_acc = best_individual[1].score(X_test, y_test)

print(f"Best individual model:          {best_acc:.4f}")
print(f"Voting (diverse models):        {diverse_acc:.4f}")
print(f"Improvement: {diverse_acc - best_acc:.4f}")
print("\nConclusion: Significant improvement - diversity helps!")

## 9. Exercises

### Exercise 1: Custom Voting Ensemble

Create a voting classifier with at least 5 diverse models. Compare hard voting vs soft voting. Which performs better?

In [None]:
# Your code here


### Exercise 2: Optimal Weight Finding

Create a voting classifier with 3-4 models. Experiment with different weight combinations to find the optimal weights that maximize test accuracy. Compare with equal weights.

In [None]:
# Your code here


### Exercise 3: Voting vs Stacking

Using the same base models, compare:
1. Soft voting
2. Stacking with logistic regression meta-learner

Which performs better? Why?

In [None]:
# Your code here


### Exercise 4: Weighted Voting Regressor

Create a voting regressor with weights based on cross-validation R² scores. Compare performance with equal-weight voting.

In [None]:
# Your code here


### Exercise 5: Diversity Analysis

Create two voting ensembles:
1. 5 similar models (e.g., decision trees with different max_depth)
2. 5 diverse models (different algorithm families)

Measure the correlation between predictions. Which ensemble has lower correlation? Which performs better?

In [None]:
# Your code here


## 10. Summary

In this notebook, you learned about Voting Ensembles:

### Key Concepts:

1. **Hard Voting**:
   - Each model votes for a class
   - Majority wins
   - Simple but less informative

2. **Soft Voting**:
   - Models output probability estimates
   - Probabilities are averaged
   - Usually performs better than hard voting
   - Requires `predict_proba` support

3. **Weighted Voting**:
   - Different models get different weights
   - Weights based on performance or confidence
   - Can significantly improve results

4. **Voting for Regression**:
   - Simply averages predictions
   - Can use weights for better results

### When Voting Works Best:

✅ **Models are diverse** (different algorithms, different assumptions)
✅ **Models have similar performance** (no one model dominates)
✅ **Errors are uncorrelated** (models make different mistakes)
✅ **Simplicity is valued** (voting is easy to understand and implement)
✅ **Computational budget allows** training multiple models

### When Voting Doesn't Help:

❌ **Models are too similar** (e.g., multiple random forests)
❌ **One model is much better** than others
❌ **Models are highly correlated** (make same mistakes)
❌ **Too few diverse models** (at least 3-5 needed)
❌ **Models are poorly tuned** (garbage in, garbage out)

### Best Practices:

1. **Use Diverse Models**:
   - Mix algorithm families (linear, tree, distance-based, etc.)
   - Avoid multiple instances of the same algorithm
   - Example: LogReg + RF + XGBoost + SVM + KNN

2. **Prefer Soft Voting**:
   - Uses more information (probabilities)
   - Usually outperforms hard voting
   - Ensure all models support `predict_proba`

3. **Tune Weights**:
   - Use cross-validation scores as weights
   - Or optimize weights on validation set
   - Equal weights work well when models perform similarly

4. **Check if It Helps**:
   - Compare with best individual model
   - If improvement < 0.5%, may not be worth complexity
   - Use cross-validation for robust comparison

### Comparison with Other Ensemble Methods:

| Method | Complexity | Performance | Interpretability | Training Time |
|--------|-----------|-------------|------------------|---------------|
| **Voting** | Low | Good | High | Medium |
| **Bagging** | Low | Good | Medium | Fast |
| **Boosting** | Medium | Excellent | Low | Medium |
| **Stacking** | High | Excellent | Low | Slow |

### Voting vs Stacking:

**Voting**:
- Simple weighted average
- Faster to train
- More interpretable
- Good when models perform similarly

**Stacking**:
- Meta-learner learns optimal combination
- Can discover non-linear combinations
- Usually higher performance
- More complex, requires cross-validation

### Practical Applications:

1. **Quick Ensemble**: When you need fast improvement over single model
2. **Model Robustness**: Reduce dependence on single model
3. **Kaggle Competitions**: Simple baseline before complex ensembling
4. **Production Systems**: Easy to understand and maintain

### What's Next?

In the next module, we'll do a **Comprehensive Comparison** of all ensemble methods:
- Performance benchmarks on multiple datasets
- Speed/accuracy trade-offs
- Decision framework for choosing the right method
- Best practices for production deployment

This will help you choose the right ensemble method for your specific problem!

## Additional Resources

- [Scikit-learn Voting Classifier](https://scikit-learn.org/stable/modules/ensemble.html#voting-classifier)
- [Ensemble Learning Tutorial](https://machinelearningmastery.com/voting-ensembles-with-python/)
- [Combining Classifiers](https://sebastianraschka.com/Articles/2014_ensemble_classifier.html)
- [Weighted Voting Strategies](https://www.sciencedirect.com/science/article/pii/S0031320315001831)