# Module 09: Voting Classifiers and Regressors

**Difficulty**: ‚≠ê‚≠ê‚≠ê Advanced
**Estimated Time**: 70 minutes
**Prerequisites**: 
- Module 02: Random Forests
- Module 05: XGBoost
- Module 08: Stacking and Blending

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand voting as the simplest ensemble combination method
2. Implement hard voting and soft voting for classification
3. Apply averaging for regression ensembles
4. Find optimal weights for weighted voting using grid search
5. Compare voting with stacking approaches
6. Combine models from different families effectively
7. Determine when voting is preferable to more complex ensembles
8. Apply voting in real-world scenarios

## Setup and Configuration

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import time
import warnings
from itertools import product
warnings.filterwarnings('ignore')

# Machine learning
from sklearn.datasets import load_breast_cancer, load_diabetes, make_classification
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    roc_auc_score, roc_curve, log_loss,
    mean_squared_error, r2_score, mean_absolute_error
)

# Voting ensembles
from sklearn.ensemble import VotingClassifier, VotingRegressor

# Base models
from sklearn.ensemble import (
    RandomForestClassifier, RandomForestRegressor,
    GradientBoostingClassifier, GradientBoostingRegressor,
    AdaBoostClassifier, AdaBoostRegressor
)
from sklearn.linear_model import LogisticRegression, Ridge, Lasso
from sklearn.svm import SVC, SVR
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.naive_bayes import GaussianNB

# Gradient boosting libraries
try:
    import xgboost as xgb
    XGB_AVAILABLE = True
except ImportError:
    XGB_AVAILABLE = False

try:
    import lightgbm as lgb
    LGB_AVAILABLE = True
except ImportError:
    LGB_AVAILABLE = False

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

# Reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

print("\nSetup complete! All libraries imported successfully.")

## 1. What is Voting?

### Voting = Democratic Decision Making

Voting is the simplest ensemble combination method:
- Train multiple independent models
- Each model "votes" on the prediction
- Final prediction based on majority or average

### Types of Voting

#### 1.1 Hard Voting (Classification)

Majority vote from predicted class labels:

```
Sample X:
  Model A predicts: Class 1
  Model B predicts: Class 0
  Model C predicts: Class 1
  
Final prediction: Class 1 (majority)
```

**Formula**:
$$\hat{y} = \text{mode}(h_1(x), h_2(x), ..., h_n(x))$$

#### 1.2 Soft Voting (Classification)

Average of predicted probabilities:

```
Sample X:
  Model A predicts: [0.4, 0.6] ‚Üí Class 1 with 60%
  Model B predicts: [0.7, 0.3] ‚Üí Class 0 with 70%
  Model C predicts: [0.3, 0.7] ‚Üí Class 1 with 70%
  
Average: [0.467, 0.533]
Final prediction: Class 1 (higher probability)
```

**Formula**:
$$\hat{y} = \arg\max_c \frac{1}{n} \sum_{i=1}^n p_i(c|x)$$

**Soft voting is generally better** because:
- Uses more information (probabilities vs hard decisions)
- Accounts for model confidence
- Smoother decision boundaries

#### 1.3 Averaging (Regression)

Simple average of predictions:

```
Sample X:
  Model A predicts: 105.3
  Model B predicts: 98.7
  Model C predicts: 102.1
  
Final prediction: (105.3 + 98.7 + 102.1) / 3 = 102.0
```

### Weighted Voting

Assign different weights to models:

$$\hat{y} = \arg\max_c \sum_{i=1}^n w_i \cdot p_i(c|x)$$

Where $w_i$ is the weight for model $i$, and $\sum w_i = 1$.

Better models get higher weights!

### Voting vs Stacking

| Aspect | Voting | Stacking |
|--------|--------|----------|
| **Complexity** | Simple | Complex |
| **Training** | Parallel | Sequential |
| **Combination** | Fixed rule | Learned |
| **Overfitting risk** | Low | Higher |
| **Flexibility** | Limited | High |
| **Performance** | Good | Better |
| **Interpretability** | High | Lower |

### When to Use Voting

**Best for**:
- Simple, interpretable ensembles
- Production systems (easier to deploy)
- Limited training data
- Models are already strong
- Want robustness over complexity

**Consider stacking if**:
- Need maximum accuracy
- Sufficient training data
- Complexity acceptable
- Models have complex interactions

In [None]:
# Load classification dataset
cancer_data = load_breast_cancer()
X, y = cancer_data.data, cancer_data.target
feature_names = cancer_data.feature_names

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_STATE
)

print(f"Dataset: {len(X_train)} train, {len(X_test)} test, {X.shape[1]} features")
print(f"Classes: {np.unique(y)}, Distribution: {np.bincount(y)}")

## 2. Hard Voting vs Soft Voting

In [None]:
# Define diverse base models
base_classifiers = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)),
    ('gb', GradientBoostingClassifier(n_estimators=100, random_state=RANDOM_STATE)),
    ('svm', SVC(probability=True, random_state=RANDOM_STATE)),  # probability=True for soft voting
    ('knn', KNeighborsClassifier(n_neighbors=5))
]

print("Base classifiers:")
for name, clf in base_classifiers:
    print(f"  - {name}: {type(clf).__name__}")

In [None]:
# Hard Voting
print("\nTraining Hard Voting Classifier...")
hard_voting = VotingClassifier(
    estimators=base_classifiers,
    voting='hard'
)
hard_voting.fit(X_train, y_train)
hard_pred = hard_voting.predict(X_test)
hard_acc = accuracy_score(y_test, hard_pred)

print(f"Hard Voting Accuracy: {hard_acc:.4f}")

In [None]:
# Soft Voting
print("\nTraining Soft Voting Classifier...")
soft_voting = VotingClassifier(
    estimators=base_classifiers,
    voting='soft'
)
soft_voting.fit(X_train, y_train)
soft_pred = soft_voting.predict(X_test)
soft_proba = soft_voting.predict_proba(X_test)
soft_acc = accuracy_score(y_test, soft_pred)
soft_auc = roc_auc_score(y_test, soft_proba[:, 1])

print(f"Soft Voting Accuracy: {soft_acc:.4f}")
print(f"Soft Voting AUC-ROC: {soft_auc:.4f}")

In [None]:
# Compare with individual models and voting methods
print("\n" + "=" * 70)
print("Performance Comparison")
print("=" * 70)

results = []

# Individual models
for name, clf in base_classifiers:
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)
    acc = accuracy_score(y_test, pred)
    
    if hasattr(clf, 'predict_proba'):
        proba = clf.predict_proba(X_test)[:, 1]
        auc = roc_auc_score(y_test, proba)
    else:
        auc = np.nan
    
    results.append({'Model': name.upper(), 'Type': 'Base', 'Accuracy': acc, 'AUC': auc})

# Voting ensembles
results.append({'Model': 'HARD VOTING', 'Type': 'Ensemble', 'Accuracy': hard_acc, 'AUC': np.nan})
results.append({'Model': 'SOFT VOTING', 'Type': 'Ensemble', 'Accuracy': soft_acc, 'AUC': soft_auc})

df_results = pd.DataFrame(results)
print(df_results.to_string(index=False))

# Best base model
best_base_acc = df_results[df_results['Type'] == 'Base']['Accuracy'].max()
improvement_hard = (hard_acc - best_base_acc) * 100
improvement_soft = (soft_acc - best_base_acc) * 100

print(f"\n‚úÖ Hard Voting improvement: +{improvement_hard:.2f}% over best base")
print(f"‚úÖ Soft Voting improvement: +{improvement_soft:.2f}% over best base")
print(f"\nüí° Soft Voting typically performs better than Hard Voting")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
models_base = df_results[df_results['Type'] == 'Base']['Model'].tolist()
acc_base = df_results[df_results['Type'] == 'Base']['Accuracy'].tolist()
models_ensemble = df_results[df_results['Type'] == 'Ensemble']['Model'].tolist()
acc_ensemble = df_results[df_results['Type'] == 'Ensemble']['Accuracy'].tolist()

x_base = np.arange(len(models_base))
x_ensemble = np.arange(len(models_base), len(models_base) + len(models_ensemble))

axes[0].bar(x_base, acc_base, color='steelblue', edgecolor='black', label='Base Models')
axes[0].bar(x_ensemble, acc_ensemble, color=['#e74c3c', '#2ecc71'], 
            edgecolor='black', label='Voting')
axes[0].set_xticks(range(len(df_results)))
axes[0].set_xticklabels(df_results['Model'], rotation=45, ha='right')
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Model Accuracy Comparison', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(axis='y', alpha=0.3)
axes[0].set_ylim([0.9, 1.0])

# AUC (only for models with predict_proba)
df_with_auc = df_results.dropna(subset=['AUC'])
axes[1].bar(range(len(df_with_auc)), df_with_auc['AUC'], 
            color=['steelblue'] * (len(df_with_auc) - 1) + ['#2ecc71'],
            edgecolor='black')
axes[1].set_xticks(range(len(df_with_auc)))
axes[1].set_xticklabels(df_with_auc['Model'], rotation=45, ha='right')
axes[1].set_ylabel('AUC-ROC', fontsize=12)
axes[1].set_title('Model AUC Comparison', fontsize=13, fontweight='bold')
axes[1].grid(axis='y', alpha=0.3)
axes[1].set_ylim([0.9, 1.0])

plt.tight_layout()
plt.show()

## 3. Weighted Voting

Assign different weights to models based on their performance.

In [None]:
# Evaluate individual model performance to determine weights
print("Evaluating individual models with cross-validation...\n")

cv_scores = {}
for name, clf in base_classifiers:
    scores = cross_val_score(clf, X_train, y_train, cv=5, scoring='accuracy')
    cv_scores[name] = scores.mean()
    print(f"{name:10s} - CV Accuracy: {scores.mean():.4f} (+/- {scores.std():.4f})")

# Convert scores to weights (normalize to sum to 1)
total_score = sum(cv_scores.values())
weights = [cv_scores[name] / total_score for name, _ in base_classifiers]

print(f"\nWeights (normalized):")
for (name, _), weight in zip(base_classifiers, weights):
    print(f"  {name:10s}: {weight:.4f}")
print(f"  Sum: {sum(weights):.4f}")

In [None]:
# Create weighted voting classifier
weighted_voting = VotingClassifier(
    estimators=base_classifiers,
    voting='soft',
    weights=weights
)

print("Training Weighted Voting Classifier...")
weighted_voting.fit(X_train, y_train)
weighted_pred = weighted_voting.predict(X_test)
weighted_proba = weighted_voting.predict_proba(X_test)
weighted_acc = accuracy_score(y_test, weighted_pred)
weighted_auc = roc_auc_score(y_test, weighted_proba[:, 1])

print(f"\nWeighted Voting Results:")
print(f"Accuracy: {weighted_acc:.4f}")
print(f"AUC-ROC: {weighted_auc:.4f}")

# Compare
print(f"\nComparison:")
print(f"  Equal weights (soft voting): {soft_acc:.4f}")
print(f"  Performance-based weights:   {weighted_acc:.4f}")
print(f"  Improvement: {(weighted_acc - soft_acc) * 100:+.2f}%")

## 4. Optimal Weight Finding with Grid Search

In [None]:
# Find optimal weights using GridSearchCV
print("Searching for optimal weights...\n")

# Define weight grid (coarse search)
# Each weight can be 1, 2, or 3 (will be normalized)
weight_options = [1, 2, 3]
param_grid = {
    'weights': [
        [w1, w2, w3, w4] 
        for w1 in weight_options 
        for w2 in weight_options 
        for w3 in weight_options
        for w4 in weight_options
    ]
}

print(f"Testing {len(param_grid['weights'])} weight combinations...")

# Grid search
grid_search = GridSearchCV(
    VotingClassifier(estimators=base_classifiers, voting='soft'),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

# Best weights
best_weights = grid_search.best_params_['weights']
best_cv_score = grid_search.best_score_

print(f"\n‚úÖ Best weights found:")
for (name, _), weight in zip(base_classifiers, best_weights):
    print(f"  {name:10s}: {weight}")
print(f"\nBest CV score: {best_cv_score:.4f}")

# Test on test set
best_voting = grid_search.best_estimator_
best_pred = best_voting.predict(X_test)
best_acc = accuracy_score(y_test, best_pred)
best_auc = roc_auc_score(y_test, best_voting.predict_proba(X_test)[:, 1])

print(f"\nTest set performance:")
print(f"Accuracy: {best_acc:.4f}")
print(f"AUC-ROC: {best_auc:.4f}")

## 5. Voting Regressor

Apply voting to regression tasks.

In [None]:
# Load regression dataset
diabetes_data = load_diabetes()
X_reg, y_reg = diabetes_data.data, diabetes_data.target

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=RANDOM_STATE
)

print(f"Regression dataset: {len(X_train_reg)} train, {len(X_test_reg)} test")
print(f"Features: {X_reg.shape[1]}")
print(f"Target range: [{y_reg.min():.1f}, {y_reg.max():.1f}]")

In [None]:
# Define base regressors
base_regressors = [
    ('rf', RandomForestRegressor(n_estimators=100, random_state=RANDOM_STATE)),
    ('gb', GradientBoostingRegressor(n_estimators=100, random_state=RANDOM_STATE)),
    ('ridge', Ridge(random_state=RANDOM_STATE)),
    ('svr', SVR())
]

print("Base regressors:")
for name, reg in base_regressors:
    print(f"  - {name}: {type(reg).__name__}")

In [None]:
# Train individual regressors
print("\nTraining individual regressors...\n")

reg_results = []
for name, reg in base_regressors:
    reg.fit(X_train_reg, y_train_reg)
    pred = reg.predict(X_test_reg)
    
    mse = mean_squared_error(y_test_reg, pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_test_reg, pred)
    r2 = r2_score(y_test_reg, pred)
    
    reg_results.append({
        'Model': name.upper(),
        'Type': 'Base',
        'RMSE': rmse,
        'MAE': mae,
        'R¬≤': r2
    })
    
    print(f"{name:10s} - RMSE: {rmse:.2f}, MAE: {mae:.2f}, R¬≤: {r2:.4f}")

In [None]:
# Create voting regressor (averaging)
print("\nTraining Voting Regressor...")

voting_reg = VotingRegressor(estimators=base_regressors)
voting_reg.fit(X_train_reg, y_train_reg)
voting_pred = voting_reg.predict(X_test_reg)

voting_mse = mean_squared_error(y_test_reg, voting_pred)
voting_rmse = np.sqrt(voting_mse)
voting_mae = mean_absolute_error(y_test_reg, voting_pred)
voting_r2 = r2_score(y_test_reg, voting_pred)

reg_results.append({
    'Model': 'VOTING',
    'Type': 'Ensemble',
    'RMSE': voting_rmse,
    'MAE': voting_mae,
    'R¬≤': voting_r2
})

print(f"\nVoting Regressor Results:")
print(f"RMSE: {voting_rmse:.2f}")
print(f"MAE: {voting_mae:.2f}")
print(f"R¬≤: {voting_r2:.4f}")

# Compare
df_reg_results = pd.DataFrame(reg_results)
print("\n" + "=" * 70)
print("Regression Results Comparison")
print("=" * 70)
print(df_reg_results.to_string(index=False))

best_base_r2 = df_reg_results[df_reg_results['Type'] == 'Base']['R¬≤'].max()
improvement = voting_r2 - best_base_r2
print(f"\n‚úÖ Voting improvement: +{improvement:.4f} R¬≤ over best base model")

In [None]:
# Visualize predictions
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Performance comparison
axes[0].bar(range(len(df_reg_results)), df_reg_results['R¬≤'],
            color=['steelblue'] * (len(df_reg_results) - 1) + ['#2ecc71'],
            edgecolor='black')
axes[0].set_xticks(range(len(df_reg_results)))
axes[0].set_xticklabels(df_reg_results['Model'], rotation=45, ha='right')
axes[0].set_ylabel('R¬≤ Score', fontsize=12)
axes[0].set_title('Regressor R¬≤ Comparison', fontsize=13, fontweight='bold')
axes[0].grid(axis='y', alpha=0.3)

# Prediction scatter
axes[1].scatter(y_test_reg, voting_pred, alpha=0.6, edgecolors='black')
axes[1].plot([y_test_reg.min(), y_test_reg.max()], 
             [y_test_reg.min(), y_test_reg.max()], 
             'r--', linewidth=2, label='Perfect Prediction')
axes[1].set_xlabel('True Values', fontsize=12)
axes[1].set_ylabel('Predicted Values', fontsize=12)
axes[1].set_title('Voting Regressor Predictions', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Voting vs Stacking: Head-to-Head Comparison

In [None]:
from sklearn.ensemble import StackingClassifier

print("Comparing Voting vs Stacking...\n")

# Voting (soft)
voting_model = VotingClassifier(
    estimators=base_classifiers,
    voting='soft'
)

# Stacking
stacking_model = StackingClassifier(
    estimators=base_classifiers,
    final_estimator=LogisticRegression(random_state=RANDOM_STATE),
    cv=5
)

# Train and evaluate
comparison_results = []

for name, model in [('Voting', voting_model), ('Stacking', stacking_model)]:
    print(f"Training {name}...")
    start = time.time()
    model.fit(X_train, y_train)
    train_time = time.time() - start
    
    pred = model.predict(X_test)
    proba = model.predict_proba(X_test)[:, 1]
    
    acc = accuracy_score(y_test, pred)
    auc = roc_auc_score(y_test, proba)
    
    comparison_results.append({
        'Method': name,
        'Training Time': train_time,
        'Accuracy': acc,
        'AUC': auc
    })

df_comparison = pd.DataFrame(comparison_results)
print("\n" + "=" * 70)
print("Voting vs Stacking Comparison")
print("=" * 70)
print(df_comparison.to_string(index=False))

print("\nüí° Analysis:")
print("   - Voting: Simpler, faster, easier to interpret")
print("   - Stacking: Potentially better accuracy, more complex")
print("   - Choice depends on: data size, complexity tolerance, accuracy needs")

## Exercises

### Exercise 1: Heterogeneous Ensemble Design

Create the most diverse voting ensemble possible:

1. Select models from different families:
   - Linear models
   - Tree-based models
   - Instance-based models
   - Probabilistic models
   - Neural networks (if available)
2. For each model:
   - Tune hyperparameters individually
   - Measure diversity (correlation of predictions)
3. Create voting ensembles with different subsets:
   - Most accurate models
   - Most diverse models
   - Balanced (accuracy + diversity)
4. Compare all strategies
5. Determine optimal diversity/accuracy trade-off

In [None]:
# Your code here


### Exercise 2: Dynamic Weighting Strategies

Implement and compare different weighting schemes:

1. **Performance-based**: Weight by cross-validation accuracy
2. **Confidence-based**: Weight by average prediction confidence
3. **Inverse error**: Weight inversely to error rate
4. **Learned weights**: Use optimization (grid search, Bayesian opt)
5. **Adaptive weights**: Different weights for different regions of feature space

For each strategy:
- Implement weighting calculation
- Apply to voting ensemble
- Evaluate on test set
- Compare with equal weights

Determine which weighting strategy works best and why.

In [None]:
# Your code here


### Exercise 3: Voting Ensemble Complexity Analysis

Analyze how ensemble size affects performance:

1. Create pool of 10 diverse models
2. For ensemble sizes from 3 to 10:
   - Try multiple random combinations
   - Measure accuracy, AUC, training time
   - Calculate prediction diversity
3. Plot:
   - Performance vs ensemble size
   - Training time vs ensemble size
   - Diminishing returns curve
4. Find optimal number of models
5. Test hypothesis: "More models ‚Üí always better?"

In [None]:
# Your code here


### Exercise 4: Production Deployment Simulation

Compare voting vs stacking for production deployment:

1. Train both voting and stacking ensembles
2. Measure production metrics:
   - **Latency**: Single prediction time
   - **Throughput**: Predictions per second
   - **Memory**: Model size and runtime memory
   - **Maintenance**: Complexity score (subjective)
3. Simulate production scenarios:
   - High-throughput (batch predictions)
   - Low-latency (real-time predictions)
   - Resource-constrained (limited memory/CPU)
4. Create deployment recommendation matrix
5. Determine when to use each approach

In [None]:
# Your code here


## Summary

### Key Concepts

1. **Voting = Simple Ensemble Combination**:
   - Train models independently
   - Combine predictions by voting or averaging
   - No learning in combination step
   - Simple, interpretable, effective

2. **Types of Voting**:
   - **Hard Voting**: Majority vote on predicted classes
   - **Soft Voting**: Average predicted probabilities (usually better)
   - **Weighted Voting**: Assign different weights to models
   - **Averaging**: For regression tasks

3. **Soft Voting Advantages**:
   - Uses full probability information
   - Accounts for model confidence
   - Typically outperforms hard voting
   - Smoother decision boundaries

4. **Weighting Strategies**:
   - Equal weights: Simple baseline
   - Performance-based: Weight by accuracy
   - Optimized: Grid search for best weights
   - Typically provides 0.5-2% improvement

5. **Voting vs Stacking**:
   - Voting: Simpler, faster, more interpretable
   - Stacking: More powerful, learned combination
   - Voting: Lower overfitting risk
   - Stacking: Higher performance ceiling

### Best Practices

1. **Model Selection**:
   - Use 3-7 diverse models
   - Different algorithm families
   - Check prediction correlation (lower = better)
   - Balance accuracy and diversity

2. **Voting Type**:
   - Prefer soft voting over hard voting
   - Ensure all models support `predict_proba()`
   - Use hard voting only if necessary

3. **Weighting**:
   - Start with equal weights
   - Try performance-based weights
   - Use grid search if critical
   - Validate on separate data

4. **When to Use Voting**:
   - ‚úÖ Need simple, interpretable ensemble
   - ‚úÖ Production deployment (easier)
   - ‚úÖ Limited training data
   - ‚úÖ Models already well-tuned
   - ‚úÖ Want robustness

5. **When to Use Stacking Instead**:
   - Need maximum accuracy
   - Sufficient training data
   - Can handle complexity
   - Models have complex interactions

### Common Mistakes

‚ùå **Using highly similar models**
  ‚Üí Ensure diversity!

‚ùå **Too many models**
  ‚Üí 3-7 is optimal, diminishing returns after

‚ùå **Not using soft voting**
  ‚Üí Soft almost always better than hard

‚ùå **Over-optimizing weights**
  ‚Üí Can lead to overfitting, validate carefully

‚ùå **Ignoring base model quality**
  ‚Üí Garbage in, garbage out - tune base models first

### Performance Expectations

Typical improvements from voting:
- **vs best base model**: +0.5% to +2% accuracy
- **Hard vs soft voting**: +0.2% to +1%
- **Equal vs optimal weights**: +0.3% to +1.5%
- **Voting vs stacking**: Stacking typically +0.2% to +1% better

### Advantages of Voting

1. **Simplicity**: Easy to understand and implement
2. **Parallelization**: Train all models independently
3. **Interpretability**: Clear how decision is made
4. **Robustness**: Reduces overfitting
5. **Production-friendly**: Easy to deploy
6. **No hyperparameters**: (except weights)
7. **Low overfitting risk**: No learning in combination

### Disadvantages of Voting

1. **Fixed combination**: Can't learn optimal weighting
2. **Less powerful**: Stacking often better
3. **Assumes independence**: Doesn't model correlations
4. **Limited flexibility**: Can't capture interactions

### What's Next?

In **Module 10: Model Comparison and Selection**, we'll:
- Benchmark ALL ensemble methods systematically
- Compare single trees, bagging, boosting, stacking, voting
- Analyze trade-offs: speed vs accuracy
- Create decision framework for method selection
- Production deployment considerations

### Additional Resources

- **sklearn**: [Voting Classifier Documentation](https://scikit-learn.org/stable/modules/ensemble.html#voting-classifier)
- **sklearn**: [Voting Regressor Documentation](https://scikit-learn.org/stable/modules/ensemble.html#voting-regressor)
- **Paper**: "A Comparison of Voting and Meta-Learning for Combining Classifiers"
- **Tutorial**: [Ensemble Learning Methods](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205)
- **Book**: "Pattern Recognition and Machine Learning" by Bishop (Chapter on Combining Models)