# Part 6.2: Advanced Topics - Advanced Ensemble Methods

We have already seen two powerful ensemble techniques: **Bagging** (in Random Forests) and **Boosting** (in Gradient Boosting). This notebook covers two more advanced methods: Voting and Stacking.

In [1]:
from sklearn.ensemble import VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Voting Classifier
A Voting Classifier takes predictions from multiple models and outputs a final prediction based on a voting system.
- **Hard Voting**: The predicted class is the one that receives the most votes (the mode).
- **Soft Voting**: It averages the predicted probabilities for each class and picks the class with the highest average probability. This is often better if your classifiers are well-calibrated.

In [2]:
clf1 = LogisticRegression(random_state=42)
clf2 = DecisionTreeClassifier(max_depth=4, random_state=42)
clf3 = SVC(probability=True, random_state=42) # probability=True is needed for soft voting

voting_clf = VotingClassifier(
    estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
    voting='soft' # Use 'hard' for majority vote
)
voting_clf.fit(X_train, y_train)

print(f"Voting Classifier Accuracy: {voting_clf.score(X_test, y_test):.4f}")

Voting Classifier Accuracy: 0.9100


### Stacking Classifier
Stacking is a more sophisticated ensemble method. It uses a final model (a **meta-model** or **final estimator**) to learn how to best combine the predictions of several base models.

The process:
1.  The base models are trained on the full training set.
2.  A new training set is created where the features are the predictions made by the base models.
3.  The meta-model is trained on this new dataset.

In [3]:
estimators = [
    ('dt', DecisionTreeClassifier(max_depth=4, random_state=42)),
    ('svc', SVC(probability=True, random_state=42))
]

# The final estimator is trained on the predictions of the base models
final_estimator = LogisticRegression()

stacking_clf = StackingClassifier(
    estimators=estimators, 
    final_estimator=final_estimator, 
    cv=5 # Use cross-validation to generate the predictions for the meta-model
)
stacking_clf.fit(X_train, y_train)

print(f"Stacking Classifier Accuracy: {stacking_clf.score(X_test, y_test):.4f}")

Stacking Classifier Accuracy: 0.9550
