# Voting, stacking and blending

Ensembling techniques like boosting, stacking, and blending are powerful strategies for combining the predictions of multiple base models to improve overall model performance. 

```{image} https://miro.medium.com/v2/resize:fit:1200/1*tWdplltkOLK8gY46F-1GiQ.jpeg
:alt: stacking
:class: bg-primary mb-1
:width: 500px
:align: center
```

1. <h4>Voting:</h4>

    A voting ensemble method is a way of making predictions in machine learning by combining the results from multiple different models. Imagine you have a group of friends and you want to decide on a movie to watch. Each friend has their own preferences and suggestions. In a voting ensemble, each "friend" (or model) gets a vote on what movie to watch, and the movie with the most votes is chosen.
    
* Strengths: Voting method is easy to implement. They're more robust because they consider multiple models, reducing the impact of mistakes made by single models, it results as better accuracy. 
    
* Weaknesses: It may not capture complex interactions between base models as effectively as stacking. The choice of the  combining rule is critical.


* The output of regression tasks is the average of the models’ predictions. Instead, there are two methods for predicting the outcome of classification tasks: **Hard voting** and **Soft voting**.<br>
     * **Hard Voting**: Each model in the ensemble gives a prediction, and the final prediction is determined by a majority vote.
     * **Soft Voting**: The final prediction is based on the average probability of outcomes from different models.

```{image} https://www.mdpi.com/applsci/applsci-12-07554/article_deploy/html/images/applsci-12-07554-g003-550.jpg
:alt: voting_types
:class: bg-primary mb-1
:width: 500px
:align: center
```

2. <h4>Stacking (Stacked Generalization):</h4>

* Basic Idea: Stacking is an ensemble method that combines the predictions of multiple base models by training a meta-model (also called a meta-learner) on top of them. The meta-model learns to weigh the predictions of the base models. Here's how it works:

    * First, you need to train your base models on the original data. These models are considered weak learners as they don't necessarily have to provide the most accurate predictions.
    * Next, you need to prepare a new dataset where each row is a base model's prediction.

    * Finally, you train your meta-model on this new dataset. The meta-model learns to weigh the predictions of the base models to make better predictions on unseen data

Here's image that illustrates it:

```{image} https://cdn.analyticsvidhya.com/wp-content/uploads/2021/03/Screenshot-from-2021-03-30-15-07-08.png
:alt: stacking
:class: bg-primary mb-1
:width: 500px
:align: center
```

Stacking offers several advantages over traditional ensemble methods:

  * It can capture complex relationships between base models and often leads to improved performance.
  * It is flexible and can accommodate various base models.

However, 
  * It may require more computational resources and tuning compared to simpler ensemble methods.
  * The final predictions might be harder to interpret because of the complexity resulting from combining multiple models and using a meta-learner..

````{admonition} Question
:class: important
Which type of model is typically used as a meta-model in stacking?

```{admonition} Answer
:class: tip, dropdown
Logistic regression is usually used as meta-learner.
```
````

3. <h4>Blending:</h4>

* Basic Idea: Blending is a simplified version of stacking that combines the predictions of base models without the need for a separate meta-model. Instead, blending typically uses a simple rule, such as averaging, to combine the predictions. Blending implements **“one-holdout set”**, that is, a small portion of the training data (*validation*) to make predictions which will be “*stacked*” to form the training data of the meta-model.

* Training Process: Blending involves training the base models on the original data and then combining their predictions using a predefined rule.
* Strengths: Blending is easy to implement and can yield improvements by leveraging the diversity of base models.
* Weaknesses: It may not capture complex interactions between base models as effectively as stacking. The choice of the combining rule is critical.


```{image} https://dataaspirant.com/wp-content/uploads/2023/03/1-11.png
:alt: blending
:class: bg-primary mb-1
:width: 500px
:align: center
```

````{admonition} Question
:class: important
What is model blending in the context of machine learning, and how does it differ from model stacking?

```{admonition} Answer
:class: tip, dropdown
Blending is technique derived from stacking and this method involves combining predictions from multiple models using a simple rule or weighted average. It is simpler and often serves as a quick way to combine diverse models.
```
````

*Blenging can use meta-model or just apply simple rule like averaging. In further example, we used simple averaging rule.* 

4. <h4>Ensemble Characteristics:</h4>

* All three ensembling techniques aim to reduce bias and variance, leading to better generalization performance.
* The success of these methods often depends on the diversity of the base models. More diverse models tend to result in better ensembles.
* Careful hyperparameter tuning and cross-validation are crucial for optimizing ensemble performance.
* In practice, the choice between voting, stacking, and blending depends on the specific problem, dataset, and computational resources available. Each method has its own strengths and weaknesses, and the selection should be guided by the problem's requirements and constraints. Experimentation and testing different ensemble approaches are often necessary to determine the most effective strategy for a particular machine learning task.

### Python implementation

The code example that demonstrates how to implement voting, stacking, and blending using scikit-learn for a classification problem on Employee dataset. In this example, we'll use three different base models and ensemble them using these techniques.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import VotingClassifier

# Read and preprocess Employee dataset
df = pd.read_csv("../ISLP_datsets/Employee.csv")

en = LabelEncoder()
to_encode = ['Education', 'City', 'Gender', 'EverBenched']

for col in to_encode:
    df[col] = en.fit_transform(df[col])
    
    
X = df.drop(columns={'LeaveOrNot'})
y = df.LeaveOrNot
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Initialize base models
model1 = AdaBoostClassifier(n_estimators=50, random_state=42)
model2 = RandomForestClassifier(n_estimators=100, random_state=42)
model3 = LogisticRegression(random_state=42)

# Train models
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
model3.fit(X_train, y_train)

# Predict
pred1 = model1.predict(X_test)
pred2 = model2.predict(X_test)
pred3 = model3.predict(X_test)

# Voting: soft voting
voting_model = VotingClassifier(
    estimators=[('ada', model1), ('rfc', model2), ('lr', model3)],
    voting='soft'
)

voting_model.fit(X_train, y_train)

ensemble_pred_voting = voting_model.predict(X_test)


# Stacking
stacking_X_train = np.column_stack((pred1, pred2, pred3))
stacking_model = LogisticRegression(random_state=42)
stacking_model.fit(stacking_X_train, y_test)

stacking_X_test = np.column_stack((model1.predict(X_test), model2.predict(X_test), model3.predict(X_test)))
ensemble_pred_stacking = stacking_model.predict(stacking_X_test)

# Blending

# Split the training data further into two parts for blending
X_train_base, X_train_blend, y_train_base, y_train_blend = train_test_split(
    X_train, y_train, test_size=0.5, random_state=42)

model1.fit(X_train_base, y_train_base)
model2.fit(X_train_base, y_train_base)
model3.fit(X_train_base, y_train_base)

# Make predictions using base models on the blending set
pred1_blend = model1.predict(X_train_blend)
pred2_blend = model2.predict(X_train_blend)
pred3_blend = model3.predict(X_train_blend)

# Combine predictions of base models on the blending set
blending_X_train = np.column_stack((pred1_blend, pred2_blend, pred3_blend))

blending_model = LogisticRegression(random_state=42)
blending_model.fit(blending_X_train, y_train_blend)

pred1_test = model1.predict(X_test)
pred2_test = model2.predict(X_test)
pred3_test = model3.predict(X_test)

blending_X_test = np.column_stack((pred1_test, pred2_test, pred3_test))
ensemble_pred_blending = blending_model.predict(blending_X_test)
blending_accuracy = accuracy_score(y_test, ensemble_pred_blending)

# Evaluate individual models and ensembles
individual_model1_accuracy = accuracy_score(y_test, pred1)
individual_model2_accuracy = accuracy_score(y_test, pred2)
individual_model3_accuracy = accuracy_score(y_test, pred3)
voting_accuracy = accuracy_score(y_test, ensemble_pred_voting)
stacking_accuracy = accuracy_score(y_test, ensemble_pred_stacking)
blending_accuracy = accuracy_score(y_test, ensemble_pred_blending)

print("Individual Model 1 Accuracy:", individual_model1_accuracy)
print("Individual Model 2 Accuracy:", individual_model2_accuracy)
print("Individual Model 3 Accuracy:", individual_model3_accuracy)
print("Voting Accuracy:", voting_accuracy)
print("Stacking Accuracy:", stacking_accuracy)
print("Blending Accuracy:", blending_accuracy)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Individual Model 1 Accuracy: 0.8141783029001074
Individual Model 2 Accuracy: 0.849624060150376
Individual Model 3 Accuracy: 0.7142857142857143
Voting Accuracy: 0.8528464017185822
Stacking Accuracy: 0.849624060150376
Blending Accuracy: 0.8506981740064447


Conclusion  
   *  Model 2 performs the best individually.
   *  Combining models through voting, stacking, or blending slightly improves accuracy compared to individual models.
   * These ensemble techniques show similar performance, with a slight boost compared to the best individual model (Model 2).

In this code:

* We load the Employee dataset as an example classification dataset and split it into training and testing sets.
* We create three different base models: AdaBoostClassifier, RandomForestClassifier, and LogisticRegression.
* Each base model is trained on the training data, and predictions are made on the test data.
* We demonstrate three ensemble methods:
    * Voting: We aggregate the predictions of the three base models using a soft voting approach for the ensemble.
    * Stacking: We use LogisticRegression as a meta-model to learn to combine the predictions of the base models.
    * Blending: We employ Logistic Regression to learn and combine predictions from the three base models on hold-out set for the ensemble.

Finally, we evaluate the accuracy of the individual models and the ensembles. Note that this is a simplified example, and in practice, you may need to fine-tune hyperparameters and use more diverse base models to achieve optimal performance.
