## Stacking

![Stacking](https://media.geeksforgeeks.org/wp-content/uploads/20190515104518/stacking.png)

![](https://miro.medium.com/v2/resize:fit:1050/1*DM1DhgvG3UCEZTF-Ev5Q-A.png)

Blog = https://medium.com/@brijesh_soni/stacking-to-improve-model-performance-a-comprehensive-guide-on-ensemble-learning-in-python-9ed53c93ce28

### Blending

![Blend](https://www.scaler.com/topics/images/blending-in-machine-learning-1.webp)

Stacking and blending are both ensemble learning techniques used to combine predictions from multiple base models to improve overall performance. While they serve a similar purpose, they have different approaches.

1. **Stacking:**
   Stacking involves training a meta-model or a blender on the predictions made by multiple base models. Here's how it typically works:

   - **Step 1: Base Models Training:** 
     - The dataset is split into training and validation sets.
     - Multiple base models (learners) are trained on the training set.
     - Each base model makes predictions on the validation set.

   - **Step 2: Meta-Model Training:**
     - The predictions from the base models along with the original features are used as inputs to train a meta-model (or a blender).
     - The meta-model learns to combine the predictions of the base models to make the final prediction.

   - **Step 3: Making Predictions:**
     - The trained base models are used to make predictions on unseen data.
     - The predictions are then fed into the trained meta-model to get the final ensemble prediction.

   Stacking allows the meta-model to learn the optimal way to combine predictions from base models. It often leads to improved performance compared to individual models.

2. **Blending:**
   Blending is a simpler version of stacking where a separate holdout dataset is used to train the meta-model. Here's how it works:

   - **Step 1: Base Models Training:** 
     - The dataset is split into a training set and a holdout set.
     - Multiple base models are trained on the training set.
     - Each base model makes predictions on the holdout set.

   - **Step 2: Meta-Model Training:**
     - The predictions from the base models on the holdout set are used as features to train a meta-model.

   - **Step 3: Making Predictions:**
     - The trained base models are used to make predictions on unseen data.
     - The predictions are then fed into the trained meta-model to get the final ensemble prediction.

   Blending is simpler and computationally less expensive than stacking since it doesn't involve cross-validation. However, it may lead to overfitting if the holdout set is too small or not representative of the overall data.

In summary, both stacking and blending aim to combine predictions from multiple models to create a more robust and accurate ensemble model. Stacking is more sophisticated as it uses cross-validation to train the meta-model, while blending is simpler and relies on a holdout dataset for meta-model training.

### Problem with stacking 


One of the main challenges with stacking is the potential for overfitting. Here are some common issues associated with stacking:

1. **Data Leakage:** When stacking is not implemented carefully, there is a risk of data leakage. This can happen if the same data used for training base models is also used for training the meta-model. Data leakage can lead to overly optimistic performance estimates and poor generalization to unseen data.

2. **Complexity:** Stacking involves training multiple base models and a meta-model, which can increase the complexity of the modeling process. Managing multiple models and tuning hyperparameters for each model and the meta-model can be time-consuming and computationally expensive.

3. **Dependency on Base Models:** The performance of a stacked model heavily depends on the quality and diversity of the base models. If the base models are similar or if they all suffer from the same weaknesses, stacking may not lead to significant improvements.

4. **Interpretability:** Stacking can make it challenging to interpret the final model. Unlike individual base models, the combined predictions from multiple models and the meta-model may not provide easily interpretable insights into the relationships between features and the target variable.

5. **Training Data Requirement:** Stacking typically requires a larger dataset to perform well, especially when using cross-validation to train base models. This can be a limitation if the dataset is small or if obtaining more data is not feasible.

To mitigate these issues, it's important to implement stacking carefully, use appropriate validation techniques to prevent data leakage, select diverse base models, and perform thorough hyperparameter tuning. Additionally, understanding the trade-offs between model complexity and performance is essential when deciding whether to use stacking or other ensemble techniques.

To overcome overfitting problem in stacking we use `Hold Out Approach (Blending) and K Fold Approach - (Stacking)`

The "Hold Out" approach, also known as blending, and the "K-Fold" approach, commonly used in stacking, are both techniques for creating ensembles of models. Here's a breakdown of each:

1. **Hold Out Approach (Blending):**
   - **Methodology:** In the hold-out approach, the dataset is typically split into three parts: a training set, a validation set (holdout set), and a test set. 
   - **Training Base Models:** The base models are trained on the training set.
   - **Generating Predictions:** After training, the base models make predictions on the validation set (holdout set), and these predictions are used as features to train the meta-model.
   - **Training Meta-Model:** The meta-model is trained using the predictions made by the base models on the validation set.
   - **Evaluating Performance:** Finally, the performance of the ensemble is evaluated on the test set.
   - **Advantages:** Simplicity is a key advantage of the hold-out approach. It's straightforward to implement and computationally efficient compared to cross-validation.
   - **Disadvantages:** However, it's susceptible to overfitting if the holdout set is small or not representative of the overall dataset.

2. **K-Fold Approach (Stacking):**
   - **Methodology:** In the K-fold approach, the dataset is divided into K folds or subsets.
   - **Training Base Models:** The base models are trained K times, each time using K-1 folds for training and the remaining fold for validation.
   - **Generating Predictions:** After training, the base models make predictions on the validation set (the fold not used for training) for each fold.
   - **Training Meta-Model:** The meta-model is trained on the predictions made by the base models across all folds.
   - **Evaluating Performance:** Performance is typically evaluated using cross-validation across all folds.
   - **Advantages:** The K-fold approach provides a more robust estimate of performance compared to the hold-out approach. It helps to reduce overfitting and variance in model performance.
   - **Disadvantages:** The K-fold approach is computationally more intensive and can be slower compared to the hold-out approach, especially with a large number of folds or when using computationally expensive models.

In summary, both blending (hold-out) and stacking (K-fold) are techniques for creating ensemble models, each with its own set of advantages and disadvantages. The choice between them depends on factors such as dataset size, computational resources, and the desired trade-off between simplicity and performance estimation accuracy.

### **Hold Out Approach (Blending):**

![](https://vitalflux.com/wp-content/uploads/2020/12/Hold-out-method-Training-Validation-Test-Dataset.png)

![](https://www.researchgate.net/publication/369110587/figure/fig5/AS:11431281125652023@1678378526692/Hold-Out-validation-technique.ppm)

### **K-Fold Approach (Stacking):**

![](https://miro.medium.com/v2/resize:fit:1400/1*0DKYwS627160j5YMu6aGXw.png)

![](https://www.researchgate.net/publication/354347252/figure/fig3/AS:1064138142932993@1630721712335/Architecture-of-the-stacking-model-Stacking-with-K-fold-cross-validation-where-the.png)

In [1]:
import pandas as pd
import numpy as np

In [2]:
df= pd.read_csv('https://raw.githubusercontent.com/campusx-official/100-days-of-machine-learning/main/day68-stacking-and-blending/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [3]:
X = df.drop(columns=['target'])
y = df['target']

In [4]:
X

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3


In [5]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=8)

In [6]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier

In [7]:
estimators = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=10)),
    ('gbdt',GradientBoostingClassifier())
]

In [8]:
from sklearn.ensemble import StackingClassifier

clf = StackingClassifier(
    estimators=estimators, 
    final_estimator=LogisticRegression(),
    cv=10
)

In [9]:
clf.fit(X_train, y_train)

In [10]:
y_pred = clf.predict(X_test)

In [11]:
from sklearn.metrics import accuracy_score

In [12]:
accuracy_score(y_test, y_pred)

0.8688524590163934