# Blending in Machine Learning |[Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2044%20Stacking%20and%20Blending)

Blending is an ensemble technique that combines predictions from multiple base models by using a hold-out set to train a meta-model. Unlike stacking, which typically relies on cross-validation to generate out-of-fold predictions, blending uses a dedicated subset of the training data to create meta-features.

## 1. Overview

- **Objective:** Improve predictive performance by combining multiple models.
- **Process:**  
  1. Split the dataset into two parts: a training subset (T) and a hold-out (blending) set (H).
  2. Train several base models on the training subset (T).
  3. Use these trained models to predict on the hold-out set (H) to generate meta-features.
  4. Train a meta-model using these meta-features and the corresponding true labels from H.
  5. For new data, base models predict first, and their outputs are fed into the meta-model to produce the final prediction.

## 2. Key Formulas

Let ( f_1, f_2, \dots, f_K ) be the base models. For a given input ( x ):
- **Base Predictions:**  
$$
  \hat{y}_k = f_k(x) \quad \text{for } k = 1, 2, \dots, K
$$
- **Meta-Model Prediction:**  
  The meta-model \( g \) is trained on the predictions from the hold-out set \( H \):
$$
  \hat{y} = g\left(\hat{y}_1, \hat{y}_2, \dots, \hat{y}_K\right)
$$
  where $$ (\hat{y}_k) $$ are the outputs obtained from the base models on \(H\).

## 3. Blending Algorithm Steps

1. **Data Split:**
   - Divide your training data into a training set \( T \) and a hold-out set \( H \).
2. **Train Base Models:**
   - Train each base model \( f_k \) on the training set \( T \).
3. **Generate Meta-Features:**
   - For each model \( f_k \), predict the outcomes on the hold-out set \( H \) to create a new feature matrix:
     $$
     X_{\text{meta}} = \left[ f_1(X_H), f_2(X_H), \dots, f_K(X_H) \right]
     $$
4. **Train Meta-Model:**
   - Use \( X_{\text{meta}} \) along with the true labels \( y_H \) from \( H \) to train the meta-model \( g \).
5. **Final Prediction:**
   - For new data \( x_{\text{new}} \), obtain base predictions \( f_k(x_{\text{new}}) \) and feed them into \( g \) to get the final output:
     $$
     \hat{y}_{\text{new}} = g\left( f_1(x_{\text{new}}), \dots, f_K(x_{\text{new}}) \right)
     $$


## 4. Python Example: Blending

Below is a Python example that demonstrates blending using scikit-learn. In this example, we use the Iris dataset, train three base models, and then blend their predictions with a meta-model.

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Step 1: Split the dataset
# Split into main training+validation and test sets
X_main, X_test, y_main, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Further split the main set into training set (T) and hold-out blending set (H)
X_train, X_holdout, y_train, y_holdout = train_test_split(X_main, y_main, test_size=0.25, random_state=42)

# Step 2: Train Base Models on T
model_lr = LogisticRegression(max_iter=200).fit(X_train, y_train)
model_dt = DecisionTreeClassifier().fit(X_train, y_train)
model_svc = SVC(probability=True).fit(X_train, y_train)

# Step 3: Generate Meta-Features on H
# Obtain probability predictions for each model on the hold-out set
pred_lr = model_lr.predict_proba(X_holdout)
pred_dt = model_dt.predict_proba(X_holdout)
pred_svc = model_svc.predict_proba(X_holdout)

# Concatenate predictions to form meta-features
X_meta = np.hstack([pred_lr, pred_dt, pred_svc])

# Step 4: Train Meta-Model on the hold-out set's meta-features
meta_model = LogisticRegression(max_iter=200).fit(X_meta, y_holdout)

# Step 5: Final Prediction on Test Set
# Get base model predictions on test data
test_pred_lr = model_lr.predict_proba(X_test)
test_pred_dt = model_dt.predict_proba(X_test)
test_pred_svc = model_svc.predict_proba(X_test)
X_test_meta = np.hstack([test_pred_lr, test_pred_dt, test_pred_svc])

# Meta-model produces the final prediction
y_pred = meta_model.predict(X_test_meta)

# Evaluate blending performance
accuracy = accuracy_score(y_test, y_pred)
print("Blending Accuracy:", accuracy)
```

## 5. Advantages and Limitations of Blending

### Advantages
- **Simplicity:** Easier to implement since it only requires a single split.
- **Speed:** Generally faster than stacking due to the absence of cross-validation iterations.

### Limitations
- **Hold-Out Dependency:** The performance can be sensitive to how the hold-out set is chosen.
- **Potential Overfitting:** If the hold-out set is small or not representative, the meta-model may overfit.
- **Less Robust:** Typically, blending might be less robust than stacking if the hold-out set does not capture the full data distribution.

## 6. Conclusion

Blending is a practical ensemble method that leverages a simple train-holdout split to combine the strengths of various base models. While it is easier and faster to implement than stacking, careful attention must be paid to the hold-out set to ensure robust performance. Experiment with blending on your datasets to see if it provides a performance boost for your specific problem.