# Stacking in Machine Learning |[Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2044%20Stacking%20and%20Blending)

Stacking (or stacked generalization) is an ensemble learning technique that combines multiple base models (level-0 models) to improve predictive performance by training a meta-model (level-1 model) on the outputs (predictions) of these base models.

## 1. Overview

- **Goal:** To leverage the strengths of different models to achieve better performance than any single model.
- **Process:** Train several base learners on the training data and then use their predictions as features to train a meta-learner.
- **Applications:** Used in both classification and regression problems.

## 2. How It Works

### Base Models (Level-0)

Assume you have \(K\) base models:
$$
( f_1(x), f_2(x), dots, f_K(x) )
$$
For a given input \( x \), each base model produces a prediction:
$$
\hat{y}_k = f_k(x) \quad \text{for } k = 1, 2, \dots, K
$$

### Meta-Model (Level-1)

The meta-model \( g \) takes the base models' predictions as input to produce the final prediction:
$$
\hat{y} = g\left(\hat{y}_1, \hat{y}_2, \dots, \hat{y}_K\right)
$$

In practice, the training of the meta-model is often performed using cross-validation:
1. **Split the training data:** Use \( k \)-fold cross-validation to obtain out-of-fold predictions for each base model.
2. **Train meta-model:** Use these out-of-fold predictions as features (often called the “level-1” data) along with the true target values to train the meta-learner.

## 3. Algorithm Steps

1. **Divide the Data:** Split your dataset into training and validation sets.
2. **Train Base Models:** For each base model $$( f_k ): $$
   - Use cross-validation to generate predictions on the validation set.
   - Collect these predictions to form a new dataset.
3. **Form Meta-Features:** Construct a new feature matrix where each column corresponds to the predictions from a base model.
4. **Train the Meta-Model:** Train a model \( g \) on the new dataset (meta-features) with the true target values.
5. **Make Final Predictions:** For a new input \( x \):
   - Obtain base predictions $$ ( \hat{y}_1, \hat{y}_2, \dots, \hat{y}_K ). $$
   - Pass these to the meta-model ( g \) to obtain the final prediction \( \hat{y} \).

## 4. Python Implementation Example

Below is an example using scikit-learn's `StackingClassifier` for a classification task on the Iris dataset.

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base learners (level-0 models)
base_learners = [
    ('lr', LogisticRegression(max_iter=200)),
    ('dt', DecisionTreeClassifier()),
    ('svc', SVC(probability=True))
]

# Define meta-learner (level-1 model)
meta_learner = LogisticRegression()

# Create the stacking classifier
stacking_clf = StackingClassifier(
    estimators=base_learners,
    final_estimator=meta_learner,
    cv=5  # number of folds for generating out-of-fold predictions
)

# Train the stacking classifier
stacking_clf.fit(X_train, y_train)

# Predict on the test set
y_pred = stacking_clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
```

## 5. Advantages and Limitations

### Advantages
- **Improved Performance:** Combines strengths of diverse models to often outperform single models.
- **Flexibility:** Can incorporate a wide range of model types.
- **Robustness:** Reduces the risk of overfitting by balancing the biases of individual models.

### Limitations
- **Complexity:** More computationally intensive due to multiple training phases.
- **Interpretability:** Harder to interpret than a single model.
- **Risk of Overfitting:** If not properly validated, the meta-model might overfit the base predictions.

## 6. Variations and Extensions

- **Blending:** A variant of stacking where a hold-out set is used instead of cross-validation.
- **Multi-level Stacking:** Stacking can be extended to more than two levels, although this increases complexity.