# üß† Stacking (Stacked Generalization)
---

## üìå Definition
Stacking is an **ensemble learning technique** where:
- Multiple base models are trained
- Their predictions are used as features
- A meta-model learns how to combine them

Also called:
- Stacked Generalization
- Level-0 ‚Üí Level-1 learning

---

## üî• Why Stacking Exists
Simple ensembles like:
- Voting
- Averaging

Assume all models are equally useful.

Stacking instead:
> Learns optimal combination automatically

---

## üÜö Voting vs Stacking

### Voting
- Mean (regression)
- Majority vote (classification)
- No learning

### Stacking
- Trains another model
- Learns model relationships
- Often higher accuracy

---

## üß± Architecture

```
Input Features
   ‚Üì
Base Models (Level-0)
   ‚Üì
Predictions as Features
   ‚Üì
Meta Model (Level-1)
   ‚Üì
Final Prediction
```

---

## üß™ Example Dataset
Features:
- CGPA
- IQ

Target:
- Salary

---

## ‚öôÔ∏è Basic Workflow

### Step 1 ‚Äî Train Base Models
Examples:
- Linear Regression
- Random Forest
- Decision Tree
- KNN

---

### Step 2 ‚Äî Generate Meta Dataset

For each sample:

| LR | RF | DT | Actual |
|----|----|----|--------|

This becomes training data for meta-model.

---

### Step 3 ‚Äî Train Meta Model
Common choices:
- Logistic Regression
- Linear Regression
- Random Forest

Meta-model learns:
- Which model to trust
- When to trust it

---

### Step 4 ‚Äî Prediction
For new input:
1. Get base predictions
2. Feed into meta-model
3. Output final prediction

---

## ‚ö†Ô∏è Core Problem ‚Äî Data Leakage

If:
- Base models trained on full data
- Meta-model trained on same predictions

Then:
‚ùå Overfitting
‚ùå Unrealistic performance

---

## üõ†Ô∏è Solutions to Leakage

### 1Ô∏è‚É£ Holdout Stacking (Blending)
Split data:

```
Train ‚Üí Base models
Validation ‚Üí Meta model
```

Simple but wastes data.

---

### 2Ô∏è‚É£ K-Fold Stacking ‚≠ê (Industry Standard)

Steps:
1. Split training data into K folds
2. For each fold:
   - Train on K-1 folds
   - Predict on remaining fold
3. Combine predictions
4. Train meta-model on combined predictions

This ensures:
‚úî No leakage  
‚úî Full data usage  

---

## üßÆ Training Cost

If:
- M base models
- K folds

Total base trainings:
```
M √ó K
```

Example:
- 3 models √ó 5 folds = 15 trainings

---

## üîÅ Retraining Step (Often Missed)

After meta-model training:

You must:
- Retrain base models on full training data

Why?
Because inference should use maximum data.

---

## üß† Blending vs Stacking

| Feature | Blending | Stacking |
|--------|----------|---------|
| Split | Holdout | K-fold |
| Data efficiency | Low | High |
| Complexity | Low | Higher |
| Industry usage | Medium | High |

---

## üß© Differences from Bagging & Boosting

| Method | Idea |
|--------|------|
| Bagging | Parallel resampling |
| Boosting | Sequential correction |
| Stacking | Meta learning |

Key difference:
Stacking learns combination explicitly.

---

## üß¨ Advanced Stacking

### Multi-layer Stacking
```
Layer 1 ‚Üí Many models
Layer 2 ‚Üí Fewer models
Layer 3 ‚Üí Meta model
```

Used in:
- Kaggle winners
- AutoML

---

### Real Competition Example
Winning pipelines often use:
- 50+ base models
- Multiple stacking layers
- Neural net meta-models

---

## üß™ sklearn Implementation

### Basic Example

```python
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

base_models = [
    ("rf", RandomForestClassifier()),
    ("svm", SVC(probability=True))
]

meta_model = LogisticRegression()

stack = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model,
    cv=5
)

stack.fit(X_train, y_train)
```

---

## üîß Important Parameters

### `cv`
- Number of folds
- Prevents leakage

---

### `passthrough=True`
Adds original features to meta model.

Without:
Meta uses only predictions.

With:
Meta uses:
- Predictions
- Original features

---

### Meta-model choice
Best options:
- Linear models (stable)
- Logistic regression
- LightGBM (advanced)

Avoid:
- Overly complex meta-models initially

---

## üìà When Stacking Works Best

‚úî Diverse base models  
‚úî Medium datasets  
‚úî Competition settings  
‚úî High accuracy needed  

---

## ‚ùå When Stacking Fails

- Small datasets
- Highly correlated models
- Poor CV design
- Leakage mistakes

---

## üß† Best Practices

‚úî Use diverse algorithms  
‚úî Normalize probabilities  
‚úî Use stratified folds  
‚úî Keep meta-model simple first  
‚úî Tune later  

---

## üö´ Common Mistakes

‚ùå Training meta-model on same data  
‚ùå Using identical base models  
‚ùå Ignoring probability calibration  
‚ùå No cross-validation  

---

## ‚ö° Performance Tips

- Use out-of-fold predictions
- Blend with weighted averaging
- Use stacking + blending hybrid
- Use calibration (Platt scaling)

---

## üìä Real-World Use Cases

- Kaggle competitions
- Credit scoring
- Medical prediction
- Ad ranking
- Fraud detection

---

## üéØ Interview Questions

**Q: What is stacking?**  
Ensemble with meta-learner.

**Q: Why better than voting?**  
Learns optimal combination.

**Q: Biggest risk?**  
Data leakage.

**Q: Solution?**  
K-fold stacking.

**Q: Blending vs stacking?**  
Holdout vs CV.

---


In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv('breast_cancer.csv')
df.sample(10)

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,569,30,malignant,benign
12.3,15.9,78.83,463.7,0.0808,0.07253,0.03844,0.01654,0.1667,0.05474,0.2382,0.8355,1.687,18.32,0.005996,0.02212,0.02117,0.006433,0.02025,0.001725,13.35,19.59,86.65,546.7,0.1096,0.165,0.1423,0.04815,0.2482,0.06306,1
14.97,19.76,95.5,690.2,0.08421,0.05352,0.01947,0.01939,0.1515,0.05266,0.184,1.065,1.286,16.64,0.003634,0.007983,0.008268,0.006432,0.01924,0.00152,15.98,25.82,102.3,782.1,0.1045,0.09995,0.0775,0.05754,0.2646,0.06085,1
13.21,25.25,84.1,537.9,0.08791,0.05205,0.02772,0.02068,0.1619,0.05584,0.2084,1.35,1.314,17.58,0.005768,0.008082,0.0151,0.006451,0.01347,0.001828,14.35,34.23,91.29,632.9,0.1289,0.1063,0.139,0.06005,0.2444,0.06788,1
18.81,19.98,120.9,1102.0,0.08923,0.05884,0.0802,0.05843,0.155,0.04996,0.3283,0.828,2.363,36.74,0.007571,0.01114,0.02623,0.01463,0.0193,0.001676,19.96,24.3,129.0,1236.0,0.1243,0.116,0.221,0.1294,0.2567,0.05737,0
11.45,20.97,73.81,401.5,0.1102,0.09362,0.04591,0.02233,0.1842,0.07005,0.3251,2.174,2.077,24.62,0.01037,0.01706,0.02586,0.007506,0.01816,0.003976,13.11,32.16,84.53,525.1,0.1557,0.1676,0.1755,0.06127,0.2762,0.08851,1
13.66,15.15,88.27,580.6,0.08268,0.07548,0.04249,0.02471,0.1792,0.05897,0.1402,0.5417,1.101,11.35,0.005212,0.02984,0.02443,0.008356,0.01818,0.004868,14.54,19.64,97.96,657.0,0.1275,0.3104,0.2569,0.1054,0.3387,0.09638,1
11.54,14.44,74.65,402.9,0.09984,0.112,0.06737,0.02594,0.1818,0.06782,0.2784,1.768,1.628,20.86,0.01215,0.04112,0.05553,0.01494,0.0184,0.005512,12.26,19.68,78.78,457.8,0.1345,0.2118,0.1797,0.06918,0.2329,0.08134,1
9.029,17.33,58.79,250.5,0.1066,0.1413,0.313,0.04375,0.2111,0.08046,0.3274,1.194,1.885,17.67,0.009549,0.08606,0.3038,0.03322,0.04197,0.009559,10.31,22.65,65.5,324.7,0.1482,0.4365,1.252,0.175,0.4228,0.1175,1
8.671,14.45,54.42,227.2,0.09138,0.04276,0.0,0.0,0.1722,0.06724,0.2204,0.7873,1.435,11.36,0.009172,0.008007,0.0,0.0,0.02711,0.003399,9.262,17.04,58.36,259.2,0.1162,0.07057,0.0,0.0,0.2592,0.07848,1
15.61,19.38,100.0,758.6,0.0784,0.05616,0.04209,0.02847,0.1547,0.05443,0.2298,0.9988,1.534,22.18,0.002826,0.009105,0.01311,0.005174,0.01013,0.001345,17.91,31.67,115.9,988.6,0.1084,0.1807,0.226,0.08568,0.2683,0.06829,0


In [None]:
X = df.drop(columns=['benign'])
y = df['benign']

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=8)
print(X_train.shape)

(455, 3)


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier

In [None]:
estimators = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=10)),
    ('gbdt',GradientBoostingClassifier())
]

In [None]:
from sklearn.ensemble import StackingClassifier

clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    cv=10
)

In [None]:
clf.fit(X_train, y_train)

In [None]:
y_pred = clf.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.956140350877193