In [None]:
'''
1)Core Idea :
Instead of just voting, we train another model to learn how to combine models.

So it‚Äôs like:
Level 1 ‚Üí Multiple different models
Level 2 ‚Üí One meta-model that learns from their outputs


2)Why Stacking is Powerful ?
Because:
Some models are good in linear regions
Some are good in non-linear regions
Some overfit
Some underfit
Stacking learns where each model performs well.


3)When To Use What?
Dataset small ‚Üí Voting
Dataset medium/complex ‚Üí Stacking
High variance models ‚Üí Bagging
High bias models ‚Üí Boosting

Q)
1)what is data leakage?
2)Why do we use cross-validation in stacking?
'''

In [2]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split ,KFold
from sklearn.metrics import accuracy_score

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

In [3]:
data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [None]:
'''
üß† Why We Need Cross-Validation in Stacking?
If you train base models on full training data and use their predictions to train meta-model on the same data ‚Üí data leakage.

So we:
Use K-Fold
Generate out-of-fold predictions
Train meta-model on those
This is how real stacking works.
'''

In [5]:
base_models = [
    LogisticRegression(max_iter = 5000),
    DecisionTreeClassifier(),
    SVC(probability = True)
]

kf = KFold(n_splits = 5 ,shuffle = True ,random_state = 42)
n_train ,n_test = X_train.shape[0] ,X_test.shape[0]
n_models = len(base_models)

meta_train = np.zeros((n_train ,n_models))
meta_test = np.zeros((n_test ,n_models))

In [9]:
for i ,model in enumerate(base_models):
    meta_test_fold = np.zeros((n_test ,5)) #5 Splits

    for j ,(train_idx ,val_idx) in enumerate(kf.split(X_train)):
        X_tr ,X_val = X_train[train_idx] ,X_train[val_idx]
        y_tr ,y_val = y_train[train_idx] ,y_train[val_idx]

        model.fit(X_tr ,y_tr)

        meta_train[val_idx ,i] = model.predict(X_val)

        meta_test_fold[: ,j] = model.predict(X_test)

    meta_test[: ,i] = meta_test_fold.mean(axis = 1)

In [10]:
meta_model = LogisticRegression(max_iter = 5000)
meta_model.fit(meta_train ,y_train)

final_predictions = meta_model.predict(meta_test)
print("Stacking Accuracy:",
      accuracy_score(y_test ,final_predictions))

Stacking Accuracy: 0.956140350877193


In [11]:
for model in base_models:
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    print(type(model).__name__,
          "Accuracy:",
          accuracy_score(y_test, preds))

LogisticRegression Accuracy: 0.956140350877193
DecisionTreeClassifier Accuracy: 0.9473684210526315
SVC Accuracy: 0.9473684210526315


In [None]:
'''
üß† Step 1: What Is K-Fold?
Imagine you have 100 samples.
Instead of training once, we split data into parts.
If we use 5-Fold, we divide data into 5 equal parts:

Part 1
Part 2
Part 3
Part 4
Part 5

Each part has 20 samples.
üîÅ How K-Fold Works
We train 5 times:

Round 1
Train on: 2,3,4,5
Test on: 1

Round 2
Train on: 1,3,4,5
Test on: 2

Round 3
Train on: 1,2,4,5
Test on: 3

‚Ä¶and so on.

Every part gets a chance to be the "test set".
That‚Äôs K-Fold cross-validation.

üéØ Why Do We Do This?
Because training once might depend too much on how we split data.

K-Fold:
‚úî Uses all data for training
‚úî Uses all data for testing
‚úî Gives more reliable performance

üß† Now Let‚Äôs Connect It to Stacking
Suppose we are doing stacking.

We have:
Base Model 1
Base Model 2
Meta Model

If we train base models on full training data, then predict on same data:
That prediction is too perfect.
The model already saw the answers.

That‚Äôs cheating.
üí° So What Do We Do Instead?

We use K-Fold.
Let‚Äôs say 5-fold again.

For each fold:
Train base model on 4 parts
Predict on the 1 part it didn‚Äôt see
Now for every sample, we get a prediction from a model that never trained on it.

That prediction is called:
üëâ Out-of-Fold (OOF) prediction

üî• Very Small Example
Data:
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5


2-Fold:
Split into:

Fold A ‚Üí 1,2,3
Fold B ‚Üí 4,5

First round:
Train on B ‚Üí Predict on A

Second round:
Train on A ‚Üí Predict on B

Now:
Every sample has a prediction
From a model that didn‚Äôt see it.

That‚Äôs what we use to train the meta-model.

üß† Simple Analogy
Think of it like this:
If you ask your friend to solve a problem:
If they already saw the answer ‚Üí their guess is fake good
If they didn‚Äôt see the answer ‚Üí their guess is honest

Stacking needs honest guesses.
K-Fold gives honest guesses.

üöÄ Why This Matters

If you skip K-Fold:

Meta-model learns from fake-perfect predictions
‚Üí Overfitting
‚Üí Bad real-world performance

If you use K-Fold:

Meta-model learns from realistic predictions
‚Üí Better generalization
'''

In [6]:
#Inbuilt Stacking
from sklearn.ensemble import StackingClassifier

stack = StackingClassifier(
    estimators=[
        ('lr', LogisticRegression(max_iter=5000)),
        ('tree', DecisionTreeClassifier()),
        ('svm', SVC(probability=True))
    ],
    final_estimator=LogisticRegression()
)

stack.fit(X_train, y_train)
print("Stacking Accuracy:",
      accuracy_score(y_test, stack.predict(X_test)))


Stacking Accuracy: 0.956140350877193
