# **Stacking from scratch**

Berikan penjelasan terkait Stacking!

Source : https://www.geeksforgeeks.org/machine-learning/stacking-in-machine-learning/
Youtube : https://www.youtube.com/watch?v=a4IS1Ai7GCI

In [2]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsimport
import pandas as pd
from sklearn.model_selection import train_test_split, KFold
from sklearn.datasets import load_iris
from sklearn.base import clone
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

**Pendefinisian class stack**

Bertujuan untuk membuat kerangka kerja ensemble learning berbasis stacking dan blending secara modular dan fleksibel. Class ini memungkinkan pengguna untuk menggabungkan beberapa model dasar (base learners) dan satu model meta (final estimator) untuk menghasilkan prediksi akhir yang lebih akurat. Dengan mengatur parameter seperti metode cross-validation, penggunaan blending, serta paralelisasi proses, class ini memberikan kontrol penuh kepada pengguna dalam menerapkan teknik stacking sesuai kebutuhan. Kegunaan utamanya adalah sebagai alat bantu untuk mengimplementasikan ensemble model dari nol (from scratch) tanpa tergantung pada fungsi otomatis dari pustaka seperti scikit-learn, sehingga cocok untuk pembelajaran konsep maupun eksperimen lanjutan dalam machine learning.

Buat class bernama Stack yang berisi <br>

Attribute:


Method:


In [5]:
# Definisikan class stack

class Stack:
    """
    Class untuk menerapkan teknik Stacking (ensemble learning)
    dari nol (from scratch).

    Atributnya:
        base_models : list
            Daftar model dasar (base learners)
        meta_model : object
            Model meta (final estimator)
        n_folds : int
            Jumlah lipatan cross-validation untuk membuat meta-features
    """

    def __init__(self, base_models, meta_model, n_folds=5):
        self.base_models = base_models
        self.meta_model = meta_model
        self.n_folds = n_folds

    def fit(self, X, y):
        """Latih base models dengan KFold, lalu latih meta model."""
        self.base_models_ = [list() for _ in self.base_models]
        self.meta_model_ = clone(self.meta_model)
        kfold = KFold(n_splits=self.n_folds, shuffle=True, random_state=42)

        # Membuat array kosong untuk meta-features
        out_of_fold_predictions = np.zeros((X.shape[0], len(self.base_models)))

        # Latih base modelsnya satu per satu
        for i, model in enumerate(self.base_models):
            for train_idx, holdout_idx in kfold.split(X, y):
                instance = clone(model)
                X_train, y_train = X[train_idx], y[train_idx]
                X_holdout = X[holdout_idx]
                instance.fit(X_train, y_train)
                y_pred = instance.predict(X_holdout)
                out_of_fold_predictions[holdout_idx, i] = y_pred
                self.base_models_[i].append(instance)

        # Latih meta model di atas meta-features
        self.meta_model_.fit(out_of_fold_predictions, y)
        return self

    def prediksi(self, X):
        """Prediksi dengan base models lalu meta model."""
        meta_features = np.column_stack([
            np.column_stack([model.prediksi(X) for model in base_models_list]).mean(axis=1)
            for base_models_list in self.base_models_
        ])
        return self.meta_model_.prediksi(meta_features)


### **Upload datasets**

In [13]:
# Load dataset iris dari sklearn
iris = load_iris(as_frame=True)
df = iris.frame

# Pisahkan fitur dan target
x = iris.data
y = iris.target

# Encode target (optional, untuk case multiclass stacking)

# Split data menjadi train dan test
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

# **Model training and evaluation of the obtained results**
Both in the case of classification and regression, stacking and blending showed the same and not the best results. As a rule, this situation occurs for two reasons: given that metadata is based on predictions of basic models, the presence of weak basic models can reduce the accuracy of stronger ones which will reduce the final prediction as a whole. Also a small amount of training data often leads to overfitting which in turn reduces the accuracy of predictions.

In this case the problem can be partially solved by setting stack_method='predict_proba' when each basic classifier outputs class membership probabilities instead of the classes themselves which can help increase accuracy in the case of non-mutually exclusive classes. Also this method works better with noise in the data. As you can see this method has significantly increased the accuracy of the model. With the right selection of models and hyperparameters the accuracy will be even higher.

Most often stacking shows slightly better results than blending due to the use of k-fold cross-validation but usually the difference is noticeable only on a large amount of data.

In [17]:
# StackingClassifier dan Blending Classifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import numpy as np

#Data tadi
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Base sama meta model
base = [
    ('dt', DecisionTreeClassifier(max_depth=3)),
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('svm', SVC(probability=True))
]
meta = LogisticRegression()

#Stack
stack = StackingClassifier(estimators=base, final_estimator=meta, stack_method='predict_proba', cv=5)
stack.fit(X_train, y_train)
stack_acc = accuracy_score(y_test, stack.predict(X_test))

#BLENDING manual
X_t, X_v, y_t, y_v = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
models = [m[1].fit(X_t, y_t) for m in base]
blend_train = np.column_stack([m.predict_proba(X_v).max(axis=1) for m in models])
blend_test = np.column_stack([m.predict_proba(X_test).max(axis=1) for m in models])
meta.fit(blend_train, y_v)
blend_acc = accuracy_score(y_test, meta.predict(blend_test))

# Hasil
print(f"Stacking Acc : {stack_acc:.4f}")
print(f"Blending Acc : {blend_acc:.4f}")

Stacking Acc : 1.0000
Blending Acc : 0.2889


**StackingClassifier (scikit-learn)**


* Merupakan implementasi resmi stacking untuk klasifikasi dalam scikit-learn.

* Mempermudah proses ensemble dengan base learners dan final estimator (meta learner) dalam satu objek.

* Sudah menangani cross-validation secara internal sehingga mengurangi risiko data leakage.

* Mendukung parameter passthrough=True jika ingin menggabungkan fitur asli dengan meta-features.

* Cocok untuk digunakan dalam pipeline dan produksi karena stabil dan teruji.

In [19]:
# StackingClassifier (scikit-learn)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Base learners dan meta learner
base_learners = [
    ('dt', DecisionTreeClassifier(max_depth=3)),
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('svm', SVC(probability=True))
]
meta_learner = LogisticRegression()

# Buat model stacking
stack_model = StackingClassifier(
    estimators=base_learners,
    final_estimator=meta_learner,
    stack_method='predict_proba',
    passthrough=False,
    cv=5
)

# Train dan evaluasi
stack_model.fit(X_train, y_train)
y_pred = stack_model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Akurasi StackingClassifier: {acc:.4f}")

Akurasi StackingClassifier: 1.0000
