# 10 - Stacking Classifier

Explanation:
1. Base Models (Level-0):
- LogisticRegression: Simple linear classifier.
- RandomForestClassifier: Ensemble method using bagging.
- XGBClassifier: Powerful boosting algorithm.
- SVC: Support Vector Machine with probability enabled.

2. Meta-Model (Level-1):
- RidgeClassifier: Learns from the predictions of base models to make the final prediction.

3. Stacking Mechanism:
- Each base model is trained on the training data.
- The meta-model combines their predictions to improve overall performance.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler

In [2]:
# Load dataset
df = pd.read_csv("bank_numeric.csv")

# Define features and target
target_column = "deposit"
X = df.drop(columns=[target_column])
y = df[target_column]

In [3]:
# Feature scaling (SVM and Logistic Regression perform better with scaling)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [4]:
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)


In [5]:
# Define base models (Level-0)
base_models = [
    ('log_reg', LogisticRegression(random_state=42)),
    ('random_forest', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('xgboost', XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)),
    ('svm', SVC(probability=True, random_state=42))
]

In [6]:
# Define meta-model (Level-1)
meta_model = RidgeClassifier(random_state=42)

In [7]:
# Create Stacking Classifier
stacked_model = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model,
    cv=5  # 5-fold cross-validation
)

In [8]:
# Train the stacked model
stacked_model.fit(X_train, y_train)

In [9]:
# Make predictions
y_pred = stacked_model.predict(X_test)


In [None]:
# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# CatBoost has 0.88 in accuracy
# but this stacking model takes the second place in this competition!

Accuracy: 0.87

Classification Report:
               precision    recall  f1-score   support

           0       0.88      0.89      0.89       609
           1       0.85      0.83      0.84       443

    accuracy                           0.87      1052
   macro avg       0.87      0.86      0.86      1052
weighted avg       0.87      0.87      0.87      1052



In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:\n", cm)

# confusion matrix is more or less the same


Confusion Matrix:
 [[544  65]
 [ 74 369]]
