# Step 4: Stacking Ensemble Building

## Objective
Build a stacking ensemble classifier using the top 3 models identified in Notebook 03 as base learners, with an optimized meta-learner to combine their predictions.

## Process
1. Load top 3 model names from Notebook 03 evaluation results
2. Map model names to sklearn classifier instances
3. Load engineered features from `data/processed/BRL_X_features.csv`
4. Create base learners from the top 3 models
5. Evaluate multiple meta-learner candidates (Logistic Regression, Random Forest, XGBoost)
6. Select best meta-learner based on accuracy
7. Train final stacking ensemble with optimal configuration
8. Compare ensemble performance vs individual base models

## Output
- Stacking ensemble classifier using top 3 models as base learners
- Best meta-learner stored in `BEST_META_LEARNER` variable for notebook 05
- Performance comparison: ensemble vs individual models

## Ensemble Architecture
**Stacking Ensemble** combines multiple models in two layers:
- **Layer 1 (Base Learners)**: Top 3 models from LazyClassifier make predictions
- **Layer 2 (Meta Learner)**: Learns to optimally combine base model predictions
- Uses cross-validation to prevent overfitting during meta-learner training

In [1]:
# Import required libraries
import os
import pandas as pd
import numpy as np
from datetime import datetime

# Model selection and evaluation
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Ensemble methods
from sklearn.ensemble import StackingClassifier, RandomForestClassifier, ExtraTreesClassifier
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, BaggingClassifier

# Linear models
from sklearn.linear_model import LogisticRegression, RidgeClassifier, SGDClassifier
from sklearn.linear_model import PassiveAggressiveClassifier, Perceptron

# Discriminant Analysis
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis

# SVM models
from sklearn.svm import SVC, LinearSVC, NuSVC

# Neighbors
from sklearn.neighbors import KNeighborsClassifier, NearestCentroid

# Naive Bayes
from sklearn.naive_bayes import GaussianNB, BernoulliNB

# Tree-based models
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier

# XGBoost
from xgboost import XGBClassifier

print(f"Stacking ensemble building started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Stacking ensemble building started at: 2025-10-25 11:18:04


In [2]:
# Define configuration parameters
FEATURES_PATH = '../data/processed/BRL_X_features.csv'  # Input file from notebook 02
TOP_MODELS_FILE = '../data/processed/metrics/top_3_model_names.txt'  # Top 3 models from notebook 03

# Train/test split configuration (must match notebook 03)
TEST_SIZE = 0.2        # 20% of data for testing
RANDOM_STATE = 42      # For reproducibility
SHUFFLE = False        # Critical: Do not shuffle time series data

# Stacking ensemble configuration
CV_FOLDS = 5           # Cross-validation folds for meta-learner training
N_JOBS = -1            # Use all CPU cores

print(f"Configuration:")
print(f"  Features: {FEATURES_PATH}")
print(f"  Top Models File: {TOP_MODELS_FILE}")
print(f"  Test Size: {TEST_SIZE * 100}%")
print(f"  CV Folds: {CV_FOLDS}")
print(f"  Shuffle: {SHUFFLE} (preserving time series order)")

Configuration:
  Features: ../data/processed/BRL_X_features.csv
  Top Models File: ../data/processed/metrics/top_3_model_names.txt
  Test Size: 20.0%
  CV Folds: 5
  Shuffle: False (preserving time series order)


In [3]:
# Load top 3 model names from Notebook 03 evaluation results
# These are the best performing models identified by LazyClassifier

print("Loading top 3 models from Notebook 03...")
TOP_3_MODELS = []

with open(TOP_MODELS_FILE, 'r') as f:
    lines = f.readlines()
    for line in lines:
        # Parse lines like "1. QuadraticDiscriminantAnalysis"
        if line.strip() and line[0].isdigit():
            model_name = line.split('.', 1)[1].strip()
            TOP_3_MODELS.append(model_name)

print(f"\nTop 3 Models from Notebook 03:")
for i, model_name in enumerate(TOP_3_MODELS, 1):
    print(f"  {i}. {model_name}")

print(f"\nTotal models loaded: {len(TOP_3_MODELS)}")

Loading top 3 models from Notebook 03...

Top 3 Models from Notebook 03:
  1. QuadraticDiscriminantAnalysis
  2. LinearDiscriminantAnalysis
  3. LinearSVC

Total models loaded: 3


In [4]:
# Create mapping from model name strings to sklearn classifier instances
# This allows dynamic instantiation of models based on LazyClassifier results

MODEL_MAPPING = {
    # Discriminant Analysis
    'LinearDiscriminantAnalysis': LinearDiscriminantAnalysis(),
    'QuadraticDiscriminantAnalysis': QuadraticDiscriminantAnalysis(),
    
    # SVM Models
    'LinearSVC': LinearSVC(random_state=RANDOM_STATE, max_iter=10000),
    'SVC': SVC(random_state=RANDOM_STATE),
    'NuSVC': NuSVC(random_state=RANDOM_STATE),
    
    # Ensemble Models
    'RandomForestClassifier': RandomForestClassifier(random_state=RANDOM_STATE, n_estimators=100),
    'ExtraTreesClassifier': ExtraTreesClassifier(random_state=RANDOM_STATE, n_estimators=100),
    'AdaBoostClassifier': AdaBoostClassifier(random_state=RANDOM_STATE),
    'GradientBoostingClassifier': GradientBoostingClassifier(random_state=RANDOM_STATE),
    'BaggingClassifier': BaggingClassifier(random_state=RANDOM_STATE),
    
    # Linear Models
    'LogisticRegression': LogisticRegression(random_state=RANDOM_STATE, max_iter=10000),
    'RidgeClassifier': RidgeClassifier(random_state=RANDOM_STATE),
    'SGDClassifier': SGDClassifier(random_state=RANDOM_STATE, max_iter=10000),
    'PassiveAggressiveClassifier': PassiveAggressiveClassifier(random_state=RANDOM_STATE, max_iter=10000),
    'Perceptron': Perceptron(random_state=RANDOM_STATE, max_iter=10000),
    
    # Neighbors
    'KNeighborsClassifier': KNeighborsClassifier(n_neighbors=5),
    'NearestCentroid': NearestCentroid(),
    
    # Naive Bayes
    'GaussianNB': GaussianNB(),
    'BernoulliNB': BernoulliNB(),
    
    # Tree Models
    'DecisionTreeClassifier': DecisionTreeClassifier(random_state=RANDOM_STATE),
    'ExtraTreeClassifier': ExtraTreeClassifier(random_state=RANDOM_STATE),
    
    # XGBoost
    'XGBClassifier': XGBClassifier(random_state=RANDOM_STATE, use_label_encoder=False, eval_metric='logloss'),
}

print(f"Model mapping created with {len(MODEL_MAPPING)} available classifiers")

Model mapping created with 22 available classifiers


In [5]:
# Load engineered features from Notebook 02
df = pd.read_csv(FEATURES_PATH, index_col=0)

# Convert index to datetime for proper time series handling
df.index = pd.to_datetime(df.index)
df.index.name = 'Date'

# Handle missing values (should be minimal after notebook 02 processing)
rows_before = len(df)
df = df.dropna()
rows_after = len(df)

print(f"Data loaded from Notebook 02:")
print(f"  Total records: {rows_after}")
print(f"  Date range: {df.index.min().strftime('%Y-%m-%d')} to {df.index.max().strftime('%Y-%m-%d')}")
print(f"  Rows dropped (NaN): {rows_before - rows_after}")
print(f"  Dataset shape: {df.shape}")

# Separate features and target
X = df.drop(columns=['target'])
y = df['target']

print(f"\nFeatures: {X.shape}")
print(f"Target: {y.shape}")
print(f"\nTarget distribution:")
print(y.value_counts())
print(y.value_counts(normalize=True))

Data loaded from Notebook 02:
  Total records: 4103
  Date range: 2010-01-21 to 2025-10-24
  Rows dropped (NaN): 0
  Dataset shape: (4103, 18)

Features: (4103, 17)
Target: (4103,)

Target distribution:
target
1    2067
0    2036
Name: count, dtype: int64
target
1    0.503778
0    0.496222
Name: proportion, dtype: float64


In [6]:
# Split data into training and testing sets
# CRITICAL: Must match Notebook 03 split for fair comparison
# shuffle=False preserves time series order

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=TEST_SIZE, 
    random_state=RANDOM_STATE, 
    shuffle=SHUFFLE
)

# Display split information
print(f"Train/Test Split Summary:")
print(f"  Total samples: {len(X)}")
print(f"  Training samples: {len(X_train)} ({len(X_train)/len(X)*100:.1f}%)")
print(f"  Testing samples: {len(X_test)} ({len(X_test)/len(X)*100:.1f}%)")

# Show date ranges for each set (time series context)
train_start = X_train.index.min().strftime('%Y-%m-%d')
train_end = X_train.index.max().strftime('%Y-%m-%d')
test_start = X_test.index.min().strftime('%Y-%m-%d')
test_end = X_test.index.max().strftime('%Y-%m-%d')

print(f"\nTraining period: {train_start} to {train_end}")
print(f"Testing period: {test_start} to {test_end}")

# Verify this matches Notebook 03
print(f"\nNote: This split must match Notebook 03 for fair ensemble comparison")

Train/Test Split Summary:
  Total samples: 4103
  Training samples: 3282 (80.0%)
  Testing samples: 821 (20.0%)

Training period: 2010-01-21 to 2022-08-29
Testing period: 2022-08-30 to 2025-10-24

Note: This split must match Notebook 03 for fair ensemble comparison


In [7]:
# Create base learners from TOP_3_MODELS
# Dynamically instantiate the models selected in Notebook 03

base_learners = []

print("Creating base learners from TOP_3_MODELS:")
print("="*80)

for i, model_name in enumerate(TOP_3_MODELS, 1):
    if model_name in MODEL_MAPPING:
        model_instance = MODEL_MAPPING[model_name]
        # Create tuple with (name, model_instance) for StackingClassifier
        base_learners.append((f'model_{i}', model_instance))
        print(f"{i}. {model_name} - Successfully instantiated")
    else:
        print(f"{i}. {model_name} - ERROR: Not found in MODEL_MAPPING")
        raise ValueError(f"Model '{model_name}' not found in MODEL_MAPPING. Please add it.")

print("="*80)
print(f"\nBase learners created: {len(base_learners)}")
print(f"Ready for stacking ensemble")

Creating base learners from TOP_3_MODELS:
1. QuadraticDiscriminantAnalysis - Successfully instantiated
2. LinearDiscriminantAnalysis - Successfully instantiated
3. LinearSVC - Successfully instantiated

Base learners created: 3
Ready for stacking ensemble


In [8]:
# Define candidate meta-learners to test
# Meta-learner combines predictions from base learners
# Test multiple algorithms to find the best combination

meta_learner_candidates = [
    ('LogisticRegression', LogisticRegression(random_state=RANDOM_STATE, max_iter=10000)),
    ('RandomForest', RandomForestClassifier(random_state=RANDOM_STATE, n_estimators=100)),
    ('XGBoost', XGBClassifier(random_state=RANDOM_STATE, use_label_encoder=False, eval_metric='logloss')),
    ('GradientBoosting', GradientBoostingClassifier(random_state=RANDOM_STATE)),
    ('ExtraTrees', ExtraTreesClassifier(random_state=RANDOM_STATE, n_estimators=100))
]

print(f"Meta-learner candidates to evaluate: {len(meta_learner_candidates)}")
for name, _ in meta_learner_candidates:
    print(f"  - {name}")

Meta-learner candidates to evaluate: 5
  - LogisticRegression
  - RandomForest
  - XGBoost
  - GradientBoosting
  - ExtraTrees


In [9]:
# Evaluate different meta-learners with stacking ensemble
# Test each meta-learner to find which best combines the base models

print("Evaluating meta-learners with stacking ensemble...")
print("="*80)
print("This may take several minutes as each configuration uses cross-validation.\n")

meta_results = {}
best_accuracy = -1
best_meta_name = None
best_meta_learner_instance = None
best_stacking_clf = None

for meta_name, meta_learner in meta_learner_candidates:
    print(f"Testing meta-learner: {meta_name}...")
    
    # Create stacking classifier with current meta-learner
    stacking_clf = StackingClassifier(
        estimators=base_learners,
        final_estimator=meta_learner,
        cv=CV_FOLDS,           # Cross-validation to prevent overfitting
        n_jobs=N_JOBS          # Use all CPU cores
    )
    
    # Train the stacking ensemble
    stacking_clf.fit(X_train, y_train)
    
    # Make predictions on test set
    y_pred = stacking_clf.predict(X_test)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_test, y_pred)
    meta_results[meta_name] = accuracy
    
    print(f"  {meta_name} Accuracy: {accuracy:.4f}")
    
    # Track best performing meta-learner
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_meta_name = meta_name
        best_meta_learner_instance = meta_learner
        best_stacking_clf = stacking_clf

print("\n" + "="*80)
print(f"Meta-learner evaluation completed!")
print(f"\nBest Meta-Learner: {best_meta_name}")
print(f"Best Accuracy: {best_accuracy:.4f}")

Evaluating meta-learners with stacking ensemble...
This may take several minutes as each configuration uses cross-validation.

Testing meta-learner: LogisticRegression...
  LogisticRegression Accuracy: 0.5347
Testing meta-learner: RandomForest...
  LogisticRegression Accuracy: 0.5347
Testing meta-learner: RandomForest...
  RandomForest Accuracy: 0.4689
Testing meta-learner: XGBoost...
  RandomForest Accuracy: 0.4689
Testing meta-learner: XGBoost...


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


  XGBoost Accuracy: 0.4714
Testing meta-learner: GradientBoosting...
  GradientBoosting Accuracy: 0.5104
Testing meta-learner: ExtraTrees...
  GradientBoosting Accuracy: 0.5104
Testing meta-learner: ExtraTrees...
  ExtraTrees Accuracy: 0.4860

Meta-learner evaluation completed!

Best Meta-Learner: LogisticRegression
Best Accuracy: 0.5347
  ExtraTrees Accuracy: 0.4860

Meta-learner evaluation completed!

Best Meta-Learner: LogisticRegression
Best Accuracy: 0.5347


In [10]:
# Display detailed comparison of all meta-learners
# Show how each meta-learner performed

print("Detailed Meta-Learner Performance Comparison:")
print("="*80)

# Create DataFrame for better visualization
meta_comparison_df = pd.DataFrame(list(meta_results.items()), columns=['Meta-Learner', 'Accuracy'])
meta_comparison_df = meta_comparison_df.sort_values('Accuracy', ascending=False)
meta_comparison_df['Rank'] = range(1, len(meta_comparison_df) + 1)
meta_comparison_df = meta_comparison_df[['Rank', 'Meta-Learner', 'Accuracy']]

print(meta_comparison_df.to_string(index=False))
print("="*80)

Detailed Meta-Learner Performance Comparison:
 Rank       Meta-Learner  Accuracy
    1 LogisticRegression  0.534714
    2   GradientBoosting  0.510353
    3         ExtraTrees  0.485993
    4            XGBoost  0.471376
    5       RandomForest  0.468940


In [11]:
# Train final stacking ensemble with best meta-learner
# Store the best meta-learner in BEST_META_LEARNER variable for notebook 05

# Store best meta-learner for future use
BEST_META_LEARNER = best_meta_learner_instance
BEST_META_LEARNER_NAME = best_meta_name

print(f"Training final stacking ensemble with best configuration...")
print(f"Base Learners: {[name for name, _ in base_learners]}")
print(f"Meta-Learner: {BEST_META_LEARNER_NAME}")
print()

# The best model was already trained during evaluation
final_stacking_clf = best_stacking_clf

# Get predictions on test set
y_pred = final_stacking_clf.predict(X_test)
final_accuracy = accuracy_score(y_test, y_pred)

print(f"Final Stacking Ensemble Performance:")
print(f"  Test Accuracy: {final_accuracy:.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Down (0)', 'Up (1)']))

print(f"\nConfusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(f"  True Negatives:  {cm[0][0]}")
print(f"  False Positives: {cm[0][1]}")
print(f"  False Negatives: {cm[1][0]}")
print(f"  True Positives:  {cm[1][1]}")

print(f"\n" + "="*80)
print(f"Variable 'BEST_META_LEARNER' created: {BEST_META_LEARNER_NAME}")
print(f"Variable 'BEST_META_LEARNER_NAME' created: '{BEST_META_LEARNER_NAME}'")
print(f"These variables will be used in notebook 05 for hyperparameter optimization.")

Training final stacking ensemble with best configuration...
Base Learners: ['model_1', 'model_2', 'model_3']
Meta-Learner: LogisticRegression

Final Stacking Ensemble Performance:
  Test Accuracy: 0.5347

Classification Report:
              precision    recall  f1-score   support

    Down (0)       0.59      0.35      0.44       426
      Up (1)       0.51      0.74      0.60       395

    accuracy                           0.53       821
   macro avg       0.55      0.54      0.52       821
weighted avg       0.55      0.53      0.52       821


Confusion Matrix:
[[148 278]
 [104 291]]
  True Negatives:  148
  False Positives: 278
  False Negatives: 104
  True Positives:  291

Variable 'BEST_META_LEARNER' created: LogisticRegression
Variable 'BEST_META_LEARNER_NAME' created: 'LogisticRegression'
These variables will be used in notebook 05 for hyperparameter optimization.


In [12]:
# Compare ensemble performance vs individual base models
# Evaluate if stacking provides improvement over single models

print("Performance Comparison: Ensemble vs Individual Base Models")
print("="*80)

individual_results = {}

# Evaluate each base model individually
for model_name, model in base_learners:
    # Get the original model name from TOP_3_MODELS
    original_name = TOP_3_MODELS[int(model_name.split('_')[1]) - 1]
    
    # Clone and train the model
    from sklearn.base import clone
    model_clone = clone(model)
    model_clone.fit(X_train, y_train)
    
    # Make predictions
    y_pred_individual = model_clone.predict(X_test)
    accuracy_individual = accuracy_score(y_test, y_pred_individual)
    
    individual_results[original_name] = accuracy_individual
    print(f"{original_name}: {accuracy_individual:.4f}")

print(f"\nStacking Ensemble ({BEST_META_LEARNER_NAME}): {final_accuracy:.4f}")

# Calculate improvement
avg_individual = np.mean(list(individual_results.values()))
improvement = final_accuracy - avg_individual

print("="*80)
print(f"\nAverage Individual Model Accuracy: {avg_individual:.4f}")
print(f"Stacking Ensemble Accuracy: {final_accuracy:.4f}")
print(f"Improvement: {improvement:+.4f} ({improvement/avg_individual*100:+.2f}%)")

if improvement > 0:
    print(f"\nThe ensemble OUTPERFORMS the average of individual models!")
else:
    print(f"\nThe ensemble does not significantly improve over individual models.")

Performance Comparison: Ensemble vs Individual Base Models
QuadraticDiscriminantAnalysis: 0.5128
LinearDiscriminantAnalysis: 0.5396
LinearSVC: 0.5189

Stacking Ensemble (LogisticRegression): 0.5347

Average Individual Model Accuracy: 0.5238
Stacking Ensemble Accuracy: 0.5347
Improvement: +0.0110 (+2.09%)

The ensemble OUTPERFORMS the average of individual models!




In [13]:
# Save ensemble configuration for documentation
# Store the final configuration for reference and notebook 05

ensemble_config = {
    'base_learners': TOP_3_MODELS,
    'meta_learner': BEST_META_LEARNER_NAME,
    'test_accuracy': final_accuracy,
    'cv_folds': CV_FOLDS,
    'individual_accuracies': individual_results,
    'ensemble_improvement': improvement
}

# Save to metrics directory
METRICS_PATH = '../data/processed/metrics/'
os.makedirs(METRICS_PATH, exist_ok=True)

config_path = os.path.join(METRICS_PATH, 'stacking_ensemble_config.txt')
with open(config_path, 'w') as f:
    f.write(f"Stacking Ensemble Configuration\n")
    f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write(f"="*80 + "\n\n")
    
    f.write(f"Base Learners (from Notebook 03):\n")
    for i, model_name in enumerate(TOP_3_MODELS, 1):
        f.write(f"  {i}. {model_name}: {individual_results[model_name]:.4f}\n")
    
    f.write(f"\nMeta-Learner: {BEST_META_LEARNER_NAME}\n")
    f.write(f"Cross-Validation Folds: {CV_FOLDS}\n")
    f.write(f"\nEnsemble Test Accuracy: {final_accuracy:.4f}\n")
    f.write(f"Average Individual Accuracy: {avg_individual:.4f}\n")
    f.write(f"Improvement: {improvement:+.4f} ({improvement/avg_individual*100:+.2f}%)\n")

print(f"Ensemble configuration saved to: {config_path}")

Ensemble configuration saved to: ../data/processed/metrics/stacking_ensemble_config.txt


## Summary

Stacking ensemble successfully built and evaluated:
- Loaded top 3 models from Notebook 03 LazyClassifier evaluation
- Base learners dynamically instantiated from model names
- Evaluated 5 different meta-learner candidates
- Selected best meta-learner based on test accuracy
- Compared ensemble performance against individual base models

**Base Learners (from Notebook 03):**
The ensemble uses the top 3 models identified in the previous notebook as base learners in the first layer of the stacking architecture.

**Meta-Learner Selection:**
The `BEST_META_LEARNER` and `BEST_META_LEARNER_NAME` variables store the optimal meta-learner configuration, which will be used in Notebook 05 for hyperparameter optimization.

**Ensemble Architecture:**
- Layer 1: Top 3 models make independent predictions
- Layer 2: Meta-learner combines predictions using cross-validation
- Prevents overfitting through CV during training

**Key Findings:**
- Ensemble performance compared to individual models
- Configuration saved to `data/processed/metrics/` for documentation
- Ready for hyperparameter optimization

## Next Steps
Proceed to `05_stacking_optuna.ipynb` to:
- Load the ensemble configuration (TOP_3_MODELS and BEST_META_LEARNER)
- Use Optuna for Bayesian hyperparameter optimization
- Optimize both base learners and meta-learner parameters
- Find the best hyperparameter combination
- Evaluate optimized ensemble performance