# Module 08: Stacking and Blending

**Difficulty**: ⭐⭐⭐
**Estimated Time**: 55 minutes
**Prerequisites**: 
- Module 00: Introduction to Ensemble Methods
- Module 02: Random Forest
- Module 05: XGBoost
- Module 06: LightGBM
- Module 07: CatBoost

## Learning Objectives
By the end of this notebook, you will be able to:
1. Understand the concept of stacking (stacked generalization)
2. Implement stacking with different base models and meta-learners
3. Use cross-validation properly to avoid overfitting in stacking
4. Understand the difference between stacking and blending
5. Build multi-level stacked ensembles
6. Know when stacking helps and when it doesn't
7. Apply best practices for creating robust stacked models

## 1. Introduction to Stacking

### What is Stacking?

**Stacking** (Stacked Generalization) is an ensemble technique that combines multiple models through a **meta-learner**. Instead of using simple averaging or voting, stacking trains a new model to learn how to best combine the predictions of base models.

### Architecture:

```
Training Data
     |
     +---> Model 1 (e.g., Random Forest)
     +---> Model 2 (e.g., XGBoost)         } Base Models (Level 0)
     +---> Model 3 (e.g., LightGBM)
     |
     v
Predictions from all models
     |
     v
Meta-Learner (Level 1)  <-- Learns to combine predictions
     |
     v
Final Prediction
```

### Key Principles:

1. **Diversity**: Base models should be diverse (different algorithms, different hyperparameters)
2. **Cross-Validation**: Must use CV to generate meta-features to avoid overfitting
3. **Meta-Learner**: Usually a simple model (logistic regression, ridge, etc.)
4. **Feature Engineering**: Base model predictions become features for meta-learner

### Stacking Process:

**Training Phase:**
1. Split training data into K folds
2. For each base model:
   - Train on K-1 folds
   - Predict on the held-out fold
   - Repeat for all folds (cross-validation)
3. Collect all out-of-fold predictions as "meta-features"
4. Train meta-learner on meta-features
5. Retrain base models on full training set for test predictions

**Prediction Phase:**
1. Each base model makes predictions on test data
2. Meta-learner combines these predictions
3. Output final prediction

### Why Stacking Works:

- Different models have different strengths and weaknesses
- Stacking learns **when to trust which model**
- Meta-learner can discover non-linear combinations
- Often achieves better performance than any single model

## 2. Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from time import time

# Scikit-learn
from sklearn.datasets import make_classification, load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split, cross_val_score, KFold, StratifiedKFold
from sklearn.ensemble import (
    RandomForestClassifier, RandomForestRegressor,
    GradientBoostingClassifier, GradientBoostingRegressor,
    StackingClassifier, StackingRegressor
)
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.svm import SVC, SVR
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    mean_squared_error, r2_score, log_loss
)

# Boosting libraries
from xgboost import XGBClassifier, XGBRegressor
from lightgbm import LGBMClassifier, LGBMRegressor
try:
    from catboost import CatBoostClassifier, CatBoostRegressor
    CATBOOST_AVAILABLE = True
except ImportError:
    CATBOOST_AVAILABLE = False
    print("CatBoost not available - will skip CatBoost examples")

# Configuration
%matplotlib inline
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Set random seed
np.random.seed(42)

## 3. Basic Stacking Example

Let's start with a simple stacking example using scikit-learn's `StackingClassifier`.

In [None]:
# Load dataset
cancer_data = load_breast_cancer()
X = cancer_data.data
y = cancer_data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

In [None]:
# Define base models (diverse algorithms)
base_models = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('xgb', XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42, eval_metric='logloss')),
    ('lgbm', LGBMClassifier(n_estimators=100, learning_rate=0.1, random_state=42, verbose=-1)),
    ('svm', SVC(kernel='rbf', probability=True, random_state=42))
]

# Define meta-learner (simple model)
meta_learner = LogisticRegression(random_state=42)

# Create stacking classifier
stacking_clf = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_learner,
    cv=5,  # Use 5-fold CV for meta-features
    stack_method='auto',  # Use predict_proba if available, else predict
    n_jobs=-1
)

print("Stacking Classifier created with:")
print(f"- {len(base_models)} base models")
print(f"- Meta-learner: {meta_learner.__class__.__name__}")
print(f"- CV folds: 5")

In [None]:
# Train stacking classifier
print("Training stacking classifier...")
start_time = time()
stacking_clf.fit(X_train, y_train)
training_time = time() - start_time

# Make predictions
y_pred_stack = stacking_clf.predict(X_test)
stack_acc = accuracy_score(y_test, y_pred_stack)

print(f"\nTraining time: {training_time:.2f} seconds")
print(f"Test Accuracy: {stack_acc:.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_stack, target_names=cancer_data.target_names))

In [None]:
# Compare with individual base models
print("Individual Base Model Performance:")
print("=" * 50)

base_accuracies = {}
for name, model in base_models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    base_accuracies[name] = acc
    print(f"{name:10s}: {acc:.4f}")

print("=" * 50)
print(f"{'Stacking':10s}: {stack_acc:.4f}")
print(f"\nImprovement over best base model: {stack_acc - max(base_accuracies.values()):.4f}")

In [None]:
# Visualize comparison
models = list(base_accuracies.keys()) + ['Stacking']
accuracies = list(base_accuracies.values()) + [stack_acc]

plt.figure(figsize=(10, 6))
colors = ['lightblue'] * len(base_accuracies) + ['red']
bars = plt.bar(models, accuracies, color=colors, alpha=0.7)

# Highlight stacking
bars[-1].set_edgecolor('darkred')
bars[-1].set_linewidth(2)

plt.ylabel('Accuracy')
plt.title('Model Performance Comparison')
plt.ylim([min(accuracies) - 0.01, 1.0])
plt.axhline(y=max(base_accuracies.values()), color='green', 
            linestyle='--', label='Best base model')
plt.xticks(rotation=45)
plt.legend()
plt.grid(axis='y', alpha=0.3)

for i, (model, acc) in enumerate(zip(models, accuracies)):
    plt.text(i, acc + 0.002, f'{acc:.4f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 4. Manual Stacking Implementation

Let's implement stacking manually to understand how it works under the hood.

In [None]:
# Step 1: Define base models
base_learners = [
    RandomForestClassifier(n_estimators=100, random_state=42),
    XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42, eval_metric='logloss'),
    LGBMClassifier(n_estimators=100, learning_rate=0.1, random_state=42, verbose=-1)
]

# Step 2: Generate meta-features using cross-validation
n_folds = 5
kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=42)

# Initialize meta-features array
meta_features_train = np.zeros((len(X_train), len(base_learners)))
meta_features_test = np.zeros((len(X_test), len(base_learners)))

print("Generating meta-features...")
for i, model in enumerate(base_learners):
    print(f"\nProcessing {model.__class__.__name__}...")
    
    # For test set: average predictions across folds
    test_preds = np.zeros(len(X_test))
    
    # Cross-validation for training meta-features
    for fold_idx, (train_idx, val_idx) in enumerate(kfold.split(X_train, y_train)):
        X_fold_train, X_fold_val = X_train[train_idx], X_train[val_idx]
        y_fold_train, y_fold_val = y_train[train_idx], y_train[val_idx]
        
        # Train on this fold
        model.fit(X_fold_train, y_fold_train)
        
        # Predict on validation fold (out-of-fold predictions)
        meta_features_train[val_idx, i] = model.predict_proba(X_fold_val)[:, 1]
        
        # Predict on test set and accumulate
        test_preds += model.predict_proba(X_test)[:, 1] / n_folds
    
    # Store test predictions
    meta_features_test[:, i] = test_preds
    print(f"  Completed {n_folds} folds")

print("\nMeta-features generated!")
print(f"Train meta-features shape: {meta_features_train.shape}")
print(f"Test meta-features shape: {meta_features_test.shape}")

In [None]:
# Step 3: Train meta-learner
meta_model = LogisticRegression(random_state=42)
meta_model.fit(meta_features_train, y_train)

# Step 4: Make final predictions
y_pred_manual_stack = meta_model.predict(meta_features_test)
manual_stack_acc = accuracy_score(y_test, y_pred_manual_stack)

print(f"Manual Stacking Accuracy: {manual_stack_acc:.4f}")
print(f"Scikit-learn Stacking Accuracy: {stack_acc:.4f}")
print(f"\nDifference: {abs(manual_stack_acc - stack_acc):.6f}")

In [None]:
# Examine meta-learner coefficients
print("Meta-Learner Coefficients (how much each model is trusted):")
print("=" * 60)
for i, model in enumerate(base_learners):
    coef = meta_model.coef_[0][i]
    print(f"{model.__class__.__name__:25s}: {coef:8.4f}")
print(f"{'Intercept':25s}: {meta_model.intercept_[0]:8.4f}")

print("\nInterpretation:")
print("- Positive coefficient: Model's high prediction increases final prediction")
print("- Negative coefficient: Model's high prediction decreases final prediction")
print("- Larger magnitude: Model has more influence")

## 5. Stacking vs Blending

### Key Differences:

**Stacking:**
- Uses cross-validation to generate meta-features
- All training data is used for meta-learner training
- More robust, less prone to overfitting
- More computationally expensive

**Blending:**
- Splits data into train and holdout sets
- Base models train on train set
- Predictions on holdout set become meta-features
- Simpler, faster
- Uses less data for base models

Let's implement blending:

In [None]:
# Blending implementation
# Split training data into train and holdout
X_blend_train, X_blend_hold, y_blend_train, y_blend_hold = train_test_split(
    X_train, y_train, test_size=0.2, random_state=42, stratify=y_train
)

print("Blending Data Split:")
print(f"Train for base models: {X_blend_train.shape}")
print(f"Holdout for meta-model: {X_blend_hold.shape}")
print(f"Test set: {X_test.shape}")

# Step 1: Train base models on blend_train
base_models_blend = [
    RandomForestClassifier(n_estimators=100, random_state=42),
    XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42, eval_metric='logloss'),
    LGBMClassifier(n_estimators=100, learning_rate=0.1, random_state=42, verbose=-1)
]

blend_meta_train = np.zeros((len(X_blend_hold), len(base_models_blend)))
blend_meta_test = np.zeros((len(X_test), len(base_models_blend)))

for i, model in enumerate(base_models_blend):
    # Train on blend_train
    model.fit(X_blend_train, y_blend_train)
    
    # Predict on holdout set
    blend_meta_train[:, i] = model.predict_proba(X_blend_hold)[:, 1]
    
    # Predict on test set
    blend_meta_test[:, i] = model.predict_proba(X_test)[:, 1]

# Step 2: Train meta-model on holdout predictions
blend_meta_model = LogisticRegression(random_state=42)
blend_meta_model.fit(blend_meta_train, y_blend_hold)

# Step 3: Predict on test set
y_pred_blend = blend_meta_model.predict(blend_meta_test)
blend_acc = accuracy_score(y_test, y_pred_blend)

print(f"\nBlending Accuracy: {blend_acc:.4f}")
print(f"Stacking Accuracy: {stack_acc:.4f}")
print(f"Difference: {abs(blend_acc - stack_acc):.4f}")

## 6. Multi-Level Stacking

We can stack multiple levels of models for even better performance.

In [None]:
# Level 0: Diverse base models
level0_models = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('xgb', XGBClassifier(n_estimators=100, random_state=42, eval_metric='logloss')),
    ('lgbm', LGBMClassifier(n_estimators=100, random_state=42, verbose=-1)),
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('nb', GaussianNB())
]

# Level 1: Intermediate meta-learners
level1_meta = StackingClassifier(
    estimators=level0_models,
    final_estimator=LogisticRegression(random_state=42),
    cv=5
)

# Level 2: Final ensemble combining Level 0 and Level 1
level2_models = [
    ('level1_stack', level1_meta),
    ('gb', GradientBoostingClassifier(n_estimators=100, random_state=42))
]

final_stack = StackingClassifier(
    estimators=level2_models,
    final_estimator=Ridge(alpha=1.0),
    cv=3
)

print("Multi-Level Stacking Architecture:")
print(f"Level 0: {len(level0_models)} diverse models")
print(f"Level 1: Stacking ensemble")
print(f"Level 2: Final combination")

# Train
print("\nTraining multi-level stack (this may take a while)...")
start = time()
final_stack.fit(X_train, y_train)
multilevel_time = time() - start

# Evaluate
y_pred_multilevel = final_stack.predict(X_test)
multilevel_acc = accuracy_score(y_test, y_pred_multilevel)

print(f"\nTraining time: {multilevel_time:.2f} seconds")
print(f"Multi-level Stacking Accuracy: {multilevel_acc:.4f}")
print(f"Single-level Stacking Accuracy: {stack_acc:.4f}")
print(f"Improvement: {multilevel_acc - stack_acc:.4f}")

**Note**: Multi-level stacking often provides marginal improvements but at significant computational cost. Use it when:
- You need every bit of accuracy
- You have computational resources
- The problem is complex enough to benefit from it

## 7. Stacking for Regression

In [None]:
# Load regression dataset
diabetes = load_diabetes()
X_reg = diabetes.data
y_reg = diabetes.target

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

print(f"Regression dataset: {X_reg.shape}")
print(f"Train: {X_train_reg.shape}, Test: {X_test_reg.shape}")

In [None]:
# Define base regressors
base_regressors = [
    ('rf', RandomForestRegressor(n_estimators=100, random_state=42)),
    ('xgb', XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)),
    ('lgbm', LGBMRegressor(n_estimators=100, learning_rate=0.1, random_state=42, verbose=-1)),
    ('svr', SVR(kernel='rbf'))
]

# Meta-learner for regression
meta_regressor = Ridge(alpha=1.0)

# Create stacking regressor
stacking_reg = StackingRegressor(
    estimators=base_regressors,
    final_estimator=meta_regressor,
    cv=5
)

# Train
stacking_reg.fit(X_train_reg, y_train_reg)

# Evaluate
y_pred_reg = stacking_reg.predict(X_test_reg)
stack_r2 = r2_score(y_test_reg, y_pred_reg)
stack_mse = mean_squared_error(y_test_reg, y_pred_reg)

print("Stacking Regressor Performance:")
print(f"R² Score: {stack_r2:.4f}")
print(f"MSE: {stack_mse:.2f}")

In [None]:
# Compare with individual models
print("\nIndividual Model Performance:")
print("=" * 50)

base_r2_scores = {}
for name, model in base_regressors:
    model.fit(X_train_reg, y_train_reg)
    y_pred = model.predict(X_test_reg)
    r2 = r2_score(y_test_reg, y_pred)
    mse = mean_squared_error(y_test_reg, y_pred)
    base_r2_scores[name] = r2
    print(f"{name:10s}: R² = {r2:.4f}, MSE = {mse:.2f}")

print("=" * 50)
print(f"{'Stacking':10s}: R² = {stack_r2:.4f}, MSE = {stack_mse:.2f}")
print(f"\nImprovement over best base model: {stack_r2 - max(base_r2_scores.values()):.4f}")

In [None]:
# Visualize predictions
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Best base model
best_base_name = max(base_r2_scores, key=base_r2_scores.get)
best_base_model = dict(base_regressors)[best_base_name]
y_pred_best = best_base_model.predict(X_test_reg)

axes[0].scatter(y_test_reg, y_pred_best, alpha=0.5)
axes[0].plot([y_test_reg.min(), y_test_reg.max()],
             [y_test_reg.min(), y_test_reg.max()],
             'r--', linewidth=2)
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(f'Best Base Model ({best_base_name})\nR² = {base_r2_scores[best_base_name]:.4f}')

axes[1].scatter(y_test_reg, y_pred_reg, alpha=0.5, color='green')
axes[1].plot([y_test_reg.min(), y_test_reg.max()],
             [y_test_reg.min(), y_test_reg.max()],
             'r--', linewidth=2)
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(f'Stacking Regressor\nR² = {stack_r2:.4f}')

plt.tight_layout()
plt.show()

## 8. Best Practices and Common Pitfalls

### Best Practices:

1. **Use Diverse Base Models**: Different algorithms, different hyperparameters
2. **Always Use Cross-Validation**: Prevents overfitting in meta-learner
3. **Keep Meta-Learner Simple**: Logistic regression, ridge, or linear models work well
4. **Monitor Overfitting**: Check if stacking actually improves test performance
5. **Start Simple**: Begin with 2-3 base models, add more if needed
6. **Consider Computational Cost**: Stacking is expensive, use when accuracy is critical

### Common Pitfalls:

1. **Not Using CV**: Training meta-learner on same data as base models → Overfitting
2. **Too Complex Meta-Learner**: Neural nets or deep trees as meta-learners often overfit
3. **Too Many Similar Base Models**: 10 random forests won't help, diversity matters
4. **Ignoring Correlation**: Highly correlated base models provide redundant information
5. **Not Checking Improvement**: Sometimes simple averaging works as well as stacking

In [None]:
# Example: Checking if stacking is worth it
# Compare stacking with simple averaging

# Get probability predictions from each base model
base_probs = []
for name, model in base_models:
    model.fit(X_train, y_train)
    probs = model.predict_proba(X_test)[:, 1]
    base_probs.append(probs)

# Simple average
avg_probs = np.mean(base_probs, axis=0)
y_pred_avg = (avg_probs > 0.5).astype(int)
avg_acc = accuracy_score(y_test, y_pred_avg)

# Weighted average (weights based on individual performance)
weights = np.array([base_accuracies[name] for name, _ in base_models])
weights = weights / weights.sum()  # Normalize
weighted_avg_probs = np.average(base_probs, axis=0, weights=weights)
y_pred_weighted = (weighted_avg_probs > 0.5).astype(int)
weighted_acc = accuracy_score(y_test, y_pred_weighted)

print("Comparison: Averaging vs Stacking")
print("=" * 50)
print(f"Simple Average:    {avg_acc:.4f}")
print(f"Weighted Average:  {weighted_acc:.4f}")
print(f"Stacking:          {stack_acc:.4f}")
print("\nConclusion:")
if stack_acc > max(avg_acc, weighted_acc) + 0.005:
    print("Stacking provides meaningful improvement - worth the complexity!")
else:
    print("Stacking improvement is marginal - simple averaging might be sufficient.")

## 9. Exercises

### Exercise 1: Custom Stacking Ensemble

Create a stacking classifier with:
- At least 5 diverse base models (include at least one from each family: tree-based, linear, distance-based)
- A meta-learner of your choice
- 10-fold cross-validation

Compare its performance with the best base model.

In [None]:
# Your code here


### Exercise 2: Implement Blending from Scratch

Implement a complete blending pipeline:
1. Split data into train (60%), holdout (20%), and test (20%)
2. Train 3 base models on train set
3. Generate predictions on holdout set
4. Train meta-learner on holdout predictions
5. Evaluate on test set

Compare with stacking using the same base models.

In [None]:
# Your code here


### Exercise 3: Meta-Learner Comparison

Using the same set of base models, try different meta-learners:
- Logistic Regression
- Ridge Regression
- Random Forest (with small n_estimators)
- Gradient Boosting (with small n_estimators)

Which meta-learner works best? Why?

In [None]:
# Your code here


### Exercise 4: Feature Engineering with Stacking

Create a stacking ensemble that uses:
- Original features AND base model predictions as input to meta-learner
- Set `passthrough=True` in StackingClassifier

Does including original features improve performance?

In [None]:
# Your code here


### Exercise 5: When Stacking Fails

Create a scenario where stacking doesn't help:
1. Use only tree-based models (Random Forest with different n_estimators)
2. Create a stacking ensemble
3. Compare with simple averaging

Explain why stacking doesn't provide much benefit in this case.

In [None]:
# Your code here


## 10. Summary

In this notebook, you learned about Stacking and Blending, advanced ensemble techniques:

### Key Concepts:

1. **Stacking (Stacked Generalization)**:
   - Combines models through a meta-learner
   - Uses cross-validation to generate meta-features
   - Meta-learner learns how to weight base model predictions
   - More robust than simple averaging

2. **Architecture**:
   - **Level 0**: Diverse base models (different algorithms)
   - **Level 1**: Meta-learner (simple model)
   - Can extend to multiple levels

3. **Blending**:
   - Simpler alternative to stacking
   - Uses holdout set instead of CV
   - Faster but uses less training data

### Best Practices:

1. **Model Diversity is Crucial**:
   - Mix different algorithm types
   - Avoid too many similar models
   - Uncorrelated errors combine better

2. **Always Use Cross-Validation**:
   - Essential for preventing overfitting
   - 5-10 folds typically work well
   - Don't train meta-learner on same data as base models

3. **Keep Meta-Learner Simple**:
   - Logistic/Linear regression often best
   - Complex meta-learners → overfitting
   - Meta-learner should combine, not learn new patterns

4. **Monitor Performance**:
   - Check if stacking actually improves over averaging
   - Marginal gains (<0.5%) may not justify complexity
   - Use validation set to verify improvements

### When to Use Stacking:

**Use Stacking when:**
✅ You have diverse base models with different strengths
✅ Individual models have similar but not identical performance
✅ You need maximum accuracy (competitions, critical applications)
✅ Computational resources are available
✅ You have sufficient training data (>10K samples)

**Avoid Stacking when:**
❌ Base models are too similar (low diversity)
❌ One model significantly outperforms others
❌ Training data is very limited (<1K samples)
❌ Computational budget is tight
❌ Simple averaging works nearly as well
❌ Model interpretability is required

### Stacking vs Other Ensemble Methods:

| Method | Complexity | Diversity | Performance | Speed |
|--------|-----------|-----------|-------------|-------|
| **Bagging** | Low | Low (same algorithm) | Good | Fast |
| **Boosting** | Medium | Low (sequential) | Excellent | Medium |
| **Voting** | Low | High (different algorithms) | Good | Fast |
| **Stacking** | High | High (different algorithms) | Best | Slow |

### Common Applications:

1. **Kaggle Competitions**: Most winners use some form of stacking
2. **Critical Predictions**: Medical diagnosis, fraud detection
3. **Research**: When pushing accuracy boundaries
4. **Ensemble of Ensembles**: Combine XGBoost, LightGBM, CatBoost

### What's Next?

In the next module, we'll explore **Voting Classifiers and Regressors**:
- Hard vs soft voting
- Combining different algorithm types
- Weight tuning for voting
- When voting helps vs when it doesn't

Voting is simpler than stacking but can still provide significant improvements!

## Additional Resources

- [Stacked Generalization (Wolpert, 1992)](http://machine-learning.martinsewell.com/ensembles/stacking/Wolpert1992.pdf) - Original paper
- [Scikit-learn Stacking Guide](https://scikit-learn.org/stable/modules/ensemble.html#stacking)
- [MLxtend Stacking Tutorial](http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/)
- [Kaggle Ensembling Guide](https://mlwave.com/kaggle-ensembling-guide/)
- [Stacking Made Easy](https://www.kdnuggets.com/2017/02/stacking-models-imropved-predictions.html)