## Ensemble stacking (final evaluation)

This notebook builds the final model by stacking multiple tuned base learners.

Workflow:
1. Load best hyperparameters for each base model (saved from the optimisation notebook).
2. Generate **out-of-fold (OOF)** predictions for each base model using K-fold CV.
3. Train a simple **Ridge** stacker on the OOF prediction matrix.
4. Refit each base model on the full training set.
5. Evaluate the stacked ensemble and each base model on the held-out **final test set**.

In [None]:
import numpy as np
import pandas as pd

from mp.models.catboost_fe import CatBoostFEModel
from mp.models.xgb_fe import XGBFEModel
from mp.models.knn_fe import KNNFEModel
from mp.models.nn_fe import NNFEModel 

from mp.models.stacking_fe import (
    build_oof_meta_features,
    fit_stacker,
    fit_base_models_full,
    build_meta_features,
    evaluate_ensemble,
)

from mp.io import load_json

In [None]:
from mp.io import get_repo_root

ROOT = get_repo_root()

cb_params  = load_json(ROOT/"reports/best_params/catboost.json")
xgb_params = load_json(ROOT/"reports/best_params/xgb.json")
knn_params = load_json(ROOT/"reports/best_params/knn.json")
nn_params  = load_json(ROOT/"reports/best_params/nn.json")

In [None]:
TARGET = "mpC"

In [None]:
train_val_data = pd.read_csv(ROOT/"data/processed/train_val.csv", index_col = 0)
final_test_df = pd.read_csv(ROOT/"data/processed/final_test.csv", index_col = 0)

In [None]:
# 1) Define base models (best params loaded)
base_models = {
    "cat": CatBoostFEModel(**cb_params),
    "xgb": XGBFEModel(**xgb_params),
    "knn": KNNFEModel(**knn_params),
    "nn":  NNFEModel(**nn_params),
}

In [None]:
# 2) Prepare train + test
X_train = train_val_data.drop(columns=[TARGET])
y_train = train_val_data[TARGET].to_numpy()

X_test = final_test_df.drop(columns=[TARGET])
y_test = final_test_df[TARGET].to_numpy()

### Out-of-fold (OOF) predictions

To avoid information leakage when training the meta-model, generate OOF predictions:

- Each base model is trained on K-1 folds and predicts the held-out fold.
- The concatenated OOF predictions form the meta-feature matrix used to fit the stacker.

This ensures the stacker only learns from predictions made on data that each base learner has not seen during training.

In [None]:
# 3) OOF meta-features
X_meta, model_names = build_oof_meta_features(
    X_train,
    y_train,
    base_models,
    n_splits=3,
    random_state=42,
    nn_name="nn",
    verbose=True,
)

# 4) Fit stacker on OOF predictions
stacker = fit_stacker(X_meta, y_train, alpha=1.0, random_state=42)

### Final evaluation on the held-out test set

After fitting the stacker:
- Base models are retrained on the full training set.
- The stacker is evaluated on the final hold-out test set (`final_test.csv`).

Reports:
- stacked ensemble MAE
- baseline MAE for each base model (same test split)

In [None]:
# 5) Refit base models on full train
fitted_base_models = fit_base_models_full(
    X_train,
    y_train,
    base_models,
    nn_name="nn",
    verbose=True,
)

In [None]:
# 6) Test meta-features + stacked preds
test_meta = build_meta_features(X_test, fitted_base_models, model_names)
y_test_ens = stacker.predict(test_meta)

ens_mae = evaluate_ensemble(y_test, y_test_ens)
print("Ensemble MAE:", ens_mae)

# 7) Base model predictions (cached)
base_preds = {
    name: fitted_base_models[name].predict(X_test)
    for name in model_names
}

for name, pred in base_preds.items():
    print(f"{name} MAE:", evaluate_ensemble(y_test, pred))

# 8) Export model predictions
assert len(y_test) == len(X_test), "y_test and X_test length mismatch"

preds = pd.DataFrame(index=X_test.index)
preds.index.name = "idx"

preds["y_true"] = np.asarray(y_test)
preds["pred_stack"] = np.asarray(y_test_ens)

for name, pred in base_preds.items():
    preds[f"pred_{name}"] = np.asarray(pred)

# Residuals / abs error (stacked)
preds["resid_stack"] = preds["pred_stack"] - preds["y_true"]
preds["abs_err_stack"] = preds["resid_stack"].abs()

# Residuals / abs error (base models)
for name in model_names:
    resid = preds[f"pred_{name}"] - preds["y_true"]
    preds[f"resid_{name}"] = resid
    preds[f"abs_err_{name}"] = resid.abs()

# Save
pred_dir = ROOT / "reports" / "predictions"
pred_dir.mkdir(parents=True, exist_ok=True)
pred_path = pred_dir / "final_test_predictions.csv"
preds.to_csv(pred_path)
print(f"Saved test predictions to: {pred_path}")

In [None]:
from mp.io import save_json

results = {
    "ensemble_mae": float(ens_mae),
    "base_mae": {
        name: float(evaluate_ensemble(y_test, fitted_base_models[name].predict(X_test)))
        for name in model_names
    },
}

save_json(results, ROOT / "reports" / "ensemble_results.json")

# Ridge weights (after StandardScaler, interpret relatively)
ridge = stacker.named_steps["ridge"]

stacker_weights = {
    "intercept": float(ridge.intercept_),
    "weights": {
        name: float(coef)
        for name, coef in zip(model_names, ridge.coef_)
    }
}

save_json(stacker_weights, ROOT / "reports" / "stacker_weights.json")

#### Functions to return feature importances from the CatBoost model

In [None]:
def export_catboost_feature_importance_from_wrapper(cat_wrapper, X_for_names, out_path):
 
    if cat_wrapper.fe_ is None or cat_wrapper.model_ is None:
        raise RuntimeError("CatBoostFEModel is not fitted (fe_ / model_ missing).")

    # Get the exact engineered feature names CatBoost trained on
    X_trans = cat_wrapper.fe_.transform(X_for_names)
    feature_names = list(X_trans.columns)

    # Get importance values from CatBoost
    importances = np.asarray(cat_wrapper.model_.get_feature_importance()).ravel()
  
    fi = (
        pd.DataFrame({"feature": feature_names, "importance": importances})
        .sort_values("importance", ascending=False)
        .reset_index(drop=True)
    )

    out_path.parent.mkdir(parents=True, exist_ok=True)
    fi.to_csv(out_path, index=False)

    return fi

In [None]:
cat_wrapper = fitted_base_models["cat"]

fi_path = ROOT / "reports" / "feature_importance" / "catboost_feature_importance.csv"
fi_df = export_catboost_feature_importance_from_wrapper(
    cat_wrapper=cat_wrapper,
    X_for_names=X_train,   # use X_train so columns are guaranteed complete
    out_path=fi_path,
)

print(f"Saved CatBoost feature importance to: {fi_path}")
fi_df.head(20)