# 11 — Final Ensemble & Online Prediction Demo

This notebook demonstrates:
1. Loading the **stacking ensemble** (`models/final_ensemble.joblib`)
2. Reviewing **final evaluation metrics** (`reports/final_metrics.json`)
3. Running **online prediction** on a sample JSON batch

In [None]:
import sys, pathlib, json
ROOT = pathlib.Path.cwd().parent
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

from IPython.display import display, JSON, Image, Markdown
import pandas as pd
print("Imports OK")

## 1. Final Metrics

In [None]:
metrics = json.loads((ROOT / "reports" / "final_metrics.json").read_text())

# Regression comparison
reg_rows = []
for key, label in [("ensemble_reg", "Ensemble"), ("xgb_reg", "XGBoost"),
                    ("lgb_reg", "LightGBM"), ("ridge_reg", "Ridge")]:
    m = metrics[key]
    reg_rows.append({"Model": label, "MAE": m["mae"], "RMSE": m["rmse"], "R²": m["r2"]})
print("Regression (test set):")
display(pd.DataFrame(reg_rows).set_index("Model").round(4))

# Classification comparison
clf_rows = []
for key, label in [("ensemble_clf", "Ensemble"), ("xgb_clf", "XGBoost"),
                    ("lgb_clf", "LightGBM"), ("lr_clf", "LogReg")]:
    m = metrics[key]
    clf_rows.append({"Model": label, "F1": m.get("f1",0), "AUC": m.get("roc_auc",0),
                     "Accuracy": m.get("accuracy",0)})
print("\nClassification (test set):")
display(pd.DataFrame(clf_rows).set_index("Model").round(4))

In [None]:
display(Image(str(ROOT / "figures" / "final_metrics.png")))

## 2. Load Ensemble Artifact

In [None]:
from src.features.feature_pipeline import FrequencyEncoder
from src.models.ensemble import StackingEnsemble
import joblib

ens = joblib.load(ROOT / "models" / "final_ensemble.joblib")
print("Ensemble keys:", list(ens.keys()))
print("Reg base learners:", [type(m).__name__ for m in ens['reg'].base_models])
print("Clf base learners:", [type(m).__name__ for m in ens['clf'].base_models])
print("Meta-learner (reg):", type(ens['reg'].meta_model).__name__)
print("Meta-learner (clf):", type(ens['clf'].meta_model).__name__)

## 3. Online Prediction Demo

Read `examples/sample_input.json`, run through the prediction pipeline,
and write `examples/sample_output.json`.

In [None]:
# Show input
inp = json.loads((ROOT / "examples" / "sample_input.json").read_text())
print(f"Input: {len(inp['flows'])} flow(s)")
display(pd.DataFrame(inp["flows"])[["src_device_id", "dst_device_id",
    "traffic_type", "packet_size_bytes", "controller_state", "attack_type"]])

In [None]:
from src.predict.online_predict import read_input, predict, write_output

df_in = read_input(ROOT / "examples" / "sample_input.json")
preds = predict(df_in, ens)

out_path = ROOT / "examples" / "sample_output.json"
write_output(preds, out_path)

display(pd.DataFrame(preds))

In [None]:
# Verify output file
out = json.loads(out_path.read_text())
print("Output file predictions:")
for p in out["predictions"]:
    flag = "VIOLATION" if p["violation_flag"] else "ok"
    print(f"  flow {p['flow_index']}: "
          f"latency={p['predicted_latency_us']:.2f} μs  "
          f"P(violation)={p['violation_probability']:.4f}  [{flag}]")

## 4. CLI equivalent

The same prediction can be run from the command line:

```bash
python -m src.predict.online_predict \
    --input  examples/sample_input.json \
    --output examples/sample_output.json
```