# 04 – CESNET IsolationForest & Regression Residuals

Document the anomaly detection results on CESNET windows using:
- Gradient Boosting regression (residual analysis)
- IsolationForest evaluation (PR-AUC, recall@FPR)

This notebook loads the artifacts created by `scripts/build_cesnet_residuals.py` and `src/train_oneclass.py`.

In [None]:
import json
import pandas as pd
from pathlib import Path
from IPython.display import display

data_dir = Path('..') / 'data_processed'
runs_dir = Path('..') / 'runs'

test_path = data_dir / 'cesnet_windows_test.csv'
df_test = pd.read_csv(test_path)

print(f'CESNET test shape: {df_test.shape}')
print(f"Anomaly prevalence: {df_test['is_anom'].mean():.4%}")
print(f"Time range: {df_test['time'].min()} to {df_test['time'].max()}")
display(df_test.head())


## IsolationForest Metrics

In [None]:
iforest_run = runs_dir / '20251008_130459_cesnet_iforest_perm'
metrics_path = iforest_run / 'metrics.json'
if metrics_path.exists():
    metrics = json.load(open(metrics_path))
    display(pd.DataFrame([metrics]))
else:
    print('IsolationForest metrics not found')


## Figures

In [None]:
from IPython.display import Image
fig_dir = runs_dir / '20251008_140820_cesnet_regression_plots_final' / 'figures'
if fig_dir.exists():
    for fig in sorted(fig_dir.glob('*.png')):
        display(Image(filename=str(fig)))
else:
    print('Regression plot folder not found')


## Notes
- Residuals are computed from a gradient boosting regressor (units: MB).
- Anomalies defined via per-IP 3σ rule plus top 3% residual tail (overall prevalence ≈ 5.8%).
- IsolationForest evaluated using PR-AUC and recall@1% FPR.
- Figures show residual distributions and representative IP throughput.