# Hybrid Anomaly Engine

This notebook implements the final Hybrid Engine combining BSTS, LSTM, and BOCPD.

## Hybrid Pipeline Runner


## Hybrid Logic
The Hybrid Anomaly Engine combines three powerful signals:
1.  **STL Decomposition**: Robust Z-score of the residuals (captures deviations from trend/seasonality).
2.  **Gaussian Process**: Z-score based on the predictive uncertainty of a GP with a Composite Kernel.
3.  **LSTM**: Z-score based on the prediction error of an LSTM trained on STL residuals.

**Ensemble Rule**:
We compute the **Average Z-Score** from all three models:
$$ Score_{combined} = \frac{Score_{STL} + Score_{GP} + Score_{LSTM}}{3} $$

We then apply a **Robust Threshold** (Rolling Sigma) and a **Persistence Filter** (min consecutive anomalies) to generate the final flags.



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import json
from IPython.display import Image, display

# Define paths
# We use relative path from notebooks/ directory
results_dir = Path("../results/hybrid/realKnownCause__nyc_taxi.csv")
pred_file = results_dir / "predictions.csv"
metrics_file = results_dir / "metrics.json"
plot_file = results_dir / "anomaly_scores.png"
forecast_plot = results_dir / "forecast_detected.png"

if pred_file.exists():
    print(f"Loading results from {pred_file}...")
    df = pd.read_csv(pred_file)
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    
    # Display Metrics
    if metrics_file.exists():
        with open(metrics_file, 'r') as f:
            metrics = json.load(f)
        print("\n--- Final Metrics ---")
        print(json.dumps(metrics, indent=2))
    
    # Plot Anomaly Scores (Pre-generated)
    if plot_file.exists():
        print("\n--- Anomaly Scores ---")
        display(Image(filename=plot_file))
        
    # Plot Forecast (Pre-generated)
    if forecast_plot.exists():
        print("\n--- Forecast & Detections ---")
        display(Image(filename=forecast_plot))
    
    # Interactive Plot of Detections (Zoomable in notebook)
    plt.figure(figsize=(15, 6))
    plt.plot(df['timestamp'], df['value'], label='Original Data', color='black', alpha=0.6)
    
    # Highlight Anomalies
    if 'detected' in df.columns:
        anoms = df[df['detected'] == 1]
        plt.scatter(anoms['timestamp'], anoms['value'], color='red', s=50, label='Detected Anomaly', zorder=5)
    
    plt.title("Hybrid Ensemble Detections (NYC Taxi)")
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
else:
    print("Results not found! Please run 'python src/run_final_benchmark.py' first.")


In [None]:

# --- Comparative Visualization ---
import matplotlib.pyplot as plt
import pandas as pd
import json
from pathlib import Path

results_dir = Path('./results')
models = ['kalman', 'bsts', 'lstm', 'gp', 'hybrid']
metrics_data = []

for m in models:
    # Find the first subdirectory (dataset)
    model_dir = results_dir / m
    if model_dir.exists():
        # Assume only one dataset for now or take the first one
        subdirs = [d for d in model_dir.iterdir() if d.is_dir()]
        if subdirs:
            metric_file = subdirs[0] / 'metrics.json'
            if metric_file.exists():
                with open(metric_file, 'r') as f:
                    data = json.load(f)
                    # Handle different metric structures if needed
                    if 'event_level' in data:
                        evt = data['event_level']
                        metrics_data.append({
                            'Model': m.upper(),
                            'F1': evt.get('f1', 0),
                            'Precision': evt.get('precision', 0),
                            'Recall': evt.get('recall', 0)
                        })

df_metrics = pd.DataFrame(metrics_data)

if not df_metrics.empty:
    # Plot
    fig, ax = plt.subplots(figsize=(10, 6))
    df_metrics.plot(x='Model', y=['F1', 'Precision', 'Recall'], kind='bar', ax=ax, rot=0)
    ax.set_title('Model Performance Comparison (Event-Level)')
    ax.set_ylabel('Score')
    ax.set_ylim(0, 1.1)
    ax.grid(axis='y', linestyle='--', alpha=0.7)
    plt.tight_layout()
    save_dir = Path('./results/comparison')
    save_dir.mkdir(parents=True, exist_ok=True)
    plt.savefig(save_dir / 'hybrid_comparison.png')
    print(f'Saved plot to {save_dir / "hybrid_comparison.png"}')
    plt.show()
else:
    print("No metrics found to plot.")


In [ ]:

# --- Hero Plot: Hybrid Detection ---
from IPython.display import Image, display
from pathlib import Path

# Display the Hybrid Forecast & Detection plot
hybrid_plot_path = Path("./results/hybrid/realKnownCause__nyc_taxi.csv/forecast_detected.png")
if hybrid_plot_path.exists():
    print("Hybrid Anomaly Detection:")
    display(Image(filename=hybrid_plot_path))
else:
    print("Hybrid plot not found. Run the pipeline above.")
    
# Display Anomaly Scores
score_plot_path = Path("./results/hybrid/realKnownCause__nyc_taxi.csv/anomaly_scores.png")
if score_plot_path.exists():
    print("\nAnomaly Scores by Model:")
    display(Image(filename=score_plot_path))




## Conclusion
- **Reduced False Positives**: By averaging scores, the Hybrid model suppresses noise that triggers single models (e.g., Kalman).
- **Robustness**: The **Persistence Filter** and **Rolling Threshold** adapt to changing variance, preventing floods of false alarms.
- **Changepoints**: The **BOCPD** module (integrated via `src/changepoint.py`) helps identify regime shifts, allowing the system to reset or adapt.

