### Anomaly Detection Modeling

We use Isolation Forest to detect anomalous behavior in the synthetic telemetry dataset.  
The model is trained on normalized multi-feature input, including derived features such as rolling statistics, FFT, lags, and correlations.


In [None]:
from src.generate_data import generate_synthetic_telemetry
from src.preprocess import normalize_data
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Load and augment data
df = generate_synthetic_telemetry()

# Feature engineering (same as in feature-engineering.ipynb)
df['cpu_rolling_mean'] = df['cpu'].rolling(window=20).mean()
df['cpu_rolling_std'] = df['cpu'].rolling(window=20).std()
df['latency_diff'] = df['latency'].diff()
df['errors_rolling_sum'] = df['errors'].rolling(window=10).sum()

from scipy.fft import fft
from scipy.stats import zscore
df['latency_fft_mean'] = pd.Series(
    [np.mean(np.abs(fft(df['latency'][i:i+64]))) if i+64 < len(df) else np.nan for i in range(len(df))]
)
df['cpu_z'] = zscore(df['cpu'], nan_policy='omit')
df['latency_z'] = zscore(df['latency'], nan_policy='omit')
df['cpu_lag1'] = df['cpu'].shift(1)
df['latency_lag3'] = df['latency'].shift(3)
df['errors_lag2'] = df['errors'].shift(2)
df['latency_cpu_corr_20'] = df['latency'].rolling(window=20).corr(df['cpu'])

# Drop rows with NaNs (from rolling/lags)
df_clean = df.dropna()

# Normalize
df_norm = normalize_data(df_clean)

# Train-test split
X_train, X_test = train_test_split(df_norm, test_size=0.2, random_state=42)

# Train model
model = IsolationForest(contamination=0.05, random_state=42)
model.fit(X_train)

# Predict
scores = model.decision_function(df_norm)
labels = model.predict(df_norm)  # -1 = anomaly, 1 = normal

# Plot
plt.figure(figsize=(12,4))
plt.plot(scores, label='Anomaly Score')
plt.fill_between(range(len(scores)), min(scores), max(scores), where=(labels==-1), color='red', alpha=0.3, label='Detected Anomaly')
plt.title("Isolation Forest Anomaly Detection")
plt.legend()
plt.grid()
plt.show()


Red-shaded areas indicate time windows classified as anomalous by the model.  
Note that these align with injected anomalies in the synthetic data, such as:
- Sudden CPU spikes
- Burst of error logs
- Latency jumps
