# 🧪 SmartAlert AI - Adaptive Training Injection Harness

This notebook implements an **adaptive learning system** that simulates real-world deployment scenarios where:

- **New log data arrives in batches** (simulating daily/hourly log ingestion)
- **Models are retrained incrementally** as new data becomes available  
- **Performance is tracked over time** to detect model drift
- **Multiple model types** are compared for adaptability

## 🎯 Key Features

- **Incremental Learning**: Models adapt to new data patterns over time
- **Drift Detection**: Automatically identifies when model performance degrades
- **Model Comparison**: Test RandomForest, XGBoost, and LightGBM adaptability
- **Rich Feature Engineering**: Uses our comprehensive preprocessing pipeline
- **Performance Visualization**: Track metrics across training rounds
- **Production Ready**: Save model artifacts for deployment

## 📊 Use Cases

1. **Online Learning Simulation**: How do models perform as new incident patterns emerge?
2. **Model Selection**: Which algorithm adapts best to changing log patterns?
3. **Drift Monitoring**: When should we retrain models in production?
4. **Performance Benchmarking**: Compare adaptive vs static training approaches

In [None]:
# Import required libraries
from injection_harness import run_training_injection_harness, AdaptiveModelTracker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

print("🧪 SmartAlert AI - Adaptive Training System")
print("=" * 50)
print("📦 All libraries imported successfully!")
print(f"⏰ Session started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# 📊 Dataset Analysis
# First, let's examine our V2 dataset
df = pd.read_csv("data/splunk_logs_v2.csv")

print("📈 Dataset Overview:")
print(f"   Total samples: {len(df):,}")
print(f"   Columns: {list(df.columns)}")
print(f"   Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"   Critical incidents: {df['critical'].sum():,} ({df['critical'].mean():.1%})")

print("\n🎯 Target Distribution:")
print(df['critical'].value_counts())

print("\n📊 Feature Overview:")
print(f"   Severity levels: {df['severity'].nunique()} ({list(df['severity'].unique())})")
print(f"   Components: {df['component'].nunique()} ({list(df['component'].unique())})")
print(f"   Message length range: {df['message_length'].min()}-{df['message_length'].max()}")

df.head(3)

In [None]:
# 🚀 Run Adaptive Training - Single Model
print("🔄 Running adaptive training with XGBoost...")
print("=" * 50)

# Run adaptive training with XGBoost
results_xgb, tracker_xgb, final_model_xgb = run_training_injection_harness(
    data_path="data/splunk_logs_v2.csv",
    model_name="xgb",
    batch_size=10000,
    num_batches=5,
    target_column="critical",
    timestamp_column="timestamp",
    output_dir="models/adaptive",
    verbose=True
)

print("\n📊 XGBoost Results Summary:")
print(results_xgb[['round', 'precision', 'recall', 'f1', 'auc', 'drift_detected']].round(4))

In [None]:
# 📈 Visualize XGBoost Performance Trends
print("📈 Generating performance visualizations...")

# Use the tracker's built-in plotting function
tracker_xgb.plot_performance()

# Additional custom plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Performance metrics trend
axes[0, 0].plot(results_xgb['round'], results_xgb['precision'], 'o-', label='Precision', linewidth=2)
axes[0, 0].plot(results_xgb['round'], results_xgb['recall'], 's-', label='Recall', linewidth=2)
axes[0, 0].plot(results_xgb['round'], results_xgb['f1'], '^-', label='F1-Score', linewidth=2)
axes[0, 0].set_title('📊 Performance Metrics Over Time')
axes[0, 0].set_xlabel('Training Round')
axes[0, 0].set_ylabel('Score')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Drift detection
drift_rounds = results_xgb[results_xgb['drift_detected']]['round']
axes[0, 1].plot(results_xgb['round'], results_xgb['f1'], 'b-o', linewidth=2, label='F1-Score')
for round_num in drift_rounds:
    axes[0, 1].axvline(x=round_num, color='red', linestyle='--', alpha=0.7, label='Drift Detected' if round_num == drift_rounds.iloc[0] else "")
axes[0, 1].set_title('🚨 Model Drift Detection')
axes[0, 1].set_xlabel('Training Round')
axes[0, 1].set_ylabel('F1-Score')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Sample growth
axes[1, 0].bar(results_xgb['round'], results_xgb['cumulative_samples'], alpha=0.7, color='green')
axes[1, 0].set_title('📈 Cumulative Training Data')
axes[1, 0].set_xlabel('Training Round')
axes[1, 0].set_ylabel('Total Samples')
axes[1, 0].grid(True, alpha=0.3)

# Critical incident rate per round
axes[1, 1].plot(results_xgb['round'], results_xgb['critical_rate'], 'ro-', linewidth=2)
axes[1, 1].set_title('🎯 Critical Incident Rate')
axes[1, 1].set_xlabel('Training Round')
axes[1, 1].set_ylabel('Critical Rate')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n✅ XGBoost adaptive training completed!")
print(f"   Best F1-Score: {results_xgb['f1'].max():.4f}")
print(f"   Drift detected in: {len(drift_rounds)} rounds")
print(f"   Final model available: {final_model_xgb is not None}")
