LLM-Sentinel

Enterprise-Grade Anomaly Detection and Observability for LLM Applications

LLM-Sentinel is a production-ready, real-time anomaly detection and observability platform designed specifically for Large Language Model (LLM) applications. Built in Rust for maximum performance and reliability, it provides comprehensive monitoring, statistical anomaly detection, automated alerting, and deep observability to ensure the reliability, security, and cost-effectiveness of your LLM deployments.

Status: ✅ Production Ready - Full implementation with enterprise-grade deployment infrastructure

Key Features

🔍 Advanced Multi-Algorithm Anomaly Detection

Z-Score Detection: Parametric anomaly detection for normally distributed metrics with configurable thresholds (default: 3.0σ)
IQR Detection: Non-parametric outlier detection using interquartile range (default: 1.5x multiplier)
MAD Detection: Robust outlier detection using median absolute deviation (default: 3.5 threshold)
CUSUM Detection: Cumulative sum change point detection for drift and regime shifts (default: 5.0 threshold, 0.5 drift)
Multi-Dimensional Baselines: Per-service, per-model statistical baselines with automatic updates
Configurable Sensitivity: Tune detection sensitivity for your specific use cases

📊 Comprehensive Monitoring

Monitor all critical LLM metrics:

Latency Spikes: Detect unusual response times (P50, P95, P99)
Token Usage Anomalies: Monitor prompt and completion token consumption patterns
Cost Anomalies: Track unexpected spending patterns and budget overruns
Error Rate Spikes: Identify service degradation and failures
Model Drift: Detect quality degradation over time
Usage Patterns: Identify suspicious or abnormal usage behavior
Throughput Changes: Monitor request rate variations

🚀 High Performance Architecture

10,000+ events/second ingestion throughput
Sub-5ms P50 latency for anomaly detection
Lock-free concurrent baseline updates using DashMap
Batch processing for optimal InfluxDB writes (100-record batches)
Zero-copy parsing for maximum efficiency
Async/await throughout for non-blocking I/O
Memory-efficient streaming processing

🛡️ Production-Grade Reliability

Zero unsafe code: Memory safety guaranteed by Rust's type system
Comprehensive error handling: Type-safe Result propagation with detailed error context
Graceful shutdown: Proper signal handling (SIGTERM, SIGINT) with resource cleanup
Health checks: Liveness, readiness, and startup probes for Kubernetes
Circuit breakers: Automatic failure detection and recovery
Exponential backoff: Intelligent retry logic for transient failures
Connection pooling: Efficient resource management

🔔 Flexible Alerting System

RabbitMQ Integration: Topic-based routing with severity levels (info, warning, critical)
Webhook Delivery: HTTP POST with HMAC-SHA256 signatures for verification
Alert Deduplication: Configurable 5-minute window to prevent alert storms
Retry Logic: Exponential backoff with configurable max attempts (default: 3)
Priority Routing: Route critical alerts to different channels
Batch Alerting: Optional batching for high-volume scenarios

💾 Scalable Storage & Caching

InfluxDB v3: Time-series storage for telemetry with automatic downsampling
Moka Cache: High-performance in-memory cache (10,000 entry capacity)
Redis Support: Distributed caching for multi-instance deployments
Persistent Baselines: Baseline persistence to disk for quick restarts
Query API: REST endpoints for historical data retrieval and analysis
TTL Management: Automatic expiration of stale data (300s default)

📈 Rich Observability

50+ Prometheus Metrics: Comprehensive instrumentation of all subsystems
4 Pre-built Grafana Dashboards:
- Anomaly Detection Dashboard
- System Health Dashboard
- Performance Metrics Dashboard
- Alert Overview Dashboard
50+ Alert Rules: Production-ready Prometheus alerting covering all failure modes
Distributed Tracing: OpenTelemetry support for request tracing
Structured Logging: JSON logs with configurable levels (trace, debug, info, warn, error)

☸️ Cloud-Native Deployment

Kubernetes-Ready: Production manifests with HPA, PDB, and network policies
Helm Chart: Parameterized chart for easy deployment and upgrades
Docker Images: Multi-stage builds with minimal attack surface (<50MB)
Horizontal Scaling: Support for 3-10+ replicas with auto-scaling
StatefulSet Support: Optional for baseline persistence
Service Mesh Compatible: Works with Istio, Linkerd, Consul Connect

🔒 Security & Compliance

Non-Root Containers: Runs as UID 1000 with dropped capabilities
Read-Only Filesystem: Root filesystem mounted read-only
Network Policies: Restrict ingress/egress to required services only
Secret Management: Support for Kubernetes secrets and external secret stores
PII Sanitization: Automatic detection and removal of sensitive data
Audit Logging: Complete audit trail of all anomalies and alerts
SBOM Generation: Software Bill of Materials for vulnerability tracking

Detection Capabilities

Real-Time Anomaly Detection

LLM-Sentinel provides four complementary detection algorithms that can be enabled individually or in combination:

1. Z-Score Detection (Parametric)

Best for: Metrics that follow a normal distribution

detection:
  zscore:
    threshold: 3.0           # Standard deviations from mean
    sensitivity: "medium"     # low, medium, high
    metrics:
      - latency_ms
      - total_tokens
      - cost_usd

Use Cases:

Latency spike detection
Token usage monitoring
Cost anomaly detection

2. IQR Detection (Non-Parametric)

Best for: Metrics with skewed distributions or outliers

detection:
  iqr:
    multiplier: 1.5          # IQR multiplier for outliers
    metrics:
      - latency_ms
      - total_tokens

Use Cases:

Robust outlier detection
Handling non-normal distributions
Resistant to extreme values

3. MAD Detection (Robust)

Best for: Metrics requiring high robustness to outliers

detection:
  mad:
    threshold: 3.5           # MAD threshold
    metrics:
      - latency_ms

Use Cases:

Ultra-robust detection
Minimal false positives
Gradual baseline shifts

4. CUSUM Detection (Change Point)

Best for: Detecting sustained changes and drift

detection:
  cusum:
    threshold: 5.0           # Cumulative threshold
    drift: 0.5              # Drift parameter
    metrics:
      - latency_ms

Use Cases:

Model performance degradation
Service quality drift
Gradual system changes

Baseline Management

Adaptive Baselines: Automatic baseline updates every 60 seconds
Multi-Dimensional: Separate baselines per service, model, and metric
Configurable Window: 1000-sample sliding window (configurable)
Minimum Samples: Require 10+ samples before detection (prevents cold-start false positives)
Persistence: Save/load baselines from disk for fast restarts

Deployment Options

Option 1: Docker Compose (Development)

Perfect for local development and testing:

# Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)
docker-compose up -d

# View logs
docker-compose logs -f sentinel

# Stop environment
docker-compose down

Includes:

Sentinel (3 replicas)
Kafka cluster (3 brokers)
InfluxDB v3
RabbitMQ with management UI
Redis
Prometheus
Grafana with pre-loaded dashboards

Option 2: Kubernetes (Production)

Production-grade deployment with all manifests:

# Apply all manifests
kubectl apply -f k8s/

# Or using kustomize
kubectl apply -k k8s/

# Check status
kubectl get pods -l app.kubernetes.io/name=sentinel

Includes:

Deployment with 3 replicas
HorizontalPodAutoscaler (3-10 replicas, CPU/memory targets)
PodDisruptionBudget (min 2 available)
NetworkPolicy (restricted ingress/egress)
ServiceMonitor (Prometheus integration)
Ingress with TLS

Option 3: Helm Chart (Recommended)

Easiest production deployment with full parameterization:

# Install with default values
helm install sentinel ./helm/sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"

# Install with custom values
helm install sentinel ./helm/sentinel -f production-values.yaml

# Upgrade
helm upgrade sentinel ./helm/sentinel

# Uninstall
helm uninstall sentinel

Helm Chart Features:

Full configuration parameterization (335+ options)
External secrets support (AWS Secrets Manager, Vault, etc.)
Multiple environment profiles (dev, staging, prod)
Init containers for dependency checking
Automatic ConfigMap/Secret generation
Post-install notes with helpful commands

See Helm Chart README for complete documentation.

Option 4: Binary (Bare Metal)

Direct installation on Linux servers:

# Build release binary
cargo build --release

# Install
sudo cp target/release/sentinel /usr/local/bin/

# Create config
sudo mkdir -p /etc/sentinel
sudo cp config/sentinel.yaml /etc/sentinel/

# Create systemd service
sudo cp deployments/systemd/sentinel.service /etc/systemd/system/
sudo systemctl enable sentinel
sudo systemctl start sentinel

Quick Start

Prerequisites

Rust 1.75+ (for building from source)
Kafka 2.8+ (message queue)
InfluxDB v3 (time-series database)
RabbitMQ 3.8+ (optional, for alerting)
Redis 6.0+ (optional, for distributed caching)

Local Development Setup

# Clone repository
git clone https://github.com/llm-devops/llm-sentinel.git
cd llm-sentinel

# Start infrastructure
docker-compose up -d kafka influxdb rabbitmq redis

# Build project
cargo build --release

# Run tests
cargo test --workspace

# Start service
export INFLUXDB_TOKEN="your-token"
export RABBITMQ_PASSWORD="your-password"
./target/release/sentinel --config config/sentinel.yaml

Kubernetes Quick Start

# Create namespace
kubectl create namespace sentinel

# Create secrets
kubectl create secret generic sentinel-secrets \
  --from-literal=influxdb-token="your-token" \
  --from-literal=rabbitmq-password="your-password" \
  -n sentinel

# Deploy with Helm
helm install sentinel ./helm/sentinel \
  --namespace sentinel \
  --set secrets.influxdbToken="your-token" \
  --set secrets.rabbitmqPassword="your-password"

# Verify deployment
kubectl get pods -n sentinel
kubectl logs -f deployment/sentinel -n sentinel

# Access dashboards
kubectl port-forward svc/sentinel 8080:8080 -n sentinel
open http://localhost:8080/metrics

Verify Installation

# Check health
curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready

# View metrics
curl http://localhost:8080/metrics

# Query recent anomalies
curl http://localhost:8080/api/v1/anomalies/recent?limit=10

# Query telemetry
curl "http://localhost:8080/api/v1/telemetry?service=chat-api&hours=1"

Architecture

High-Level Architecture

┌────────────────────────────────────────────────────────────────────┐
│                         LLM Applications                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│  │ OpenAI   │  │ Claude   │  │ Llama    │  │ Custom   │         │
│  │ API      │  │ API      │  │ API      │  │ LLM API  │         │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘         │
└───────┼─────────────┼─────────────┼──────────────┼────────────────┘
        │             │             │              │
        └─────────────┴─────────────┴──────────────┘
                      │
                      ▼ (Telemetry Events)
        ┌─────────────────────────────┐
        │      Apache Kafka           │
        │   (llm.telemetry topic)     │
        └─────────────┬───────────────┘
                      │
                      ▼
┌────────────────────────────────────────────────────────────────────┐
│                         LLM-SENTINEL                               │
├────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │               Ingestion Layer (sentinel-ingestion)           │ │
│  │  ┌──────────────┐    ┌──────────────┐    ┌───────────────┐ │ │
│  │  │   Kafka      │───>│  OTLP/JSON   │───>│  Validation   │ │ │
│  │  │   Consumer   │    │   Parsing    │    │  & PII Filter │ │ │
│  │  └──────────────┘    └──────────────┘    └───────┬───────┘ │ │
│  └────────────────────────────────────────────────────┼─────────┘ │
│                                                        │           │
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │          Detection Engine (sentinel-detection)               │ │
│  │  ┌──────────────────────────────────────────────────────┐   │ │
│  │  │           Baseline Manager (Multi-Dimensional)        │   │ │
│  │  │  Per Service × Model × Metric Statistical Baselines  │   │ │
│  │  └─────────────────────┬────────────────────────────────┘   │ │
│  │                        │                                     │ │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌──────────┐│ │
│  │  │  Z-Score  │  │    IQR    │  │    MAD    │  │  CUSUM   ││ │
│  │  │ Detector  │  │ Detector  │  │ Detector  │  │ Detector ││ │
│  │  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘  └────┬─────┘│ │
│  │        └──────────────┴───────────────┴──────────────┘      │ │
│  │                        │ (Anomalies)                         │ │
│  └────────────────────────┼─────────────────────────────────────┘ │
│                           │                                       │
│  ┌────────────────────────┼─────────────────────────────────────┐ │
│  │                        ▼                                      │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │ │
│  │  │  InfluxDB    │  │  RabbitMQ    │  │  Webhook     │      │ │
│  │  │  Storage     │  │  Alerting    │  │  Delivery    │      │ │
│  │  │              │  │              │  │              │      │ │
│  │  │ • Telemetry  │  │ • Topic      │  │ • HMAC       │      │ │
│  │  │ • Anomalies  │  │   Routing    │  │   Signature  │      │ │
│  │  │ • Query API  │  │ • Severity   │  │ • Retry      │      │ │
│  │  └──────────────┘  └──────────────┘  └──────────────┘      │ │
│  │  (sentinel-storage) (sentinel-alerting)                     │ │
│  └──────────────────────────────────────────────────────────────┘ │
│                                                                    │
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │                REST API (sentinel-api)                        │ │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐            │ │
│  │  │  Health    │  │  Metrics   │  │   Query    │            │ │
│  │  │  /health/* │  │ /metrics   │  │  /api/v1/* │            │ │
│  │  └────────────┘  └────────────┘  └────────────┘            │ │
│  └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
                      │              │
                      ▼              ▼
        ┌──────────────────┐  ┌──────────────────┐
        │   Prometheus     │  │     Grafana      │
        │   (Metrics)      │  │   (Dashboards)   │
        └──────────────────┘  └──────────────────┘

Component Architecture

sentinel-core

Error types and handling
Configuration models
Telemetry event types
Shared utilities

sentinel-ingestion

Kafka consumer with group management
OTLP/JSON parsing
Schema validation
PII detection and sanitization
Configurable message handling

sentinel-detection

Statistical baseline management
Four detection algorithms (Z-Score, IQR, MAD, CUSUM)
Multi-dimensional baseline tracking
Lock-free concurrent updates
Baseline persistence

sentinel-storage

InfluxDB v3 client with batch writes
In-memory cache (Moka)
Redis distributed cache
Query API for historical data
Automatic TTL management

sentinel-alerting

RabbitMQ topic publisher
Webhook HTTP delivery
Alert deduplication (5-minute window)
Exponential backoff retry
HMAC signature generation

sentinel-api

REST API server (Axum)
Health check endpoints
Prometheus metrics exporter
Query endpoints for telemetry and anomalies
CORS support

Observability

Grafana Dashboards

Four production-ready dashboards included in deployments/grafana/dashboards/:

1. Anomaly Detection Dashboard

Anomaly detection rate by severity (timeseries)
Total anomaly rate gauge
Anomalies in last 24h (stat)
Anomaly types distribution (bar chart)
Top services by anomalies
Detection latency percentiles (P50, P95, P99)

2. System Health Dashboard

Service status (up/down)
Active instance count
Uptime tracking
Memory usage by instance
CPU usage by instance
Event processing throughput
Error rates by component

3. Performance Metrics Dashboard

Events processed per second
Detection P95 latency
Cache hit rate percentage
Pipeline throughput (Kafka → Detection → Storage)
Storage write rate
API request rate

4. Alert Overview Dashboard

Alerts sent (last hour)
Alerts deduplicated (last hour)
Alert failures (last hour)
RabbitMQ publishes by severity
Webhook delivery success rate
Alert distribution by type

Prometheus Metrics

50+ metrics exported at /metrics:

Ingestion Metrics:

sentinel_events_received_total - Total events consumed from Kafka
sentinel_events_processed_total - Successfully processed events
sentinel_ingestion_errors_total - Ingestion errors by type
sentinel_validation_failures_total - Validation failures
sentinel_kafka_messages_consumed_total - Kafka messages consumed
sentinel_kafka_consumption_errors_total - Kafka errors

Detection Metrics:

sentinel_anomalies_detected_total - Anomalies by severity and detector
sentinel_detection_latency_seconds - Detection latency histogram
sentinel_detection_errors_total - Detection errors
sentinel_baseline_updates_total - Baseline update count
sentinel_baseline_samples - Current baseline sample counts

Storage Metrics:

sentinel_storage_writes_total - Successful storage writes
sentinel_storage_errors_total - Storage errors
sentinel_cache_hits_total - Cache hits
sentinel_cache_misses_total - Cache misses
sentinel_cache_size - Current cache size

Alerting Metrics:

sentinel_alerts_sent_total - Alerts sent by channel
sentinel_alerts_deduplicated_total - Deduplicated alerts
sentinel_alert_failures_total - Alert delivery failures
sentinel_rabbitmq_publishes_total - RabbitMQ publishes
sentinel_webhook_deliveries_total - Webhook deliveries
sentinel_webhook_failures_total - Webhook failures

Prometheus Alerts

50+ production-ready alert rules in deployments/prometheus/alerts/sentinel-alerts.yaml:

Service Health:

SentinelServiceDown (1m down)
SentinelHighRestartRate (frequent restarts)
SentinelInsufficientInstances (<2 instances)

Performance:

SentinelHighLatency (P95 >50ms)
SentinelVeryHighLatency (P95 >100ms)
SentinelLowThroughput (<100 events/sec)

Errors:

SentinelHighErrorRate (>10 errors/sec)
SentinelIngestionErrors (>5 errors/sec)
SentinelDetectionErrors (>5 errors/sec)
SentinelStorageErrors (>5 errors/sec)

Resources:

SentinelHighMemoryUsage (>1.5GB)
SentinelCriticalMemoryUsage (>1.8GB)
SentinelHighCPUUsage (>80%)

Anomalies:

SentinelAnomalySpike (>50 anomalies/sec)
SentinelCriticalAnomalies (>5 critical/sec)
SentinelNoAnomalies (0 anomalies for 6h - detection health check)

API Reference

Health Endpoints

Liveness Probe

GET /health/live

Response: 200 OK
{
  "status": "healthy",
  "timestamp": "2024-11-06T10:30:00Z"
}

Readiness Probe

GET /health/ready

Response: 200 OK (when ready to accept traffic)
Response: 503 Service Unavailable (when not ready)

Metrics Endpoint

Prometheus Metrics

GET /metrics

Response: 200 OK
# HELP sentinel_events_processed_total Total events processed
# TYPE sentinel_events_processed_total counter
sentinel_events_processed_total{service="chat-api"} 12453
...

Query Endpoints

Query Recent Telemetry

GET /api/v1/telemetry?service={service}&model={model}&hours={hours}

Example:
GET /api/v1/telemetry?service=chat-api&model=gpt-4&hours=24

Response: 200 OK
{
  "data": [...],
  "count": 1523,
  "timeRange": {
    "start": "2024-11-05T10:30:00Z",
    "end": "2024-11-06T10:30:00Z"
  }
}

Query Anomalies

GET /api/v1/anomalies?severity={severity}&hours={hours}&limit={limit}

Example:
GET /api/v1/anomalies?severity=critical&hours=1&limit=50

Response: 200 OK
{
  "anomalies": [
    {
      "id": "anom-123",
      "timestamp": "2024-11-06T10:25:00Z",
      "service": "chat-api",
      "model": "gpt-4",
      "detector": "zscore",
      "metric": "latency_ms",
      "value": 15234.5,
      "baseline_mean": 1234.5,
      "baseline_stddev": 234.2,
      "z_score": 59.8,
      "severity": "critical"
    }
  ],
  "count": 3
}

Query Recent Anomalies

GET /api/v1/anomalies/recent?limit={limit}

Example:
GET /api/v1/anomalies/recent?limit=10

Response: 200 OK (last 10 anomalies)

Query Baseline Statistics

GET /api/v1/baselines?service={service}&model={model}

Response: 200 OK
{
  "baselines": [
    {
      "key": "chat-api:gpt-4:latency_ms",
      "samples": 1000,
      "mean": 1234.5,
      "stddev": 234.2,
      "median": 1205.3,
      "p95": 1687.2,
      "p99": 2103.4
    }
  ]
}

Example Producers

Python Producer

Full-featured Python example for sending telemetry to Kafka:

cd examples/python

# Install dependencies
pip install -r requirements.txt

# Run with defaults (20 normal + 5 anomalous events)
python producer.py --brokers localhost:9092

# Run continuously
python producer.py --continuous

# Custom configuration
python producer.py \
  --brokers kafka-0:9092,kafka-1:9092 \
  --topic llm.telemetry \
  --normal-events 50 \
  --anomalous-events 10

Features:

Configurable event generation
Simulates 4 anomaly types (high latency, high tokens, high cost, suspicious patterns)
Kafka integration with guaranteed delivery
Continuous mode for load testing

See Python Example README

Go Producer

High-performance Go example for production use:

cd examples/go

# Build
go build -o producer producer.go

# Run with defaults
./producer -brokers localhost:9092

# Run continuously
./producer -continuous

# Custom configuration
./producer \
  -brokers kafka-0:9092,kafka-1:9092 \
  -topic llm.telemetry \
  -normal-events 100 \
  -anomalous-events 20

Features:

Native Go performance (10,000+ events/sec)
Graceful shutdown (SIGINT/SIGTERM)
Connection pooling and retries
Minimal memory footprint (<20MB)

See Go Example README

Integration Example

Integrate telemetry into your LLM application:

from llm_sentinel import TelemetryProducer

# Initialize
producer = TelemetryProducer(
    brokers=["kafka:9092"],
    topic="llm.telemetry"
)

# After each LLM API call
event = producer.create_telemetry_event(
    service_name="my-chatbot",
    model_name="gpt-4",
    latency_ms=response_time,
    prompt_tokens=completion.usage.prompt_tokens,
    completion_tokens=completion.usage.completion_tokens,
    cost_usd=calculated_cost,
    user_id=user.id,
    session_id=session.id
)

producer.send_event(event)

Configuration

Complete Configuration Example

# Ingestion configuration
ingestion:
  kafka:
    brokers:
      - "kafka-0:9092"
      - "kafka-1:9092"
      - "kafka-2:9092"
    topic: "llm.telemetry"
    group_id: "sentinel-consumer"
    session_timeout_ms: 6000
    enable_auto_commit: false
    auto_offset_reset: "latest"
    max_poll_records: 500

  parsing:
    max_text_length: 10000
    enable_sanitization: true
    sanitize_patterns:
      - "password"
      - "api_key"
      - "secret"

  validation:
    min_latency_ms: 0.1
    max_latency_ms: 300000.0  # 5 minutes
    max_tokens: 100000
    max_cost_usd: 100.0
    enable_pii_detection: true

# Detection configuration
detection:
  enabled_detectors:
    - "zscore"
    - "iqr"
    - "mad"
    - "cusum"

  baseline:
    window_size: 1000
    min_samples: 10
    update_interval_secs: 60
    enable_persistence: true
    persistence_path: "/var/lib/sentinel/baselines"

  zscore:
    threshold: 3.0
    sensitivity: "medium"  # low, medium, high
    metrics:
      - "latency_ms"
      - "total_tokens"
      - "cost_usd"

  iqr:
    multiplier: 1.5
    metrics:
      - "latency_ms"
      - "total_tokens"

  mad:
    threshold: 3.5
    metrics:
      - "latency_ms"

  cusum:
    threshold: 5.0
    drift: 0.5
    metrics:
      - "latency_ms"

# Storage configuration
storage:
  influxdb:
    url: "http://influxdb:8086"
    org: "sentinel"
    token: "${INFLUXDB_TOKEN}"
    telemetry_bucket: "telemetry"
    anomaly_bucket: "anomalies"
    batch_size: 100
    timeout_secs: 10

  cache:
    max_capacity: 10000
    ttl_secs: 300
    tti_secs: 60
    enable_metrics: true

  redis:
    enabled: true
    url: "redis://redis:6379"
    password: "${REDIS_PASSWORD}"
    key_prefix: "sentinel:"
    ttl_secs: 300

# Alerting configuration
alerting:
  rabbitmq:
    url: "amqp://rabbitmq:5672"
    username: "sentinel"
    password: "${RABBITMQ_PASSWORD}"
    exchange: "sentinel.alerts"
    exchange_type: "topic"
    routing_key_prefix: "alert"
    persistent: true
    timeout_secs: 10
    retry_config:
      max_attempts: 3
      initial_delay_ms: 1000
      backoff_multiplier: 2.0
      max_delay_ms: 30000

  webhook:
    enabled: false
    url: "https://alerts.example.com/webhook"
    method: "POST"
    secret: "${WEBHOOK_SECRET}"
    timeout_secs: 10
    max_retries: 3
    retry_delay_ms: 1000
    backoff_multiplier: 2.0

  deduplication:
    enabled: true
    window_secs: 300
    cleanup_interval_secs: 60

# API configuration
api:
  bind_addr: "0.0.0.0:8080"
  enable_cors: true
  cors_origins:
    - "*"
  timeout_secs: 30
  max_body_size: 10485760  # 10MB
  enable_logging: true
  metrics_path: "/metrics"

See config/sentinel.yaml for a complete annotated example.

Environment Variables

All sensitive configuration can be provided via environment variables:

export INFLUXDB_TOKEN="your-influxdb-token"
export RABBITMQ_PASSWORD="your-rabbitmq-password"
export REDIS_PASSWORD="your-redis-password"
export WEBHOOK_SECRET="your-webhook-secret"
export SENTINEL_LOG_LEVEL="info"
export SENTINEL_LOG_JSON="true"
export RUST_BACKTRACE="1"

Performance

Throughput Benchmarks

Tested on AWS m5.xlarge (4 vCPU, 16GB RAM):

Operation	Throughput	P50	P95	P99
Ingestion	10,500 events/sec	2ms	8ms	15ms
Detection	5,200 detections/sec	3ms	12ms	25ms
Storage (batched)	8,300 writes/sec	5ms	18ms	35ms
API Queries	2,800 req/sec	8ms	25ms	48ms

Resource Usage

Configuration	Memory	CPU	Disk I/O
Idle	180MB	0.1 cores	1 MB/s
Light load (1k events/sec)	320MB	0.8 cores	15 MB/s
Medium load (5k events/sec)	520MB	1.8 cores	42 MB/s
Heavy load (10k events/sec)	850MB	2.9 cores	78 MB/s

Scaling Characteristics

Horizontal: Linear scaling up to 10 replicas
Vertical: Efficient use of multi-core (up to 8 cores)
Storage: InfluxDB handles 100k+ writes/sec with proper sizing
Network: ~10 Mbps at 10k events/sec (depends on event size)

Development

Project Structure

llm-sentinel/
├── crates/
│   ├── sentinel-core/          # Core types and error handling (1,350 lines)
│   ├── sentinel-ingestion/     # Kafka consumer and OTLP parsing (1,390 lines)
│   ├── sentinel-detection/     # Anomaly detection algorithms (2,319 lines)
│   ├── sentinel-storage/       # InfluxDB and caching (987 lines)
│   ├── sentinel-alerting/      # RabbitMQ and webhooks (1,645 lines)
│   └── sentinel-api/           # REST API server (1,452 lines)
├── sentinel/                   # Main binary (285 lines)
├── config/                     # Configuration examples
├── deployments/                # Deployment configurations
│   ├── grafana/               # 4 Grafana dashboards
│   └── prometheus/            # 50+ alert rules
├── docs/                       # Complete documentation (27 files)
├── examples/                   # Example producers
│   ├── python/                # Python producer example
│   └── go/                    # Go producer example
├── helm/                       # Helm chart
│   └── sentinel/              # Production-ready chart
├── k8s/                        # Kubernetes manifests (13 files)
├── .github/workflows/          # CI/CD pipelines
├── Cargo.toml                  # Rust workspace
├── docker-compose.yaml         # Local development environment
└── Dockerfile                  # Multi-stage Docker build

Total: ~9,500 lines of production Rust code

Running Tests

# All tests
cargo test --workspace

# Specific crate
cargo test -p sentinel-detection

# With output
cargo test -- --nocapture

# Integration tests only
cargo test --workspace --test '*'

# With coverage
cargo tarpaulin --out Html --output-dir coverage

Building

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# With specific features
cargo build --features redis

# Docker build
docker build -t llm-sentinel:latest .

# Multi-platform Docker build
docker buildx build --platform linux/amd64,linux/arm64 -t llm-sentinel:latest .

Code Quality

# Format code
cargo fmt

# Lint with clippy
cargo clippy -- -D warnings

# Security audit
cargo audit

# Check dependencies
cargo deny check

CI/CD Pipeline

Full GitHub Actions workflow:

CI Pipeline (`.github/workflows/ci.yaml`)

Format check (rustfmt)
Lint (clippy with all warnings as errors)
Build (debug and release)
Test (all workspaces)
Coverage (tarpaulin with 80% target)
Security audit (cargo-audit)
Dependency check (cargo-deny)

CD Pipeline (`.github/workflows/cd.yaml`)

Docker build (multi-stage)
SBOM generation (syft)
Image signing (cosign)
Vulnerability scan (trivy)
Push to registry (GHCR)
Helm package (chart packaging)
Deploy (to staging/production)

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

Code of Conduct
Development setup
Coding standards
Testing requirements
Pull request process
Security reporting

Documentation

Complete documentation available in /docs:

License

This project is licensed under the Apache License 2.0 - see LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Complete docs
Security: security@llm-devops.io

Roadmap

Additional detection algorithms (Isolation Forest, AutoEncoder)
Real-time dashboard (WebSocket streaming)
Multi-tenant support
Advanced query DSL
Machine learning model drift detection
Automated baseline tuning
Support for additional message brokers (NATS, Pulsar)
OpenTelemetry native ingestion
Grafana plugin for custom visualizations

Status: ✅ Production Ready Version: 0.1.0 License: Apache 2.0 Built with: Rust 🦀 Last Updated: 2025-11-06

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
benchmarks/output		benchmarks/output
config		config
crates		crates
deployments		deployments
docs		docs
examples		examples
helm/sentinel		helm/sentinel
k8s		k8s
plans		plans
sdks		sdks
sentinel		sentinel
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
deny.toml		deny.toml
docker-compose.yaml		docker-compose.yaml
package.json		package.json

License

LLM-Dev-Ops/sentinel

Folders and files

Latest commit

History

Repository files navigation

LLM-Sentinel

Table of Contents

Key Features

🔍 Advanced Multi-Algorithm Anomaly Detection

📊 Comprehensive Monitoring

🚀 High Performance Architecture

🛡️ Production-Grade Reliability

🔔 Flexible Alerting System

💾 Scalable Storage & Caching

📈 Rich Observability

☸️ Cloud-Native Deployment

🔒 Security & Compliance

Detection Capabilities

Real-Time Anomaly Detection

1. Z-Score Detection (Parametric)

2. IQR Detection (Non-Parametric)

3. MAD Detection (Robust)

4. CUSUM Detection (Change Point)

Baseline Management

Deployment Options

Option 1: Docker Compose (Development)

Option 2: Kubernetes (Production)

Option 3: Helm Chart (Recommended)

Option 4: Binary (Bare Metal)

Quick Start

Prerequisites

Local Development Setup

Kubernetes Quick Start

Verify Installation

Architecture

High-Level Architecture

Component Architecture

sentinel-core

sentinel-ingestion

sentinel-detection

sentinel-storage

sentinel-alerting

sentinel-api

Observability

Grafana Dashboards

1. Anomaly Detection Dashboard

2. System Health Dashboard

3. Performance Metrics Dashboard

4. Alert Overview Dashboard

Prometheus Metrics

Prometheus Alerts

API Reference

Health Endpoints

Liveness Probe

Readiness Probe

Metrics Endpoint

Prometheus Metrics

Query Endpoints

Query Recent Telemetry

Query Anomalies

Query Recent Anomalies

Query Baseline Statistics

Example Producers

Python Producer

Go Producer

Integration Example

Configuration

Complete Configuration Example

Environment Variables

Performance

Throughput Benchmarks

Resource Usage

Scaling Characteristics

Development

Project Structure

Running Tests

Building

Code Quality

CI/CD Pipeline

CI Pipeline (`.github/workflows/ci.yaml`)

CD Pipeline (`.github/workflows/cd.yaml`)

Packages