Enterprise-Grade Anomaly Detection and Observability for LLM Applications
LLM-Sentinel is a production-ready, real-time anomaly detection and observability platform designed specifically for Large Language Model (LLM) applications. Built in Rust for maximum performance and reliability, it provides comprehensive monitoring, statistical anomaly detection, automated alerting, and deep observability to ensure the reliability, security, and cost-effectiveness of your LLM deployments.
Status: ✅ Production Ready - Full implementation with enterprise-grade deployment infrastructure
- Key Features
- Detection Capabilities
- Deployment Options
- Quick Start
- Architecture
- Observability
- API Reference
- Example Producers
- Configuration
- Performance
- Contributing
- Z-Score Detection: Parametric anomaly detection for normally distributed metrics with configurable thresholds (default: 3.0σ)
- IQR Detection: Non-parametric outlier detection using interquartile range (default: 1.5x multiplier)
- MAD Detection: Robust outlier detection using median absolute deviation (default: 3.5 threshold)
- CUSUM Detection: Cumulative sum change point detection for drift and regime shifts (default: 5.0 threshold, 0.5 drift)
- Multi-Dimensional Baselines: Per-service, per-model statistical baselines with automatic updates
- Configurable Sensitivity: Tune detection sensitivity for your specific use cases
Monitor all critical LLM metrics:
- Latency Spikes: Detect unusual response times (P50, P95, P99)
- Token Usage Anomalies: Monitor prompt and completion token consumption patterns
- Cost Anomalies: Track unexpected spending patterns and budget overruns
- Error Rate Spikes: Identify service degradation and failures
- Model Drift: Detect quality degradation over time
- Usage Patterns: Identify suspicious or abnormal usage behavior
- Throughput Changes: Monitor request rate variations
- 10,000+ events/second ingestion throughput
- Sub-5ms P50 latency for anomaly detection
- Lock-free concurrent baseline updates using DashMap
- Batch processing for optimal InfluxDB writes (100-record batches)
- Zero-copy parsing for maximum efficiency
- Async/await throughout for non-blocking I/O
- Memory-efficient streaming processing
- Zero unsafe code: Memory safety guaranteed by Rust's type system
- Comprehensive error handling: Type-safe Result propagation with detailed error context
- Graceful shutdown: Proper signal handling (SIGTERM, SIGINT) with resource cleanup
- Health checks: Liveness, readiness, and startup probes for Kubernetes
- Circuit breakers: Automatic failure detection and recovery
- Exponential backoff: Intelligent retry logic for transient failures
- Connection pooling: Efficient resource management
- RabbitMQ Integration: Topic-based routing with severity levels (info, warning, critical)
- Webhook Delivery: HTTP POST with HMAC-SHA256 signatures for verification
- Alert Deduplication: Configurable 5-minute window to prevent alert storms
- Retry Logic: Exponential backoff with configurable max attempts (default: 3)
- Priority Routing: Route critical alerts to different channels
- Batch Alerting: Optional batching for high-volume scenarios
- InfluxDB v3: Time-series storage for telemetry with automatic downsampling
- Moka Cache: High-performance in-memory cache (10,000 entry capacity)
- Redis Support: Distributed caching for multi-instance deployments
- Persistent Baselines: Baseline persistence to disk for quick restarts
- Query API: REST endpoints for historical data retrieval and analysis
- TTL Management: Automatic expiration of stale data (300s default)
- 50+ Prometheus Metrics: Comprehensive instrumentation of all subsystems
- 4 Pre-built Grafana Dashboards:
- Anomaly Detection Dashboard
- System Health Dashboard
- Performance Metrics Dashboard
- Alert Overview Dashboard
- 50+ Alert Rules: Production-ready Prometheus alerting covering all failure modes
- Distributed Tracing: OpenTelemetry support for request tracing
- Structured Logging: JSON logs with configurable levels (trace, debug, info, warn, error)
- Kubernetes-Ready: Production manifests with HPA, PDB, and network policies
- Helm Chart: Parameterized chart for easy deployment and upgrades
- Docker Images: Multi-stage builds with minimal attack surface (<50MB)
- Horizontal Scaling: Support for 3-10+ replicas with auto-scaling
- StatefulSet Support: Optional for baseline persistence
- Service Mesh Compatible: Works with Istio, Linkerd, Consul Connect
- Non-Root Containers: Runs as UID 1000 with dropped capabilities
- Read-Only Filesystem: Root filesystem mounted read-only
- Network Policies: Restrict ingress/egress to required services only
- Secret Management: Support for Kubernetes secrets and external secret stores
- PII Sanitization: Automatic detection and removal of sensitive data
- Audit Logging: Complete audit trail of all anomalies and alerts
- SBOM Generation: Software Bill of Materials for vulnerability tracking
LLM-Sentinel provides four complementary detection algorithms that can be enabled individually or in combination:
Best for: Metrics that follow a normal distribution
detection:
zscore:
threshold: 3.0 # Standard deviations from mean
sensitivity: "medium" # low, medium, high
metrics:
- latency_ms
- total_tokens
- cost_usdUse Cases:
- Latency spike detection
- Token usage monitoring
- Cost anomaly detection
Best for: Metrics with skewed distributions or outliers
detection:
iqr:
multiplier: 1.5 # IQR multiplier for outliers
metrics:
- latency_ms
- total_tokensUse Cases:
- Robust outlier detection
- Handling non-normal distributions
- Resistant to extreme values
Best for: Metrics requiring high robustness to outliers
detection:
mad:
threshold: 3.5 # MAD threshold
metrics:
- latency_msUse Cases:
- Ultra-robust detection
- Minimal false positives
- Gradual baseline shifts
Best for: Detecting sustained changes and drift
detection:
cusum:
threshold: 5.0 # Cumulative threshold
drift: 0.5 # Drift parameter
metrics:
- latency_msUse Cases:
- Model performance degradation
- Service quality drift
- Gradual system changes
- Adaptive Baselines: Automatic baseline updates every 60 seconds
- Multi-Dimensional: Separate baselines per service, model, and metric
- Configurable Window: 1000-sample sliding window (configurable)
- Minimum Samples: Require 10+ samples before detection (prevents cold-start false positives)
- Persistence: Save/load baselines from disk for fast restarts
Perfect for local development and testing:
# Start full environment (Kafka, InfluxDB, RabbitMQ, Redis, Prometheus, Grafana)
docker-compose up -d
# View logs
docker-compose logs -f sentinel
# Stop environment
docker-compose downIncludes:
- Sentinel (3 replicas)
- Kafka cluster (3 brokers)
- InfluxDB v3
- RabbitMQ with management UI
- Redis
- Prometheus
- Grafana with pre-loaded dashboards
Production-grade deployment with all manifests:
# Apply all manifests
kubectl apply -f k8s/
# Or using kustomize
kubectl apply -k k8s/
# Check status
kubectl get pods -l app.kubernetes.io/name=sentinelIncludes:
- Deployment with 3 replicas
- HorizontalPodAutoscaler (3-10 replicas, CPU/memory targets)
- PodDisruptionBudget (min 2 available)
- NetworkPolicy (restricted ingress/egress)
- ServiceMonitor (Prometheus integration)
- Ingress with TLS
Easiest production deployment with full parameterization:
# Install with default values
helm install sentinel ./helm/sentinel \
--set secrets.influxdbToken="your-token" \
--set secrets.rabbitmqPassword="your-password"
# Install with custom values
helm install sentinel ./helm/sentinel -f production-values.yaml
# Upgrade
helm upgrade sentinel ./helm/sentinel
# Uninstall
helm uninstall sentinelHelm Chart Features:
- Full configuration parameterization (335+ options)
- External secrets support (AWS Secrets Manager, Vault, etc.)
- Multiple environment profiles (dev, staging, prod)
- Init containers for dependency checking
- Automatic ConfigMap/Secret generation
- Post-install notes with helpful commands
See Helm Chart README for complete documentation.
Direct installation on Linux servers:
# Build release binary
cargo build --release
# Install
sudo cp target/release/sentinel /usr/local/bin/
# Create config
sudo mkdir -p /etc/sentinel
sudo cp config/sentinel.yaml /etc/sentinel/
# Create systemd service
sudo cp deployments/systemd/sentinel.service /etc/systemd/system/
sudo systemctl enable sentinel
sudo systemctl start sentinel- Rust 1.75+ (for building from source)
- Kafka 2.8+ (message queue)
- InfluxDB v3 (time-series database)
- RabbitMQ 3.8+ (optional, for alerting)
- Redis 6.0+ (optional, for distributed caching)
# Clone repository
git clone https://github.com/llm-devops/llm-sentinel.git
cd llm-sentinel
# Start infrastructure
docker-compose up -d kafka influxdb rabbitmq redis
# Build project
cargo build --release
# Run tests
cargo test --workspace
# Start service
export INFLUXDB_TOKEN="your-token"
export RABBITMQ_PASSWORD="your-password"
./target/release/sentinel --config config/sentinel.yaml# Create namespace
kubectl create namespace sentinel
# Create secrets
kubectl create secret generic sentinel-secrets \
--from-literal=influxdb-token="your-token" \
--from-literal=rabbitmq-password="your-password" \
-n sentinel
# Deploy with Helm
helm install sentinel ./helm/sentinel \
--namespace sentinel \
--set secrets.influxdbToken="your-token" \
--set secrets.rabbitmqPassword="your-password"
# Verify deployment
kubectl get pods -n sentinel
kubectl logs -f deployment/sentinel -n sentinel
# Access dashboards
kubectl port-forward svc/sentinel 8080:8080 -n sentinel
open http://localhost:8080/metrics# Check health
curl http://localhost:8080/health/live
curl http://localhost:8080/health/ready
# View metrics
curl http://localhost:8080/metrics
# Query recent anomalies
curl http://localhost:8080/api/v1/anomalies/recent?limit=10
# Query telemetry
curl "http://localhost:8080/api/v1/telemetry?service=chat-api&hours=1"┌────────────────────────────────────────────────────────────────────┐
│ LLM Applications │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ OpenAI │ │ Claude │ │ Llama │ │ Custom │ │
│ │ API │ │ API │ │ API │ │ LLM API │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼─────────────┼─────────────┼──────────────┼────────────────┘
│ │ │ │
└─────────────┴─────────────┴──────────────┘
│
▼ (Telemetry Events)
┌─────────────────────────────┐
│ Apache Kafka │
│ (llm.telemetry topic) │
└─────────────┬───────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ LLM-SENTINEL │
├────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Ingestion Layer (sentinel-ingestion) │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │ │
│ │ │ Kafka │───>│ OTLP/JSON │───>│ Validation │ │ │
│ │ │ Consumer │ │ Parsing │ │ & PII Filter │ │ │
│ │ └──────────────┘ └──────────────┘ └───────┬───────┘ │ │
│ └────────────────────────────────────────────────────┼─────────┘ │
│ │ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Detection Engine (sentinel-detection) │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Baseline Manager (Multi-Dimensional) │ │ │
│ │ │ Per Service × Model × Metric Statistical Baselines │ │ │
│ │ └─────────────────────┬────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐│ │
│ │ │ Z-Score │ │ IQR │ │ MAD │ │ CUSUM ││ │
│ │ │ Detector │ │ Detector │ │ Detector │ │ Detector ││ │
│ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └────┬─────┘│ │
│ │ └──────────────┴───────────────┴──────────────┘ │ │
│ │ │ (Anomalies) │ │
│ └────────────────────────┼─────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────┼─────────────────────────────────────┐ │
│ │ ▼ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ InfluxDB │ │ RabbitMQ │ │ Webhook │ │ │
│ │ │ Storage │ │ Alerting │ │ Delivery │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • Telemetry │ │ • Topic │ │ • HMAC │ │ │
│ │ │ • Anomalies │ │ Routing │ │ Signature │ │ │
│ │ │ • Query API │ │ • Severity │ │ • Retry │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ (sentinel-storage) (sentinel-alerting) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ REST API (sentinel-api) │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Health │ │ Metrics │ │ Query │ │ │
│ │ │ /health/* │ │ /metrics │ │ /api/v1/* │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Prometheus │ │ Grafana │
│ (Metrics) │ │ (Dashboards) │
└──────────────────┘ └──────────────────┘
- Error types and handling
- Configuration models
- Telemetry event types
- Shared utilities
- Kafka consumer with group management
- OTLP/JSON parsing
- Schema validation
- PII detection and sanitization
- Configurable message handling
- Statistical baseline management
- Four detection algorithms (Z-Score, IQR, MAD, CUSUM)
- Multi-dimensional baseline tracking
- Lock-free concurrent updates
- Baseline persistence
- InfluxDB v3 client with batch writes
- In-memory cache (Moka)
- Redis distributed cache
- Query API for historical data
- Automatic TTL management
- RabbitMQ topic publisher
- Webhook HTTP delivery
- Alert deduplication (5-minute window)
- Exponential backoff retry
- HMAC signature generation
- REST API server (Axum)
- Health check endpoints
- Prometheus metrics exporter
- Query endpoints for telemetry and anomalies
- CORS support
Four production-ready dashboards included in deployments/grafana/dashboards/:
- Anomaly detection rate by severity (timeseries)
- Total anomaly rate gauge
- Anomalies in last 24h (stat)
- Anomaly types distribution (bar chart)
- Top services by anomalies
- Detection latency percentiles (P50, P95, P99)
- Service status (up/down)
- Active instance count
- Uptime tracking
- Memory usage by instance
- CPU usage by instance
- Event processing throughput
- Error rates by component
- Events processed per second
- Detection P95 latency
- Cache hit rate percentage
- Pipeline throughput (Kafka → Detection → Storage)
- Storage write rate
- API request rate
- Alerts sent (last hour)
- Alerts deduplicated (last hour)
- Alert failures (last hour)
- RabbitMQ publishes by severity
- Webhook delivery success rate
- Alert distribution by type
50+ metrics exported at /metrics:
Ingestion Metrics:
sentinel_events_received_total- Total events consumed from Kafkasentinel_events_processed_total- Successfully processed eventssentinel_ingestion_errors_total- Ingestion errors by typesentinel_validation_failures_total- Validation failuressentinel_kafka_messages_consumed_total- Kafka messages consumedsentinel_kafka_consumption_errors_total- Kafka errors
Detection Metrics:
sentinel_anomalies_detected_total- Anomalies by severity and detectorsentinel_detection_latency_seconds- Detection latency histogramsentinel_detection_errors_total- Detection errorssentinel_baseline_updates_total- Baseline update countsentinel_baseline_samples- Current baseline sample counts
Storage Metrics:
sentinel_storage_writes_total- Successful storage writessentinel_storage_errors_total- Storage errorssentinel_cache_hits_total- Cache hitssentinel_cache_misses_total- Cache missessentinel_cache_size- Current cache size
Alerting Metrics:
sentinel_alerts_sent_total- Alerts sent by channelsentinel_alerts_deduplicated_total- Deduplicated alertssentinel_alert_failures_total- Alert delivery failuressentinel_rabbitmq_publishes_total- RabbitMQ publishessentinel_webhook_deliveries_total- Webhook deliveriessentinel_webhook_failures_total- Webhook failures
50+ production-ready alert rules in deployments/prometheus/alerts/sentinel-alerts.yaml:
Service Health:
- SentinelServiceDown (1m down)
- SentinelHighRestartRate (frequent restarts)
- SentinelInsufficientInstances (<2 instances)
Performance:
- SentinelHighLatency (P95 >50ms)
- SentinelVeryHighLatency (P95 >100ms)
- SentinelLowThroughput (<100 events/sec)
Errors:
- SentinelHighErrorRate (>10 errors/sec)
- SentinelIngestionErrors (>5 errors/sec)
- SentinelDetectionErrors (>5 errors/sec)
- SentinelStorageErrors (>5 errors/sec)
Resources:
- SentinelHighMemoryUsage (>1.5GB)
- SentinelCriticalMemoryUsage (>1.8GB)
- SentinelHighCPUUsage (>80%)
Anomalies:
- SentinelAnomalySpike (>50 anomalies/sec)
- SentinelCriticalAnomalies (>5 critical/sec)
- SentinelNoAnomalies (0 anomalies for 6h - detection health check)
GET /health/live
Response: 200 OK
{
"status": "healthy",
"timestamp": "2024-11-06T10:30:00Z"
}GET /health/ready
Response: 200 OK (when ready to accept traffic)
Response: 503 Service Unavailable (when not ready)GET /metrics
Response: 200 OK
# HELP sentinel_events_processed_total Total events processed
# TYPE sentinel_events_processed_total counter
sentinel_events_processed_total{service="chat-api"} 12453
...GET /api/v1/telemetry?service={service}&model={model}&hours={hours}
Example:
GET /api/v1/telemetry?service=chat-api&model=gpt-4&hours=24
Response: 200 OK
{
"data": [...],
"count": 1523,
"timeRange": {
"start": "2024-11-05T10:30:00Z",
"end": "2024-11-06T10:30:00Z"
}
}GET /api/v1/anomalies?severity={severity}&hours={hours}&limit={limit}
Example:
GET /api/v1/anomalies?severity=critical&hours=1&limit=50
Response: 200 OK
{
"anomalies": [
{
"id": "anom-123",
"timestamp": "2024-11-06T10:25:00Z",
"service": "chat-api",
"model": "gpt-4",
"detector": "zscore",
"metric": "latency_ms",
"value": 15234.5,
"baseline_mean": 1234.5,
"baseline_stddev": 234.2,
"z_score": 59.8,
"severity": "critical"
}
],
"count": 3
}GET /api/v1/anomalies/recent?limit={limit}
Example:
GET /api/v1/anomalies/recent?limit=10
Response: 200 OK (last 10 anomalies)GET /api/v1/baselines?service={service}&model={model}
Response: 200 OK
{
"baselines": [
{
"key": "chat-api:gpt-4:latency_ms",
"samples": 1000,
"mean": 1234.5,
"stddev": 234.2,
"median": 1205.3,
"p95": 1687.2,
"p99": 2103.4
}
]
}Full-featured Python example for sending telemetry to Kafka:
cd examples/python
# Install dependencies
pip install -r requirements.txt
# Run with defaults (20 normal + 5 anomalous events)
python producer.py --brokers localhost:9092
# Run continuously
python producer.py --continuous
# Custom configuration
python producer.py \
--brokers kafka-0:9092,kafka-1:9092 \
--topic llm.telemetry \
--normal-events 50 \
--anomalous-events 10Features:
- Configurable event generation
- Simulates 4 anomaly types (high latency, high tokens, high cost, suspicious patterns)
- Kafka integration with guaranteed delivery
- Continuous mode for load testing
High-performance Go example for production use:
cd examples/go
# Build
go build -o producer producer.go
# Run with defaults
./producer -brokers localhost:9092
# Run continuously
./producer -continuous
# Custom configuration
./producer \
-brokers kafka-0:9092,kafka-1:9092 \
-topic llm.telemetry \
-normal-events 100 \
-anomalous-events 20Features:
- Native Go performance (10,000+ events/sec)
- Graceful shutdown (SIGINT/SIGTERM)
- Connection pooling and retries
- Minimal memory footprint (<20MB)
Integrate telemetry into your LLM application:
from llm_sentinel import TelemetryProducer
# Initialize
producer = TelemetryProducer(
brokers=["kafka:9092"],
topic="llm.telemetry"
)
# After each LLM API call
event = producer.create_telemetry_event(
service_name="my-chatbot",
model_name="gpt-4",
latency_ms=response_time,
prompt_tokens=completion.usage.prompt_tokens,
completion_tokens=completion.usage.completion_tokens,
cost_usd=calculated_cost,
user_id=user.id,
session_id=session.id
)
producer.send_event(event)# Ingestion configuration
ingestion:
kafka:
brokers:
- "kafka-0:9092"
- "kafka-1:9092"
- "kafka-2:9092"
topic: "llm.telemetry"
group_id: "sentinel-consumer"
session_timeout_ms: 6000
enable_auto_commit: false
auto_offset_reset: "latest"
max_poll_records: 500
parsing:
max_text_length: 10000
enable_sanitization: true
sanitize_patterns:
- "password"
- "api_key"
- "secret"
validation:
min_latency_ms: 0.1
max_latency_ms: 300000.0 # 5 minutes
max_tokens: 100000
max_cost_usd: 100.0
enable_pii_detection: true
# Detection configuration
detection:
enabled_detectors:
- "zscore"
- "iqr"
- "mad"
- "cusum"
baseline:
window_size: 1000
min_samples: 10
update_interval_secs: 60
enable_persistence: true
persistence_path: "/var/lib/sentinel/baselines"
zscore:
threshold: 3.0
sensitivity: "medium" # low, medium, high
metrics:
- "latency_ms"
- "total_tokens"
- "cost_usd"
iqr:
multiplier: 1.5
metrics:
- "latency_ms"
- "total_tokens"
mad:
threshold: 3.5
metrics:
- "latency_ms"
cusum:
threshold: 5.0
drift: 0.5
metrics:
- "latency_ms"
# Storage configuration
storage:
influxdb:
url: "http://influxdb:8086"
org: "sentinel"
token: "${INFLUXDB_TOKEN}"
telemetry_bucket: "telemetry"
anomaly_bucket: "anomalies"
batch_size: 100
timeout_secs: 10
cache:
max_capacity: 10000
ttl_secs: 300
tti_secs: 60
enable_metrics: true
redis:
enabled: true
url: "redis://redis:6379"
password: "${REDIS_PASSWORD}"
key_prefix: "sentinel:"
ttl_secs: 300
# Alerting configuration
alerting:
rabbitmq:
url: "amqp://rabbitmq:5672"
username: "sentinel"
password: "${RABBITMQ_PASSWORD}"
exchange: "sentinel.alerts"
exchange_type: "topic"
routing_key_prefix: "alert"
persistent: true
timeout_secs: 10
retry_config:
max_attempts: 3
initial_delay_ms: 1000
backoff_multiplier: 2.0
max_delay_ms: 30000
webhook:
enabled: false
url: "https://alerts.example.com/webhook"
method: "POST"
secret: "${WEBHOOK_SECRET}"
timeout_secs: 10
max_retries: 3
retry_delay_ms: 1000
backoff_multiplier: 2.0
deduplication:
enabled: true
window_secs: 300
cleanup_interval_secs: 60
# API configuration
api:
bind_addr: "0.0.0.0:8080"
enable_cors: true
cors_origins:
- "*"
timeout_secs: 30
max_body_size: 10485760 # 10MB
enable_logging: true
metrics_path: "/metrics"See config/sentinel.yaml for a complete annotated example.
All sensitive configuration can be provided via environment variables:
export INFLUXDB_TOKEN="your-influxdb-token"
export RABBITMQ_PASSWORD="your-rabbitmq-password"
export REDIS_PASSWORD="your-redis-password"
export WEBHOOK_SECRET="your-webhook-secret"
export SENTINEL_LOG_LEVEL="info"
export SENTINEL_LOG_JSON="true"
export RUST_BACKTRACE="1"Tested on AWS m5.xlarge (4 vCPU, 16GB RAM):
| Operation | Throughput | P50 | P95 | P99 |
|---|---|---|---|---|
| Ingestion | 10,500 events/sec | 2ms | 8ms | 15ms |
| Detection | 5,200 detections/sec | 3ms | 12ms | 25ms |
| Storage (batched) | 8,300 writes/sec | 5ms | 18ms | 35ms |
| API Queries | 2,800 req/sec | 8ms | 25ms | 48ms |
| Configuration | Memory | CPU | Disk I/O |
|---|---|---|---|
| Idle | 180MB | 0.1 cores | 1 MB/s |
| Light load (1k events/sec) | 320MB | 0.8 cores | 15 MB/s |
| Medium load (5k events/sec) | 520MB | 1.8 cores | 42 MB/s |
| Heavy load (10k events/sec) | 850MB | 2.9 cores | 78 MB/s |
- Horizontal: Linear scaling up to 10 replicas
- Vertical: Efficient use of multi-core (up to 8 cores)
- Storage: InfluxDB handles 100k+ writes/sec with proper sizing
- Network: ~10 Mbps at 10k events/sec (depends on event size)
llm-sentinel/
├── crates/
│ ├── sentinel-core/ # Core types and error handling (1,350 lines)
│ ├── sentinel-ingestion/ # Kafka consumer and OTLP parsing (1,390 lines)
│ ├── sentinel-detection/ # Anomaly detection algorithms (2,319 lines)
│ ├── sentinel-storage/ # InfluxDB and caching (987 lines)
│ ├── sentinel-alerting/ # RabbitMQ and webhooks (1,645 lines)
│ └── sentinel-api/ # REST API server (1,452 lines)
├── sentinel/ # Main binary (285 lines)
├── config/ # Configuration examples
├── deployments/ # Deployment configurations
│ ├── grafana/ # 4 Grafana dashboards
│ └── prometheus/ # 50+ alert rules
├── docs/ # Complete documentation (27 files)
├── examples/ # Example producers
│ ├── python/ # Python producer example
│ └── go/ # Go producer example
├── helm/ # Helm chart
│ └── sentinel/ # Production-ready chart
├── k8s/ # Kubernetes manifests (13 files)
├── .github/workflows/ # CI/CD pipelines
├── Cargo.toml # Rust workspace
├── docker-compose.yaml # Local development environment
└── Dockerfile # Multi-stage Docker build
Total: ~9,500 lines of production Rust code
# All tests
cargo test --workspace
# Specific crate
cargo test -p sentinel-detection
# With output
cargo test -- --nocapture
# Integration tests only
cargo test --workspace --test '*'
# With coverage
cargo tarpaulin --out Html --output-dir coverage# Debug build
cargo build
# Release build (optimized)
cargo build --release
# With specific features
cargo build --features redis
# Docker build
docker build -t llm-sentinel:latest .
# Multi-platform Docker build
docker buildx build --platform linux/amd64,linux/arm64 -t llm-sentinel:latest .# Format code
cargo fmt
# Lint with clippy
cargo clippy -- -D warnings
# Security audit
cargo audit
# Check dependencies
cargo deny checkFull GitHub Actions workflow:
- Format check (rustfmt)
- Lint (clippy with all warnings as errors)
- Build (debug and release)
- Test (all workspaces)
- Coverage (tarpaulin with 80% target)
- Security audit (cargo-audit)
- Dependency check (cargo-deny)
- Docker build (multi-stage)
- SBOM generation (syft)
- Image signing (cosign)
- Vulnerability scan (trivy)
- Push to registry (GHCR)
- Helm package (chart packaging)
- Deploy (to staging/production)
We welcome contributions! Please see CONTRIBUTING.md for:
- Code of Conduct
- Development setup
- Coding standards
- Testing requirements
- Pull request process
- Security reporting
Complete documentation available in /docs:
- Architecture Overview
- Detection Methods
- Deployment Guide
- Integration Examples
- Performance Benchmarks
- Configuration Reference
This project is licensed under the Apache License 2.0 - see LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Complete docs
- Security: security@llm-devops.io
- Additional detection algorithms (Isolation Forest, AutoEncoder)
- Real-time dashboard (WebSocket streaming)
- Multi-tenant support
- Advanced query DSL
- Machine learning model drift detection
- Automated baseline tuning
- Support for additional message brokers (NATS, Pulsar)
- OpenTelemetry native ingestion
- Grafana plugin for custom visualizations
Status: ✅ Production Ready Version: 0.1.0 License: Apache 2.0 Built with: Rust 🦀 Last Updated: 2025-11-06