Autonomous Self-Healing AI Infrastructure Platform
Detects LLM/API outages and automatically reroutes inference traffic to backup providers in real time.
┌─────────────────────────────────────────────────────────────────┐
│ Sentinel-Ops AI │
│ │
│ ┌──────────────┐ ┌─────────────────┐ ┌───────────────┐ │
│ │ FastAPI │ │ Failover │ │ Health │ │
│ │ Gateway │───▶│ Engine │───▶│ Monitor │ │
│ │ │ │ (Circuit Break) │ │ (Background) │ │
│ └──────┬───────┘ └────────┬────────┘ └───────┬───────┘ │
│ │ │ │ │
│ │ ┌────────▼────────┐ │ │
│ │ │ Provider │ │ │
│ │ │ Registry │ │ │
│ │ │ │ │ │
│ │ │ ┌───────────┐ │ │ │
│ │ │ │ OpenAI │ │◀────────────┘ │
│ │ │ │ (Primary) │ │ │
│ │ │ └───────────┘ │ │
│ │ │ ┌───────────┐ │ │
│ │ │ │ Ollama │ │ │
│ │ │ │ (Fallback)│ │ │
│ │ │ └───────────┘ │ │
│ │ └─────────────────┘ │
│ │ │
│ ┌──────▼───────┐ ┌─────────────────┐ │
│ │ WebSocket │ │ Redis │ │
│ │ Event Bus │ │ (Incidents + │ │
│ │ │ │ Event Cache) │ │
│ └──────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Pattern | Implementation |
|---|---|
| Circuit Breaker | 3-state (CLOSED/OPEN/HALF_OPEN) per provider |
| Failover Chain | OpenAI → Ollama → (extensible) |
| Retry Strategy | Exponential back-off with jitter |
| Event Streaming | WebSocket fan-out via async broadcast |
| Incident Storage | Redis list (ring-buffer, 500 events max) |
| Observability | Structured JSON logs + per-provider metrics |
backend/
├── app/
│ ├── api/
│ │ ├── chat.py # POST /api/chat
│ │ ├── providers.py # GET /api/providers/status
│ │ ├── incidents.py # GET /api/incidents
│ │ ├── metrics.py # GET /api/metrics
│ │ ├── health.py # GET /health
│ │ └── middleware.py # Tracing, rate limiting, security headers
│ ├── core/
│ │ ├── config.py # Pydantic settings (env vars)
│ │ ├── logging.py # Structlog JSON logger
│ │ ├── redis.py # Async Redis pool + helpers
│ │ └── circuit_breaker.py# 3-state circuit breaker
│ ├── providers/
│ │ ├── base.py # Abstract BaseProvider interface
│ │ ├── openai_provider.py# OpenAI implementation
│ │ ├── ollama_provider.py# Ollama local implementation
│ │ └── registry.py # Provider registry + chain
│ ├── services/
│ │ ├── failover_engine.py# Core routing + failover logic
│ │ ├── incident_service.py# Incident persistence + broadcast
│ │ └── metrics_service.py# Rolling metrics aggregation
│ ├── monitoring/
│ │ └── health_monitor.py # Async background health checker
│ ├── websocket/
│ │ ├── manager.py # Connection pool + broadcast
│ │ └── router.py # WS /ws/system-events endpoint
│ ├── models/
│ │ └── schemas.py # All Pydantic v2 domain models
│ └── app_factory.py # FastAPI app factory + lifespan
├── tests/
│ └── test_sentinel.py # Unit + integration tests
├── main.py # Uvicorn entrypoint
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── .env.example
# 1. Clone and enter the project
cd sentinel-ops/backend
# 2. Configure environment
cp .env.example .env
# Edit .env — set OPENAI_API_KEY at minimum
# 3. Start Ollama locally (for fallback)
ollama pull llama3.2
# 4. Launch
docker compose up --buildAPI is live at http://localhost:8000
Docs at http://localhost:8000/docs
# Prerequisites: Python 3.12+, Redis, Ollama
# 1. Install dependencies
pip install -r requirements.txt
# 2. Configure
cp .env.example .env
# Set OPENAI_API_KEY, APP_ENV=development
# 3. Start Redis
redis-server
# 4. Start Ollama
ollama serve
ollama pull llama3.2
# 5. Run
python main.py
# or
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadRoute a prompt through the AI failover engine.
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Explain circuit breakers in distributed systems."}
],
"max_tokens": 512,
"temperature": 0.7
}'Response:
{
"trace_id": "a1b2c3d4-...",
"provider_used": "openai",
"model": "gpt-4o-mini",
"response_text": "A circuit breaker is ...",
"latency_ms": 843.2,
"status": "success",
"failover_occurred": false,
"failover_chain": [],
"tokens_used": 187
}Force failover (when OpenAI is down):
{
"provider_used": "ollama",
"failover_occurred": true,
"failover_chain": ["openai"]
}curl http://localhost:8000/api/providers/status{
"openai": {
"status": "healthy",
"latency_ms": 412.3,
"success_rate_pct": 99.1,
"circuit_breaker": { "state": "closed" }
},
"ollama": {
"status": "healthy",
"latency_ms": 1204.7,
"circuit_breaker": { "state": "closed" }
}
}curl "http://localhost:8000/api/incidents?limit=20&type=failover_triggered"curl http://localhost:8000/api/metrics{
"total_requests": 1042,
"total_successes": 1038,
"total_failures": 4,
"total_failovers": 2,
"avg_latency_ms": 523.1,
"active_provider": "openai",
"uptime_seconds": 3601.0
}Trigger an on-demand health check:
curl -X POST http://localhost:8000/api/providers/openai/probeConnect from any WebSocket client:
const ws = new WebSocket("ws://localhost:8000/ws/system-events");
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data.event_type, data.payload);
};Event types:
provider_status— health update for one providerincident— new incident recorded (outage, failover, recovery)metrics— aggregated system metrics (every health-check cycle)heartbeat— keep-alive ping every 30ssystem— connection lifecycle messages
# Run all tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=app --cov-report=term-missing- Create
app/providers/gemini_provider.pyextendingBaseProvider - Implement
complete()andhealth_check() - Register it in
app/providers/registry.py:
from app.providers.gemini_provider import GeminiProvider
registry.register(GeminiProvider(), position=2)The failover engine will automatically include it in the chain.
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | OpenAI API key |
OPENAI_MODEL |
gpt-4o-mini |
Model to use |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama3.2 |
Local model name |
REDIS_URL |
redis://localhost:6379/0 |
Redis connection string |
CIRCUIT_BREAKER_FAILURE_THRESHOLD |
3 |
Failures before circuit opens |
CIRCUIT_BREAKER_RECOVERY_TIMEOUT |
30 |
Seconds before half-open probe |
HEALTH_CHECK_INTERVAL_SECONDS |
15 |
Background monitoring frequency |
RATE_LIMIT_REQUESTS |
100 |
Max requests per window |
RATE_LIMIT_WINDOW_SECONDS |
60 |
Rate limit sliding window |
APP_ENV |
production |
development / staging / production |
- FastAPI — async web framework
- Pydantic v2 — data validation and settings
- httpx — async HTTP client for provider calls
- redis-py (async) — event storage and pub/sub
- structlog — structured JSON logging
- Docker + Compose — containerised deployment