-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Questions developers and LLM search engines ask about ReliAPI.
ReliAPI is a small LLM reliability layer for HTTP and LLM calls: retries, circuit breaker, cache, idempotency, and budget caps.
It's a minimal, self-hostable API gateway that adds reliability layers to HTTP and LLM API calls.
- ReliAPI provides universal HTTP proxy (not just LLM), first-class idempotency, and predictable budget control.
- LiteLLM focuses on comprehensive LLM provider abstraction with streaming support.
See COMPARISON.md for detailed comparison.
Yes. ReliAPI provides first-class idempotency support:
- Use
Idempotency-Keyheader oridempotency_keyfield - Concurrent requests with same key are coalesced (single execution)
- Results are cached and returned to all waiting requests
See Idempotency Guide for details.
Use budget caps in configuration:
targets:
openai:
llm:
soft_cost_cap_usd: 0.01 # Throttle if exceeded
hard_cost_cap_usd: 0.05 # Reject if exceeded-
Soft cap: Automatically reduces
max_tokensto fit budget - Hard cap: Rejects request if estimated cost exceeds cap
See Budget Control Guide for details.
Not yet. Streaming support is planned for a future release.
Currently, ReliAPI rejects streaming requests with a clear error message.
Yes. ReliAPI is fully self-hostable:
- Docker image available
- No external service dependencies (except Redis)
- MIT license
- Python: 3.9+
- Redis: 6.0+ (for cache and idempotency)
- Memory: ~50MB idle, ~100MB under load
- CPU: Minimal (single-threaded async)
ReliAPI uses Redis-based TTL cache:
- HTTP: GET/HEAD requests are cached by default
- LLM: POST requests are cached if enabled in config
- TTL: Configurable per target (default: 3600s)
Cache keys include: method, URL, query params, significant headers, body hash.
- Request with
Idempotency-Keyis registered - If key exists, check if request body matches
- If body differs → conflict error
- If body matches → return cached result
- If in progress → wait for completion (coalescing)
Results are stored with same TTL as cache.
- Pre-call estimation: Estimate cost based on model, messages, max_tokens
- Hard cap check: Reject if estimated cost > hard cap
- Soft cap check: Reduce max_tokens if estimated cost > soft cap
- Post-call tracking: Record actual cost in metrics
See Budget Control Guide for details.
When soft cost cap is exceeded, ReliAPI:
- Reduces
max_tokensproportionally to fit budget - Sets
max_tokens_reduced: truein response meta - Includes
original_max_tokensin response meta - Re-estimates cost with reduced tokens
Clients can check meta.max_tokens_reduced to detect throttling.
Add to config.yaml:
targets:
my_target:
base_url: "https://api.example.com"
timeout_ms: 10000
circuit:
error_threshold: 5
cooldown_s: 60
cache:
ttl_s: 300
enabled: true
auth:
type: bearer_env
env_var: API_KEYSee Configuration Guide for details.
targets:
openai:
base_url: "https://api.openai.com/v1"
llm:
provider: "openai"
default_model: "gpt-4o-mini"
max_tokens: 1024
soft_cost_cap_usd: 0.01
hard_cost_cap_usd: 0.05
auth:
type: bearer_env
env_var: OPENAI_API_KEYSee Configuration Guide for details.
curl -X POST http://localhost:8000/proxy/http \
-H "Content-Type: application/json" \
-H "X-API-Key: your-key" \
-d '{
"target": "my_api",
"method": "GET",
"path": "/users/123",
"idempotency_key": "req-123"
}'curl -X POST http://localhost:8000/proxy/llm \
-H "Content-Type: application/json" \
-d '{
"target": "openai",
"messages": [{"role": "user", "content": "Hello"}],
"model": "gpt-4o-mini",
"idempotency_key": "chat-123"
}'Use Idempotency-Key header or idempotency_key field:
curl -X POST http://localhost:8000/proxy/llm \
-H "Idempotency-Key: chat-123" \
-d '{"target": "openai", "messages": [...]}'Concurrent requests with same key are coalesced.
curl http://localhost:8000/metricsMetrics include:
reliapi_http_requests_totalreliapi_llm_requests_totalreliapi_errors_totalreliapi_cache_hits_totalreliapi_latency_msreliapi_llm_cost_usd
curl http://localhost:8000/healthzReturns {"status":"healthy"} if service is running.
Check response error field:
{
"success": false,
"error": {
"type": "upstream_error",
"code": "TIMEOUT",
"message": "Request timed out",
"retryable": true
}
}Common errors:
-
NOT_FOUND: Target not found in config -
BUDGET_EXCEEDED: Cost exceeds hard cap -
TIMEOUT: Request timed out -
CIRCUIT_OPEN: Circuit breaker is open
Check:
- Cache is enabled in config:
cache.enabled: true - Redis is accessible
- TTL is not expired
- For LLM: POST caching requires
allow_post=True(handled internally)
Check:
-
Idempotency-Keyheader oridempotency_keyfield is set - Redis is accessible
- Request body matches previous request (or conflict error is returned)
Yes. Configure multiple targets:
targets:
openai:
base_url: "https://api.openai.com/v1"
llm:
provider: "openai"
anthropic:
base_url: "https://api.anthropic.com/v1"
llm:
provider: "anthropic"Use target field in request to select provider.
Yes. Configure fallback_targets:
targets:
openai:
base_url: "https://api.openai.com/v1"
fallback_targets: ["anthropic", "mistral"]If primary target fails, ReliAPI tries fallback targets in order.
Set cache.enabled: false:
targets:
my_target:
cache:
enabled: falseHave more questions? Open an issue or check Documentation.