Skip to content

Stability Definition

arminrad edited this page Mar 16, 2026 · 3 revisions

Stability Definition

Reading path: Conceptual Model | Stability Definition (you are here) | Conceptual Model Features | Features | Delta Report | Features-Acceptance-Criteria

Read after: Conceptual Model (so you know what the system is) Next: Conceptual Model Features (the full feature spec)


TL;DR — Stability = the system keeps its promise under all conditions. Three perspectives: User ("my request works, costs what I expect, I never see infrastructure"), Product Owner ("every feature works correctly and features don't break each other"), Engineer ("every component has a fallback, failures don't cascade, and we can see it happening"). The system has 6 known gaps vs the conceptual model: no guardrails, single region, no health-weighted routing, no SLA credit-backs, no customer webhooks, no traffic splitting.


The Core Promise

The conceptual model makes one commitment:

"One API key, every AI model, automatic reliability, one bill."

Stability is the degree to which the system keeps that promise under all conditions. If a developer integrates once and never has to think about provider outages, billing errors, or routing failures — the system is stable. The moment they write workaround code, manually switch providers, or debug a billing discrepancy — stability has broken.


1. User Perspective

"Stable means I don't have to think about it."

Requests always succeed

  • I send a request, I get a response. Provider outages are invisible to me.
  • My OpenAI/Anthropic SDK works by changing base_url only. Streaming, function calling, JSON mode, logprobs — all work unchanged.
  • Response latency is predictable. The model I asked for is the one that answers.

No surprises

  • Rate limits are communicated via headers (X-RateLimit-Remaining, Retry-After), not discovered through unexplained failures.
  • Credits deduct once per request. Provider errors (5xx) are auto-refunded. My balance matches what I expect.
  • When something is degraded, the status page and health endpoints reflect it.

Edge cases handled

  • Trial expiration doesn't brick my app — :free models still work.
  • Streaming connections deliver chunks or timeout cleanly, never hang.
  • IP allowlists enforce exactly what's configured.

Summary

Experience What makes it stable
Request succeeded 14-provider failover chain, transparent
It was fast 7 cache layers, health-weighted routing, concurrency control
Cost was correct Pre-flight credit check, idempotent deduction, auto-refund on 5xx
Not blocked unfairly Three-layer rate limiting, authenticated user bypass, tuned thresholds
I can see what happened Activity logs, usage stats, model health endpoints, status page

2. Product Owner Perspective

"Stable means every feature works correctly, consistently, and together."

Each functional block holds its promise

The conceptual model defines seven blocks: PROTECT, ROUTE, OPTIMIZE, BILL, CATALOG, PLATFORM, OBSERVE. Stability means all seven hold simultaneously.

Block Stable when...
PROTECT Auth never leaks. Rate limits never fail open. Redis down → in-memory fallback activates, not "no limiting."
ROUTE 120+ aliases resolve correctly. Failover triggers on 5xx/402, not on 400/429. Circuit breakers recover after cool-down.
OPTIMIZE Health tiers classify models correctly. Passive monitoring adds zero request latency. Caches serve stale-but-valid data during provider outages.
BILL Credits deduct atomically. Subscription allowance drains before purchased credits. High-value models never served at default pricing.
CATALOG Models without pricing are excluded. Background sync keeps catalog fresh without hitting providers on the hot path.
PLATFORM Chat history persists. Shared links remain accessible. Analytics don't lose data under load.
OBSERVE Prometheus metrics reflect reality. Health dashboards show actual states. Alerts fire on real incidents, not 4xx noise.

Features interact without conflict

Interaction Correct behavior
Failover + Billing Cost calculated using the provider that actually served the request
Rate Limiting + Velocity Mode Paid users get less restriction than free users during velocity mode
Circuit Breaker + Health Open breaker reflected in health data; HALF_OPEN validated by real test request before CLOSED
Caching + Credits Cache hit = no credit deduction. Cache miss = deduction after provider response.
Trial + Failover Trial users benefit from failover. Provider failure doesn't consume trial budget.
Catalog + Pricing Model without pricing blocked from catalog — never served at $0

Measured by

  • Conformance tests (25 checks in Testing Plan, Section 25) pass continuously.
  • No revenue leakage: no model served without correct pricing, no double-charges, no missed deductions.
  • Feature completeness vs conceptual model sections 2.2–2.11.

3. Technical Perspective

"Stable means every component degrades gracefully, recovers automatically, and never cascades failure."

Core principle

From the conceptual model (Section 2.5):

"No cache failure ever blocks a user request."

This applies to every layer. No single component failure produces a user-visible outage.

Layer-by-layer contracts

Ingress (middleware/security_middleware.py)

  • IP rate limiting: residential 300 RPM, datacenter 60 RPM.
  • Velocity mode triggers on 5xx only (not 4xx). Auto-deactivates after 3 min.
  • Authenticated users bypass IP limits.
  • Failure mode: Bad thresholds → legitimate users get 429s. (See issue #1091: 166+ blocked requests from misconfigured velocity mode.)

Routing (services/provider_failover.py)

  • Failover on: 401, 402, 403, 404, 502, 503, 504.
  • No failover on: 400 (user error), 429 (use backoff).
  • Model-aware: OpenAI models → OpenAI/OpenRouter only. Open-source → all providers.
  • Circuit breaker: 5 consecutive failures → OPEN. 5 min cool-down → HALF_OPEN → test → CLOSED.
  • Failure mode: Wrong failover → format mismatch or cost mismatch. No failover → user sees raw provider errors.

Health (services/intelligent_health_monitor.py, services/passive_health_monitor.py)

  • Tiered active checks: critical 5min, popular 30min, standard 2h, on-demand 4h.
  • Passive: every real request updates health as a background task (zero overhead).
  • Database-backed persistence (survives restarts).
  • Failure mode: Dead providers stay in the routing chain → requests go into black holes.

Caching (7 layers)

Semantic → Exact-match → External (Butter.dev) → Provider API

Supporting: Auth cache, Catalog L1/L2, DB query cache, Health cache, Local memory fallback.

  • Redis failure → in-memory LRU (500 entries, 15 min TTL).
  • Catalog cache has stampede protection (one rebuild at a time).
  • Failure mode: All caches miss → every request hits DB + provider. Latency spikes 5ms → 500ms. Under load, connection pool exhausts.

Database (config/supabase_config.py)

  • Pool: 80 primary, 30–100 read replica, separate bulk pool.
  • Thread-safe singleton with double-checked locking.
  • Failed init retries after 60s (no reconnect storms).
  • Failure mode: Pool exhaustion → all layers fail (auth, billing, catalog, history).

Concurrency (middleware/concurrency_middleware.py)

  • Semaphore: 20 concurrent, 50 queued, 10s queue timeout.
  • Overload → 503 (not 429).
  • Streaming endpoints exempt.
  • Failure mode: Without this, traffic spikes overwhelm the event loop → death spiral.

Credits (db/credit_transactions.py, services/pricing.py)

  • Pre-flight: estimate max cost before provider call. Insufficient → 402 immediately.
  • Idempotent: same request ID → single deduction.
  • Atomic: balance + transaction in one DB operation.
  • Auto-refund: 5xx → credits returned. 4xx → credits kept.
  • High-value protection: GPT-4/Claude/Gemini/o-series blocked if pricing falls to default.
  • Failure mode: Double-charging, under-billing expensive models, or negative balances.

Failure isolation matrix

Component fails Fallback User impact
Redis In-memory LRU + local rate limiting None
Primary provider 14-provider failover None
Database (momentary) Cached auth + cached catalog None for reads
Database (extended) Degraded — health reports it Partial (new auth/billing fail)
Health monitor Passive monitoring from real requests Reduced visibility
Prometheus/Grafana System functions, not observable None for users
Sentry Errors logged locally None for users
Stripe webhook Returns 200, retries later None (eventual consistency)

Known Gaps vs Conceptual Model

Features described in the conceptual model but not yet fully implemented:

Gap Conceptual Model Section Impact on stability
Input/Output guardrails (PII, injection, moderation) 2.2 No content safety layer yet
Multi-region deployment 2.11 Single region = single point of geographic failure
Health-weighted routing (route to healthiest first) 2.3 Always tries primary provider first, even if degraded
SLA tracking with credit-back 2.7 No automatic compensation for SLA breaches
Customer webhooks (credits.low, model.degraded) 2.7 Customers can't automate on system events
Traffic splitting across providers 2.3 Over-reliance on primary provider per model

One-Line Definitions

  • User: "My request works, costs what I expect, and I never see infrastructure."
  • Product Owner: "Every feature works correctly and features don't break each other."
  • Engineer: "Every component has a fallback, failures don't cascade, and we can see it happening."

Clone this wiki locally