Stability Definition

Reading path: Conceptual Model | Stability Definition (you are here) | Conceptual Model Features | Features | Delta Report | Features-Acceptance-Criteria

Read after: Conceptual Model (so you know what the system is) Next: Conceptual Model Features (the full feature spec)

TL;DR — Stability = the system keeps its promise under all conditions. Three perspectives: User ("my request works, costs what I expect, I never see infrastructure"), Product Owner ("every feature works correctly and features don't break each other"), Engineer ("every component has a fallback, failures don't cascade, and we can see it happening"). The system has 6 known gaps vs the conceptual model: no guardrails, single region, no health-weighted routing, no SLA credit-backs, no customer webhooks, no traffic splitting.

The Core Promise

The conceptual model makes one commitment:

"One API key, every AI model, automatic reliability, one bill."

Stability is the degree to which the system keeps that promise under all conditions. If a developer integrates once and never has to think about provider outages, billing errors, or routing failures — the system is stable. The moment they write workaround code, manually switch providers, or debug a billing discrepancy — stability has broken.

1. User Perspective

"Stable means I don't have to think about it."

Requests always succeed

I send a request, I get a response. Provider outages are invisible to me.
My OpenAI/Anthropic SDK works by changing base_url only. Streaming, function calling, JSON mode, logprobs — all work unchanged.
Response latency is predictable. The model I asked for is the one that answers.

No surprises

Rate limits are communicated via headers (X-RateLimit-Remaining, Retry-After), not discovered through unexplained failures.
Credits deduct once per request. Provider errors (5xx) are auto-refunded. My balance matches what I expect.
When something is degraded, the status page and health endpoints reflect it.

Edge cases handled

Trial expiration doesn't brick my app — :free models still work.
Streaming connections deliver chunks or timeout cleanly, never hang.
IP allowlists enforce exactly what's configured.

Summary

Experience	What makes it stable
Request succeeded	14-provider failover chain, transparent
It was fast	7 cache layers, health-weighted routing, concurrency control
Cost was correct	Pre-flight credit check, idempotent deduction, auto-refund on 5xx
Not blocked unfairly	Three-layer rate limiting, authenticated user bypass, tuned thresholds
I can see what happened	Activity logs, usage stats, model health endpoints, status page

2. Product Owner Perspective

"Stable means every feature works correctly, consistently, and together."

Each functional block holds its promise

The conceptual model defines seven blocks: PROTECT, ROUTE, OPTIMIZE, BILL, CATALOG, PLATFORM, OBSERVE. Stability means all seven hold simultaneously.

Block	Stable when...
PROTECT	Auth never leaks. Rate limits never fail open. Redis down → in-memory fallback activates, not "no limiting."
ROUTE	120+ aliases resolve correctly. Failover triggers on 5xx/402, not on 400/429. Circuit breakers recover after cool-down.
OPTIMIZE	Health tiers classify models correctly. Passive monitoring adds zero request latency. Caches serve stale-but-valid data during provider outages.
BILL	Credits deduct atomically. Subscription allowance drains before purchased credits. High-value models never served at default pricing.
CATALOG	Models without pricing are excluded. Background sync keeps catalog fresh without hitting providers on the hot path.
PLATFORM	Chat history persists. Shared links remain accessible. Analytics don't lose data under load.
OBSERVE	Prometheus metrics reflect reality. Health dashboards show actual states. Alerts fire on real incidents, not 4xx noise.

Features interact without conflict

Interaction	Correct behavior
Failover + Billing	Cost calculated using the provider that actually served the request
Rate Limiting + Velocity Mode	Paid users get less restriction than free users during velocity mode
Circuit Breaker + Health	Open breaker reflected in health data; HALF_OPEN validated by real test request before CLOSED
Caching + Credits	Cache hit = no credit deduction. Cache miss = deduction after provider response.
Trial + Failover	Trial users benefit from failover. Provider failure doesn't consume trial budget.
Catalog + Pricing	Model without pricing blocked from catalog — never served at $0

Measured by

Conformance tests (25 checks in Testing Plan, Section 25) pass continuously.
No revenue leakage: no model served without correct pricing, no double-charges, no missed deductions.
Feature completeness vs conceptual model sections 2.2–2.11.

3. Technical Perspective

"Stable means every component degrades gracefully, recovers automatically, and never cascades failure."

Core principle

From the conceptual model (Section 2.5):

"No cache failure ever blocks a user request."

This applies to every layer. No single component failure produces a user-visible outage.

Layer-by-layer contracts

Ingress (`middleware/security_middleware.py`)

IP rate limiting: residential 300 RPM, datacenter 60 RPM.
Velocity mode triggers on 5xx only (not 4xx). Auto-deactivates after 3 min.
Authenticated users bypass IP limits.
Failure mode: Bad thresholds → legitimate users get 429s. (See issue #1091: 166+ blocked requests from misconfigured velocity mode.)

Routing (`services/provider_failover.py`)

Failover on: 401, 402, 403, 404, 502, 503, 504.
No failover on: 400 (user error), 429 (use backoff).
Model-aware: OpenAI models → OpenAI/OpenRouter only. Open-source → all providers.
Circuit breaker: 5 consecutive failures → OPEN. 5 min cool-down → HALF_OPEN → test → CLOSED.
Failure mode: Wrong failover → format mismatch or cost mismatch. No failover → user sees raw provider errors.

Health (`services/intelligent_health_monitor.py`, `services/passive_health_monitor.py`)

Tiered active checks: critical 5min, popular 30min, standard 2h, on-demand 4h.
Passive: every real request updates health as a background task (zero overhead).
Database-backed persistence (survives restarts).
Failure mode: Dead providers stay in the routing chain → requests go into black holes.

Caching (7 layers)

Semantic → Exact-match → External (Butter.dev) → Provider API

Supporting: Auth cache, Catalog L1/L2, DB query cache, Health cache, Local memory fallback.

Redis failure → in-memory LRU (500 entries, 15 min TTL).
Catalog cache has stampede protection (one rebuild at a time).
Failure mode: All caches miss → every request hits DB + provider. Latency spikes 5ms → 500ms. Under load, connection pool exhausts.

Database (`config/supabase_config.py`)

Pool: 80 primary, 30–100 read replica, separate bulk pool.
Thread-safe singleton with double-checked locking.
Failed init retries after 60s (no reconnect storms).
Failure mode: Pool exhaustion → all layers fail (auth, billing, catalog, history).

Concurrency (`middleware/concurrency_middleware.py`)

Semaphore: 20 concurrent, 50 queued, 10s queue timeout.
Overload → 503 (not 429).
Streaming endpoints exempt.
Failure mode: Without this, traffic spikes overwhelm the event loop → death spiral.

Credits (`db/credit_transactions.py`, `services/pricing.py`)

Pre-flight: estimate max cost before provider call. Insufficient → 402 immediately.
Idempotent: same request ID → single deduction.
Atomic: balance + transaction in one DB operation.
Auto-refund: 5xx → credits returned. 4xx → credits kept.
High-value protection: GPT-4/Claude/Gemini/o-series blocked if pricing falls to default.
Failure mode: Double-charging, under-billing expensive models, or negative balances.

Failure isolation matrix

Component fails	Fallback	User impact
Redis	In-memory LRU + local rate limiting	None
Primary provider	14-provider failover	None
Database (momentary)	Cached auth + cached catalog	None for reads
Database (extended)	Degraded — health reports it	Partial (new auth/billing fail)
Health monitor	Passive monitoring from real requests	Reduced visibility
Prometheus/Grafana	System functions, not observable	None for users
Sentry	Errors logged locally	None for users
Stripe webhook	Returns 200, retries later	None (eventual consistency)

Known Gaps vs Conceptual Model

Features described in the conceptual model but not yet fully implemented:

Gap	Conceptual Model Section	Impact on stability
Input/Output guardrails (PII, injection, moderation)	2.2	No content safety layer yet
Multi-region deployment	2.11	Single region = single point of geographic failure
Health-weighted routing (route to healthiest first)	2.3	Always tries primary provider first, even if degraded
SLA tracking with credit-back	2.7	No automatic compensation for SLA breaches
Customer webhooks (`credits.low`, `model.degraded`)	2.7	Customers can't automate on system events
Traffic splitting across providers	2.3	Over-reliance on primary provider per model

One-Line Definitions

User: "My request works, costs what I expect, and I never see infrastructure."
Product Owner: "Every feature works correctly and features don't break each other."
Engineer: "Every component has a fallback, failures don't cascade, and we can see it happening."

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References

Stability Definition

Stability Definition

The Core Promise

1. User Perspective

Requests always succeed

No surprises

Edge cases handled

Summary

2. Product Owner Perspective

Each functional block holds its promise

Features interact without conflict

Measured by

3. Technical Perspective

Core principle

Layer-by-layer contracts

Ingress (middleware/security_middleware.py)

Routing (services/provider_failover.py)

Health (services/intelligent_health_monitor.py, services/passive_health_monitor.py)

Caching (7 layers)

Database (config/supabase_config.py)

Concurrency (middleware/concurrency_middleware.py)

Credits (db/credit_transactions.py, services/pricing.py)

Failure isolation matrix

Known Gaps vs Conceptual Model

One-Line Definitions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Ingress (`middleware/security_middleware.py`)

Routing (`services/provider_failover.py`)

Health (`services/intelligent_health_monitor.py`, `services/passive_health_monitor.py`)

Database (`config/supabase_config.py`)

Concurrency (`middleware/concurrency_middleware.py`)

Credits (`db/credit_transactions.py`, `services/pricing.py`)