Features Acceptance Criteria

Detailed acceptance criteria for every Gatewayz feature, organized by the Conceptual Model's 10-layer architecture.

Each feature includes: what it must do, what it must NOT do, detailed acceptance criteria with verification methods, code-level references, known issues, and implementation status.

Derived from: Conceptual Model Features, Features, Delta Report, Testing Plan, Test Coverage Audit

Last Updated: 2026-03-09 | Version: 2.0.4

How to Read This Document

Each feature section includes:

Description: What the feature does and its boundaries (from Conceptual Model)
Implementation Status: Current state (Complete / Partial / Not Implemented)
Acceptance Criteria: Numbered, testable statements — a feature is accepted when ALL criteria pass
Code References: File paths and line numbers for verification
Known Issues: Bugs, gaps, or discrepancies found during code investigation
Priority: P0 (must fix before release), P1 (should fix), P2 (nice to have), Deferred (post-release)

Layer 1: Ingress

1.1 API Key Authentication

Status: Complete

What it does: Authenticates every API request using API keys encrypted at rest with Fernet AES-128. Keys are looked up via HMAC-SHA256 hash for O(log n) retrieval. Validates that keys are active, not expired, and not rate-limited.

What it does NOT do: No OAuth/JWT for API requests. No automatic key rotation. No multi-key auth per request.

#	Criterion	Verification	Priority
AC-1.1.1	Valid API key in `Authorization: Bearer gw_*` header returns 200	Send request with valid key	P0
AC-1.1.2	Invalid API key returns 401, never 200 or 500	Send request with `Bearer invalid_key`	P0
AC-1.1.3	Expired API key returns 401 with clear message	Use a key past its `expires_at`	P0
AC-1.1.4	Deactivated API key (`is_active=false`) returns 401	Deactivate key, then use it	P0
AC-1.1.5	API keys in DB are Fernet-encrypted ciphertext, never plaintext	Query `api_keys_new` table directly, verify encrypted_key column is ciphertext	P0
AC-1.1.6	Key lookup uses HMAC-SHA256 hash index, not brute-force decryption of all keys	Verify `key_hash` column is indexed, lookup is O(log n) by timing with 1 key vs 1000 keys	P0
AC-1.1.7	Key format is `gw_{env}_{43_random_chars}` (e.g., `gw_live_abc123...`)	Create new key, verify format regex	P0
AC-1.1.8	Key creation stores `last4` characters for user-friendly identification	Create key, check `last4` field in response and DB	P1
AC-1.1.9	Authentication is cached (5-min TTL, 512-entry LRU) — second request with same key is faster	Time two consecutive auth calls, second should be <5ms vs 50-150ms	P1
AC-1.1.10	When Redis is down, auth cache falls back to local memory — requests are never blocked	Stop Redis, verify auth still works	P0

Code References:

src/security/security.py — Fernet encryption, HMAC hashing
src/security/deps.py — get_api_key(), get_current_user(), validate_api_key_security()
src/db/api_keys.py — Key CRUD, key lookup by hash

1.2 Role-Based Access Control (RBAC)

Status: Complete

What it does: Assigns roles (admin, team, dev, free) to users. Permissions checked at dependency-injection level before route handlers execute. Role changes are audit-logged.

What it does NOT do: No granular per-model permissions. No custom roles. No team-level RBAC. No provider-level permissions.

#	Criterion	Verification	Priority
AC-1.2.1	Non-admin API key returns 403 on ALL `/admin/*` endpoints	`GET /admin/users` with user key	P0
AC-1.2.2	Admin API key returns 200 on admin endpoints	`GET /admin/users` with admin key	P0
AC-1.2.3	Unauthorized admin access attempts are logged via `audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS")`	Attempt admin access with user key, check audit log	P0
AC-1.2.4	Role updates require admin auth and are logged with a reason	`POST /admin/roles/update` with user_id, new_role, reason	P0
AC-1.2.5	`GET /admin/roles/permissions/{role}` returns the correct permission set for each role	Check all 4 roles	P1
AC-1.2.6	Role change audit log is retrievable at `GET /admin/roles/audit/log`	Verify entries with timestamps and reasons	P1

Code References:

src/security/deps.py — require_admin dependency
src/routes/admin.py — Admin route handlers
src/db/roles.py — Role management

1.3 Per-Key IP Allowlists

Status: Complete

What it does: Restricts API key usage to specific IP addresses or CIDR ranges. Requests from non-allowlisted IPs are rejected before processing.

What it does NOT do: No geo-based restrictions. No IPv6 ranges. No automatic IP suggestions.

#	Criterion	Verification	Priority
AC-1.3.1	Admin can create IP allowlist entries with IPv4 addresses	`POST /api/admin/ip-whitelist` with `{"ip": "1.2.3.4"}`	P0
AC-1.3.2	Admin can create IP allowlist entries with CIDR notation	`POST /api/admin/ip-whitelist` with `{"ip": "10.0.0.0/24"}`	P0
AC-1.3.3	API key with allowlist rejects requests from non-allowed IPs with 403	Use key from IP not in allowlist	P0
AC-1.3.4	API key with allowlist accepts requests from allowed IPs	Use key from allowlisted IP	P0
AC-1.3.5	`POST /api/admin/ip-whitelist/check` correctly reports allowed vs blocked IPs	Test with both allowed and blocked IPs	P1
AC-1.3.6	Allowlist entries can be listed, updated, and deleted	CRUD operations on `/api/admin/ip-whitelist/*`	P1

Code References:

src/routes/admin.py — IP allowlist endpoints
src/security/deps.py — IP validation in validate_api_key_security()

1.4 Domain Restrictions

Status: Complete

What it does: Limits which HTTP referrer domains can use a specific API key. Prevents stolen keys from being used on unauthorized domains.

What it does NOT do: No domain ownership validation. No subdomain wildcards. No server-side restriction (only applies when Referer header present).

#	Criterion	Verification	Priority
AC-1.4.1	API key with domain restriction rejects requests with wrong Referer header	Send request with `Referer: https://unauthorized.com`	P0
AC-1.4.2	API key with domain restriction accepts requests with correct Referer	Send request with configured Referer domain	P0
AC-1.4.3	Requests without Referer header bypass domain restriction (server-side usage)	Send request without Referer header	P0
AC-1.4.4	Multiple domains can be configured per key	Configure 3 domains, verify all work	P1

Code References:

src/security/deps.py — validate_api_key_security() domain check

1.5 Three-Layer Rate Limiting

Status: Complete (with known header gap on Layers 2 and 3)

What it does:

Layer 1 (IP): Security middleware with behavioral analysis, velocity detection. 300 RPM for unauthenticated, authenticated users exempt.
Layer 2 (API Key): Redis-backed per-key limits tied to plan tier.
Layer 3 (Anonymous): Stricter limits for unauthenticated requests.
Fallback: In-memory rate limiter when Redis is unavailable.

What it does NOT do: No per-model rate limits. No burst/token-bucket. No cross-instance IP state sharing. Rejected requests consume zero credits.

#	Criterion	Verification	Priority
AC-1.5.1	Unauthenticated requests exceeding 300 RPM from same IP receive 429	Send 301 requests from one IP	P0
AC-1.5.2	Authenticated users are exempt from IP-level rate limiting	Verify no IP block on auth requests	P0
AC-1.5.3	API key exceeding plan RPM receives 429	Exceed per-key limit	P0
AC-1.5.4	Anonymous rate limits are stricter than authenticated limits	Compare thresholds for anon vs auth	P0
AC-1.5.5	Layer 1 429 response includes `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`, `X-RateLimit-Reason`, `X-RateLimit-Mode`	Trigger Layer 1 429, inspect headers	P0
AC-1.5.6	Layer 2 429 response includes `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`	Trigger Layer 2 429, inspect headers	P0 (KNOWN BUG)
AC-1.5.7	Layer 3 429 response includes `Retry-After` and `X-RateLimit-*` headers	Trigger Layer 3 429, inspect headers	P0 (KNOWN BUG)
AC-1.5.8	When Redis is down, rate limiting continues via in-memory fallback — requests are never blocked	Stop Redis, verify rate limiting works	P0
AC-1.5.9	Velocity mode activates when error rate exceeds 25% and reduces limits to 50%	Trigger >25% error rate, check `GET /velocity-mode-status`	P0
AC-1.5.10	Velocity mode deactivates after 3 minutes of normal error rates	Wait for cooldown, verify normal limits restored	P1
AC-1.5.11	Rate limit configuration viewable at `GET /user/rate-limits`	Check response format	P1
AC-1.5.12	Per-key rate limits updatable via `PUT /user/rate-limits/{key_id}`	Update and verify enforcement	P1
AC-1.5.13	Auth endpoint rate-limits to 10 requests per 15 minutes per IP	`POST /auth` 11 times, 11th returns 429	P0
AC-1.5.14	Registration rate-limits to 3 requests per hour per IP	`POST /auth/register` 4 times, 4th returns 429	P0

Code References:

src/middleware/security_middleware.py (lines 647-716) — Layer 1, headers present
src/services/rate_limiting.py (lines 78-94) — Layer 2, RateLimitResult dataclass has fields but NOT converted to HTTP headers
src/services/anonymous_rate_limiter.py — Layer 3, NO headers
src/services/rate_limiting_fallback.py — In-memory fallback

Known Issues:

P0-5 (Delta Report): Layer 2 RateLimitResult fields exist but are not converted to HTTP response headers. Layer 3 has no rate limit headers at all. Clients get bare 429 rejections with no retry information.

1.6–1.9 Input Guardrails (PII Detection, Prompt Injection, Topic Restrictions, Content Moderation)

Status: Not Implemented (Deferred)

What these would do: PII scanning (phone, SSN, email, credit card), prompt injection pattern detection, per-key topic restrictions, content moderation via external classifiers.

#	Criterion	Verification	Priority
AC-1.6.1	PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers	Send prompt with PII	Deferred
AC-1.7.1	Prompt injection patterns that attempt to override system prompts are detected and blocked	Send known injection pattern	Deferred
AC-1.8.1	Per-API-key topic restrictions limit responses to configured domains	Configure restriction, test out-of-domain	Deferred
AC-1.9.1	Content moderation blocks harmful inputs before reaching providers	Send harmful content	Deferred

Note: These are Conceptual Model features (D-1 through D-4 in Delta Report). Not required for stable release. No code exists.

1.10–1.12 Output Guardrails (Content Filtering, Structured Output Validation, Hallucination Flags)

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-1.10.1	Output content filtering scans responses for policy violations before returning	Trigger policy-violating response	Deferred
AC-1.11.1	Structured output validation confirms JSON schema conformance when requested	Request JSON schema output	Deferred (D-5, Small effort)
AC-1.12.1	Provider-side safety metadata (refusals, safety triggers) is surfaced in standardized format	Trigger safety filter, inspect response	Deferred

Layer 2: Core Routing Engine

2.1 Model Resolution Pipeline

Status: Complete

What it does: Three-stage pipeline: alias normalization (120+ aliases) → provider detection (overrides → format rules → mapping tables → org-prefix fallbacks) → model ID transformation (provider-native format).

What it does NOT do: No user-defined aliases. No version/snapshot resolution. No per-modality routing differences.

#	Criterion	Verification	Priority
AC-2.1.1	`gpt-4o` resolves to `openai/gpt-4o`	`POST /v1/chat/completions` with `model: "gpt-4o"`	P0
AC-2.1.2	`r1` resolves to `deepseek/deepseek-r1`	`POST /v1/chat/completions` with `model: "r1"`	P0
AC-2.1.3	Canonical IDs (e.g., `openai/gpt-4o`) work directly without alias resolution	POST with canonical ID	P0
AC-2.1.4	Provider detection correctly routes `google/gemini-*` models to Vertex when credentials available	POST with Gemini model	P0
AC-2.1.5	No alias maps to itself (no self-referencing loops)	Inspect `MODEL_ALIASES` dict for cycles	P0
AC-2.1.6	Fireworks model IDs are transformed to `accounts/fireworks/models/...` format	POST with Fireworks model, verify upstream call format	P1
AC-2.1.7	Nonexistent model returns 400 or 404, not 500	POST with `model: "nonexistent/model"`	P0

Code References:

src/services/models.py — MODEL_ALIASES dict, resolution pipeline
src/services/model_transformations.py — Provider-specific ID transformations
src/services/model_availability.py — Availability checking

2.2 Intelligent Routing — General Router

Status: Complete

What it does: ML-powered model selection via NotDiamond. Four modes: quality (openai/gpt-4o), cost (openai/gpt-4o-mini), latency (groq/llama-3.3-70b-versatile), balanced (anthropic/claude-sonnet-4). Falls back to mode-specific defaults when NotDiamond unavailable.

What it does NOT do: No user feedback learning. No custom model pools. No routing constraints.

#	Criterion	Verification	Priority
AC-2.2.1	`router:general:quality` selects a high-quality model and returns 200	POST chat with `model: "router:general:quality"`	P0
AC-2.2.2	`router:general:cost` selects a cheaper model than quality mode	Compare selected models for same prompt	P0
AC-2.2.3	`router:general:latency` selects a low-latency model	POST and verify selection	P0
AC-2.2.4	`router:general:balanced` considers quality, cost, and latency	POST and verify selection	P0
AC-2.2.5	When NotDiamond is unavailable, fallback models are used without error	Disable NotDiamond, verify graceful fallback	P0
AC-2.2.6	`GET /general-router/settings/options` returns available strategies and model pools	Inspect response	P1
AC-2.2.7	`POST /general-router/test` returns selected model + reasoning	POST with sample prompt	P1

Code References:

src/services/general_router.py — Routing logic, NotDiamond integration
src/routes/general_router.py — Endpoints

2.3 Intelligent Routing — Code Router

Status: Complete

What it does: Benchmark-driven model selection for coding tasks. 4 tiers by SWE-bench/HumanEval scores. Modes: auto (complexity-based), price, quality, agentic. Static data from code_quality_priors.json.

What it does NOT do: No code execution. No feedback learning. No custom tiers. No language detection.

#	Criterion	Verification	Priority
AC-2.3.1	`router:code:auto` classifies prompt complexity and selects appropriate tier	POST with code prompt	P0
AC-2.3.2	`router:code:quality` selects highest-tier code model	POST and verify	P0
AC-2.3.3	`router:code:price` selects cost-effective code model	POST and verify	P0
AC-2.3.4	`router:code:agentic` selects model optimized for multi-step tool use	POST and verify	P0
AC-2.3.5	`GET /code-router/tiers` returns models with SWE-bench/HumanEval scores	Inspect response	P0
AC-2.3.6	Code router works entirely from in-memory data (no DB/Redis dependency)	Verify response with Redis down	P0
AC-2.3.7	`POST /code-router/test` returns selected model and routing rationale	POST with sample prompt	P1

Code References:

src/services/code_router.py — Routing logic, tier selection
src/services/code_quality_priors.json — Static benchmark data
src/routes/code_router.py — Endpoints

2.4 Provider Failover

Status: Complete

What it does: 14-provider prioritized failover chain. Failover triggers on 401/402/403/404/502/503/504. Does NOT trigger on 400 (user error) or 429 (retries with backoff). Model-aware rules: OpenAI → OpenRouter only, Anthropic → OpenRouter only, open-source → all providers.

What it does NOT do: No mid-stream failover. No user-configured chains. No same-pricing guarantee across providers.

#	Criterion	Verification	Priority
AC-2.4.1	Primary provider 502/503/504 → request succeeds via fallback transparently	Force primary failure, verify success	P0
AC-2.4.2	Provider 401/402/403/404 → failover to next provider	Force auth error, verify failover	P0
AC-2.4.3	Provider 400 (user error) → returns 400 to user, NO failover	Send malformed request	P0
AC-2.4.4	Provider 429 → retries with backoff, does NOT failover	Trigger rate limit, verify retry behavior	P0
AC-2.4.5	OpenAI models only failover to OpenAI → OpenRouter	Inspect failover chain for `openai/gpt-4o`	P0
AC-2.4.6	Anthropic models only failover to Anthropic → OpenRouter	Inspect failover chain for `anthropic/claude-sonnet-4`	P0
AC-2.4.7	Open-source models can failover across all providers	Inspect chain for `meta-llama/llama-3-70b`	P0
AC-2.4.8	Failover chain skips providers with OPEN circuit breakers	Open a breaker, verify provider is skipped	P0
AC-2.4.9	User receives no indication of failover (transparent to client)	Monitor response during failover	P0

Code References:

src/services/provider_failover.py — Failover chain construction, error classification
src/routes/chat.py — build_provider_failover_chain() integration

2.5 Circuit Breakers

Status: Complete (with timing discrepancy)

What it does: Per-provider circuit breakers. CLOSED → OPEN (5 consecutive failures) → HALF_OPEN (after timeout) → CLOSED (3 consecutive successes) or back to OPEN. Redis + in-memory state.

What it does NOT do: No per-provider threshold configuration. No error-type differentiation. No operator alerts. No persistent state beyond Redis.

#	Criterion	Verification	Priority
AC-2.5.1	New provider starts in CLOSED state	`GET /circuit-breakers/{new_provider}`	P0
AC-2.5.2	After 5 consecutive failures, state transitions to OPEN	Send 5 failing requests, check state	P0
AC-2.5.3	OPEN state prevents requests to that provider	Verify provider is skipped in failover	P0
AC-2.5.4	After timeout period, OPEN transitions to HALF_OPEN	Wait for timeout, check state	P0
AC-2.5.5	In HALF_OPEN, a successful request transitions to CLOSED	Send success, check state	P0
AC-2.5.6	In HALF_OPEN, a failed request transitions back to OPEN	Send failure, check state	P0
AC-2.5.7	`POST /circuit-breakers/{provider}/reset` resets to CLOSED	Reset and verify	P0
AC-2.5.8	`POST /circuit-breakers/reset-all` resets all breakers	Reset all and verify	P0
AC-2.5.9	Circuit breaker endpoints require NO auth (public)	Verify no auth needed	P1
AC-2.5.10	Prometheus metrics emitted on state transitions	Check `circuit_breaker_state_transitions_total`	P1

Code References:

src/services/circuit_breaker.py (line 67) — Default timeout 60 seconds
Redis keys: circuit_breaker:{provider}:{state|failure_count|success_count|opened_at} (3600s TTL)

Known Issues:

P1-7 (Delta Report): Code uses 60-second timeout, but Conceptual Model says 5 minutes and wiki Testing Plan says 5 minutes. Either code or docs must be updated.

2.6 Health-Weighted Load Balancing

Status: Partial

What it does: Checks primary provider health score before routing. Below-threshold providers are demoted in failover chain.

What it does NOT do: No proportional traffic splitting by health score. No per-model health. No predictive health.

#	Criterion	Verification	Priority
AC-2.6.1	When primary provider health is below threshold, a healthier provider is promoted	Degrade a provider, verify chain reordering	P1
AC-2.6.2	Health-based promotion is a binary decision (promote or don't)	Verify no weighted splitting	P1

2.7–2.8 Latency-Optimal / Cost-Optimal Selection

Status: Partial

What it does: Route to lowest-latency or cheapest provider for same model. General Router "latency" mode hardcodes to groq/llama-3.3-70b-versatile.

#	Criterion	Verification	Priority
AC-2.7.1	Latency mode selects a low-latency provider	Verify model selection via router	P1
AC-2.8.1	Cost mode selects cheapest capable provider	Compare pricing of selected vs alternatives	P1

Known Issues: No dynamic latency-optimal selection — latency mode hardcodes a specific model rather than measuring real-time latency. Deferred for post-release.

2.9 Traffic Splitting

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-2.9.1	Traffic for same model is distributed across providers at configured ratios	Monitor provider selection distribution	Deferred (D-17)

Layer 3: Intelligence

3.1 Tiered Health Monitoring

Status: Complete

What it does: Continuous monitoring at intervals by tier: Critical (5min), Popular (30min), Standard (2-4hr), On-Demand (when requested). Health checks verify availability and latency.

#	Criterion	Verification	Priority
AC-3.1.1	`GET /health` always returns 200, even when dependencies are degraded	Call when DB is down	P0
AC-3.1.2	Health response includes `version`, `status`, and `timestamp`	Inspect response	P0
AC-3.1.3	`GET /health/system` returns memory, CPU, and connection pool stats	Inspect response	P0
AC-3.1.4	Provider health scores are 0-100 per provider	`GET /health/providers`	P0
AC-3.1.5	Model health shows `healthy`, `degraded`, or `down` per model	`GET /health/models`	P0
AC-3.1.6	`GET /health/quick` is sub-millisecond (static response)	Time the endpoint	P1
AC-3.1.7	`GET /health/railway` returns comprehensive check (DB, Redis, providers)	Inspect response	P1
AC-3.1.8	Gateway health dashboard returns HTML and JSON formats	`GET /health/gateways/dashboard` and `/data`	P1
AC-3.1.9	Health insights provide actionable recommendations	`GET /health/insights`	P2
AC-3.1.10	Background monitoring can be started and stopped	`POST /health/monitoring/start`, `/stop`	P1

Code References:

src/services/intelligent_health_monitor.py — Tiered monitoring
src/services/autonomous_monitor.py — Background monitoring
src/routes/health.py — Health endpoints

3.2 Passive Health Capture

Status: Complete

What it does: Every real inference request contributes health data as a background task — success/failure, latency, tokens, provider response codes. Zero overhead on request path.

#	Criterion	Verification	Priority
AC-3.2.1	Health data is captured after response is returned (no latency impact on user)	Verify background task execution	P0
AC-3.2.2	Captured data includes: latency, tokens, status, provider	Inspect health data store	P1

3.3 Incident Management

Status: Complete

What it does: Auto-creates incidents on health degradation. Severity levels, timestamps, captured logs, resolution tracking, MTTR calculation.

#	Criterion	Verification	Priority
AC-3.3.1	Downtime incidents can be listed with filters	`GET /admin/downtime/incidents?status=ongoing`	P0
AC-3.3.2	Incidents can be resolved with notes	`POST /admin/downtime/incidents/{id}/resolve`	P0
AC-3.3.3	Already-resolved incidents reject re-resolution	Attempt to resolve again	P1
AC-3.3.4	Incident analysis shows error patterns and type distribution	`GET /admin/downtime/incidents/{id}/analysis`	P1
AC-3.3.5	MTTR statistics are computed	`GET /admin/downtime/statistics`	P1

Code References:

src/routes/admin.py — Downtime tracking endpoints

3.4 Model Quality Scoring & Benchmarks

Status: Partial

What it does: Hardcoded quality priors for ~20 models (task-specific: simple_qa, code_gen, reasoning, etc.). SWE-bench/HumanEval in Code Router.

What it does NOT do: Not stored in DB. Not updatable without code change. Missing MMLU, MATH, MT-Bench, LMSYS Arena ELO, LiveBench.

#	Criterion	Verification	Priority
AC-3.4.1	Code router tiers include SWE-bench and HumanEval scores	`GET /code-router/tiers`	P0
AC-3.4.2	Model selector uses quality priors for task-specific routing	Verify `model_selector.py` quality maps	P1

Known Issues: Quality data is static/hardcoded, not from DB. Missing several major benchmarks. No dynamic updating.

3.5 Per-Customer Quality Tracking

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-3.5.1	Per-customer success rates are tracked per model	Check customer-model analytics	Deferred (D-19)

3.6 Provider Credit Monitoring

Status: Partial (OpenRouter only)

What it does: Tracks upstream provider credit balances. OpenRouter: full implementation with API call, 15-min cache, threshold alerts (critical $5, warning $20, info $50).

#	Criterion	Verification	Priority
AC-3.6.1	`GET /api/provider-credits/balance` returns credit balances for monitored providers	Inspect response	P0
AC-3.6.2	OpenRouter balance is cached for 15 minutes	Check timing of two consecutive calls	P1
AC-3.6.3	Threshold alerts fire at critical ($5), warning ($20), info ($50)	Verify alert logic	P1

Code References:

src/services/provider_credit_monitor.py (lines 33-138) — OpenRouter implementation
Lines 165-167 — TODO stubs for all other providers

Known Issues:

P1-1 (Delta Report): Only OpenRouter implemented. 29 other providers have TODO stubs. No preemptive deprioritization in failover chain.

Layer 4: Caching System

4.1 Semantic Cache

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-4.1.1	Semantically similar prompts return cached responses (cosine similarity >0.95)	Test with paraphrased prompt	Deferred (D-8)

4.2 Exact-Match Response Cache

Status: Not Implemented (Deferred — infrastructure exists but not wired)

#	Criterion	Verification	Priority
AC-4.2.1	Identical inference requests (same messages + model + params) return cached response	Send same request twice, compare latency	Deferred (D-9)

Code References:

src/services/response_cache.py — SHA-256 hashing, Redis + in-memory fallback exists but NOT wired into inference path

4.3 External Cache (Butter.dev)

Status: Partial (Ghost Feature — P0 issue)

What it does: Butter.dev proxy used for all requests. User preference endpoints exist but are ignored.

#	Criterion	Verification	Priority
AC-4.3.1	If Butter cache settings endpoints exist, user preference MUST be respected during inference	Set `enable_butter_cache=false`, verify Butter proxy is NOT used	P0 (KNOWN BUG)
AC-4.3.2	OR: Butter cache settings endpoints are removed entirely	Verify endpoints don't exist	P0 Alternative

Code References:

src/routes/users.py (lines 305-408) — GET/PUT /user/cache-settings exist, store preference
src/routes/chat.py (line 697) — Always calls get_butter_pooled_async_client() without checking preference

Known Issues:

P0-1 (Delta Report): Ghost feature. User can toggle a setting that does nothing. Trust-eroding.

4.4 Supporting Caches

Status: Complete

#	Criterion	Verification	Priority
AC-4.4.1	Catalog endpoint responds in sub-100ms on cache hit	Time `GET /v1/models` on second request	P0
AC-4.4.2	Auth cache reduces lookup latency from ~100ms to <5ms	Compare first vs second auth timing	P1
AC-4.4.3	When Redis is down, local memory cache activates — no requests blocked	Stop Redis, verify normal operation	P0
AC-4.4.4	Cache invalidation clears all layers	`POST /admin/cache/clear`, verify fresh data	P1
AC-4.4.5	Stampede protection prevents multiple simultaneous cache rebuilds	Concurrent requests to cold cache	P1

Layer 5: Model Catalog

5.1 Background Model Sync

Status: Complete

#	Criterion	Verification	Priority
AC-5.1.1	Model sync can be triggered incrementally and fully	`POST /admin/model-sync/trigger` and `/all`	P0
AC-5.1.2	If provider API is down, last synced catalog is served	Verify stale catalog on provider failure	P0
AC-5.1.3	Per-provider sync works	`POST /admin/model-sync/provider/{slug}`	P1
AC-5.1.4	Full resync (delete + reimport) works	`POST /admin/model-sync/full`	P1

5.2 Model Metadata Standard

Status: Complete

#	Criterion	Verification	Priority
AC-5.2.1	Every model in `GET /v1/models` has `id`, `name`, `provider_slug`, `context_length`, and pricing	Inspect response schema	P0
AC-5.2.2	No model has null or zero pricing for both prompt and completion	Scan all models in response	P0 (see 5.3)

5.3 Catalog Inclusion Requirements

Status: Partial (gating not enforced at sync)

#	Criterion	Verification	Priority
AC-5.3.1	Models without pricing are rejected during sync (not visible to users)	Check catalog for models with null pricing	P1 (KNOWN BUG)
AC-5.3.2	`GET /v1/models/unique` returns no duplicate model IDs	Check for uniqueness	P0
AC-5.3.3	High-value models without explicit pricing are BLOCKED, not served at default rate	Verify pricing guard for GPT-4, Claude, Gemini	P0

Code References:

src/services/model_catalog_sync.py — extract_pricing() (lines 136-153) returns all None for missing pricing. Line 368 checks if any(pricing.values()) but is non-blocking — models ARE synced without pricing.
src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS guard raises ValueError on default pricing fallback

Known Issues:

P1-3 (Delta Report): Models without pricing are synced into the catalog. Non-high-value models without pricing fall to default ($0.00002/token) — potential under-billing.

5.4 HuggingFace Enrichment

Status: Complete

#	Criterion	Verification	Priority
AC-5.4.1	Model detail returns HuggingFace data (downloads, likes, parameters) when available	`GET /api/models/detail?model_id=meta-llama/...`	P1
AC-5.4.2	HuggingFace data is cached with TTL	Verify caching on repeated requests	P1

5.5 Model Discovery & Search

Status: Complete

#	Criterion	Verification	Priority
AC-5.5.1	`GET /v1/models?provider=fireworks` returns only Fireworks models	Filter and verify	P0
AC-5.5.2	`GET /v1/models/search?q=llama` returns matching models	Verify results	P0
AC-5.5.3	`GET /v1/models/trending` returns models ranked by usage	Inspect response	P1
AC-5.5.4	`GET /v1/gateways` returns all gateways with name, color, priority, site_url	Inspect response	P0
AC-5.5.5	Model comparison works across providers	`GET /v1/models/{provider}/{model}/compare`	P1

Layer 6: Business

6.1 Credit System

Status: Complete (with atomicity concern on legacy path)

What it does: Atomic billing unit. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Pre-flight checks, idempotent deductions (UNIQUE constraint + RPC), subscription allowance consumed first, auto-refund on provider errors.

What it does NOT do: No real-time credit streaming during generation. No credit expiration. No rollover. No credit transfers. No multi-currency.

#	Criterion	Verification	Priority
AC-6.1.1	Pre-flight check: user with 0 credits receives 402 BEFORE any provider call	POST with 0-credit user, verify no upstream call	P0
AC-6.1.2	Idempotent deduction: same request ID sent twice deducts credits only once	POST twice with same `X-Request-ID`	P0
AC-6.1.3	Subscription allowance consumed before purchased credits	User with both: make request, verify subscription decreases first	P0
AC-6.1.4	Provider 5xx error → automatic credit refund	Trigger 5xx, verify refund in `credit_transactions`	P0
AC-6.1.5	Provider timeout → automatic credit refund	Trigger timeout, verify refund	P0
AC-6.1.6	Provider 4xx error (user error) → NO refund	Trigger 4xx, verify no refund	P0
AC-6.1.7	High-value models (GPT-4, Claude, Gemini, o1/o3/o4) blocked if pricing falls to default	Verify pricing guard fires for each pattern	P0
AC-6.1.8	Credit transactions logged with request_id, user_id, model, token counts, cost	Check `credit_transactions` table	P0
AC-6.1.9	Balance update and transaction log happen atomically (single DB transaction via RPC)	Verify `atomic_deduct_credits` RPC is used	P0
AC-6.1.10	Legacy fallback path either doesn't exist or handles transaction logging failure safely	Verify legacy path behavior on logging failure	P0 (KNOWN RISK)
AC-6.1.11	Credit transaction history is paginated	`GET /credits/transactions?limit=10`	P1
AC-6.1.12	Admin can add/adjust/refund credits	`POST /credits/add`, `/adjust`, `/refund`	P1
AC-6.1.13	Daily usage cap prevents runaway costs	Exceed daily limit, verify 402	P1
AC-6.1.14	`request_id` has UNIQUE constraint in DB (belt-and-suspenders idempotency)	Check migration `20260223000001_add_request_id_to_credit_transactions.sql`	P0

Code References:

src/db/users.py (lines 701-1106) — Credit deduction
- Atomic RPC path (lines 862-967) — Correct
- Legacy fallback path (lines 987-1096) — Risk: two separate calls, if logging fails credits already deducted (lines 1077-1082)
src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS
src/routes/chat.py (lines 1670-1742) — Auto-refund logic

Known Issues:

P0-2: Legacy fallback path may create orphaned deductions (balance reduced, no transaction record)
P0-3: Pricing guard needs end-to-end verification — must fire BEFORE provider call
P0-4: Auto-refund needs integration testing for edge cases (partial stream, refund failure)

6.2 Plans & Tiers

Status: Complete

#	Criterion	Verification	Priority
AC-6.2.1	New user gets $5 credits and trial expiring in 3 days	Register, check balance + `trial_end`	P0 (config mismatch)
AC-6.2.2	Trial user can make requests until credits/limits exhausted	Make requests during trial	P0
AC-6.2.3	Expired trial returns 402 for paid models	POST after trial expiry	P0
AC-6.2.4	Expired trial CAN access `:free` suffix models	POST with `:free` model after expiry	P0
AC-6.2.5	`GET /plans` returns available plan tiers with pricing	Inspect response	P0
AC-6.2.6	`GET /trial/status` returns `active`/`expired` and days remaining	Check response	P0
AC-6.2.7	Unused subscription allowance does NOT roll over (resets monthly)	Verify at month boundary	P1
AC-6.2.8	Purchased credits never expire and survive plan changes	Change plan, verify credits persist	P1
AC-6.2.9	Daily trial limit ($1/day) is enforced	Exceed $1 in trial, verify blocking	P0

Known Issues:

P0-7 (Delta Report): Trial config mismatch — CLAUDE.md says $5, wiki says $10, code says $5. Must reconcile.
src/config/usage_limits.py — Trial: $5, 3 days, $1/day
src/db/trials.py (line 44) — Formula trial_days * 5 suggests variable durations

6.3 Customer Usage Analytics

Status: Partial

#	Criterion	Verification	Priority
AC-6.3.1	User can view activity stats (total requests/tokens/spend by model/provider)	`GET /user/activity/stats`	P0
AC-6.3.2	Activity log is paginated (limit 1-1000)	`GET /user/activity/log?limit=50`	P0
AC-6.3.3	Activity log `total` field returns actual DB total, not page count	Verify `total` vs `count`	P1 (KNOWN BUG)
AC-6.3.4	Per-API-key usage breakdown is available	`GET /user/api-keys/{key_id}/usage`	P2 (NOT IMPLEMENTED)
AC-6.3.5	CSV/JSON export is available	`GET /user/usage/export?format=csv`	P2 (NOT IMPLEMENTED)

Known Issues:

P1-4 (Delta Report): src/routes/users.py (line 515) — "total": len(transactions) returns page count, not DB total
P2-1: activity_log stores user_id but NOT api_key_id — no per-key breakdown

6.4 Customer Webhooks

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-6.4.1	Outbound webhook delivery for `credits.low`, `credits.depleted`, `model.degraded` events	Configure webhook, trigger events	Deferred (D-10)

6.5 SLA Tracking

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-6.5.1	Per-tier SLA violations are detected with auto credit-back compensation	Monitor SLA metrics	Deferred (D-14)

Layer 7: Developer Platform

7.1 Prompt Management

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-7.1.1	Template library with versioning	CRUD on prompt templates	Deferred (D-12)

7.2 Batch / Async Inference

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-7.2.1	`POST /v1/batch/jobs` submits bulk workloads	Submit batch job	Deferred (D-11)

7.3 Evaluation & Testing

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-7.3.1	Side-by-side model comparison for same prompt	Compare endpoint	Deferred (D-13)

7.4 Playground

Status: Not Implemented (Deferred — frontend-coupled)

#	Criterion	Verification	Priority
AC-7.4.1	Interactive prompt testing UI	Access playground	Deferred

Layer 8: Observability

8.1 Internal Metrics & Dashboards

Status: Complete

#	Criterion	Verification	Priority
AC-8.1.1	`GET /metrics` returns valid Prometheus text format	Parse response	P0
AC-8.1.2	OpenMetrics format with exemplar support is available via content negotiation	`Accept: application/openmetrics-text`	P1
AC-8.1.3	Parsed metrics include p50, p95, p99 latency percentiles	`GET /api/metrics/parsed`	P0
AC-8.1.4	Real-time stats update within 60 seconds of new requests	`GET /api/monitoring/stats/realtime`	P1
AC-8.1.5	Error rates tracked per provider and per model	`GET /api/monitoring/error-rates`	P1
AC-8.1.6	Anomaly detection flags unusual patterns	`GET /api/monitoring/anomalies`	P1
AC-8.1.7	Grafana SimpleJSON datasource protocol fully implemented	`GET /prometheus/datasource` (test), `POST /search`, `/query`	P1

8.2 Distributed Tracing

Status: Complete

#	Criterion	Verification	Priority
AC-8.2.1	OpenTelemetry traces are initialized and exportable	`GET /api/instrumentation/health`	P0
AC-8.2.2	Every request gets a trace ID linking middleware → auth → routing → provider → billing	Inspect trace in Tempo	P1
AC-8.2.3	Exemplar linking from metrics to traces works	Verify in Grafana	P2

8.3 Error Tracking

Status: Complete

#	Criterion	Verification	Priority
AC-8.3.1	Autonomous error monitor status is retrievable	`GET /error-monitor/autonomous/status`	P0
AC-8.3.2	Dashboard provides error landscape overview	`GET /error-monitor/dashboard`	P0
AC-8.3.3	Recent errors sorted by recency	`GET /error-monitor/errors/recent`	P0
AC-8.3.4	Critical errors flagged separately	`GET /error-monitor/errors/critical`	P0
AC-8.3.5	Error patterns detect recurring issues	`GET /error-monitor/errors/patterns`	P1
AC-8.3.6	AI fix suggestions generated via Claude	`POST /error-monitor/fixes/generate-for-error`	P2

Note: All error monitor endpoints require NO auth (all public). Error patterns are in-memory only — lost on restart.

8.4 AI-Specific Tracing

Status: Partial

#	Criterion	Verification	Priority
AC-8.4.1	Arize Phoenix config exists and is functional	Check Arize initialization	P2
AC-8.4.2	OpenTelemetry captures inference metadata (model, tokens, latency)	Inspect trace attributes	P1

Known Issues: Arize Phoenix not exposed via API. Braintrust not integrated. No prompt/response pair recording.

8.5 Profiling

Status: Complete

#	Criterion	Verification	Priority
AC-8.5.1	Pyroscope profiling tags cache/Redis layers with operation context	Verify tag presence in Pyroscope	P1
AC-8.5.2	Profiling does not add measurable latency to requests	Compare request times with/without profiling	P1

8.6 Customer-Facing Observability

Status: Partial

#	Criterion	Verification	Priority
AC-8.6.1	User can view their own usage dashboard data	`GET /user/activity/stats`, `GET /user/monitor`	P0
AC-8.6.2	Model health status visible to users	`GET /v1/model-health`	P0
AC-8.6.3	Public status page with provider/model availability	`GET /v1/status/`, `GET /v1/status/providers`	P0
AC-8.6.4	Latency percentiles exposed to customers	`GET /user/latency?model=...`	P2 (NOT IMPLEMENTED)

Layer 9: API Compatibility

9.1 OpenAI-Compatible API

Status: Complete

What it does: POST /v1/chat/completions — full drop-in replacement. Streaming SSE, tool/function calling, JSON mode, logprobs. Any OpenAI SDK app works by changing base URL.

#	Criterion	Verification	Priority
AC-9.1.1	Non-streaming returns 200 with `choices[0].message.content`, `usage.prompt_tokens`, `usage.completion_tokens`	POST with `stream: false`	P0
AC-9.1.2	Streaming returns SSE where each line starts with `data:` , ends with `data: [DONE]`	POST with `stream: true`	P0
AC-9.1.3	`response_format: {"type": "json_object"}` returns valid parseable JSON	POST with JSON mode	P0
AC-9.1.4	`tools` array returns `tool_calls` when model decides to call a tool	POST with tool definitions	P0
AC-9.1.5	`logprobs: true` returns a `logprobs` field	POST with logprobs	P1
AC-9.1.6	OpenAI Python SDK works with zero changes beyond `base_url` and `api_key`	`openai.OpenAI(base_url="$BASE/v1")`	P0
AC-9.1.7	All inference errors use OpenAI-compatible format: `{"error": {"message": "...", "type": "...", "code": "..."}}`	Trigger errors, inspect format	P1 (KNOWN ISSUE)
AC-9.1.8	Unauthenticated request with whitelisted model returns 200	POST without auth header	P0
AC-9.1.9	Unauthenticated request with non-whitelisted model returns 401/403	POST without auth header	P0
AC-9.1.10	Streaming normalization handles OpenAI, Gemini, Anthropic, Fireworks formats	Test stream from each provider type	P0
AC-9.1.11	Unrecognized streaming format logs a warning (not silently dropped)	Check logs for dropped chunks	P1 (KNOWN BUG)

Known Issues:

P1-2: ~5% of errors use FastAPI default {"detail": "..."} instead of OpenAI format — breaks SDK error handling
P1-8: Stream normalizer returns None for unrecognized chunks (silently dropped, no warning)

9.2 Anthropic-Compatible API

Status: Complete

#	Criterion	Verification	Priority
AC-9.2.1	Non-streaming returns 200 with `content[0].text`, `usage.input_tokens`, `usage.output_tokens` in Anthropic format	POST `/v1/messages`	P0
AC-9.2.2	Streaming returns SSE in Anthropic format (`message_start`, `content_block_delta`, `message_stop`)	POST with `stream: true`	P0
AC-9.2.3	Credits deducted using Anthropic token counts	Compare balance before/after	P0
AC-9.2.4	Anthropic Python SDK works with zero changes beyond `base_url` and `api_key`	`anthropic.Anthropic(base_url="$BASE/v1")`	P0

Layer 10: Infrastructure & Deployment

10.1 Multi-Region Routing

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-10.1.1	Requests routed to nearest provider region for lowest latency	Test from different regions	Deferred (D-15)

10.2 Data Residency

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-10.2.1	EU customers' requests routed to EU-based providers	Test with EU IP	Deferred (D-16)

10.3 Multi-Target Deployment

Status: Complete

#	Criterion	Verification	Priority
AC-10.3.1	Vercel serverless deployment works via `api/index.py`	Deploy to Vercel	P0
AC-10.3.2	Railway/Docker deployment works via `start.sh`	Deploy to Railway	P0
AC-10.3.3	Dev server starts with `python src/main.py` or `uvicorn src.main:app --reload`	Start locally	P0

Cross-Cutting Features

CC-1: Stripe Payments

Status: Complete

#	Criterion	Verification	Priority
AC-CC.1.1	`GET /api/stripe/credit-packages` returns available packages (public, no auth)	Inspect response	P0
AC-CC.1.2	`POST /api/stripe/checkout-session` returns valid Stripe checkout URL	Create session	P0
AC-CC.1.3	Successful payment webhook adds credits to user's balance	Simulate `payment_intent.succeeded` webhook	P0
AC-CC.1.4	Webhook endpoint ALWAYS returns 200, even if processing fails	Send malformed webhook	P0
AC-CC.1.5	Payment history is paginated with amount, date, status	`GET /api/stripe/payments`	P0
AC-CC.1.6	Subscription checkout creates Stripe subscription and assigns plan	`POST /api/stripe/subscription-checkout`	P0
AC-CC.1.7	Subscription upgrade/downgrade/cancel work	Test each operation	P1
AC-CC.1.8	Webhook handles all events: `payment_intent.succeeded`, `charge.succeeded`, `invoice.paid`, `customer.subscription.created`	Test each event type	P0

CC-2: Coupons

Status: Complete

#	Criterion	Verification	Priority
AC-CC.2.1	Valid coupon redeems and adds correct credit amount	`POST /coupons/redeem`	P0
AC-CC.2.2	Expired coupon returns 400	Redeem expired code	P0
AC-CC.2.3	Already-redeemed coupon (same user) returns 400	Redeem twice	P0
AC-CC.2.4	User-specific coupon redeemed by wrong user returns 400/403	Redeem with different user	P0
AC-CC.2.5	`GET /coupons/available` returns global + user-targeted coupons	Inspect response	P1
AC-CC.2.6	Redemption history shows past redemptions	`GET /coupons/history`	P1

CC-3: Referrals

Status: Complete

#	Criterion	Verification	Priority
AC-CC.3.1	User generates unique referral code	`POST /referral/generate`	P0
AC-CC.3.2	Referral code validates successfully	`POST /referral/validate`	P0
AC-CC.3.3	Self-referral is prevented	Attempt self-referral	P0
AC-CC.3.4	Referral stats show total referred, conversions, rewards	`GET /referral/stats`	P1
AC-CC.3.5	Successful referral grants $10 credits to both parties on first $10+ purchase	Complete referral flow	P0

CC-4: Chat History & Sessions

Status: Complete

#	Criterion	Verification	Priority
AC-CC.4.1	Sessions can be created, listed, updated, deleted	CRUD on `/v1/chat/sessions/*`	P0
AC-CC.4.2	Messages can be saved individually and in batch	POST single and batch	P0
AC-CC.4.3	Full-text search returns matching sessions	`POST /v1/chat/search`	P0
AC-CC.4.4	Duplicate messages are deduplicated	Save same message twice, verify single entry	P1
AC-CC.4.5	Chat stats return accurate usage data	`GET /v1/chat/stats`	P1
AC-CC.4.6	Share links provide public read-only access	Create share, access without auth	P1
AC-CC.4.7	Feedback CRUD (create, read, update, delete) works per session	CRUD on `/v1/chat/feedback/*`	P1

CC-5: API Key Management

Status: Complete

#	Criterion	Verification	Priority
AC-CC.5.1	Created key is in `gw_{env}_*` format	`POST /user/api-keys`	P0
AC-CC.5.2	Key creation rate-limited to 10 per hour; 11th returns 429	Create 11 keys	P0
AC-CC.5.3	Keys can be listed showing all active keys	`GET /user/api-keys`	P0
AC-CC.5.4	Keys can be updated (name, restrictions)	`PUT /user/api-keys/{key_id}`	P0
AC-CC.5.5	Keys can be deleted	`DELETE /user/api-keys/{key_id}`	P0
AC-CC.5.6	Deleted key no longer authenticates (returns 401)	Use deleted key	P0
AC-CC.5.7	Audit logs record key creation, usage, deletion	`GET /user/api-keys/audit-logs`	P1

CC-6: Image Generation

Status: Complete

#	Criterion	Verification	Priority
AC-CC.6.1	`POST /v1/images/generations` returns 200 with image data or URL	POST with prompt	P0
AC-CC.6.2	Credits deducted based on image generation pricing	Compare balance before/after	P0
AC-CC.6.3	0-credit user receives 402	POST with 0-credit user	P0

CC-7: Audio Transcription

Status: Complete

#	Criterion	Verification	Priority
AC-CC.7.1	File upload transcription returns 200 with text	POST with audio file	P0
AC-CC.7.2	Base64 transcription returns 200	`POST /v1/audio/transcriptions/base64`	P0
AC-CC.7.3	Unsupported format returns appropriate error	POST with invalid format	P1

CC-8: Server-Side Tools

Status: Complete

#	Criterion	Verification	Priority
AC-CC.8.1	`GET /v1/tools` returns available tools (web_search, text_to_speech)	Inspect response	P0
AC-CC.8.2	Tool definitions in OpenAI function-calling format	`GET /v1/tools/definitions`	P0
AC-CC.8.3	Nonexistent tool returns 404	`GET /v1/tools/fake_tool`	P0
AC-CC.8.4	Web search execution returns results	`POST /v1/tools/execute` with web_search	P0
AC-CC.8.5	SSRF protection blocks internal/private IP ranges	Attempt internal URL in tool execution	P0

CC-9: Partner Trials

Status: Complete

#	Criterion	Verification	Priority
AC-CC.9.1	Partner config is publicly accessible	`GET /partner-trials/config/{code}`	P0
AC-CC.9.2	Partner code check always returns 200 (valid/invalid in body)	`GET /partner-trials/check/{code}`	P0
AC-CC.9.3	Starting partner trial applies partner-specific credits and limits	`POST /partner-trials/start` with known partner code	P0
AC-CC.9.4	Partner trial daily limit is enforced	Exceed daily limit	P0
AC-CC.9.5	Partner trial config is cached (5-min in-memory)	Check timing	P1

CC-10: Notifications

Status: Complete (partial test coverage)

#	Criterion	Verification	Priority
AC-CC.10.1	User can retrieve notification preferences	`GET /user/notifications/preferences`	P0
AC-CC.10.2	Usage report can be triggered on demand	`POST /user/notifications/send-usage-report`	P0
AC-CC.10.3	Test notification sends successfully	`POST /user/notifications/test`	P0
AC-CC.10.4	Notification failure does not crash the system	Disable Resend, verify graceful handling	P0
AC-CC.10.5	Retry logic on notification delivery failure	Verify 2-3 retries with backoff	P2 (NOT IMPLEMENTED)

Known Issues:

P2-4 (Delta Report): No retry logic, no persistent delivery tracking. On failure: logs error, returns False, continues silently.

CC-11: Admin Operations

Status: Complete

#	Criterion	Verification	Priority
AC-CC.11.1	Non-admin users receive 403 on ALL admin endpoints	Use user key on admin endpoint	P0
AC-CC.11.2	Admin can list, search, view user details	`GET /admin/users`, `/admin/users/{id}`	P0
AC-CC.11.3	Admin credit grants respect per-transaction cap and 24h daily limit	Exceed limits	P0
AC-CC.11.4	Admin can assign plans	`POST /admin/assign-plan`	P0
AC-CC.11.5	System monitor returns user counts, credit totals, API usage	`GET /admin/monitor`	P0
AC-CC.11.6	Cache operations work (status, refresh, clear)	GET/POST cache endpoints	P1
AC-CC.11.7	Model sync can be triggered	`POST /admin/model-sync/trigger`	P1
AC-CC.11.8	`GET /admin/model-sync/providers` requires admin auth	Verify auth enforcement	P0 (KNOWN RISK)
AC-CC.11.9	Bulk user delete by domain respects protected domains (gmail, yahoo, outlook)	Attempt protected domain delete	P0
AC-CC.11.10	Bulk user delete defaults to dry_run=true	Verify default behavior	P0

Known Issues:

P0-6 (Delta Report): GET /admin/model-sync/providers documented as "No auth enforced" — leaks infrastructure details (33 providers).

CC-12: Security

Status: Complete

#	Criterion	Verification	Priority
AC-CC.12.1	API keys are Fernet-encrypted in DB	Query DB directly	P0
AC-CC.12.2	API key lookup uses HMAC hash, not brute-force decryption	Verify code path	P0
AC-CC.12.3	SQL injection attempts are sanitized/rejected	`'; DROP TABLE users; --` in inputs	P0
AC-CC.12.4	XSS payloads are sanitized/rejected	`<script>alert(1)</script>` in inputs	P0
AC-CC.12.5	Command injection blocked	`; rm -rf /` in inputs	P0
AC-CC.12.6	Path traversal blocked	`../../etc/passwd` in inputs	P0
AC-CC.12.7	Error messages never expose stack traces, internal paths, or sensitive data	Trigger errors, inspect responses	P0
AC-CC.12.8	Admin security violations logged in audit trail	Attempt unauthorized admin access	P0
AC-CC.12.9	Temporary/disposable email domains detected during registration	Register with `user@tempmail.com`	P1

CC-13: Google Vertex Function Calling

Status: Partial

#	Criterion	Verification	Priority
AC-CC.13.1	REST path function calling works (OpenAI tools → Vertex functionDeclarations)	POST with tools to Vertex model via REST	P0
AC-CC.13.2	SDK path function calling either works OR is avoided when tools present	POST with tools via SDK path	P1 (KNOWN BUG)
AC-CC.13.3	Tool choice options (auto, required, none) are translated correctly	Test each tool_choice value	P1

Code References:

src/services/google_vertex_client.py (lines 250-402, 662-707) — REST path implemented
Lines 585-587 — SDK path has TODO: "Function calling may not work correctly"

Known Issues:

P1-5 (Delta Report): SDK path has TODO. If SDK path is used when tools are present, function calling silently fails.

Summary Matrix

Layer	Feature	Criteria	Status	Known Issues
1	API Key Auth	10	Complete	—
1	RBAC	6	Complete	—
1	IP Allowlists	6	Complete	—
1	Domain Restrictions	4	Complete	—
1	Three-Layer Rate Limiting	14	Complete	P0-5: Missing headers on L2/L3
1	Input Guardrails (4 features)	4	Not Implemented	Deferred
1	Output Guardrails (3 features)	3	Not Implemented	Deferred
2	Model Resolution	7	Complete	—
2	General Router	7	Complete	—
2	Code Router	7	Complete	—
2	Provider Failover	9	Complete	—
2	Circuit Breakers	10	Complete	P1-7: Timing discrepancy (60s vs 5min)
2	Health-Weighted LB	2	Partial	—
2	Latency/Cost Optimal	2	Partial	Hardcoded latency model
2	Traffic Splitting	1	Not Implemented	Deferred
3	Tiered Health Monitoring	10	Complete	—
3	Passive Health Capture	2	Complete	—
3	Incident Management	5	Complete	—
3	Model Quality Scoring	2	Partial	Static/hardcoded
3	Per-Customer Quality	1	Not Implemented	Deferred
3	Provider Credit Monitoring	3	Partial	P1-1: OpenRouter only
4	Semantic Cache	1	Not Implemented	Deferred
4	Exact-Match Cache	1	Not Implemented	Deferred (infra exists)
4	Butter.dev Cache	2	Partial	P0-1: Ghost feature
4	Supporting Caches	5	Complete	—
5	Background Model Sync	4	Complete	—
5	Model Metadata Standard	2	Complete	—
5	Catalog Inclusion	3	Partial	P1-3: No gating at sync
5	HuggingFace Enrichment	2	Complete	—
5	Model Discovery & Search	5	Complete	—
6	Credit System	14	Complete	P0-2/3/4: Atomicity, pricing guard, refund
6	Plans & Tiers	9	Complete	P0-7: Config mismatch
6	Customer Usage Analytics	5	Partial	P1-4: Pagination bug, P2-1/2: Per-key, export
6	Customer Webhooks	1	Not Implemented	Deferred
6	SLA Tracking	1	Not Implemented	Deferred
7	Prompt Management	1	Not Implemented	Deferred
7	Batch/Async Inference	1	Not Implemented	Deferred
7	Evaluation & Testing	1	Not Implemented	Deferred
7	Playground	1	Not Implemented	Deferred
8	Metrics & Dashboards	7	Complete	—
8	Distributed Tracing	3	Complete	—
8	Error Tracking	6	Complete	—
8	AI-Specific Tracing	2	Partial	Arize/Braintrust gaps
8	Profiling	2	Complete	—
8	Customer Observability	4	Partial	P2-3: No latency API
9	OpenAI-Compatible API	11	Complete	P1-2: Error format, P1-8: Stream drops
9	Anthropic-Compatible API	4	Complete	—
10	Multi-Region Routing	1	Not Implemented	Deferred
10	Data Residency	1	Not Implemented	Deferred
10	Multi-Target Deployment	3	Complete	—
CC	Stripe Payments	8	Complete	—
CC	Coupons	6	Complete	—
CC	Referrals	5	Complete	—
CC	Chat History	7	Complete	—
CC	API Key Management	7	Complete	—
CC	Image Generation	3	Complete	—
CC	Audio Transcription	3	Complete	—
CC	Server-Side Tools	5	Complete	—
CC	Partner Trials	5	Complete	—
CC	Notifications	5	Complete	P2-4: No retry/delivery tracking
CC	Admin Operations	10	Complete	P0-6: Model-sync providers auth
CC	Security	9	Complete	—
CC	Google Vertex FC	3	Partial	P1-5: SDK path TODO
	TOTAL	323

Priority Summary

Priority	Count	Description
P0	7 bugs across 46 criteria	Ghost features, billing atomicity, pricing guard, refund verification, rate limit headers, admin auth, trial config
P1	8 bugs across 28 criteria	Provider monitoring, error format, catalog gating, pagination, Vertex FC, overage, circuit breaker timing, stream normalization
P2	4 gaps across 12 criteria	Per-key usage, export, latency API, notification delivery
Deferred	20 features, 24 criteria	Guardrails, caching, webhooks, batch, prompts, eval, SLA, geo-routing, GDPR, traffic splitting

Source: Conceptual Model Features | Features | Delta Report | Testing Plan | Acceptance Criteria

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References

Features Acceptance Criteria

Features Acceptance Criteria

How to Read This Document

Layer 1: Ingress

1.1 API Key Authentication

1.2 Role-Based Access Control (RBAC)

1.3 Per-Key IP Allowlists

1.4 Domain Restrictions

1.5 Three-Layer Rate Limiting

1.6–1.9 Input Guardrails (PII Detection, Prompt Injection, Topic Restrictions, Content Moderation)

1.10–1.12 Output Guardrails (Content Filtering, Structured Output Validation, Hallucination Flags)

Layer 2: Core Routing Engine

2.1 Model Resolution Pipeline

2.2 Intelligent Routing — General Router

2.3 Intelligent Routing — Code Router

2.4 Provider Failover

2.5 Circuit Breakers

2.6 Health-Weighted Load Balancing

2.7–2.8 Latency-Optimal / Cost-Optimal Selection

2.9 Traffic Splitting

Layer 3: Intelligence

3.1 Tiered Health Monitoring

3.2 Passive Health Capture

3.3 Incident Management

3.4 Model Quality Scoring & Benchmarks

3.5 Per-Customer Quality Tracking

3.6 Provider Credit Monitoring

Layer 4: Caching System

4.1 Semantic Cache

4.2 Exact-Match Response Cache

4.3 External Cache (Butter.dev)

4.4 Supporting Caches

Layer 5: Model Catalog

5.1 Background Model Sync

5.2 Model Metadata Standard

5.3 Catalog Inclusion Requirements

5.4 HuggingFace Enrichment

5.5 Model Discovery & Search

Layer 6: Business

6.1 Credit System

6.2 Plans & Tiers

6.3 Customer Usage Analytics

6.4 Customer Webhooks

6.5 SLA Tracking

Layer 7: Developer Platform

7.1 Prompt Management

7.2 Batch / Async Inference

7.3 Evaluation & Testing

7.4 Playground

Layer 8: Observability

8.1 Internal Metrics & Dashboards

8.2 Distributed Tracing

8.3 Error Tracking

8.4 AI-Specific Tracing

8.5 Profiling

8.6 Customer-Facing Observability

Layer 9: API Compatibility

9.1 OpenAI-Compatible API

9.2 Anthropic-Compatible API

Layer 10: Infrastructure & Deployment

10.1 Multi-Region Routing

10.2 Data Residency

10.3 Multi-Target Deployment

Cross-Cutting Features

CC-1: Stripe Payments

CC-2: Coupons

CC-3: Referrals

CC-4: Chat History & Sessions

CC-5: API Key Management

CC-6: Image Generation

CC-7: Audio Transcription

CC-8: Server-Side Tools

CC-9: Partner Trials

CC-10: Notifications

CC-11: Admin Operations

CC-12: Security

CC-13: Google Vertex Function Calling

Summary Matrix

Priority Summary

Uh oh!