Features Acceptance Criteria

Acceptance Criteria

Reading path: Conceptual Model | Stability Definition | Conceptual Model Features | Features | Delta Report | Acceptance Criteria (you are here)

Read after: Delta Report (so you know the gaps and priorities) You're at the end of the reading path! From here, go to Testing Guide to see how these criteria are tested.

TL;DR — This is the single source of truth for acceptance criteria across all 56 features in the Conceptual Model. Each feature has: what it must do, what it must NOT do, testable acceptance criteria, implementation status, code references, known issues, and priority. Organized by the 10-layer architecture. Use this to verify any feature is "done." For detailed Given/When/Then format criteria and boundary validations, see Conceptual Model Acceptance Criteria.

Consolidation note: This document is the primary acceptance criteria reference. It incorporates the implementation-aware criteria. For the spec-pure criteria (Given/When/Then format with integration requirements), see Conceptual Model Acceptance Criteria. For compact test-plan-linked criteria, see the Testing Plan directly.

How to Read This Document

Each feature section includes:

Description: What the feature does and its boundaries (from Conceptual Model)
Implementation Status: Current state (Complete / Partial / Not Implemented)
Acceptance Criteria: Numbered, testable statements — a feature is accepted when ALL criteria pass
Code References: File paths and line numbers for verification
Known Issues: Bugs, gaps, or discrepancies found during code investigation
Priority: P0 (must fix before release), P1 (should fix), P2 (nice to have), Deferred (post-release)

Layer 1: Ingress

1.1 API Key Authentication

Status: Complete

What it does: Authenticates every API request using API keys encrypted at rest with Fernet AES-128. Keys are looked up via HMAC-SHA256 hash for O(log n) retrieval. Validates that keys are active, not expired, and not rate-limited.

What it does NOT do: No OAuth/JWT for API requests. No automatic key rotation. No multi-key auth per request.

#	Criterion	Verification	Priority
AC-1.1.1	Valid API key in `Authorization: Bearer gw_*` header returns 200	Send request with valid key	P0
AC-1.1.2	Invalid API key returns 401, never 200 or 500	Send request with `Bearer invalid_key`	P0
AC-1.1.3	Expired API key returns 401 with clear message	Use a key past its `expires_at`	P0
AC-1.1.4	Deactivated API key (`is_active=false`) returns 401	Deactivate key, then use it	P0
AC-1.1.5	API keys in DB are Fernet-encrypted ciphertext, never plaintext	Query `api_keys_new` table directly, verify encrypted_key column is ciphertext	P0
AC-1.1.6	Key lookup uses HMAC-SHA256 hash index, not brute-force decryption of all keys	Verify `key_hash` column is indexed, lookup is O(log n) by timing with 1 key vs 1000 keys	P0
AC-1.1.7	Key format is `gw_{env}_{43_random_chars}` (e.g., `gw_live_abc123...`)	Create new key, verify format regex	P0
AC-1.1.8	Key creation stores `last4` characters for user-friendly identification	Create key, check `last4` field in response and DB	P1
AC-1.1.9	Authentication is cached (5-min TTL, 512-entry LRU) — second request with same key is faster	Time two consecutive auth calls, second should be <5ms vs 50-150ms	P1
AC-1.1.10	When Redis is down, auth cache falls back to local memory — requests are never blocked	Stop Redis, verify auth still works	P0

Code References:

src/security/security.py — Fernet encryption, HMAC hashing
src/security/deps.py — get_api_key(), get_current_user(), validate_api_key_security()
src/db/api_keys.py — Key CRUD, key lookup by hash

1.2 Role-Based Access Control (RBAC)

Status: Complete

What it does: Assigns roles (admin, team, dev, free) to users. Permissions checked at dependency-injection level before route handlers execute. Role changes are audit-logged.

What it does NOT do: No granular per-model permissions. No custom roles. No team-level RBAC. No provider-level permissions.

#	Criterion	Verification	Priority
AC-1.2.1	Non-admin API key returns 403 on ALL `/admin/*` endpoints	`GET /admin/users` with user key	P0
AC-1.2.2	Admin API key returns 200 on admin endpoints	`GET /admin/users` with admin key	P0
AC-1.2.3	Unauthorized admin access attempts are logged via `audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS")`	Attempt admin access with user key, check audit log	P0
AC-1.2.4	Role updates require admin auth and are logged with a reason	`POST /admin/roles/update` with user_id, new_role, reason	P0
AC-1.2.5	`GET /admin/roles/permissions/{role}` returns the correct permission set for each role	Check all 4 roles	P1
AC-1.2.6	Role change audit log is retrievable at `GET /admin/roles/audit/log`	Verify entries with timestamps and reasons	P1

Code References:

src/security/deps.py — require_admin dependency
src/routes/admin.py — Admin route handlers
src/db/roles.py — Role management

1.3 Per-Key IP Allowlists

Status: Complete

What it does: Restricts API key usage to specific IP addresses or CIDR ranges. Requests from non-allowlisted IPs are rejected before processing.

What it does NOT do: No geo-based restrictions. No IPv6 ranges. No automatic IP suggestions.

#	Criterion	Verification	Priority
AC-1.3.1	Admin can create IP allowlist entries with IPv4 addresses	`POST /api/admin/ip-whitelist` with `{"ip": "1.2.3.4"}`	P0
AC-1.3.2	Admin can create IP allowlist entries with CIDR notation	`POST /api/admin/ip-whitelist` with `{"ip": "10.0.0.0/24"}`	P0
AC-1.3.3	API key with allowlist rejects requests from non-allowed IPs with 403	Use key from IP not in allowlist	P0
AC-1.3.4	API key with allowlist accepts requests from allowed IPs	Use key from allowlisted IP	P0
AC-1.3.5	`POST /api/admin/ip-whitelist/check` correctly reports allowed vs blocked IPs	Test with both allowed and blocked IPs	P1
AC-1.3.6	Allowlist entries can be listed, updated, and deleted	CRUD operations on `/api/admin/ip-whitelist/*`	P1

Code References:

src/routes/admin.py — IP allowlist endpoints
src/security/deps.py — IP validation in validate_api_key_security()

1.4 Domain Restrictions

Status: Complete

What it does: Limits which HTTP referrer domains can use a specific API key. Prevents stolen keys from being used on unauthorized domains.

What it does NOT do: No domain ownership validation. No subdomain wildcards. No server-side restriction (only applies when Referer header present).

#	Criterion	Verification	Priority
AC-1.4.1	API key with domain restriction rejects requests with wrong Referer header	Send request with `Referer: https://unauthorized.com`	P0
AC-1.4.2	API key with domain restriction accepts requests with correct Referer	Send request with configured Referer domain	P0
AC-1.4.3	Requests without Referer header bypass domain restriction (server-side usage)	Send request without Referer header	P0
AC-1.4.4	Multiple domains can be configured per key	Configure 3 domains, verify all work	P1

Code References:

src/security/deps.py — validate_api_key_security() domain check

1.5 Three-Layer Rate Limiting

Status: Complete (with known header gap on Layers 2 and 3)

What it does:

Layer 1 (IP): Security middleware with behavioral analysis, velocity detection. 300 RPM for unauthenticated, authenticated users exempt.
Layer 2 (API Key): Redis-backed per-key limits tied to plan tier.
Layer 3 (Anonymous): Stricter limits for unauthenticated requests.
Fallback: In-memory rate limiter when Redis is unavailable.

What it does NOT do: No per-model rate limits. No burst/token-bucket. No cross-instance IP state sharing. Rejected requests consume zero credits.

#	Criterion	Verification	Priority
AC-1.5.1	Unauthenticated requests exceeding 300 RPM from same IP receive 429	Send 301 requests from one IP	P0
AC-1.5.2	Authenticated users are exempt from IP-level rate limiting	Verify no IP block on auth requests	P0
AC-1.5.3	API key exceeding plan RPM receives 429	Exceed per-key limit	P0
AC-1.5.4	Anonymous rate limits are stricter than authenticated limits	Compare thresholds for anon vs auth	P0
AC-1.5.5	Layer 1 429 response includes `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`, `X-RateLimit-Reason`, `X-RateLimit-Mode`	Trigger Layer 1 429, inspect headers	P0
AC-1.5.6	Layer 2 429 response includes `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`	Trigger Layer 2 429, inspect headers	P0 (KNOWN BUG)
AC-1.5.7	Layer 3 429 response includes `Retry-After` and `X-RateLimit-*` headers	Trigger Layer 3 429, inspect headers	P0 (KNOWN BUG)
AC-1.5.8	When Redis is down, rate limiting continues via in-memory fallback — requests are never blocked	Stop Redis, verify rate limiting works	P0
AC-1.5.9	Velocity mode activates when error rate exceeds 25% and reduces limits to 50%	Trigger >25% error rate, check `GET /velocity-mode-status`	P0
AC-1.5.10	Velocity mode deactivates after 3 minutes of normal error rates	Wait for cooldown, verify normal limits restored	P1
AC-1.5.11	Rate limit configuration viewable at `GET /user/rate-limits`	Check response format	P1
AC-1.5.12	Per-key rate limits updatable via `PUT /user/rate-limits/{key_id}`	Update and verify enforcement	P1
AC-1.5.13	Auth endpoint rate-limits to 10 requests per 15 minutes per IP	`POST /auth` 11 times, 11th returns 429	P0
AC-1.5.14	Registration rate-limits to 3 requests per hour per IP	`POST /auth/register` 4 times, 4th returns 429	P0

Code References:

src/middleware/security_middleware.py (lines 647-716) — Layer 1, headers present
src/services/rate_limiting.py (lines 78-94) — Layer 2, RateLimitResult dataclass has fields but NOT converted to HTTP headers
src/services/anonymous_rate_limiter.py — Layer 3, NO headers
src/services/rate_limiting_fallback.py — In-memory fallback

Known Issues:

P0-5 (Delta Report): Layer 2 RateLimitResult fields exist but are not converted to HTTP response headers. Layer 3 has no rate limit headers at all. Clients get bare 429 rejections with no retry information.

1.6–1.9 Input Guardrails (PII Detection, Prompt Injection, Topic Restrictions, Content Moderation)

Status: Not Implemented (Deferred)

What these would do: PII scanning (phone, SSN, email, credit card), prompt injection pattern detection, per-key topic restrictions, content moderation via external classifiers.

#	Criterion	Verification	Priority
AC-1.6.1	PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers	Send prompt with PII	Deferred
AC-1.7.1	Prompt injection patterns that attempt to override system prompts are detected and blocked	Send known injection pattern	Deferred
AC-1.8.1	Per-API-key topic restrictions limit responses to configured domains	Configure restriction, test out-of-domain	Deferred
AC-1.9.1	Content moderation blocks harmful inputs before reaching providers	Send harmful content	Deferred

Note: These are Conceptual Model features (D-1 through D-4 in Delta Report). Not required for stable release. No code exists.

1.10–1.12 Output Guardrails (Content Filtering, Structured Output Validation, Hallucination Flags)

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-1.10.1	Output content filtering scans responses for policy violations before returning	Trigger policy-violating response	Deferred
AC-1.11.1	Structured output validation confirms JSON schema conformance when requested	Request JSON schema output	Deferred (D-5, Small effort)
AC-1.12.1	Provider-side safety metadata (refusals, safety triggers) is surfaced in standardized format	Trigger safety filter, inspect response	Deferred

Layer 2: Core Routing Engine

2.1 Model Resolution Pipeline

Status: Complete

What it does: Three-stage pipeline: alias normalization (120+ aliases) → provider detection (overrides → format rules → mapping tables → org-prefix fallbacks) → model ID transformation (provider-native format).

What it does NOT do: No user-defined aliases. No version/snapshot resolution. No per-modality routing differences.

#	Criterion	Verification	Priority
AC-2.1.1	`gpt-4o` resolves to `openai/gpt-4o`	`POST /v1/chat/completions` with `model: "gpt-4o"`	P0
AC-2.1.2	`r1` resolves to `deepseek/deepseek-r1`	`POST /v1/chat/completions` with `model: "r1"`	P0
AC-2.1.3	Canonical IDs (e.g., `openai/gpt-4o`) work directly without alias resolution	POST with canonical ID	P0
AC-2.1.4	Provider detection correctly routes `google/gemini-*` models to Vertex when credentials available	POST with Gemini model	P0
AC-2.1.5	No alias maps to itself (no self-referencing loops)	Inspect `MODEL_ALIASES` dict for cycles	P0
AC-2.1.6	Fireworks model IDs are transformed to `accounts/fireworks/models/...` format	POST with Fireworks model, verify upstream call format	P1
AC-2.1.7	Nonexistent model returns 400 or 404, not 500	POST with `model: "nonexistent/model"`	P0

Code References:

src/services/models.py — MODEL_ALIASES dict, resolution pipeline
src/services/model_transformations.py — Provider-specific ID transformations
src/services/model_availability.py — Availability checking

2.2 Intelligent Routing — General Router

Status: Complete

What it does: ML-powered model selection via NotDiamond. Four modes: quality (openai/gpt-4o), cost (openai/gpt-4o-mini), latency (groq/llama-3.3-70b-versatile), balanced (anthropic/claude-sonnet-4). Falls back to mode-specific defaults when NotDiamond unavailable.

What it does NOT do: No user feedback learning. No custom model pools. No routing constraints.

#	Criterion	Verification	Priority
AC-2.2.1	`router:general:quality` selects a high-quality model and returns 200	POST chat with `model: "router:general:quality"`	P0
AC-2.2.2	`router:general:cost` selects a cheaper model than quality mode	Compare selected models for same prompt	P0
AC-2.2.3	`router:general:latency` selects a low-latency model	POST and verify selection	P0
AC-2.2.4	`router:general:balanced` considers quality, cost, and latency	POST and verify selection	P0
AC-2.2.5	When NotDiamond is unavailable, fallback models are used without error	Disable NotDiamond, verify graceful fallback	P0
AC-2.2.6	`GET /general-router/settings/options` returns available strategies and model pools	Inspect response	P1
AC-2.2.7	`POST /general-router/test` returns selected model + reasoning	POST with sample prompt	P1

Code References:

src/services/general_router.py — Routing logic, NotDiamond integration
src/routes/general_router.py — Endpoints

2.3 Intelligent Routing — Code Router

Status: Complete

What it does: Benchmark-driven model selection for coding tasks. 4 tiers by SWE-bench/HumanEval scores. Modes: auto (complexity-based), price, quality, agentic. Static data from code_quality_priors.json.

What it does NOT do: No code execution. No feedback learning. No custom tiers. No language detection.

#	Criterion	Verification	Priority
AC-2.3.1	`router:code:auto` classifies prompt complexity and selects appropriate tier	POST with code prompt	P0
AC-2.3.2	`router:code:quality` selects highest-tier code model	POST and verify	P0
AC-2.3.3	`router:code:price` selects cost-effective code model	POST and verify	P0
AC-2.3.4	`router:code:agentic` selects model optimized for multi-step tool use	POST and verify	P0
AC-2.3.5	`GET /code-router/tiers` returns models with SWE-bench/HumanEval scores	Inspect response	P0
AC-2.3.6	Code router works entirely from in-memory data (no DB/Redis dependency)	Verify response with Redis down	P0
AC-2.3.7	`POST /code-router/test` returns selected model and routing rationale	POST with sample prompt	P1

Code References:

src/services/code_router.py — Routing logic, tier selection
src/services/code_quality_priors.json — Static benchmark data
src/routes/code_router.py — Endpoints

2.4 Provider Failover

Status: Complete

What it does: 14-provider prioritized failover chain. Failover triggers on 401/402/403/404/502/503/504. Does NOT trigger on 400 (user error) or 429 (retries with backoff). Model-aware rules: OpenAI → OpenRouter only, Anthropic → OpenRouter only, open-source → all providers.

What it does NOT do: No mid-stream failover. No user-configured chains. No same-pricing guarantee across providers.

#	Criterion	Verification	Priority
AC-2.4.1	Primary provider 502/503/504 → request succeeds via fallback transparently	Force primary failure, verify success	P0
AC-2.4.2	Provider 401/402/403/404 → failover to next provider	Force auth error, verify failover	P0
AC-2.4.3	Provider 400 (user error) → returns 400 to user, NO failover	Send malformed request	P0
AC-2.4.4	Provider 429 → retries with backoff, does NOT failover	Trigger rate limit, verify retry behavior	P0
AC-2.4.5	OpenAI models only failover to OpenAI → OpenRouter	Inspect failover chain for `openai/gpt-4o`	P0
AC-2.4.6	Anthropic models only failover to Anthropic → OpenRouter	Inspect failover chain for `anthropic/claude-sonnet-4`	P0
AC-2.4.7	Open-source models can failover across all providers	Inspect chain for `meta-llama/llama-3-70b`	P0
AC-2.4.8	Failover chain skips providers with OPEN circuit breakers	Open a breaker, verify provider is skipped	P0
AC-2.4.9	User receives no indication of failover (transparent to client)	Monitor response during failover	P0

Code References:

src/services/provider_failover.py — Failover chain construction, error classification
src/routes/chat.py — build_provider_failover_chain() integration

2.5 Circuit Breakers

Status: Complete (with timing discrepancy)

What it does: Per-provider circuit breakers. CLOSED → OPEN (5 consecutive failures) → HALF_OPEN (after timeout) → CLOSED (3 consecutive successes) or back to OPEN. Redis + in-memory state.

What it does NOT do: No per-provider threshold configuration. No error-type differentiation. No operator alerts. No persistent state beyond Redis.

#	Criterion	Verification	Priority
AC-2.5.1	New provider starts in CLOSED state	`GET /circuit-breakers/{new_provider}`	P0
AC-2.5.2	After 5 consecutive failures, state transitions to OPEN	Send 5 failing requests, check state	P0
AC-2.5.3	OPEN state prevents requests to that provider	Verify provider is skipped in failover	P0
AC-2.5.4	After timeout period, OPEN transitions to HALF_OPEN	Wait for timeout, check state	P0
AC-2.5.5	In HALF_OPEN, a successful request transitions to CLOSED	Send success, check state	P0
AC-2.5.6	In HALF_OPEN, a failed request transitions back to OPEN	Send failure, check state	P0
AC-2.5.7	`POST /circuit-breakers/{provider}/reset` resets to CLOSED	Reset and verify	P0
AC-2.5.8	`POST /circuit-breakers/reset-all` resets all breakers	Reset all and verify	P0
AC-2.5.9	Circuit breaker endpoints require NO auth (public)	Verify no auth needed	P1
AC-2.5.10	Prometheus metrics emitted on state transitions	Check `circuit_breaker_state_transitions_total`	P1

Code References:

src/services/circuit_breaker.py (line 67) — Default timeout 60 seconds
Redis keys: circuit_breaker:{provider}:{state|failure_count|success_count|opened_at} (3600s TTL)

Known Issues:

P1-7 (Delta Report): Code uses 60-second timeout, but Conceptual Model says 5 minutes and wiki Testing Plan says 5 minutes. Either code or docs must be updated.

2.6 Health-Weighted Load Balancing

Status: Partial

What it does: Checks primary provider health score before routing. Below-threshold providers are demoted in failover chain.

What it does NOT do: No proportional traffic splitting by health score. No per-model health. No predictive health.

#	Criterion	Verification	Priority
AC-2.6.1	When primary provider health is below threshold, a healthier provider is promoted	Degrade a provider, verify chain reordering	P1
AC-2.6.2	Health-based promotion is a binary decision (promote or don't)	Verify no weighted splitting	P1

2.7–2.8 Latency-Optimal / Cost-Optimal Selection

Status: Partial

What it does: Route to lowest-latency or cheapest provider for same model. General Router "latency" mode hardcodes to groq/llama-3.3-70b-versatile.

#	Criterion	Verification	Priority
AC-2.7.1	Latency mode selects a low-latency provider	Verify model selection via router	P1
AC-2.8.1	Cost mode selects cheapest capable provider	Compare pricing of selected vs alternatives	P1

Known Issues: No dynamic latency-optimal selection — latency mode hardcodes a specific model rather than measuring real-time latency. Deferred for post-release.

2.9 Traffic Splitting

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-2.9.1	Traffic for same model is distributed across providers at configured ratios	Monitor provider selection distribution	Deferred (D-17)

Layer 3: Intelligence

3.1 Tiered Health Monitoring

Status: Complete

What it does: Continuous monitoring at intervals by tier: Critical (5min), Popular (30min), Standard (2-4hr), On-Demand (when requested). Health checks verify availability and latency.

#	Criterion	Verification	Priority
AC-3.1.1	`GET /health` always returns 200, even when dependencies are degraded	Call when DB is down	P0
AC-3.1.2	Health response includes `version`, `status`, and `timestamp`	Inspect response	P0
AC-3.1.3	`GET /health/system` returns memory, CPU, and connection pool stats	Inspect response	P0
AC-3.1.4	Provider health scores are 0-100 per provider	`GET /health/providers`	P0
AC-3.1.5	Model health shows `healthy`, `degraded`, or `down` per model	`GET /health/models`	P0
AC-3.1.6	`GET /health/quick` is sub-millisecond (static response)	Time the endpoint	P1
AC-3.1.7	`GET /health/railway` returns comprehensive check (DB, Redis, providers)	Inspect response	P1
AC-3.1.8	Gateway health dashboard returns HTML and JSON formats	`GET /health/gateways/dashboard` and `/data`	P1
AC-3.1.9	Health insights provide actionable recommendations	`GET /health/insights`	P2
AC-3.1.10	Background monitoring can be started and stopped	`POST /health/monitoring/start`, `/stop`	P1

Code References:

src/services/intelligent_health_monitor.py — Tiered monitoring
src/services/autonomous_monitor.py — Background monitoring
src/routes/health.py — Health endpoints

3.2 Passive Health Capture

Status: Complete

What it does: Every real inference request contributes health data as a background task — success/failure, latency, tokens, provider response codes. Zero overhead on request path.

#	Criterion	Verification	Priority
AC-3.2.1	Health data is captured after response is returned (no latency impact on user)	Verify background task execution	P0
AC-3.2.2	Captured data includes: latency, tokens, status, provider	Inspect health data store	P1

3.3 Incident Management

Status: Complete

What it does: Auto-creates incidents on health degradation. Severity levels, timestamps, captured logs, resolution tracking, MTTR calculation.

#	Criterion	Verification	Priority
AC-3.3.1	Downtime incidents can be listed with filters	`GET /admin/downtime/incidents?status=ongoing`	P0
AC-3.3.2	Incidents can be resolved with notes	`POST /admin/downtime/incidents/{id}/resolve`	P0
AC-3.3.3	Already-resolved incidents reject re-resolution	Attempt to resolve again	P1
AC-3.3.4	Incident analysis shows error patterns and type distribution	`GET /admin/downtime/incidents/{id}/analysis`	P1
AC-3.3.5	MTTR statistics are computed	`GET /admin/downtime/statistics`	P1

Code References:

src/routes/admin.py — Downtime tracking endpoints

3.4 Model Quality Scoring & Benchmarks

Status: Partial

What it does: Hardcoded quality priors for ~20 models (task-specific: simple_qa, code_gen, reasoning, etc.). SWE-bench/HumanEval in Code Router.

What it does NOT do: Not stored in DB. Not updatable without code change. Missing MMLU, MATH, MT-Bench, LMSYS Arena ELO, LiveBench.

#	Criterion	Verification	Priority
AC-3.4.1	Code router tiers include SWE-bench and HumanEval scores	`GET /code-router/tiers`	P0
AC-3.4.2	Model selector uses quality priors for task-specific routing	Verify `model_selector.py` quality maps	P1

Known Issues: Quality data is static/hardcoded, not from DB. Missing several major benchmarks. No dynamic updating.

3.5 Per-Customer Quality Tracking

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-3.5.1	Per-customer success rates are tracked per model	Check customer-model analytics	Deferred (D-19)

3.6 Provider Credit Monitoring

Status: Partial (OpenRouter only)

What it does: Tracks upstream provider credit balances. OpenRouter: full implementation with API call, 15-min cache, threshold alerts (critical $5, warning $20, info $50).

#	Criterion	Verification	Priority
AC-3.6.1	`GET /api/provider-credits/balance` returns credit balances for monitored providers	Inspect response	P0
AC-3.6.2	OpenRouter balance is cached for 15 minutes	Check timing of two consecutive calls	P1
AC-3.6.3	Threshold alerts fire at critical ($5), warning ($20), info ($50)	Verify alert logic	P1

Code References:

src/services/provider_credit_monitor.py (lines 33-138) — OpenRouter implementation
Lines 165-167 — TODO stubs for all other providers

Known Issues:

P1-1 (Delta Report): Only OpenRouter implemented. 29 other providers have TODO stubs. No preemptive deprioritization in failover chain.

Layer 4: Caching System

4.1 Semantic Cache

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-4.1.1	Semantically similar prompts return cached responses (cosine similarity >0.95)	Test with paraphrased prompt	Deferred (D-8)

4.2 Exact-Match Response Cache

Status: Not Implemented (Deferred — infrastructure exists but not wired)

#	Criterion	Verification	Priority
AC-4.2.1	Identical inference requests (same messages + model + params) return cached response	Send same request twice, compare latency	Deferred (D-9)

Code References:

src/services/response_cache.py — SHA-256 hashing, Redis + in-memory fallback exists but NOT wired into inference path

4.3 External Cache (Butter.dev)

Status: Partial (Ghost Feature — P0 issue)

What it does: Butter.dev proxy used for all requests. User preference endpoints exist but are ignored.

#	Criterion	Verification	Priority
AC-4.3.1	If Butter cache settings endpoints exist, user preference MUST be respected during inference	Set `enable_butter_cache=false`, verify Butter proxy is NOT used	P0 (KNOWN BUG)
AC-4.3.2	OR: Butter cache settings endpoints are removed entirely	Verify endpoints don't exist	P0 Alternative

Code References:

src/routes/users.py (lines 305-408) — GET/PUT /user/cache-settings exist, store preference
src/routes/chat.py (line 697) — Always calls get_butter_pooled_async_client() without checking preference

Known Issues:

P0-1 (Delta Report): Ghost feature. User can toggle a setting that does nothing. Trust-eroding.

4.4 Supporting Caches

Status: Complete

#	Criterion	Verification	Priority
AC-4.4.1	Catalog endpoint responds in sub-100ms on cache hit	Time `GET /v1/models` on second request	P0
AC-4.4.2	Auth cache reduces lookup latency from ~100ms to <5ms	Compare first vs second auth timing	P1
AC-4.4.3	When Redis is down, local memory cache activates — no requests blocked	Stop Redis, verify normal operation	P0
AC-4.4.4	Cache invalidation clears all layers	`POST /admin/cache/clear`, verify fresh data	P1
AC-4.4.5	Stampede protection prevents multiple simultaneous cache rebuilds	Concurrent requests to cold cache	P1

Layer 5: Model Catalog

5.1 Background Model Sync

Status: Complete

#	Criterion	Verification	Priority
AC-5.1.1	Model sync can be triggered incrementally and fully	`POST /admin/model-sync/trigger` and `/all`	P0
AC-5.1.2	If provider API is down, last synced catalog is served	Verify stale catalog on provider failure	P0
AC-5.1.3	Per-provider sync works	`POST /admin/model-sync/provider/{slug}`	P1
AC-5.1.4	Full resync (delete + reimport) works	`POST /admin/model-sync/full`	P1

5.2 Model Metadata Standard

Status: Complete

#	Criterion	Verification	Priority
AC-5.2.1	Every model in `GET /v1/models` has `id`, `name`, `provider_slug`, `context_length`, and pricing	Inspect response schema	P0
AC-5.2.2	No model has null or zero pricing for both prompt and completion	Scan all models in response	P0 (see 5.3)

5.3 Catalog Inclusion Requirements

Status: Partial (gating not enforced at sync)

#	Criterion	Verification	Priority
AC-5.3.1	Models without pricing are rejected during sync (not visible to users)	Check catalog for models with null pricing	P1 (KNOWN BUG)
AC-5.3.2	`GET /v1/models/unique` returns no duplicate model IDs	Check for uniqueness	P0
AC-5.3.3	High-value models without explicit pricing are BLOCKED, not served at default rate	Verify pricing guard for GPT-4, Claude, Gemini	P0

Code References:

src/services/model_catalog_sync.py — extract_pricing() (lines 136-153) returns all None for missing pricing. Line 368 checks if any(pricing.values()) but is non-blocking — models ARE synced without pricing.
src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS guard raises ValueError on default pricing fallback

Known Issues:

P1-3 (Delta Report): Models without pricing are synced into the catalog. Non-high-value models without pricing fall to default ($0.00002/token) — potential under-billing.

5.4 HuggingFace Enrichment

Status: Complete

#	Criterion	Verification	Priority
AC-5.4.1	Model detail returns HuggingFace data (downloads, likes, parameters) when available	`GET /api/models/detail?model_id=meta-llama/...`	P1
AC-5.4.2	HuggingFace data is cached with TTL	Verify caching on repeated requests	P1

5.5 Model Discovery & Search

Status: Complete

#	Criterion	Verification	Priority
AC-5.5.1	`GET /v1/models?provider=fireworks` returns only Fireworks models	Filter and verify	P0
AC-5.5.2	`GET /v1/models/search?q=llama` returns matching models	Verify results	P0
AC-5.5.3	`GET /v1/models/trending` returns models ranked by usage	Inspect response	P1
AC-5.5.4	`GET /v1/gateways` returns all gateways with name, color, priority, site_url	Inspect response	P0
AC-5.5.5	Model comparison works across providers	`GET /v1/models/{provider}/{model}/compare`	P1

Layer 6: Business

6.1 Credit System

Status: Complete (with atomicity concern on legacy path)

What it does: Atomic billing unit. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Pre-flight checks, idempotent deductions (UNIQUE constraint + RPC), subscription allowance consumed first, auto-refund on provider errors.

What it does NOT do: No real-time credit streaming during generation. No credit expiration. No rollover. No credit transfers. No multi-currency.

#	Criterion	Verification	Priority
AC-6.1.1	Pre-flight check: user with 0 credits receives 402 BEFORE any provider call	POST with 0-credit user, verify no upstream call	P0
AC-6.1.2	Idempotent deduction: same request ID sent twice deducts credits only once	POST twice with same `X-Request-ID`	P0
AC-6.1.3	Subscription allowance consumed before purchased credits	User with both: make request, verify subscription decreases first	P0
AC-6.1.4	Provider 5xx error → automatic credit refund	Trigger 5xx, verify refund in `credit_transactions`	P0
AC-6.1.5	Provider timeout → automatic credit refund	Trigger timeout, verify refund	P0
AC-6.1.6	Provider 4xx error (user error) → NO refund	Trigger 4xx, verify no refund	P0
AC-6.1.7	High-value models (GPT-4, Claude, Gemini, o1/o3/o4) blocked if pricing falls to default	Verify pricing guard fires for each pattern	P0
AC-6.1.8	Credit transactions logged with request_id, user_id, model, token counts, cost	Check `credit_transactions` table	P0
AC-6.1.9	Balance update and transaction log happen atomically (single DB transaction via RPC)	Verify `atomic_deduct_credits` RPC is used	P0
AC-6.1.10	Legacy fallback path either doesn't exist or handles transaction logging failure safely	Verify legacy path behavior on logging failure	P0 (KNOWN RISK)
AC-6.1.11	Credit transaction history is paginated	`GET /credits/transactions?limit=10`	P1
AC-6.1.12	Admin can add/adjust/refund credits	`POST /credits/add`, `/adjust`, `/refund`	P1
AC-6.1.13	Daily usage cap prevents runaway costs	Exceed daily limit, verify 402	P1
AC-6.1.14	`request_id` has UNIQUE constraint in DB (belt-and-suspenders idempotency)	Check migration `20260223000001_add_request_id_to_credit_transactions.sql`	P0

Code References:

src/db/users.py (lines 701-1106) — Credit deduction
- Atomic RPC path (lines 862-967) — Correct
- Legacy fallback path (lines 987-1096) — Risk: two separate calls, if logging fails credits already deducted (lines 1077-1082)
src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS
src/routes/chat.py (lines 1670-1742) — Auto-refund logic

Known Issues:

P0-2: Legacy fallback path may create orphaned deductions (balance reduced, no transaction record)
P0-3: Pricing guard needs end-to-end verification — must fire BEFORE provider call
P0-4: Auto-refund needs integration testing for edge cases (partial stream, refund failure)

6.2 Plans & Tiers

Status: Complete

#	Criterion	Verification	Priority
AC-6.2.1	New user gets $5 credits and trial expiring in 3 days	Register, check balance + `trial_end`	P0 (config mismatch)
AC-6.2.2	Trial user can make requests until credits/limits exhausted	Make requests during trial	P0
AC-6.2.3	Expired trial returns 402 for paid models	POST after trial expiry	P0
AC-6.2.4	Expired trial CAN access `:free` suffix models	POST with `:free` model after expiry	P0
AC-6.2.5	`GET /plans` returns available plan tiers with pricing	Inspect response	P0
AC-6.2.6	`GET /trial/status` returns `active`/`expired` and days remaining	Check response	P0
AC-6.2.7	Unused subscription allowance does NOT roll over (resets monthly)	Verify at month boundary	P1
AC-6.2.8	Purchased credits never expire and survive plan changes	Change plan, verify credits persist	P1
AC-6.2.9	Daily trial limit ($1/day) is enforced	Exceed $1 in trial, verify blocking	P0

Known Issues:

P0-7 (Delta Report): Trial config mismatch — CLAUDE.md says $5, wiki says $10, code says $5. Must reconcile.
src/config/usage_limits.py — Trial: $5, 3 days, $1/day
src/db/trials.py (line 44) — Formula trial_days * 5 suggests variable durations

6.3 Customer Usage Analytics

Status: Partial

#	Criterion	Verification	Priority
AC-6.3.1	User can view activity stats (total requests/tokens/spend by model/provider)	`GET /user/activity/stats`	P0
AC-6.3.2	Activity log is paginated (limit 1-1000)	`GET /user/activity/log?limit=50`	P0
AC-6.3.3	Activity log `total` field returns actual DB total, not page count	Verify `total` vs `count`	P1 (KNOWN BUG)
AC-6.3.4	Per-API-key usage breakdown is available	`GET /user/api-keys/{key_id}/usage`	P2 (NOT IMPLEMENTED)
AC-6.3.5	CSV/JSON export is available	`GET /user/usage/export?format=csv`	P2 (NOT IMPLEMENTED)

Known Issues:

P1-4 (Delta Report): src/routes/users.py (line 515) — "total": len(transactions) returns page count, not DB total
P2-1: activity_log stores user_id but NOT api_key_id — no per-key breakdown

6.4 Customer Webhooks

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-6.4.1	Outbound webhook delivery for `credits.low`, `credits.depleted`, `model.degraded` events	Configure webhook, trigger events	Deferred (D-10)

6.5 SLA Tracking

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-6.5.1	Per-tier SLA violations are detected with auto credit-back compensation	Monitor SLA metrics	Deferred (D-14)

Layer 7: Developer Platform

7.1 Prompt Management

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-7.1.1	Template library with versioning	CRUD on prompt templates	Deferred (D-12)

7.2 Batch / Async Inference

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-7.2.1	`POST /v1/batch/jobs` submits bulk workloads	Submit batch job	Deferred (D-11)

7.3 Evaluation & Testing

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-7.3.1	Side-by-side model comparison for same prompt	Compare endpoint	Deferred (D-13)

7.4 Playground

Status: Not Implemented (Deferred — frontend-coupled)

#	Criterion	Verification	Priority
AC-7.4.1	Interactive prompt testing UI	Access playground	Deferred

Layer 8: Observability

8.1 Internal Metrics & Dashboards

Status: Complete

#	Criterion	Verification	Priority
AC-8.1.1	`GET /metrics` returns valid Prometheus text format	Parse response	P0
AC-8.1.2	OpenMetrics format with exemplar support is available via content negotiation	`Accept: application/openmetrics-text`	P1
AC-8.1.3	Parsed metrics include p50, p95, p99 latency percentiles	`GET /api/metrics/parsed`	P0
AC-8.1.4	Real-time stats update within 60 seconds of new requests	`GET /api/monitoring/stats/realtime`	P1
AC-8.1.5	Error rates tracked per provider and per model	`GET /api/monitoring/error-rates`	P1
AC-8.1.6	Anomaly detection flags unusual patterns	`GET /api/monitoring/anomalies`	P1
AC-8.1.7	Grafana SimpleJSON datasource protocol fully implemented	`GET /prometheus/datasource` (test), `POST /search`, `/query`	P1

8.2 Distributed Tracing

Status: Complete

#	Criterion	Verification	Priority
AC-8.2.1	OpenTelemetry traces are initialized and exportable	`GET /api/instrumentation/health`	P0
AC-8.2.2	Every request gets a trace ID linking middleware → auth → routing → provider → billing	Inspect trace in Tempo	P1
AC-8.2.3	Exemplar linking from metrics to traces works	Verify in Grafana	P2

8.3 Error Tracking

Status: Complete

#	Criterion	Verification	Priority
AC-8.3.1	Autonomous error monitor status is retrievable	`GET /error-monitor/autonomous/status`	P0
AC-8.3.2	Dashboard provides error landscape overview	`GET /error-monitor/dashboard`	P0
AC-8.3.3	Recent errors sorted by recency	`GET /error-monitor/errors/recent`	P0
AC-8.3.4	Critical errors flagged separately	`GET /error-monitor/errors/critical`	P0
AC-8.3.5	Error patterns detect recurring issues	`GET /error-monitor/errors/patterns`	P1
AC-8.3.6	AI fix suggestions generated via Claude	`POST /error-monitor/fixes/generate-for-error`	P2

Note: All error monitor endpoints require NO auth (all public). Error patterns are in-memory only — lost on restart.

8.4 AI-Specific Tracing

Status: Partial

#	Criterion	Verification	Priority
AC-8.4.1	Arize Phoenix config exists and is functional	Check Arize initialization	P2
AC-8.4.2	OpenTelemetry captures inference metadata (model, tokens, latency)	Inspect trace attributes	P1

Known Issues: Arize Phoenix not exposed via API. Braintrust not integrated. No prompt/response pair recording.

8.5 Profiling

Status: Complete

#	Criterion	Verification	Priority
AC-8.5.1	Pyroscope profiling tags cache/Redis layers with operation context	Verify tag presence in Pyroscope	P1
AC-8.5.2	Profiling does not add measurable latency to requests	Compare request times with/without profiling	P1

8.6 Customer-Facing Observability

Status: Partial

#	Criterion	Verification	Priority
AC-8.6.1	User can view their own usage dashboard data	`GET /user/activity/stats`, `GET /user/monitor`	P0
AC-8.6.2	Model health status visible to users	`GET /v1/model-health`	P0
AC-8.6.3	Public status page with provider/model availability	`GET /v1/status/`, `GET /v1/status/providers`	P0
AC-8.6.4	Latency percentiles exposed to customers	`GET /user/latency?model=...`	P2 (NOT IMPLEMENTED)

Layer 9: API Compatibility

9.1 OpenAI-Compatible API

Status: Complete

What it does: POST /v1/chat/completions — full drop-in replacement. Streaming SSE, tool/function calling, JSON mode, logprobs. Any OpenAI SDK app works by changing base URL.

#	Criterion	Verification	Priority
AC-9.1.1	Non-streaming returns 200 with `choices[0].message.content`, `usage.prompt_tokens`, `usage.completion_tokens`	POST with `stream: false`	P0
AC-9.1.2	Streaming returns SSE where each line starts with `data:` , ends with `data: [DONE]`	POST with `stream: true`	P0
AC-9.1.3	`response_format: {"type": "json_object"}` returns valid parseable JSON	POST with JSON mode	P0
AC-9.1.4	`tools` array returns `tool_calls` when model decides to call a tool	POST with tool definitions	P0
AC-9.1.5	`logprobs: true` returns a `logprobs` field	POST with logprobs	P1
AC-9.1.6	OpenAI Python SDK works with zero changes beyond `base_url` and `api_key`	`openai.OpenAI(base_url="$BASE/v1")`	P0
AC-9.1.7	All inference errors use OpenAI-compatible format: `{"error": {"message": "...", "type": "...", "code": "..."}}`	Trigger errors, inspect format	P1 (KNOWN ISSUE)
AC-9.1.8	Unauthenticated request with whitelisted model returns 200	POST without auth header	P0
AC-9.1.9	Unauthenticated request with non-whitelisted model returns 401/403	POST without auth header	P0
AC-9.1.10	Streaming normalization handles OpenAI, Gemini, Anthropic, Fireworks formats	Test stream from each provider type	P0
AC-9.1.11	Unrecognized streaming format logs a warning (not silently dropped)	Check logs for dropped chunks	P1 (KNOWN BUG)

Known Issues:

P1-2: ~5% of errors use FastAPI default {"detail": "..."} instead of OpenAI format — breaks SDK error handling
P1-8: Stream normalizer returns None for unrecognized chunks (silently dropped, no warning)

9.2 Anthropic-Compatible API

Status: Complete

#	Criterion	Verification	Priority
AC-9.2.1	Non-streaming returns 200 with `content[0].text`, `usage.input_tokens`, `usage.output_tokens` in Anthropic format	POST `/v1/messages`	P0
AC-9.2.2	Streaming returns SSE in Anthropic format (`message_start`, `content_block_delta`, `message_stop`)	POST with `stream: true`	P0
AC-9.2.3	Credits deducted using Anthropic token counts	Compare balance before/after	P0
AC-9.2.4	Anthropic Python SDK works with zero changes beyond `base_url` and `api_key`	`anthropic.Anthropic(base_url="$BASE/v1")`	P0

Layer 10: Infrastructure & Deployment

10.1 Multi-Region Routing

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-10.1.1	Requests routed to nearest provider region for lowest latency	Test from different regions	Deferred (D-15)

10.2 Data Residency

Status: Not Implemented (Deferred)

#	Criterion	Verification	Priority
AC-10.2.1	EU customers' requests routed to EU-based providers	Test with EU IP	Deferred (D-16)

10.3 Multi-Target Deployment

Status: Complete

#	Criterion	Verification	Priority
AC-10.3.1	Vercel serverless deployment works via `api/index.py`	Deploy to Vercel	P0
AC-10.3.2	Railway/Docker deployment works via `start.sh`	Deploy to Railway	P0
AC-10.3.3	Dev server starts with `python src/main.py` or `uvicorn src.main:app --reload`	Start locally	P0

Cross-Cutting Features

CC-1: Stripe Payments

Status: Complete

#	Criterion	Verification	Priority
AC-CC.1.1	`GET /api/stripe/credit-packages` returns available packages (public, no auth)	Inspect response	P0
AC-CC.1.2	`POST /api/stripe/checkout-session` returns valid Stripe checkout URL	Create session	P0
AC-CC.1.3	Successful payment webhook adds credits to user's balance	Simulate `payment_intent.succeeded` webhook	P0
AC-CC.1.4	Webhook endpoint ALWAYS returns 200, even if processing fails	Send malformed webhook	P0
AC-CC.1.5	Payment history is paginated with amount, date, status	`GET /api/stripe/payments`	P0
AC-CC.1.6	Subscription checkout creates Stripe subscription and assigns plan	`POST /api/stripe/subscription-checkout`	P0
AC-CC.1.7	Subscription upgrade/downgrade/cancel work	Test each operation	P1
AC-CC.1.8	Webhook handles all events: `payment_intent.succeeded`, `charge.succeeded`, `invoice.paid`, `customer.subscription.created`	Test each event type	P0

CC-2: Coupons

Status: Complete

#	Criterion	Verification	Priority
AC-CC.2.1	Valid coupon redeems and adds correct credit amount	`POST /coupons/redeem`	P0
AC-CC.2.2	Expired coupon returns 400	Redeem expired code	P0
AC-CC.2.3	Already-redeemed coupon (same user) returns 400	Redeem twice	P0
AC-CC.2.4	User-specific coupon redeemed by wrong user returns 400/403	Redeem with different user	P0
AC-CC.2.5	`GET /coupons/available` returns global + user-targeted coupons	Inspect response	P1
AC-CC.2.6	Redemption history shows past redemptions	`GET /coupons/history`	P1

CC-3: Referrals

Status: Complete

#	Criterion	Verification	Priority
AC-CC.3.1	User generates unique referral code	`POST /referral/generate`	P0
AC-CC.3.2	Referral code validates successfully	`POST /referral/validate`	P0
AC-CC.3.3	Self-referral is prevented	Attempt self-referral	P0
AC-CC.3.4	Referral stats show total referred, conversions, rewards	`GET /referral/stats`	P1
AC-CC.3.5	Successful referral grants $10 credits to both parties on first $10+ purchase	Complete referral flow	P0

CC-4: Chat History & Sessions

Status: Complete

#	Criterion	Verification	Priority
AC-CC.4.1	Sessions can be created, listed, updated, deleted	CRUD on `/v1/chat/sessions/*`	P0
AC-CC.4.2	Messages can be saved individually and in batch	POST single and batch	P0
AC-CC.4.3	Full-text search returns matching sessions	`POST /v1/chat/search`	P0
AC-CC.4.4	Duplicate messages are deduplicated	Save same message twice, verify single entry	P1
AC-CC.4.5	Chat stats return accurate usage data	`GET /v1/chat/stats`	P1
AC-CC.4.6	Share links provide public read-only access	Create share, access without auth	P1
AC-CC.4.7	Feedback CRUD (create, read, update, delete) works per session	CRUD on `/v1/chat/feedback/*`	P1

CC-5: API Key Management

Status: Complete

#	Criterion	Verification	Priority
AC-CC.5.1	Created key is in `gw_{env}_*` format	`POST /user/api-keys`	P0
AC-CC.5.2	Key creation rate-limited to 10 per hour; 11th returns 429	Create 11 keys	P0
AC-CC.5.3	Keys can be listed showing all active keys	`GET /user/api-keys`	P0
AC-CC.5.4	Keys can be updated (name, restrictions)	`PUT /user/api-keys/{key_id}`	P0
AC-CC.5.5	Keys can be deleted	`DELETE /user/api-keys/{key_id}`	P0
AC-CC.5.6	Deleted key no longer authenticates (returns 401)	Use deleted key	P0
AC-CC.5.7	Audit logs record key creation, usage, deletion	`GET /user/api-keys/audit-logs`	P1

CC-6: Image Generation

Status: Complete

#	Criterion	Verification	Priority
AC-CC.6.1	`POST /v1/images/generations` returns 200 with image data or URL	POST with prompt	P0
AC-CC.6.2	Credits deducted based on image generation pricing	Compare balance before/after	P0
AC-CC.6.3	0-credit user receives 402	POST with 0-credit user	P0

CC-7: Audio Transcription

Status: Complete

#	Criterion	Verification	Priority
AC-CC.7.1	File upload transcription returns 200 with text	POST with audio file	P0
AC-CC.7.2	Base64 transcription returns 200	`POST /v1/audio/transcriptions/base64`	P0
AC-CC.7.3	Unsupported format returns appropriate error	POST with invalid format	P1

CC-8: Server-Side Tools

Status: Complete

#	Criterion	Verification	Priority
AC-CC.8.1	`GET /v1/tools` returns available tools (web_search, text_to_speech)	Inspect response	P0
AC-CC.8.2	Tool definitions in OpenAI function-calling format	`GET /v1/tools/definitions`	P0
AC-CC.8.3	Nonexistent tool returns 404	`GET /v1/tools/fake_tool`	P0
AC-CC.8.4	Web search execution returns results	`POST /v1/tools/execute` with web_search	P0
AC-CC.8.5	SSRF protection blocks internal/private IP ranges	Attempt internal URL in tool execution	P0

CC-9: Partner Trials

Status: Complete

#	Criterion	Verification	Priority
AC-CC.9.1	Partner config is publicly accessible	`GET /partner-trials/config/{code}`	P0
AC-CC.9.2	Partner code check always returns 200 (valid/invalid in body)	`GET /partner-trials/check/{code}`	P0
AC-CC.9.3	Starting partner trial applies partner-specific credits and limits	`POST /partner-trials/start` with known partner code	P0
AC-CC.9.4	Partner trial daily limit is enforced	Exceed daily limit	P0
AC-CC.9.5	Partner trial config is cached (5-min in-memory)	Check timing	P1

CC-10: Notifications

Status: Complete (partial test coverage)

#	Criterion	Verification	Priority
AC-CC.10.1	User can retrieve notification preferences	`GET /user/notifications/preferences`	P0
AC-CC.10.2	Usage report can be triggered on demand	`POST /user/notifications/send-usage-report`	P0
AC-CC.10.3	Test notification sends successfully	`POST /user/notifications/test`	P0
AC-CC.10.4	Notification failure does not crash the system	Disable Resend, verify graceful handling	P0
AC-CC.10.5	Retry logic on notification delivery failure	Verify 2-3 retries with backoff	P2 (NOT IMPLEMENTED)

Known Issues:

P2-4 (Delta Report): No retry logic, no persistent delivery tracking. On failure: logs error, returns False, continues silently.

CC-11: Admin Operations

Status: Complete

#	Criterion	Verification	Priority
AC-CC.11.1	Non-admin users receive 403 on ALL admin endpoints	Use user key on admin endpoint	P0
AC-CC.11.2	Admin can list, search, view user details	`GET /admin/users`, `/admin/users/{id}`	P0
AC-CC.11.3	Admin credit grants respect per-transaction cap and 24h daily limit	Exceed limits	P0
AC-CC.11.4	Admin can assign plans	`POST /admin/assign-plan`	P0
AC-CC.11.5	System monitor returns user counts, credit totals, API usage	`GET /admin/monitor`	P0
AC-CC.11.6	Cache operations work (status, refresh, clear)	GET/POST cache endpoints	P1
AC-CC.11.7	Model sync can be triggered	`POST /admin/model-sync/trigger`	P1
AC-CC.11.8	`GET /admin/model-sync/providers` requires admin auth	Verify auth enforcement	P0 (KNOWN RISK)
AC-CC.11.9	Bulk user delete by domain respects protected domains (gmail, yahoo, outlook)	Attempt protected domain delete	P0
AC-CC.11.10	Bulk user delete defaults to dry_run=true	Verify default behavior	P0

Known Issues:

P0-6 (Delta Report): GET /admin/model-sync/providers documented as "No auth enforced" — leaks infrastructure details (33 providers).

CC-12: Security

Status: Complete

#	Criterion	Verification	Priority
AC-CC.12.1	API keys are Fernet-encrypted in DB	Query DB directly	P0
AC-CC.12.2	API key lookup uses HMAC hash, not brute-force decryption	Verify code path	P0
AC-CC.12.3	SQL injection attempts are sanitized/rejected	`'; DROP TABLE users; --` in inputs	P0
AC-CC.12.4	XSS payloads are sanitized/rejected	`<script>alert(1)</script>` in inputs	P0
AC-CC.12.5	Command injection blocked	`; rm -rf /` in inputs	P0
AC-CC.12.6	Path traversal blocked	`../../etc/passwd` in inputs	P0
AC-CC.12.7	Error messages never expose stack traces, internal paths, or sensitive data	Trigger errors, inspect responses	P0
AC-CC.12.8	Admin security violations logged in audit trail	Attempt unauthorized admin access	P0
AC-CC.12.9	Temporary/disposable email domains detected during registration	Register with `user@tempmail.com`	P1

CC-13: Google Vertex Function Calling

Status: Partial

#	Criterion	Verification	Priority
AC-CC.13.1	REST path function calling works (OpenAI tools → Vertex functionDeclarations)	POST with tools to Vertex model via REST	P0
AC-CC.13.2	SDK path function calling either works OR is avoided when tools present	POST with tools via SDK path	P1 (KNOWN BUG)
AC-CC.13.3	Tool choice options (auto, required, none) are translated correctly	Test each tool_choice value	P1

Code References:

src/services/google_vertex_client.py (lines 250-402, 662-707) — REST path implemented
Lines 585-587 — SDK path has TODO: "Function calling may not work correctly"

Known Issues:

P1-5 (Delta Report): SDK path has TODO. If SDK path is used when tools are present, function calling silently fails.

Summary Matrix

Layer	Feature	Criteria	Status	Known Issues
1	API Key Auth	10	Complete	—
1	RBAC	6	Complete	—
1	IP Allowlists	6	Complete	—
1	Domain Restrictions	4	Complete	—
1	Three-Layer Rate Limiting	14	Complete	P0-5: Missing headers on L2/L3
1	Input Guardrails (4 features)	4	Not Implemented	Deferred
1	Output Guardrails (3 features)	3	Not Implemented	Deferred
2	Model Resolution	7	Complete	—
2	General Router	7	Complete	—
2	Code Router	7	Complete	—
2	Provider Failover	9	Complete	—
2	Circuit Breakers	10	Complete	P1-7: Timing discrepancy (60s vs 5min)
2	Health-Weighted LB	2	Partial	—
2	Latency/Cost Optimal	2	Partial	Hardcoded latency model
2	Traffic Splitting	1	Not Implemented	Deferred
3	Tiered Health Monitoring	10	Complete	—
3	Passive Health Capture	2	Complete	—
3	Incident Management	5	Complete	—
3	Model Quality Scoring	2	Partial	Static/hardcoded
3	Per-Customer Quality	1	Not Implemented	Deferred
3	Provider Credit Monitoring	3	Partial	P1-1: OpenRouter only
4	Semantic Cache	1	Not Implemented	Deferred
4	Exact-Match Cache	1	Not Implemented	Deferred (infra exists)
4	Butter.dev Cache	2	Partial	P0-1: Ghost feature
4	Supporting Caches	5	Complete	—
5	Background Model Sync	4	Complete	—
5	Model Metadata Standard	2	Complete	—
5	Catalog Inclusion	3	Partial	P1-3: No gating at sync
5	HuggingFace Enrichment	2	Complete	—
5	Model Discovery & Search	5	Complete	—
6	Credit System	14	Complete	P0-2/3/4: Atomicity, pricing guard, refund
6	Plans & Tiers	9	Complete	P0-7: Config mismatch
6	Customer Usage Analytics	5	Partial	P1-4: Pagination bug, P2-1/2: Per-key, export
6	Customer Webhooks	1	Not Implemented	Deferred
6	SLA Tracking	1	Not Implemented	Deferred
7	Prompt Management	1	Not Implemented	Deferred
7	Batch/Async Inference	1	Not Implemented	Deferred
7	Evaluation & Testing	1	Not Implemented	Deferred
7	Playground	1	Not Implemented	Deferred
8	Metrics & Dashboards	7	Complete	—
8	Distributed Tracing	3	Complete	—
8	Error Tracking	6	Complete	—
8	AI-Specific Tracing	2	Partial	Arize/Braintrust gaps
8	Profiling	2	Complete	—
8	Customer Observability	4	Partial	P2-3: No latency API
9	OpenAI-Compatible API	11	Complete	P1-2: Error format, P1-8: Stream drops
9	Anthropic-Compatible API	4	Complete	—
10	Multi-Region Routing	1	Not Implemented	Deferred
10	Data Residency	1	Not Implemented	Deferred
10	Multi-Target Deployment	3	Complete	—
CC	Stripe Payments	8	Complete	—
CC	Coupons	6	Complete	—
CC	Referrals	5	Complete	—
CC	Chat History	7	Complete	—
CC	API Key Management	7	Complete	—
CC	Image Generation	3	Complete	—
CC	Audio Transcription	3	Complete	—
CC	Server-Side Tools	5	Complete	—
CC	Partner Trials	5	Complete	—
CC	Notifications	5	Complete	P2-4: No retry/delivery tracking
CC	Admin Operations	10	Complete	P0-6: Model-sync providers auth
CC	Security	9	Complete	—
CC	Google Vertex FC	3	Partial	P1-5: SDK path TODO
	TOTAL	323

Priority Summary

Priority	Count	Description
P0	7 bugs across 46 criteria	Ghost features, billing atomicity, pricing guard, refund verification, rate limit headers, admin auth, trial config
P1	8 bugs across 28 criteria	Provider monitoring, error format, catalog gating, pagination, Vertex FC, overage, circuit breaker timing, stream normalization
P2	4 gaps across 12 criteria	Per-key usage, export, latency API, notification delivery
Deferred	20 features, 24 criteria	Guardrails, caching, webhooks, batch, prompts, eval, SLA, geo-routing, GDPR, traffic splitting

Source: Conceptual Model Features | Features | Delta Report | Testing Plan | Acceptance Criteria

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References

Features Acceptance Criteria

Acceptance Criteria

How to Read This Document

Layer 1: Ingress

1.1 API Key Authentication

1.2 Role-Based Access Control (RBAC)

1.3 Per-Key IP Allowlists

1.4 Domain Restrictions

1.5 Three-Layer Rate Limiting

1.6–1.9 Input Guardrails (PII Detection, Prompt Injection, Topic Restrictions, Content Moderation)

1.10–1.12 Output Guardrails (Content Filtering, Structured Output Validation, Hallucination Flags)

Layer 2: Core Routing Engine

2.1 Model Resolution Pipeline

2.2 Intelligent Routing — General Router

2.3 Intelligent Routing — Code Router

2.4 Provider Failover

2.5 Circuit Breakers

2.6 Health-Weighted Load Balancing

2.7–2.8 Latency-Optimal / Cost-Optimal Selection

2.9 Traffic Splitting

Layer 3: Intelligence

3.1 Tiered Health Monitoring

3.2 Passive Health Capture

3.3 Incident Management

3.4 Model Quality Scoring & Benchmarks

3.5 Per-Customer Quality Tracking

3.6 Provider Credit Monitoring

Layer 4: Caching System

4.1 Semantic Cache

4.2 Exact-Match Response Cache

4.3 External Cache (Butter.dev)

4.4 Supporting Caches

Layer 5: Model Catalog

5.1 Background Model Sync

5.2 Model Metadata Standard

5.3 Catalog Inclusion Requirements

5.4 HuggingFace Enrichment

5.5 Model Discovery & Search

Layer 6: Business

6.1 Credit System

6.2 Plans & Tiers

6.3 Customer Usage Analytics

6.4 Customer Webhooks

6.5 SLA Tracking

Layer 7: Developer Platform

7.1 Prompt Management

7.2 Batch / Async Inference

7.3 Evaluation & Testing

7.4 Playground

Layer 8: Observability

8.1 Internal Metrics & Dashboards

8.2 Distributed Tracing

8.3 Error Tracking

8.4 AI-Specific Tracing

8.5 Profiling

8.6 Customer-Facing Observability

Layer 9: API Compatibility

9.1 OpenAI-Compatible API

9.2 Anthropic-Compatible API

Layer 10: Infrastructure & Deployment

10.1 Multi-Region Routing

10.2 Data Residency

10.3 Multi-Target Deployment

Cross-Cutting Features

CC-1: Stripe Payments

CC-2: Coupons

CC-3: Referrals

CC-4: Chat History & Sessions

CC-5: API Key Management

CC-6: Image Generation

CC-7: Audio Transcription

CC-8: Server-Side Tools

CC-9: Partner Trials

CC-10: Notifications

CC-11: Admin Operations

CC-12: Security

CC-13: Google Vertex Function Calling

Summary Matrix

Priority Summary

Uh oh!