-
Notifications
You must be signed in to change notification settings - Fork 1
Features Acceptance Criteria
Reading path: Conceptual Model | Stability Definition | Conceptual Model Features | Features | Delta Report | Acceptance Criteria (you are here)
Read after: Delta Report (so you know the gaps and priorities) You're at the end of the reading path! From here, go to Testing Guide to see how these criteria are tested.
TL;DR — This is the single source of truth for acceptance criteria across all 56 features in the Conceptual Model. Each feature has: what it must do, what it must NOT do, testable acceptance criteria, implementation status, code references, known issues, and priority. Organized by the 10-layer architecture. Use this to verify any feature is "done." For detailed Given/When/Then format criteria and boundary validations, see Conceptual Model Acceptance Criteria.
Consolidation note: This document is the primary acceptance criteria reference. It incorporates the implementation-aware criteria. For the spec-pure criteria (Given/When/Then format with integration requirements), see Conceptual Model Acceptance Criteria. For compact test-plan-linked criteria, see the Testing Plan directly.
Each feature section includes:
- Description: What the feature does and its boundaries (from Conceptual Model)
- Implementation Status: Current state (Complete / Partial / Not Implemented)
- Acceptance Criteria: Numbered, testable statements — a feature is accepted when ALL criteria pass
- Code References: File paths and line numbers for verification
- Known Issues: Bugs, gaps, or discrepancies found during code investigation
- Priority: P0 (must fix before release), P1 (should fix), P2 (nice to have), Deferred (post-release)
Status: Complete
What it does: Authenticates every API request using API keys encrypted at rest with Fernet AES-128. Keys are looked up via HMAC-SHA256 hash for O(log n) retrieval. Validates that keys are active, not expired, and not rate-limited.
What it does NOT do: No OAuth/JWT for API requests. No automatic key rotation. No multi-key auth per request.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.1.1 | Valid API key in Authorization: Bearer gw_* header returns 200 |
Send request with valid key | P0 |
| AC-1.1.2 | Invalid API key returns 401, never 200 or 500 | Send request with Bearer invalid_key
|
P0 |
| AC-1.1.3 | Expired API key returns 401 with clear message | Use a key past its expires_at
|
P0 |
| AC-1.1.4 | Deactivated API key (is_active=false) returns 401 |
Deactivate key, then use it | P0 |
| AC-1.1.5 | API keys in DB are Fernet-encrypted ciphertext, never plaintext | Query api_keys_new table directly, verify encrypted_key column is ciphertext |
P0 |
| AC-1.1.6 | Key lookup uses HMAC-SHA256 hash index, not brute-force decryption of all keys | Verify key_hash column is indexed, lookup is O(log n) by timing with 1 key vs 1000 keys |
P0 |
| AC-1.1.7 | Key format is gw_{env}_{43_random_chars} (e.g., gw_live_abc123...) |
Create new key, verify format regex | P0 |
| AC-1.1.8 | Key creation stores last4 characters for user-friendly identification |
Create key, check last4 field in response and DB |
P1 |
| AC-1.1.9 | Authentication is cached (5-min TTL, 512-entry LRU) — second request with same key is faster | Time two consecutive auth calls, second should be <5ms vs 50-150ms | P1 |
| AC-1.1.10 | When Redis is down, auth cache falls back to local memory — requests are never blocked | Stop Redis, verify auth still works | P0 |
Code References:
-
src/security/security.py— Fernet encryption, HMAC hashing -
src/security/deps.py—get_api_key(),get_current_user(),validate_api_key_security() -
src/db/api_keys.py— Key CRUD, key lookup by hash
Status: Complete
What it does: Assigns roles (admin, team, dev, free) to users. Permissions checked at dependency-injection level before route handlers execute. Role changes are audit-logged.
What it does NOT do: No granular per-model permissions. No custom roles. No team-level RBAC. No provider-level permissions.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.2.1 | Non-admin API key returns 403 on ALL /admin/* endpoints |
GET /admin/users with user key |
P0 |
| AC-1.2.2 | Admin API key returns 200 on admin endpoints |
GET /admin/users with admin key |
P0 |
| AC-1.2.3 | Unauthorized admin access attempts are logged via audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS")
|
Attempt admin access with user key, check audit log | P0 |
| AC-1.2.4 | Role updates require admin auth and are logged with a reason |
POST /admin/roles/update with user_id, new_role, reason |
P0 |
| AC-1.2.5 |
GET /admin/roles/permissions/{role} returns the correct permission set for each role |
Check all 4 roles | P1 |
| AC-1.2.6 | Role change audit log is retrievable at GET /admin/roles/audit/log
|
Verify entries with timestamps and reasons | P1 |
Code References:
-
src/security/deps.py—require_admindependency -
src/routes/admin.py— Admin route handlers -
src/db/roles.py— Role management
Status: Complete
What it does: Restricts API key usage to specific IP addresses or CIDR ranges. Requests from non-allowlisted IPs are rejected before processing.
What it does NOT do: No geo-based restrictions. No IPv6 ranges. No automatic IP suggestions.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.3.1 | Admin can create IP allowlist entries with IPv4 addresses |
POST /api/admin/ip-whitelist with {"ip": "1.2.3.4"}
|
P0 |
| AC-1.3.2 | Admin can create IP allowlist entries with CIDR notation |
POST /api/admin/ip-whitelist with {"ip": "10.0.0.0/24"}
|
P0 |
| AC-1.3.3 | API key with allowlist rejects requests from non-allowed IPs with 403 | Use key from IP not in allowlist | P0 |
| AC-1.3.4 | API key with allowlist accepts requests from allowed IPs | Use key from allowlisted IP | P0 |
| AC-1.3.5 |
POST /api/admin/ip-whitelist/check correctly reports allowed vs blocked IPs |
Test with both allowed and blocked IPs | P1 |
| AC-1.3.6 | Allowlist entries can be listed, updated, and deleted | CRUD operations on /api/admin/ip-whitelist/*
|
P1 |
Code References:
-
src/routes/admin.py— IP allowlist endpoints -
src/security/deps.py— IP validation invalidate_api_key_security()
Status: Complete
What it does: Limits which HTTP referrer domains can use a specific API key. Prevents stolen keys from being used on unauthorized domains.
What it does NOT do: No domain ownership validation. No subdomain wildcards. No server-side restriction (only applies when Referer header present).
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.4.1 | API key with domain restriction rejects requests with wrong Referer header | Send request with Referer: https://unauthorized.com
|
P0 |
| AC-1.4.2 | API key with domain restriction accepts requests with correct Referer | Send request with configured Referer domain | P0 |
| AC-1.4.3 | Requests without Referer header bypass domain restriction (server-side usage) | Send request without Referer header | P0 |
| AC-1.4.4 | Multiple domains can be configured per key | Configure 3 domains, verify all work | P1 |
Code References:
-
src/security/deps.py—validate_api_key_security()domain check
Status: Complete (with known header gap on Layers 2 and 3)
What it does:
- Layer 1 (IP): Security middleware with behavioral analysis, velocity detection. 300 RPM for unauthenticated, authenticated users exempt.
- Layer 2 (API Key): Redis-backed per-key limits tied to plan tier.
- Layer 3 (Anonymous): Stricter limits for unauthenticated requests.
- Fallback: In-memory rate limiter when Redis is unavailable.
What it does NOT do: No per-model rate limits. No burst/token-bucket. No cross-instance IP state sharing. Rejected requests consume zero credits.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.5.1 | Unauthenticated requests exceeding 300 RPM from same IP receive 429 | Send 301 requests from one IP | P0 |
| AC-1.5.2 | Authenticated users are exempt from IP-level rate limiting | Verify no IP block on auth requests | P0 |
| AC-1.5.3 | API key exceeding plan RPM receives 429 | Exceed per-key limit | P0 |
| AC-1.5.4 | Anonymous rate limits are stricter than authenticated limits | Compare thresholds for anon vs auth | P0 |
| AC-1.5.5 |
Layer 1 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Reason, X-RateLimit-Mode
|
Trigger Layer 1 429, inspect headers | P0 |
| AC-1.5.6 |
Layer 2 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
|
Trigger Layer 2 429, inspect headers | P0 (KNOWN BUG) |
| AC-1.5.7 |
Layer 3 429 response includes Retry-After and X-RateLimit-* headers |
Trigger Layer 3 429, inspect headers | P0 (KNOWN BUG) |
| AC-1.5.8 | When Redis is down, rate limiting continues via in-memory fallback — requests are never blocked | Stop Redis, verify rate limiting works | P0 |
| AC-1.5.9 | Velocity mode activates when error rate exceeds 25% and reduces limits to 50% | Trigger >25% error rate, check GET /velocity-mode-status
|
P0 |
| AC-1.5.10 | Velocity mode deactivates after 3 minutes of normal error rates | Wait for cooldown, verify normal limits restored | P1 |
| AC-1.5.11 | Rate limit configuration viewable at GET /user/rate-limits
|
Check response format | P1 |
| AC-1.5.12 | Per-key rate limits updatable via PUT /user/rate-limits/{key_id}
|
Update and verify enforcement | P1 |
| AC-1.5.13 | Auth endpoint rate-limits to 10 requests per 15 minutes per IP |
POST /auth 11 times, 11th returns 429 |
P0 |
| AC-1.5.14 | Registration rate-limits to 3 requests per hour per IP |
POST /auth/register 4 times, 4th returns 429 |
P0 |
Code References:
-
src/middleware/security_middleware.py(lines 647-716) — Layer 1, headers present -
src/services/rate_limiting.py(lines 78-94) — Layer 2,RateLimitResultdataclass has fields but NOT converted to HTTP headers -
src/services/anonymous_rate_limiter.py— Layer 3, NO headers -
src/services/rate_limiting_fallback.py— In-memory fallback
Known Issues:
-
P0-5 (Delta Report): Layer 2
RateLimitResultfields exist but are not converted to HTTP response headers. Layer 3 has no rate limit headers at all. Clients get bare 429 rejections with no retry information.
Status: Not Implemented (Deferred)
What these would do: PII scanning (phone, SSN, email, credit card), prompt injection pattern detection, per-key topic restrictions, content moderation via external classifiers.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.6.1 | PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers | Send prompt with PII | Deferred |
| AC-1.7.1 | Prompt injection patterns that attempt to override system prompts are detected and blocked | Send known injection pattern | Deferred |
| AC-1.8.1 | Per-API-key topic restrictions limit responses to configured domains | Configure restriction, test out-of-domain | Deferred |
| AC-1.9.1 | Content moderation blocks harmful inputs before reaching providers | Send harmful content | Deferred |
Note: These are Conceptual Model features (D-1 through D-4 in Delta Report). Not required for stable release. No code exists.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.10.1 | Output content filtering scans responses for policy violations before returning | Trigger policy-violating response | Deferred |
| AC-1.11.1 | Structured output validation confirms JSON schema conformance when requested | Request JSON schema output | Deferred (D-5, Small effort) |
| AC-1.12.1 | Provider-side safety metadata (refusals, safety triggers) is surfaced in standardized format | Trigger safety filter, inspect response | Deferred |
Status: Complete
What it does: Three-stage pipeline: alias normalization (120+ aliases) → provider detection (overrides → format rules → mapping tables → org-prefix fallbacks) → model ID transformation (provider-native format).
What it does NOT do: No user-defined aliases. No version/snapshot resolution. No per-modality routing differences.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.1.1 |
gpt-4o resolves to openai/gpt-4o
|
POST /v1/chat/completions with model: "gpt-4o"
|
P0 |
| AC-2.1.2 |
r1 resolves to deepseek/deepseek-r1
|
POST /v1/chat/completions with model: "r1"
|
P0 |
| AC-2.1.3 | Canonical IDs (e.g., openai/gpt-4o) work directly without alias resolution |
POST with canonical ID | P0 |
| AC-2.1.4 | Provider detection correctly routes google/gemini-* models to Vertex when credentials available |
POST with Gemini model | P0 |
| AC-2.1.5 | No alias maps to itself (no self-referencing loops) | Inspect MODEL_ALIASES dict for cycles |
P0 |
| AC-2.1.6 | Fireworks model IDs are transformed to accounts/fireworks/models/... format |
POST with Fireworks model, verify upstream call format | P1 |
| AC-2.1.7 | Nonexistent model returns 400 or 404, not 500 | POST with model: "nonexistent/model"
|
P0 |
Code References:
-
src/services/models.py—MODEL_ALIASESdict, resolution pipeline -
src/services/model_transformations.py— Provider-specific ID transformations -
src/services/model_availability.py— Availability checking
Status: Complete
What it does: ML-powered model selection via NotDiamond. Four modes: quality (openai/gpt-4o), cost (openai/gpt-4o-mini), latency (groq/llama-3.3-70b-versatile), balanced (anthropic/claude-sonnet-4). Falls back to mode-specific defaults when NotDiamond unavailable.
What it does NOT do: No user feedback learning. No custom model pools. No routing constraints.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.2.1 |
router:general:quality selects a high-quality model and returns 200 |
POST chat with model: "router:general:quality"
|
P0 |
| AC-2.2.2 |
router:general:cost selects a cheaper model than quality mode |
Compare selected models for same prompt | P0 |
| AC-2.2.3 |
router:general:latency selects a low-latency model |
POST and verify selection | P0 |
| AC-2.2.4 |
router:general:balanced considers quality, cost, and latency |
POST and verify selection | P0 |
| AC-2.2.5 | When NotDiamond is unavailable, fallback models are used without error | Disable NotDiamond, verify graceful fallback | P0 |
| AC-2.2.6 |
GET /general-router/settings/options returns available strategies and model pools |
Inspect response | P1 |
| AC-2.2.7 |
POST /general-router/test returns selected model + reasoning |
POST with sample prompt | P1 |
Code References:
-
src/services/general_router.py— Routing logic, NotDiamond integration -
src/routes/general_router.py— Endpoints
Status: Complete
What it does: Benchmark-driven model selection for coding tasks. 4 tiers by SWE-bench/HumanEval scores. Modes: auto (complexity-based), price, quality, agentic. Static data from code_quality_priors.json.
What it does NOT do: No code execution. No feedback learning. No custom tiers. No language detection.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.3.1 |
router:code:auto classifies prompt complexity and selects appropriate tier |
POST with code prompt | P0 |
| AC-2.3.2 |
router:code:quality selects highest-tier code model |
POST and verify | P0 |
| AC-2.3.3 |
router:code:price selects cost-effective code model |
POST and verify | P0 |
| AC-2.3.4 |
router:code:agentic selects model optimized for multi-step tool use |
POST and verify | P0 |
| AC-2.3.5 |
GET /code-router/tiers returns models with SWE-bench/HumanEval scores |
Inspect response | P0 |
| AC-2.3.6 | Code router works entirely from in-memory data (no DB/Redis dependency) | Verify response with Redis down | P0 |
| AC-2.3.7 |
POST /code-router/test returns selected model and routing rationale |
POST with sample prompt | P1 |
Code References:
-
src/services/code_router.py— Routing logic, tier selection -
src/services/code_quality_priors.json— Static benchmark data -
src/routes/code_router.py— Endpoints
Status: Complete
What it does: 14-provider prioritized failover chain. Failover triggers on 401/402/403/404/502/503/504. Does NOT trigger on 400 (user error) or 429 (retries with backoff). Model-aware rules: OpenAI → OpenRouter only, Anthropic → OpenRouter only, open-source → all providers.
What it does NOT do: No mid-stream failover. No user-configured chains. No same-pricing guarantee across providers.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.4.1 | Primary provider 502/503/504 → request succeeds via fallback transparently | Force primary failure, verify success | P0 |
| AC-2.4.2 | Provider 401/402/403/404 → failover to next provider | Force auth error, verify failover | P0 |
| AC-2.4.3 | Provider 400 (user error) → returns 400 to user, NO failover | Send malformed request | P0 |
| AC-2.4.4 | Provider 429 → retries with backoff, does NOT failover | Trigger rate limit, verify retry behavior | P0 |
| AC-2.4.5 | OpenAI models only failover to OpenAI → OpenRouter | Inspect failover chain for openai/gpt-4o
|
P0 |
| AC-2.4.6 | Anthropic models only failover to Anthropic → OpenRouter | Inspect failover chain for anthropic/claude-sonnet-4
|
P0 |
| AC-2.4.7 | Open-source models can failover across all providers | Inspect chain for meta-llama/llama-3-70b
|
P0 |
| AC-2.4.8 | Failover chain skips providers with OPEN circuit breakers | Open a breaker, verify provider is skipped | P0 |
| AC-2.4.9 | User receives no indication of failover (transparent to client) | Monitor response during failover | P0 |
Code References:
-
src/services/provider_failover.py— Failover chain construction, error classification -
src/routes/chat.py—build_provider_failover_chain()integration
Status: Complete (with timing discrepancy)
What it does: Per-provider circuit breakers. CLOSED → OPEN (5 consecutive failures) → HALF_OPEN (after timeout) → CLOSED (3 consecutive successes) or back to OPEN. Redis + in-memory state.
What it does NOT do: No per-provider threshold configuration. No error-type differentiation. No operator alerts. No persistent state beyond Redis.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.5.1 | New provider starts in CLOSED state | GET /circuit-breakers/{new_provider} |
P0 |
| AC-2.5.2 | After 5 consecutive failures, state transitions to OPEN | Send 5 failing requests, check state | P0 |
| AC-2.5.3 | OPEN state prevents requests to that provider | Verify provider is skipped in failover | P0 |
| AC-2.5.4 | After timeout period, OPEN transitions to HALF_OPEN | Wait for timeout, check state | P0 |
| AC-2.5.5 | In HALF_OPEN, a successful request transitions to CLOSED | Send success, check state | P0 |
| AC-2.5.6 | In HALF_OPEN, a failed request transitions back to OPEN | Send failure, check state | P0 |
| AC-2.5.7 |
POST /circuit-breakers/{provider}/reset resets to CLOSED |
Reset and verify | P0 |
| AC-2.5.8 |
POST /circuit-breakers/reset-all resets all breakers |
Reset all and verify | P0 |
| AC-2.5.9 | Circuit breaker endpoints require NO auth (public) | Verify no auth needed | P1 |
| AC-2.5.10 | Prometheus metrics emitted on state transitions | Check circuit_breaker_state_transitions_total
|
P1 |
Code References:
-
src/services/circuit_breaker.py(line 67) — Default timeout 60 seconds - Redis keys:
circuit_breaker:{provider}:{state|failure_count|success_count|opened_at}(3600s TTL)
Known Issues:
- P1-7 (Delta Report): Code uses 60-second timeout, but Conceptual Model says 5 minutes and wiki Testing Plan says 5 minutes. Either code or docs must be updated.
Status: Partial
What it does: Checks primary provider health score before routing. Below-threshold providers are demoted in failover chain.
What it does NOT do: No proportional traffic splitting by health score. No per-model health. No predictive health.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.6.1 | When primary provider health is below threshold, a healthier provider is promoted | Degrade a provider, verify chain reordering | P1 |
| AC-2.6.2 | Health-based promotion is a binary decision (promote or don't) | Verify no weighted splitting | P1 |
Status: Partial
What it does: Route to lowest-latency or cheapest provider for same model. General Router "latency" mode hardcodes to groq/llama-3.3-70b-versatile.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.7.1 | Latency mode selects a low-latency provider | Verify model selection via router | P1 |
| AC-2.8.1 | Cost mode selects cheapest capable provider | Compare pricing of selected vs alternatives | P1 |
Known Issues: No dynamic latency-optimal selection — latency mode hardcodes a specific model rather than measuring real-time latency. Deferred for post-release.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.9.1 | Traffic for same model is distributed across providers at configured ratios | Monitor provider selection distribution | Deferred (D-17) |
Status: Complete
What it does: Continuous monitoring at intervals by tier: Critical (5min), Popular (30min), Standard (2-4hr), On-Demand (when requested). Health checks verify availability and latency.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.1.1 |
GET /health always returns 200, even when dependencies are degraded |
Call when DB is down | P0 |
| AC-3.1.2 | Health response includes version, status, and timestamp
|
Inspect response | P0 |
| AC-3.1.3 |
GET /health/system returns memory, CPU, and connection pool stats |
Inspect response | P0 |
| AC-3.1.4 | Provider health scores are 0-100 per provider | GET /health/providers |
P0 |
| AC-3.1.5 | Model health shows healthy, degraded, or down per model |
GET /health/models |
P0 |
| AC-3.1.6 |
GET /health/quick is sub-millisecond (static response) |
Time the endpoint | P1 |
| AC-3.1.7 |
GET /health/railway returns comprehensive check (DB, Redis, providers) |
Inspect response | P1 |
| AC-3.1.8 | Gateway health dashboard returns HTML and JSON formats |
GET /health/gateways/dashboard and /data
|
P1 |
| AC-3.1.9 | Health insights provide actionable recommendations | GET /health/insights |
P2 |
| AC-3.1.10 | Background monitoring can be started and stopped |
POST /health/monitoring/start, /stop
|
P1 |
Code References:
-
src/services/intelligent_health_monitor.py— Tiered monitoring -
src/services/autonomous_monitor.py— Background monitoring -
src/routes/health.py— Health endpoints
Status: Complete
What it does: Every real inference request contributes health data as a background task — success/failure, latency, tokens, provider response codes. Zero overhead on request path.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.2.1 | Health data is captured after response is returned (no latency impact on user) | Verify background task execution | P0 |
| AC-3.2.2 | Captured data includes: latency, tokens, status, provider | Inspect health data store | P1 |
Status: Complete
What it does: Auto-creates incidents on health degradation. Severity levels, timestamps, captured logs, resolution tracking, MTTR calculation.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.3.1 | Downtime incidents can be listed with filters | GET /admin/downtime/incidents?status=ongoing |
P0 |
| AC-3.3.2 | Incidents can be resolved with notes | POST /admin/downtime/incidents/{id}/resolve |
P0 |
| AC-3.3.3 | Already-resolved incidents reject re-resolution | Attempt to resolve again | P1 |
| AC-3.3.4 | Incident analysis shows error patterns and type distribution | GET /admin/downtime/incidents/{id}/analysis |
P1 |
| AC-3.3.5 | MTTR statistics are computed | GET /admin/downtime/statistics |
P1 |
Code References:
-
src/routes/admin.py— Downtime tracking endpoints
Status: Partial
What it does: Hardcoded quality priors for ~20 models (task-specific: simple_qa, code_gen, reasoning, etc.). SWE-bench/HumanEval in Code Router.
What it does NOT do: Not stored in DB. Not updatable without code change. Missing MMLU, MATH, MT-Bench, LMSYS Arena ELO, LiveBench.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.4.1 | Code router tiers include SWE-bench and HumanEval scores | GET /code-router/tiers |
P0 |
| AC-3.4.2 | Model selector uses quality priors for task-specific routing | Verify model_selector.py quality maps |
P1 |
Known Issues: Quality data is static/hardcoded, not from DB. Missing several major benchmarks. No dynamic updating.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.5.1 | Per-customer success rates are tracked per model | Check customer-model analytics | Deferred (D-19) |
Status: Partial (OpenRouter only)
What it does: Tracks upstream provider credit balances. OpenRouter: full implementation with API call, 15-min cache, threshold alerts (critical $5, warning $20, info $50).
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.6.1 |
GET /api/provider-credits/balance returns credit balances for monitored providers |
Inspect response | P0 |
| AC-3.6.2 | OpenRouter balance is cached for 15 minutes | Check timing of two consecutive calls | P1 |
| AC-3.6.3 | Threshold alerts fire at critical ($5), warning ($20), info ($50) | Verify alert logic | P1 |
Code References:
-
src/services/provider_credit_monitor.py(lines 33-138) — OpenRouter implementation - Lines 165-167 — TODO stubs for all other providers
Known Issues:
- P1-1 (Delta Report): Only OpenRouter implemented. 29 other providers have TODO stubs. No preemptive deprioritization in failover chain.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.1.1 | Semantically similar prompts return cached responses (cosine similarity >0.95) | Test with paraphrased prompt | Deferred (D-8) |
Status: Not Implemented (Deferred — infrastructure exists but not wired)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.2.1 | Identical inference requests (same messages + model + params) return cached response | Send same request twice, compare latency | Deferred (D-9) |
Code References:
-
src/services/response_cache.py— SHA-256 hashing, Redis + in-memory fallback exists but NOT wired into inference path
Status: Partial (Ghost Feature — P0 issue)
What it does: Butter.dev proxy used for all requests. User preference endpoints exist but are ignored.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.3.1 | If Butter cache settings endpoints exist, user preference MUST be respected during inference | Set enable_butter_cache=false, verify Butter proxy is NOT used |
P0 (KNOWN BUG) |
| AC-4.3.2 | OR: Butter cache settings endpoints are removed entirely | Verify endpoints don't exist | P0 Alternative |
Code References:
-
src/routes/users.py(lines 305-408) —GET/PUT /user/cache-settingsexist, store preference -
src/routes/chat.py(line 697) — Always callsget_butter_pooled_async_client()without checking preference
Known Issues:
- P0-1 (Delta Report): Ghost feature. User can toggle a setting that does nothing. Trust-eroding.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.4.1 | Catalog endpoint responds in sub-100ms on cache hit | Time GET /v1/models on second request |
P0 |
| AC-4.4.2 | Auth cache reduces lookup latency from ~100ms to <5ms | Compare first vs second auth timing | P1 |
| AC-4.4.3 | When Redis is down, local memory cache activates — no requests blocked | Stop Redis, verify normal operation | P0 |
| AC-4.4.4 | Cache invalidation clears all layers |
POST /admin/cache/clear, verify fresh data |
P1 |
| AC-4.4.5 | Stampede protection prevents multiple simultaneous cache rebuilds | Concurrent requests to cold cache | P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.1.1 | Model sync can be triggered incrementally and fully |
POST /admin/model-sync/trigger and /all
|
P0 |
| AC-5.1.2 | If provider API is down, last synced catalog is served | Verify stale catalog on provider failure | P0 |
| AC-5.1.3 | Per-provider sync works | POST /admin/model-sync/provider/{slug} |
P1 |
| AC-5.1.4 | Full resync (delete + reimport) works | POST /admin/model-sync/full |
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.2.1 | Every model in GET /v1/models has id, name, provider_slug, context_length, and pricing |
Inspect response schema | P0 |
| AC-5.2.2 | No model has null or zero pricing for both prompt and completion | Scan all models in response | P0 (see 5.3) |
Status: Partial (gating not enforced at sync)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.3.1 | Models without pricing are rejected during sync (not visible to users) | Check catalog for models with null pricing | P1 (KNOWN BUG) |
| AC-5.3.2 |
GET /v1/models/unique returns no duplicate model IDs |
Check for uniqueness | P0 |
| AC-5.3.3 | High-value models without explicit pricing are BLOCKED, not served at default rate | Verify pricing guard for GPT-4, Claude, Gemini | P0 |
Code References:
-
src/services/model_catalog_sync.py—extract_pricing()(lines 136-153) returns all None for missing pricing. Line 368 checksif any(pricing.values())but is non-blocking — models ARE synced without pricing. -
src/services/pricing.py(lines 783-839) —HIGH_VALUE_MODEL_PATTERNSguard raises ValueError on default pricing fallback
Known Issues:
- P1-3 (Delta Report): Models without pricing are synced into the catalog. Non-high-value models without pricing fall to default ($0.00002/token) — potential under-billing.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.4.1 | Model detail returns HuggingFace data (downloads, likes, parameters) when available | GET /api/models/detail?model_id=meta-llama/... |
P1 |
| AC-5.4.2 | HuggingFace data is cached with TTL | Verify caching on repeated requests | P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.5.1 |
GET /v1/models?provider=fireworks returns only Fireworks models |
Filter and verify | P0 |
| AC-5.5.2 |
GET /v1/models/search?q=llama returns matching models |
Verify results | P0 |
| AC-5.5.3 |
GET /v1/models/trending returns models ranked by usage |
Inspect response | P1 |
| AC-5.5.4 |
GET /v1/gateways returns all gateways with name, color, priority, site_url |
Inspect response | P0 |
| AC-5.5.5 | Model comparison works across providers | GET /v1/models/{provider}/{model}/compare |
P1 |
Status: Complete (with atomicity concern on legacy path)
What it does: Atomic billing unit. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Pre-flight checks, idempotent deductions (UNIQUE constraint + RPC), subscription allowance consumed first, auto-refund on provider errors.
What it does NOT do: No real-time credit streaming during generation. No credit expiration. No rollover. No credit transfers. No multi-currency.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.1.1 | Pre-flight check: user with 0 credits receives 402 BEFORE any provider call | POST with 0-credit user, verify no upstream call | P0 |
| AC-6.1.2 | Idempotent deduction: same request ID sent twice deducts credits only once | POST twice with same X-Request-ID
|
P0 |
| AC-6.1.3 | Subscription allowance consumed before purchased credits | User with both: make request, verify subscription decreases first | P0 |
| AC-6.1.4 | Provider 5xx error → automatic credit refund | Trigger 5xx, verify refund in credit_transactions
|
P0 |
| AC-6.1.5 | Provider timeout → automatic credit refund | Trigger timeout, verify refund | P0 |
| AC-6.1.6 | Provider 4xx error (user error) → NO refund | Trigger 4xx, verify no refund | P0 |
| AC-6.1.7 | High-value models (GPT-4, Claude, Gemini, o1/o3/o4) blocked if pricing falls to default | Verify pricing guard fires for each pattern | P0 |
| AC-6.1.8 | Credit transactions logged with request_id, user_id, model, token counts, cost | Check credit_transactions table |
P0 |
| AC-6.1.9 | Balance update and transaction log happen atomically (single DB transaction via RPC) | Verify atomic_deduct_credits RPC is used |
P0 |
| AC-6.1.10 | Legacy fallback path either doesn't exist or handles transaction logging failure safely | Verify legacy path behavior on logging failure | P0 (KNOWN RISK) |
| AC-6.1.11 | Credit transaction history is paginated | GET /credits/transactions?limit=10 |
P1 |
| AC-6.1.12 | Admin can add/adjust/refund credits |
POST /credits/add, /adjust, /refund
|
P1 |
| AC-6.1.13 | Daily usage cap prevents runaway costs | Exceed daily limit, verify 402 | P1 |
| AC-6.1.14 |
request_id has UNIQUE constraint in DB (belt-and-suspenders idempotency) |
Check migration 20260223000001_add_request_id_to_credit_transactions.sql
|
P0 |
Code References:
-
src/db/users.py(lines 701-1106) — Credit deduction- Atomic RPC path (lines 862-967) — Correct
- Legacy fallback path (lines 987-1096) — Risk: two separate calls, if logging fails credits already deducted (lines 1077-1082)
-
src/services/pricing.py(lines 783-839) —HIGH_VALUE_MODEL_PATTERNS -
src/routes/chat.py(lines 1670-1742) — Auto-refund logic
Known Issues:
- P0-2: Legacy fallback path may create orphaned deductions (balance reduced, no transaction record)
- P0-3: Pricing guard needs end-to-end verification — must fire BEFORE provider call
- P0-4: Auto-refund needs integration testing for edge cases (partial stream, refund failure)
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.2.1 | New user gets $5 credits and trial expiring in 3 days | Register, check balance + trial_end
|
P0 (config mismatch) |
| AC-6.2.2 | Trial user can make requests until credits/limits exhausted | Make requests during trial | P0 |
| AC-6.2.3 | Expired trial returns 402 for paid models | POST after trial expiry | P0 |
| AC-6.2.4 | Expired trial CAN access :free suffix models |
POST with :free model after expiry |
P0 |
| AC-6.2.5 |
GET /plans returns available plan tiers with pricing |
Inspect response | P0 |
| AC-6.2.6 |
GET /trial/status returns active/expired and days remaining |
Check response | P0 |
| AC-6.2.7 | Unused subscription allowance does NOT roll over (resets monthly) | Verify at month boundary | P1 |
| AC-6.2.8 | Purchased credits never expire and survive plan changes | Change plan, verify credits persist | P1 |
| AC-6.2.9 | Daily trial limit ($1/day) is enforced | Exceed $1 in trial, verify blocking | P0 |
Known Issues:
- P0-7 (Delta Report): Trial config mismatch — CLAUDE.md says $5, wiki says $10, code says $5. Must reconcile.
-
src/config/usage_limits.py— Trial: $5, 3 days, $1/day -
src/db/trials.py(line 44) — Formulatrial_days * 5suggests variable durations
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.3.1 | User can view activity stats (total requests/tokens/spend by model/provider) | GET /user/activity/stats |
P0 |
| AC-6.3.2 | Activity log is paginated (limit 1-1000) | GET /user/activity/log?limit=50 |
P0 |
| AC-6.3.3 | Activity log total field returns actual DB total, not page count |
Verify total vs count
|
P1 (KNOWN BUG) |
| AC-6.3.4 | Per-API-key usage breakdown is available | GET /user/api-keys/{key_id}/usage |
P2 (NOT IMPLEMENTED) |
| AC-6.3.5 | CSV/JSON export is available | GET /user/usage/export?format=csv |
P2 (NOT IMPLEMENTED) |
Known Issues:
-
P1-4 (Delta Report):
src/routes/users.py(line 515) —"total": len(transactions)returns page count, not DB total -
P2-1:
activity_logstoresuser_idbut NOTapi_key_id— no per-key breakdown
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.4.1 | Outbound webhook delivery for credits.low, credits.depleted, model.degraded events |
Configure webhook, trigger events | Deferred (D-10) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.5.1 | Per-tier SLA violations are detected with auto credit-back compensation | Monitor SLA metrics | Deferred (D-14) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.1.1 | Template library with versioning | CRUD on prompt templates | Deferred (D-12) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.2.1 |
POST /v1/batch/jobs submits bulk workloads |
Submit batch job | Deferred (D-11) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.3.1 | Side-by-side model comparison for same prompt | Compare endpoint | Deferred (D-13) |
Status: Not Implemented (Deferred — frontend-coupled)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.4.1 | Interactive prompt testing UI | Access playground | Deferred |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.1.1 |
GET /metrics returns valid Prometheus text format |
Parse response | P0 |
| AC-8.1.2 | OpenMetrics format with exemplar support is available via content negotiation | Accept: application/openmetrics-text |
P1 |
| AC-8.1.3 | Parsed metrics include p50, p95, p99 latency percentiles | GET /api/metrics/parsed |
P0 |
| AC-8.1.4 | Real-time stats update within 60 seconds of new requests | GET /api/monitoring/stats/realtime |
P1 |
| AC-8.1.5 | Error rates tracked per provider and per model | GET /api/monitoring/error-rates |
P1 |
| AC-8.1.6 | Anomaly detection flags unusual patterns | GET /api/monitoring/anomalies |
P1 |
| AC-8.1.7 | Grafana SimpleJSON datasource protocol fully implemented |
GET /prometheus/datasource (test), POST /search, /query
|
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.2.1 | OpenTelemetry traces are initialized and exportable | GET /api/instrumentation/health |
P0 |
| AC-8.2.2 | Every request gets a trace ID linking middleware → auth → routing → provider → billing | Inspect trace in Tempo | P1 |
| AC-8.2.3 | Exemplar linking from metrics to traces works | Verify in Grafana | P2 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.3.1 | Autonomous error monitor status is retrievable | GET /error-monitor/autonomous/status |
P0 |
| AC-8.3.2 | Dashboard provides error landscape overview | GET /error-monitor/dashboard |
P0 |
| AC-8.3.3 | Recent errors sorted by recency | GET /error-monitor/errors/recent |
P0 |
| AC-8.3.4 | Critical errors flagged separately | GET /error-monitor/errors/critical |
P0 |
| AC-8.3.5 | Error patterns detect recurring issues | GET /error-monitor/errors/patterns |
P1 |
| AC-8.3.6 | AI fix suggestions generated via Claude | POST /error-monitor/fixes/generate-for-error |
P2 |
Note: All error monitor endpoints require NO auth (all public). Error patterns are in-memory only — lost on restart.
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.4.1 | Arize Phoenix config exists and is functional | Check Arize initialization | P2 |
| AC-8.4.2 | OpenTelemetry captures inference metadata (model, tokens, latency) | Inspect trace attributes | P1 |
Known Issues: Arize Phoenix not exposed via API. Braintrust not integrated. No prompt/response pair recording.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.5.1 | Pyroscope profiling tags cache/Redis layers with operation context | Verify tag presence in Pyroscope | P1 |
| AC-8.5.2 | Profiling does not add measurable latency to requests | Compare request times with/without profiling | P1 |
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.6.1 | User can view their own usage dashboard data |
GET /user/activity/stats, GET /user/monitor
|
P0 |
| AC-8.6.2 | Model health status visible to users | GET /v1/model-health |
P0 |
| AC-8.6.3 | Public status page with provider/model availability |
GET /v1/status/, GET /v1/status/providers
|
P0 |
| AC-8.6.4 | Latency percentiles exposed to customers | GET /user/latency?model=... |
P2 (NOT IMPLEMENTED) |
Status: Complete
What it does: POST /v1/chat/completions — full drop-in replacement. Streaming SSE, tool/function calling, JSON mode, logprobs. Any OpenAI SDK app works by changing base URL.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-9.1.1 | Non-streaming returns 200 with choices[0].message.content, usage.prompt_tokens, usage.completion_tokens
|
POST with stream: false
|
P0 |
| AC-9.1.2 | Streaming returns SSE where each line starts with data: , ends with data: [DONE]
|
POST with stream: true
|
P0 |
| AC-9.1.3 |
response_format: {"type": "json_object"} returns valid parseable JSON |
POST with JSON mode | P0 |
| AC-9.1.4 |
tools array returns tool_calls when model decides to call a tool |
POST with tool definitions | P0 |
| AC-9.1.5 |
logprobs: true returns a logprobs field |
POST with logprobs | P1 |
| AC-9.1.6 | OpenAI Python SDK works with zero changes beyond base_url and api_key
|
openai.OpenAI(base_url="$BASE/v1") |
P0 |
| AC-9.1.7 | All inference errors use OpenAI-compatible format: {"error": {"message": "...", "type": "...", "code": "..."}}
|
Trigger errors, inspect format | P1 (KNOWN ISSUE) |
| AC-9.1.8 | Unauthenticated request with whitelisted model returns 200 | POST without auth header | P0 |
| AC-9.1.9 | Unauthenticated request with non-whitelisted model returns 401/403 | POST without auth header | P0 |
| AC-9.1.10 | Streaming normalization handles OpenAI, Gemini, Anthropic, Fireworks formats | Test stream from each provider type | P0 |
| AC-9.1.11 | Unrecognized streaming format logs a warning (not silently dropped) | Check logs for dropped chunks | P1 (KNOWN BUG) |
Known Issues:
-
P1-2: ~5% of errors use FastAPI default
{"detail": "..."}instead of OpenAI format — breaks SDK error handling -
P1-8: Stream normalizer returns
Nonefor unrecognized chunks (silently dropped, no warning)
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-9.2.1 | Non-streaming returns 200 with content[0].text, usage.input_tokens, usage.output_tokens in Anthropic format |
POST /v1/messages
|
P0 |
| AC-9.2.2 | Streaming returns SSE in Anthropic format (message_start, content_block_delta, message_stop) |
POST with stream: true
|
P0 |
| AC-9.2.3 | Credits deducted using Anthropic token counts | Compare balance before/after | P0 |
| AC-9.2.4 | Anthropic Python SDK works with zero changes beyond base_url and api_key
|
anthropic.Anthropic(base_url="$BASE/v1") |
P0 |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-10.1.1 | Requests routed to nearest provider region for lowest latency | Test from different regions | Deferred (D-15) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-10.2.1 | EU customers' requests routed to EU-based providers | Test with EU IP | Deferred (D-16) |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-10.3.1 | Vercel serverless deployment works via api/index.py
|
Deploy to Vercel | P0 |
| AC-10.3.2 | Railway/Docker deployment works via start.sh
|
Deploy to Railway | P0 |
| AC-10.3.3 | Dev server starts with python src/main.py or uvicorn src.main:app --reload
|
Start locally | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.1.1 |
GET /api/stripe/credit-packages returns available packages (public, no auth) |
Inspect response | P0 |
| AC-CC.1.2 |
POST /api/stripe/checkout-session returns valid Stripe checkout URL |
Create session | P0 |
| AC-CC.1.3 | Successful payment webhook adds credits to user's balance | Simulate payment_intent.succeeded webhook |
P0 |
| AC-CC.1.4 | Webhook endpoint ALWAYS returns 200, even if processing fails | Send malformed webhook | P0 |
| AC-CC.1.5 | Payment history is paginated with amount, date, status | GET /api/stripe/payments |
P0 |
| AC-CC.1.6 | Subscription checkout creates Stripe subscription and assigns plan | POST /api/stripe/subscription-checkout |
P0 |
| AC-CC.1.7 | Subscription upgrade/downgrade/cancel work | Test each operation | P1 |
| AC-CC.1.8 | Webhook handles all events: payment_intent.succeeded, charge.succeeded, invoice.paid, customer.subscription.created
|
Test each event type | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.2.1 | Valid coupon redeems and adds correct credit amount | POST /coupons/redeem |
P0 |
| AC-CC.2.2 | Expired coupon returns 400 | Redeem expired code | P0 |
| AC-CC.2.3 | Already-redeemed coupon (same user) returns 400 | Redeem twice | P0 |
| AC-CC.2.4 | User-specific coupon redeemed by wrong user returns 400/403 | Redeem with different user | P0 |
| AC-CC.2.5 |
GET /coupons/available returns global + user-targeted coupons |
Inspect response | P1 |
| AC-CC.2.6 | Redemption history shows past redemptions | GET /coupons/history |
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.3.1 | User generates unique referral code | POST /referral/generate |
P0 |
| AC-CC.3.2 | Referral code validates successfully | POST /referral/validate |
P0 |
| AC-CC.3.3 | Self-referral is prevented | Attempt self-referral | P0 |
| AC-CC.3.4 | Referral stats show total referred, conversions, rewards | GET /referral/stats |
P1 |
| AC-CC.3.5 | Successful referral grants $10 credits to both parties on first $10+ purchase | Complete referral flow | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.4.1 | Sessions can be created, listed, updated, deleted | CRUD on /v1/chat/sessions/*
|
P0 |
| AC-CC.4.2 | Messages can be saved individually and in batch | POST single and batch | P0 |
| AC-CC.4.3 | Full-text search returns matching sessions | POST /v1/chat/search |
P0 |
| AC-CC.4.4 | Duplicate messages are deduplicated | Save same message twice, verify single entry | P1 |
| AC-CC.4.5 | Chat stats return accurate usage data | GET /v1/chat/stats |
P1 |
| AC-CC.4.6 | Share links provide public read-only access | Create share, access without auth | P1 |
| AC-CC.4.7 | Feedback CRUD (create, read, update, delete) works per session | CRUD on /v1/chat/feedback/*
|
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.5.1 | Created key is in gw_{env}_* format |
POST /user/api-keys |
P0 |
| AC-CC.5.2 | Key creation rate-limited to 10 per hour; 11th returns 429 | Create 11 keys | P0 |
| AC-CC.5.3 | Keys can be listed showing all active keys | GET /user/api-keys |
P0 |
| AC-CC.5.4 | Keys can be updated (name, restrictions) | PUT /user/api-keys/{key_id} |
P0 |
| AC-CC.5.5 | Keys can be deleted | DELETE /user/api-keys/{key_id} |
P0 |
| AC-CC.5.6 | Deleted key no longer authenticates (returns 401) | Use deleted key | P0 |
| AC-CC.5.7 | Audit logs record key creation, usage, deletion | GET /user/api-keys/audit-logs |
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.6.1 |
POST /v1/images/generations returns 200 with image data or URL |
POST with prompt | P0 |
| AC-CC.6.2 | Credits deducted based on image generation pricing | Compare balance before/after | P0 |
| AC-CC.6.3 | 0-credit user receives 402 | POST with 0-credit user | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.7.1 | File upload transcription returns 200 with text | POST with audio file | P0 |
| AC-CC.7.2 | Base64 transcription returns 200 | POST /v1/audio/transcriptions/base64 |
P0 |
| AC-CC.7.3 | Unsupported format returns appropriate error | POST with invalid format | P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.8.1 |
GET /v1/tools returns available tools (web_search, text_to_speech) |
Inspect response | P0 |
| AC-CC.8.2 | Tool definitions in OpenAI function-calling format | GET /v1/tools/definitions |
P0 |
| AC-CC.8.3 | Nonexistent tool returns 404 | GET /v1/tools/fake_tool |
P0 |
| AC-CC.8.4 | Web search execution returns results |
POST /v1/tools/execute with web_search |
P0 |
| AC-CC.8.5 | SSRF protection blocks internal/private IP ranges | Attempt internal URL in tool execution | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.9.1 | Partner config is publicly accessible | GET /partner-trials/config/{code} |
P0 |
| AC-CC.9.2 | Partner code check always returns 200 (valid/invalid in body) | GET /partner-trials/check/{code} |
P0 |
| AC-CC.9.3 | Starting partner trial applies partner-specific credits and limits |
POST /partner-trials/start with known partner code |
P0 |
| AC-CC.9.4 | Partner trial daily limit is enforced | Exceed daily limit | P0 |
| AC-CC.9.5 | Partner trial config is cached (5-min in-memory) | Check timing | P1 |
Status: Complete (partial test coverage)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.10.1 | User can retrieve notification preferences | GET /user/notifications/preferences |
P0 |
| AC-CC.10.2 | Usage report can be triggered on demand | POST /user/notifications/send-usage-report |
P0 |
| AC-CC.10.3 | Test notification sends successfully | POST /user/notifications/test |
P0 |
| AC-CC.10.4 | Notification failure does not crash the system | Disable Resend, verify graceful handling | P0 |
| AC-CC.10.5 | Retry logic on notification delivery failure | Verify 2-3 retries with backoff | P2 (NOT IMPLEMENTED) |
Known Issues:
- P2-4 (Delta Report): No retry logic, no persistent delivery tracking. On failure: logs error, returns False, continues silently.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.11.1 | Non-admin users receive 403 on ALL admin endpoints | Use user key on admin endpoint | P0 |
| AC-CC.11.2 | Admin can list, search, view user details |
GET /admin/users, /admin/users/{id}
|
P0 |
| AC-CC.11.3 | Admin credit grants respect per-transaction cap and 24h daily limit | Exceed limits | P0 |
| AC-CC.11.4 | Admin can assign plans | POST /admin/assign-plan |
P0 |
| AC-CC.11.5 | System monitor returns user counts, credit totals, API usage | GET /admin/monitor |
P0 |
| AC-CC.11.6 | Cache operations work (status, refresh, clear) | GET/POST cache endpoints | P1 |
| AC-CC.11.7 | Model sync can be triggered | POST /admin/model-sync/trigger |
P1 |
| AC-CC.11.8 |
GET /admin/model-sync/providers requires admin auth |
Verify auth enforcement | P0 (KNOWN RISK) |
| AC-CC.11.9 | Bulk user delete by domain respects protected domains (gmail, yahoo, outlook) | Attempt protected domain delete | P0 |
| AC-CC.11.10 | Bulk user delete defaults to dry_run=true | Verify default behavior | P0 |
Known Issues:
-
P0-6 (Delta Report):
GET /admin/model-sync/providersdocumented as "No auth enforced" — leaks infrastructure details (33 providers).
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.12.1 | API keys are Fernet-encrypted in DB | Query DB directly | P0 |
| AC-CC.12.2 | API key lookup uses HMAC hash, not brute-force decryption | Verify code path | P0 |
| AC-CC.12.3 | SQL injection attempts are sanitized/rejected |
'; DROP TABLE users; -- in inputs |
P0 |
| AC-CC.12.4 | XSS payloads are sanitized/rejected |
<script>alert(1)</script> in inputs |
P0 |
| AC-CC.12.5 | Command injection blocked |
; rm -rf / in inputs |
P0 |
| AC-CC.12.6 | Path traversal blocked |
../../etc/passwd in inputs |
P0 |
| AC-CC.12.7 | Error messages never expose stack traces, internal paths, or sensitive data | Trigger errors, inspect responses | P0 |
| AC-CC.12.8 | Admin security violations logged in audit trail | Attempt unauthorized admin access | P0 |
| AC-CC.12.9 | Temporary/disposable email domains detected during registration | Register with user@tempmail.com
|
P1 |
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.13.1 | REST path function calling works (OpenAI tools → Vertex functionDeclarations) | POST with tools to Vertex model via REST | P0 |
| AC-CC.13.2 | SDK path function calling either works OR is avoided when tools present | POST with tools via SDK path | P1 (KNOWN BUG) |
| AC-CC.13.3 | Tool choice options (auto, required, none) are translated correctly | Test each tool_choice value | P1 |
Code References:
-
src/services/google_vertex_client.py(lines 250-402, 662-707) — REST path implemented - Lines 585-587 — SDK path has TODO: "Function calling may not work correctly"
Known Issues:
- P1-5 (Delta Report): SDK path has TODO. If SDK path is used when tools are present, function calling silently fails.
| Layer | Feature | Criteria | Status | Known Issues |
|---|---|---|---|---|
| 1 | API Key Auth | 10 | Complete | — |
| 1 | RBAC | 6 | Complete | — |
| 1 | IP Allowlists | 6 | Complete | — |
| 1 | Domain Restrictions | 4 | Complete | — |
| 1 | Three-Layer Rate Limiting | 14 | Complete | P0-5: Missing headers on L2/L3 |
| 1 | Input Guardrails (4 features) | 4 | Not Implemented | Deferred |
| 1 | Output Guardrails (3 features) | 3 | Not Implemented | Deferred |
| 2 | Model Resolution | 7 | Complete | — |
| 2 | General Router | 7 | Complete | — |
| 2 | Code Router | 7 | Complete | — |
| 2 | Provider Failover | 9 | Complete | — |
| 2 | Circuit Breakers | 10 | Complete | P1-7: Timing discrepancy (60s vs 5min) |
| 2 | Health-Weighted LB | 2 | Partial | — |
| 2 | Latency/Cost Optimal | 2 | Partial | Hardcoded latency model |
| 2 | Traffic Splitting | 1 | Not Implemented | Deferred |
| 3 | Tiered Health Monitoring | 10 | Complete | — |
| 3 | Passive Health Capture | 2 | Complete | — |
| 3 | Incident Management | 5 | Complete | — |
| 3 | Model Quality Scoring | 2 | Partial | Static/hardcoded |
| 3 | Per-Customer Quality | 1 | Not Implemented | Deferred |
| 3 | Provider Credit Monitoring | 3 | Partial | P1-1: OpenRouter only |
| 4 | Semantic Cache | 1 | Not Implemented | Deferred |
| 4 | Exact-Match Cache | 1 | Not Implemented | Deferred (infra exists) |
| 4 | Butter.dev Cache | 2 | Partial | P0-1: Ghost feature |
| 4 | Supporting Caches | 5 | Complete | — |
| 5 | Background Model Sync | 4 | Complete | — |
| 5 | Model Metadata Standard | 2 | Complete | — |
| 5 | Catalog Inclusion | 3 | Partial | P1-3: No gating at sync |
| 5 | HuggingFace Enrichment | 2 | Complete | — |
| 5 | Model Discovery & Search | 5 | Complete | — |
| 6 | Credit System | 14 | Complete | P0-2/3/4: Atomicity, pricing guard, refund |
| 6 | Plans & Tiers | 9 | Complete | P0-7: Config mismatch |
| 6 | Customer Usage Analytics | 5 | Partial | P1-4: Pagination bug, P2-1/2: Per-key, export |
| 6 | Customer Webhooks | 1 | Not Implemented | Deferred |
| 6 | SLA Tracking | 1 | Not Implemented | Deferred |
| 7 | Prompt Management | 1 | Not Implemented | Deferred |
| 7 | Batch/Async Inference | 1 | Not Implemented | Deferred |
| 7 | Evaluation & Testing | 1 | Not Implemented | Deferred |
| 7 | Playground | 1 | Not Implemented | Deferred |
| 8 | Metrics & Dashboards | 7 | Complete | — |
| 8 | Distributed Tracing | 3 | Complete | — |
| 8 | Error Tracking | 6 | Complete | — |
| 8 | AI-Specific Tracing | 2 | Partial | Arize/Braintrust gaps |
| 8 | Profiling | 2 | Complete | — |
| 8 | Customer Observability | 4 | Partial | P2-3: No latency API |
| 9 | OpenAI-Compatible API | 11 | Complete | P1-2: Error format, P1-8: Stream drops |
| 9 | Anthropic-Compatible API | 4 | Complete | — |
| 10 | Multi-Region Routing | 1 | Not Implemented | Deferred |
| 10 | Data Residency | 1 | Not Implemented | Deferred |
| 10 | Multi-Target Deployment | 3 | Complete | — |
| CC | Stripe Payments | 8 | Complete | — |
| CC | Coupons | 6 | Complete | — |
| CC | Referrals | 5 | Complete | — |
| CC | Chat History | 7 | Complete | — |
| CC | API Key Management | 7 | Complete | — |
| CC | Image Generation | 3 | Complete | — |
| CC | Audio Transcription | 3 | Complete | — |
| CC | Server-Side Tools | 5 | Complete | — |
| CC | Partner Trials | 5 | Complete | — |
| CC | Notifications | 5 | Complete | P2-4: No retry/delivery tracking |
| CC | Admin Operations | 10 | Complete | P0-6: Model-sync providers auth |
| CC | Security | 9 | Complete | — |
| CC | Google Vertex FC | 3 | Partial | P1-5: SDK path TODO |
| TOTAL | 323 |
| Priority | Count | Description |
|---|---|---|
| P0 | 7 bugs across 46 criteria | Ghost features, billing atomicity, pricing guard, refund verification, rate limit headers, admin auth, trial config |
| P1 | 8 bugs across 28 criteria | Provider monitoring, error format, catalog gating, pagination, Vertex FC, overage, circuit breaker timing, stream normalization |
| P2 | 4 gaps across 12 criteria | Per-key usage, export, latency API, notification delivery |
| Deferred | 20 features, 24 criteria | Guardrails, caching, webhooks, batch, prompts, eval, SLA, geo-routing, GDPR, traffic splitting |
Source: Conceptual Model Features | Features | Delta Report | Testing Plan | Acceptance Criteria
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References