-
Notifications
You must be signed in to change notification settings - Fork 1
Features Acceptance Criteria
Detailed acceptance criteria for every Gatewayz feature, organized by the Conceptual Model's 10-layer architecture.
Each feature includes: what it must do, what it must NOT do, detailed acceptance criteria with verification methods, code-level references, known issues, and implementation status.
Derived from: Conceptual Model Features, Features, Delta Report, Testing Plan, Test Coverage Audit
Last Updated: 2026-03-09 | Version: 2.0.4
Each feature section includes:
- Description: What the feature does and its boundaries (from Conceptual Model)
- Implementation Status: Current state (Complete / Partial / Not Implemented)
- Acceptance Criteria: Numbered, testable statements — a feature is accepted when ALL criteria pass
- Code References: File paths and line numbers for verification
- Known Issues: Bugs, gaps, or discrepancies found during code investigation
- Priority: P0 (must fix before release), P1 (should fix), P2 (nice to have), Deferred (post-release)
Status: Complete
What it does: Authenticates every API request using API keys encrypted at rest with Fernet AES-128. Keys are looked up via HMAC-SHA256 hash for O(log n) retrieval. Validates that keys are active, not expired, and not rate-limited.
What it does NOT do: No OAuth/JWT for API requests. No automatic key rotation. No multi-key auth per request.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.1.1 | Valid API key in Authorization: Bearer gw_* header returns 200 |
Send request with valid key | P0 |
| AC-1.1.2 | Invalid API key returns 401, never 200 or 500 | Send request with Bearer invalid_key
|
P0 |
| AC-1.1.3 | Expired API key returns 401 with clear message | Use a key past its expires_at
|
P0 |
| AC-1.1.4 | Deactivated API key (is_active=false) returns 401 |
Deactivate key, then use it | P0 |
| AC-1.1.5 | API keys in DB are Fernet-encrypted ciphertext, never plaintext | Query api_keys_new table directly, verify encrypted_key column is ciphertext |
P0 |
| AC-1.1.6 | Key lookup uses HMAC-SHA256 hash index, not brute-force decryption of all keys | Verify key_hash column is indexed, lookup is O(log n) by timing with 1 key vs 1000 keys |
P0 |
| AC-1.1.7 | Key format is gw_{env}_{43_random_chars} (e.g., gw_live_abc123...) |
Create new key, verify format regex | P0 |
| AC-1.1.8 | Key creation stores last4 characters for user-friendly identification |
Create key, check last4 field in response and DB |
P1 |
| AC-1.1.9 | Authentication is cached (5-min TTL, 512-entry LRU) — second request with same key is faster | Time two consecutive auth calls, second should be <5ms vs 50-150ms | P1 |
| AC-1.1.10 | When Redis is down, auth cache falls back to local memory — requests are never blocked | Stop Redis, verify auth still works | P0 |
Code References:
-
src/security/security.py— Fernet encryption, HMAC hashing -
src/security/deps.py—get_api_key(),get_current_user(),validate_api_key_security() -
src/db/api_keys.py— Key CRUD, key lookup by hash
Status: Complete
What it does: Assigns roles (admin, team, dev, free) to users. Permissions checked at dependency-injection level before route handlers execute. Role changes are audit-logged.
What it does NOT do: No granular per-model permissions. No custom roles. No team-level RBAC. No provider-level permissions.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.2.1 | Non-admin API key returns 403 on ALL /admin/* endpoints |
GET /admin/users with user key |
P0 |
| AC-1.2.2 | Admin API key returns 200 on admin endpoints |
GET /admin/users with admin key |
P0 |
| AC-1.2.3 | Unauthorized admin access attempts are logged via audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS")
|
Attempt admin access with user key, check audit log | P0 |
| AC-1.2.4 | Role updates require admin auth and are logged with a reason |
POST /admin/roles/update with user_id, new_role, reason |
P0 |
| AC-1.2.5 |
GET /admin/roles/permissions/{role} returns the correct permission set for each role |
Check all 4 roles | P1 |
| AC-1.2.6 | Role change audit log is retrievable at GET /admin/roles/audit/log
|
Verify entries with timestamps and reasons | P1 |
Code References:
-
src/security/deps.py—require_admindependency -
src/routes/admin.py— Admin route handlers -
src/db/roles.py— Role management
Status: Complete
What it does: Restricts API key usage to specific IP addresses or CIDR ranges. Requests from non-allowlisted IPs are rejected before processing.
What it does NOT do: No geo-based restrictions. No IPv6 ranges. No automatic IP suggestions.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.3.1 | Admin can create IP allowlist entries with IPv4 addresses |
POST /api/admin/ip-whitelist with {"ip": "1.2.3.4"}
|
P0 |
| AC-1.3.2 | Admin can create IP allowlist entries with CIDR notation |
POST /api/admin/ip-whitelist with {"ip": "10.0.0.0/24"}
|
P0 |
| AC-1.3.3 | API key with allowlist rejects requests from non-allowed IPs with 403 | Use key from IP not in allowlist | P0 |
| AC-1.3.4 | API key with allowlist accepts requests from allowed IPs | Use key from allowlisted IP | P0 |
| AC-1.3.5 |
POST /api/admin/ip-whitelist/check correctly reports allowed vs blocked IPs |
Test with both allowed and blocked IPs | P1 |
| AC-1.3.6 | Allowlist entries can be listed, updated, and deleted | CRUD operations on /api/admin/ip-whitelist/*
|
P1 |
Code References:
-
src/routes/admin.py— IP allowlist endpoints -
src/security/deps.py— IP validation invalidate_api_key_security()
Status: Complete
What it does: Limits which HTTP referrer domains can use a specific API key. Prevents stolen keys from being used on unauthorized domains.
What it does NOT do: No domain ownership validation. No subdomain wildcards. No server-side restriction (only applies when Referer header present).
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.4.1 | API key with domain restriction rejects requests with wrong Referer header | Send request with Referer: https://unauthorized.com
|
P0 |
| AC-1.4.2 | API key with domain restriction accepts requests with correct Referer | Send request with configured Referer domain | P0 |
| AC-1.4.3 | Requests without Referer header bypass domain restriction (server-side usage) | Send request without Referer header | P0 |
| AC-1.4.4 | Multiple domains can be configured per key | Configure 3 domains, verify all work | P1 |
Code References:
-
src/security/deps.py—validate_api_key_security()domain check
Status: Complete (with known header gap on Layers 2 and 3)
What it does:
- Layer 1 (IP): Security middleware with behavioral analysis, velocity detection. 300 RPM for unauthenticated, authenticated users exempt.
- Layer 2 (API Key): Redis-backed per-key limits tied to plan tier.
- Layer 3 (Anonymous): Stricter limits for unauthenticated requests.
- Fallback: In-memory rate limiter when Redis is unavailable.
What it does NOT do: No per-model rate limits. No burst/token-bucket. No cross-instance IP state sharing. Rejected requests consume zero credits.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.5.1 | Unauthenticated requests exceeding 300 RPM from same IP receive 429 | Send 301 requests from one IP | P0 |
| AC-1.5.2 | Authenticated users are exempt from IP-level rate limiting | Verify no IP block on auth requests | P0 |
| AC-1.5.3 | API key exceeding plan RPM receives 429 | Exceed per-key limit | P0 |
| AC-1.5.4 | Anonymous rate limits are stricter than authenticated limits | Compare thresholds for anon vs auth | P0 |
| AC-1.5.5 |
Layer 1 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Reason, X-RateLimit-Mode
|
Trigger Layer 1 429, inspect headers | P0 |
| AC-1.5.6 |
Layer 2 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
|
Trigger Layer 2 429, inspect headers | P0 (KNOWN BUG) |
| AC-1.5.7 |
Layer 3 429 response includes Retry-After and X-RateLimit-* headers |
Trigger Layer 3 429, inspect headers | P0 (KNOWN BUG) |
| AC-1.5.8 | When Redis is down, rate limiting continues via in-memory fallback — requests are never blocked | Stop Redis, verify rate limiting works | P0 |
| AC-1.5.9 | Velocity mode activates when error rate exceeds 25% and reduces limits to 50% | Trigger >25% error rate, check GET /velocity-mode-status
|
P0 |
| AC-1.5.10 | Velocity mode deactivates after 3 minutes of normal error rates | Wait for cooldown, verify normal limits restored | P1 |
| AC-1.5.11 | Rate limit configuration viewable at GET /user/rate-limits
|
Check response format | P1 |
| AC-1.5.12 | Per-key rate limits updatable via PUT /user/rate-limits/{key_id}
|
Update and verify enforcement | P1 |
| AC-1.5.13 | Auth endpoint rate-limits to 10 requests per 15 minutes per IP |
POST /auth 11 times, 11th returns 429 |
P0 |
| AC-1.5.14 | Registration rate-limits to 3 requests per hour per IP |
POST /auth/register 4 times, 4th returns 429 |
P0 |
Code References:
-
src/middleware/security_middleware.py(lines 647-716) — Layer 1, headers present -
src/services/rate_limiting.py(lines 78-94) — Layer 2,RateLimitResultdataclass has fields but NOT converted to HTTP headers -
src/services/anonymous_rate_limiter.py— Layer 3, NO headers -
src/services/rate_limiting_fallback.py— In-memory fallback
Known Issues:
-
P0-5 (Delta Report): Layer 2
RateLimitResultfields exist but are not converted to HTTP response headers. Layer 3 has no rate limit headers at all. Clients get bare 429 rejections with no retry information.
Status: Not Implemented (Deferred)
What these would do: PII scanning (phone, SSN, email, credit card), prompt injection pattern detection, per-key topic restrictions, content moderation via external classifiers.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.6.1 | PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers | Send prompt with PII | Deferred |
| AC-1.7.1 | Prompt injection patterns that attempt to override system prompts are detected and blocked | Send known injection pattern | Deferred |
| AC-1.8.1 | Per-API-key topic restrictions limit responses to configured domains | Configure restriction, test out-of-domain | Deferred |
| AC-1.9.1 | Content moderation blocks harmful inputs before reaching providers | Send harmful content | Deferred |
Note: These are Conceptual Model features (D-1 through D-4 in Delta Report). Not required for stable release. No code exists.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-1.10.1 | Output content filtering scans responses for policy violations before returning | Trigger policy-violating response | Deferred |
| AC-1.11.1 | Structured output validation confirms JSON schema conformance when requested | Request JSON schema output | Deferred (D-5, Small effort) |
| AC-1.12.1 | Provider-side safety metadata (refusals, safety triggers) is surfaced in standardized format | Trigger safety filter, inspect response | Deferred |
Status: Complete
What it does: Three-stage pipeline: alias normalization (120+ aliases) → provider detection (overrides → format rules → mapping tables → org-prefix fallbacks) → model ID transformation (provider-native format).
What it does NOT do: No user-defined aliases. No version/snapshot resolution. No per-modality routing differences.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.1.1 |
gpt-4o resolves to openai/gpt-4o
|
POST /v1/chat/completions with model: "gpt-4o"
|
P0 |
| AC-2.1.2 |
r1 resolves to deepseek/deepseek-r1
|
POST /v1/chat/completions with model: "r1"
|
P0 |
| AC-2.1.3 | Canonical IDs (e.g., openai/gpt-4o) work directly without alias resolution |
POST with canonical ID | P0 |
| AC-2.1.4 | Provider detection correctly routes google/gemini-* models to Vertex when credentials available |
POST with Gemini model | P0 |
| AC-2.1.5 | No alias maps to itself (no self-referencing loops) | Inspect MODEL_ALIASES dict for cycles |
P0 |
| AC-2.1.6 | Fireworks model IDs are transformed to accounts/fireworks/models/... format |
POST with Fireworks model, verify upstream call format | P1 |
| AC-2.1.7 | Nonexistent model returns 400 or 404, not 500 | POST with model: "nonexistent/model"
|
P0 |
Code References:
-
src/services/models.py—MODEL_ALIASESdict, resolution pipeline -
src/services/model_transformations.py— Provider-specific ID transformations -
src/services/model_availability.py— Availability checking
Status: Complete
What it does: ML-powered model selection via NotDiamond. Four modes: quality (openai/gpt-4o), cost (openai/gpt-4o-mini), latency (groq/llama-3.3-70b-versatile), balanced (anthropic/claude-sonnet-4). Falls back to mode-specific defaults when NotDiamond unavailable.
What it does NOT do: No user feedback learning. No custom model pools. No routing constraints.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.2.1 |
router:general:quality selects a high-quality model and returns 200 |
POST chat with model: "router:general:quality"
|
P0 |
| AC-2.2.2 |
router:general:cost selects a cheaper model than quality mode |
Compare selected models for same prompt | P0 |
| AC-2.2.3 |
router:general:latency selects a low-latency model |
POST and verify selection | P0 |
| AC-2.2.4 |
router:general:balanced considers quality, cost, and latency |
POST and verify selection | P0 |
| AC-2.2.5 | When NotDiamond is unavailable, fallback models are used without error | Disable NotDiamond, verify graceful fallback | P0 |
| AC-2.2.6 |
GET /general-router/settings/options returns available strategies and model pools |
Inspect response | P1 |
| AC-2.2.7 |
POST /general-router/test returns selected model + reasoning |
POST with sample prompt | P1 |
Code References:
-
src/services/general_router.py— Routing logic, NotDiamond integration -
src/routes/general_router.py— Endpoints
Status: Complete
What it does: Benchmark-driven model selection for coding tasks. 4 tiers by SWE-bench/HumanEval scores. Modes: auto (complexity-based), price, quality, agentic. Static data from code_quality_priors.json.
What it does NOT do: No code execution. No feedback learning. No custom tiers. No language detection.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.3.1 |
router:code:auto classifies prompt complexity and selects appropriate tier |
POST with code prompt | P0 |
| AC-2.3.2 |
router:code:quality selects highest-tier code model |
POST and verify | P0 |
| AC-2.3.3 |
router:code:price selects cost-effective code model |
POST and verify | P0 |
| AC-2.3.4 |
router:code:agentic selects model optimized for multi-step tool use |
POST and verify | P0 |
| AC-2.3.5 |
GET /code-router/tiers returns models with SWE-bench/HumanEval scores |
Inspect response | P0 |
| AC-2.3.6 | Code router works entirely from in-memory data (no DB/Redis dependency) | Verify response with Redis down | P0 |
| AC-2.3.7 |
POST /code-router/test returns selected model and routing rationale |
POST with sample prompt | P1 |
Code References:
-
src/services/code_router.py— Routing logic, tier selection -
src/services/code_quality_priors.json— Static benchmark data -
src/routes/code_router.py— Endpoints
Status: Complete
What it does: 14-provider prioritized failover chain. Failover triggers on 401/402/403/404/502/503/504. Does NOT trigger on 400 (user error) or 429 (retries with backoff). Model-aware rules: OpenAI → OpenRouter only, Anthropic → OpenRouter only, open-source → all providers.
What it does NOT do: No mid-stream failover. No user-configured chains. No same-pricing guarantee across providers.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.4.1 | Primary provider 502/503/504 → request succeeds via fallback transparently | Force primary failure, verify success | P0 |
| AC-2.4.2 | Provider 401/402/403/404 → failover to next provider | Force auth error, verify failover | P0 |
| AC-2.4.3 | Provider 400 (user error) → returns 400 to user, NO failover | Send malformed request | P0 |
| AC-2.4.4 | Provider 429 → retries with backoff, does NOT failover | Trigger rate limit, verify retry behavior | P0 |
| AC-2.4.5 | OpenAI models only failover to OpenAI → OpenRouter | Inspect failover chain for openai/gpt-4o
|
P0 |
| AC-2.4.6 | Anthropic models only failover to Anthropic → OpenRouter | Inspect failover chain for anthropic/claude-sonnet-4
|
P0 |
| AC-2.4.7 | Open-source models can failover across all providers | Inspect chain for meta-llama/llama-3-70b
|
P0 |
| AC-2.4.8 | Failover chain skips providers with OPEN circuit breakers | Open a breaker, verify provider is skipped | P0 |
| AC-2.4.9 | User receives no indication of failover (transparent to client) | Monitor response during failover | P0 |
Code References:
-
src/services/provider_failover.py— Failover chain construction, error classification -
src/routes/chat.py—build_provider_failover_chain()integration
Status: Complete (with timing discrepancy)
What it does: Per-provider circuit breakers. CLOSED → OPEN (5 consecutive failures) → HALF_OPEN (after timeout) → CLOSED (3 consecutive successes) or back to OPEN. Redis + in-memory state.
What it does NOT do: No per-provider threshold configuration. No error-type differentiation. No operator alerts. No persistent state beyond Redis.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.5.1 | New provider starts in CLOSED state | GET /circuit-breakers/{new_provider} |
P0 |
| AC-2.5.2 | After 5 consecutive failures, state transitions to OPEN | Send 5 failing requests, check state | P0 |
| AC-2.5.3 | OPEN state prevents requests to that provider | Verify provider is skipped in failover | P0 |
| AC-2.5.4 | After timeout period, OPEN transitions to HALF_OPEN | Wait for timeout, check state | P0 |
| AC-2.5.5 | In HALF_OPEN, a successful request transitions to CLOSED | Send success, check state | P0 |
| AC-2.5.6 | In HALF_OPEN, a failed request transitions back to OPEN | Send failure, check state | P0 |
| AC-2.5.7 |
POST /circuit-breakers/{provider}/reset resets to CLOSED |
Reset and verify | P0 |
| AC-2.5.8 |
POST /circuit-breakers/reset-all resets all breakers |
Reset all and verify | P0 |
| AC-2.5.9 | Circuit breaker endpoints require NO auth (public) | Verify no auth needed | P1 |
| AC-2.5.10 | Prometheus metrics emitted on state transitions | Check circuit_breaker_state_transitions_total
|
P1 |
Code References:
-
src/services/circuit_breaker.py(line 67) — Default timeout 60 seconds - Redis keys:
circuit_breaker:{provider}:{state|failure_count|success_count|opened_at}(3600s TTL)
Known Issues:
- P1-7 (Delta Report): Code uses 60-second timeout, but Conceptual Model says 5 minutes and wiki Testing Plan says 5 minutes. Either code or docs must be updated.
Status: Partial
What it does: Checks primary provider health score before routing. Below-threshold providers are demoted in failover chain.
What it does NOT do: No proportional traffic splitting by health score. No per-model health. No predictive health.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.6.1 | When primary provider health is below threshold, a healthier provider is promoted | Degrade a provider, verify chain reordering | P1 |
| AC-2.6.2 | Health-based promotion is a binary decision (promote or don't) | Verify no weighted splitting | P1 |
Status: Partial
What it does: Route to lowest-latency or cheapest provider for same model. General Router "latency" mode hardcodes to groq/llama-3.3-70b-versatile.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.7.1 | Latency mode selects a low-latency provider | Verify model selection via router | P1 |
| AC-2.8.1 | Cost mode selects cheapest capable provider | Compare pricing of selected vs alternatives | P1 |
Known Issues: No dynamic latency-optimal selection — latency mode hardcodes a specific model rather than measuring real-time latency. Deferred for post-release.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-2.9.1 | Traffic for same model is distributed across providers at configured ratios | Monitor provider selection distribution | Deferred (D-17) |
Status: Complete
What it does: Continuous monitoring at intervals by tier: Critical (5min), Popular (30min), Standard (2-4hr), On-Demand (when requested). Health checks verify availability and latency.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.1.1 |
GET /health always returns 200, even when dependencies are degraded |
Call when DB is down | P0 |
| AC-3.1.2 | Health response includes version, status, and timestamp
|
Inspect response | P0 |
| AC-3.1.3 |
GET /health/system returns memory, CPU, and connection pool stats |
Inspect response | P0 |
| AC-3.1.4 | Provider health scores are 0-100 per provider | GET /health/providers |
P0 |
| AC-3.1.5 | Model health shows healthy, degraded, or down per model |
GET /health/models |
P0 |
| AC-3.1.6 |
GET /health/quick is sub-millisecond (static response) |
Time the endpoint | P1 |
| AC-3.1.7 |
GET /health/railway returns comprehensive check (DB, Redis, providers) |
Inspect response | P1 |
| AC-3.1.8 | Gateway health dashboard returns HTML and JSON formats |
GET /health/gateways/dashboard and /data
|
P1 |
| AC-3.1.9 | Health insights provide actionable recommendations | GET /health/insights |
P2 |
| AC-3.1.10 | Background monitoring can be started and stopped |
POST /health/monitoring/start, /stop
|
P1 |
Code References:
-
src/services/intelligent_health_monitor.py— Tiered monitoring -
src/services/autonomous_monitor.py— Background monitoring -
src/routes/health.py— Health endpoints
Status: Complete
What it does: Every real inference request contributes health data as a background task — success/failure, latency, tokens, provider response codes. Zero overhead on request path.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.2.1 | Health data is captured after response is returned (no latency impact on user) | Verify background task execution | P0 |
| AC-3.2.2 | Captured data includes: latency, tokens, status, provider | Inspect health data store | P1 |
Status: Complete
What it does: Auto-creates incidents on health degradation. Severity levels, timestamps, captured logs, resolution tracking, MTTR calculation.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.3.1 | Downtime incidents can be listed with filters | GET /admin/downtime/incidents?status=ongoing |
P0 |
| AC-3.3.2 | Incidents can be resolved with notes | POST /admin/downtime/incidents/{id}/resolve |
P0 |
| AC-3.3.3 | Already-resolved incidents reject re-resolution | Attempt to resolve again | P1 |
| AC-3.3.4 | Incident analysis shows error patterns and type distribution | GET /admin/downtime/incidents/{id}/analysis |
P1 |
| AC-3.3.5 | MTTR statistics are computed | GET /admin/downtime/statistics |
P1 |
Code References:
-
src/routes/admin.py— Downtime tracking endpoints
Status: Partial
What it does: Hardcoded quality priors for ~20 models (task-specific: simple_qa, code_gen, reasoning, etc.). SWE-bench/HumanEval in Code Router.
What it does NOT do: Not stored in DB. Not updatable without code change. Missing MMLU, MATH, MT-Bench, LMSYS Arena ELO, LiveBench.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.4.1 | Code router tiers include SWE-bench and HumanEval scores | GET /code-router/tiers |
P0 |
| AC-3.4.2 | Model selector uses quality priors for task-specific routing | Verify model_selector.py quality maps |
P1 |
Known Issues: Quality data is static/hardcoded, not from DB. Missing several major benchmarks. No dynamic updating.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.5.1 | Per-customer success rates are tracked per model | Check customer-model analytics | Deferred (D-19) |
Status: Partial (OpenRouter only)
What it does: Tracks upstream provider credit balances. OpenRouter: full implementation with API call, 15-min cache, threshold alerts (critical $5, warning $20, info $50).
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-3.6.1 |
GET /api/provider-credits/balance returns credit balances for monitored providers |
Inspect response | P0 |
| AC-3.6.2 | OpenRouter balance is cached for 15 minutes | Check timing of two consecutive calls | P1 |
| AC-3.6.3 | Threshold alerts fire at critical ($5), warning ($20), info ($50) | Verify alert logic | P1 |
Code References:
-
src/services/provider_credit_monitor.py(lines 33-138) — OpenRouter implementation - Lines 165-167 — TODO stubs for all other providers
Known Issues:
- P1-1 (Delta Report): Only OpenRouter implemented. 29 other providers have TODO stubs. No preemptive deprioritization in failover chain.
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.1.1 | Semantically similar prompts return cached responses (cosine similarity >0.95) | Test with paraphrased prompt | Deferred (D-8) |
Status: Not Implemented (Deferred — infrastructure exists but not wired)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.2.1 | Identical inference requests (same messages + model + params) return cached response | Send same request twice, compare latency | Deferred (D-9) |
Code References:
-
src/services/response_cache.py— SHA-256 hashing, Redis + in-memory fallback exists but NOT wired into inference path
Status: Partial (Ghost Feature — P0 issue)
What it does: Butter.dev proxy used for all requests. User preference endpoints exist but are ignored.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.3.1 | If Butter cache settings endpoints exist, user preference MUST be respected during inference | Set enable_butter_cache=false, verify Butter proxy is NOT used |
P0 (KNOWN BUG) |
| AC-4.3.2 | OR: Butter cache settings endpoints are removed entirely | Verify endpoints don't exist | P0 Alternative |
Code References:
-
src/routes/users.py(lines 305-408) —GET/PUT /user/cache-settingsexist, store preference -
src/routes/chat.py(line 697) — Always callsget_butter_pooled_async_client()without checking preference
Known Issues:
- P0-1 (Delta Report): Ghost feature. User can toggle a setting that does nothing. Trust-eroding.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-4.4.1 | Catalog endpoint responds in sub-100ms on cache hit | Time GET /v1/models on second request |
P0 |
| AC-4.4.2 | Auth cache reduces lookup latency from ~100ms to <5ms | Compare first vs second auth timing | P1 |
| AC-4.4.3 | When Redis is down, local memory cache activates — no requests blocked | Stop Redis, verify normal operation | P0 |
| AC-4.4.4 | Cache invalidation clears all layers |
POST /admin/cache/clear, verify fresh data |
P1 |
| AC-4.4.5 | Stampede protection prevents multiple simultaneous cache rebuilds | Concurrent requests to cold cache | P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.1.1 | Model sync can be triggered incrementally and fully |
POST /admin/model-sync/trigger and /all
|
P0 |
| AC-5.1.2 | If provider API is down, last synced catalog is served | Verify stale catalog on provider failure | P0 |
| AC-5.1.3 | Per-provider sync works | POST /admin/model-sync/provider/{slug} |
P1 |
| AC-5.1.4 | Full resync (delete + reimport) works | POST /admin/model-sync/full |
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.2.1 | Every model in GET /v1/models has id, name, provider_slug, context_length, and pricing |
Inspect response schema | P0 |
| AC-5.2.2 | No model has null or zero pricing for both prompt and completion | Scan all models in response | P0 (see 5.3) |
Status: Partial (gating not enforced at sync)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.3.1 | Models without pricing are rejected during sync (not visible to users) | Check catalog for models with null pricing | P1 (KNOWN BUG) |
| AC-5.3.2 |
GET /v1/models/unique returns no duplicate model IDs |
Check for uniqueness | P0 |
| AC-5.3.3 | High-value models without explicit pricing are BLOCKED, not served at default rate | Verify pricing guard for GPT-4, Claude, Gemini | P0 |
Code References:
-
src/services/model_catalog_sync.py—extract_pricing()(lines 136-153) returns all None for missing pricing. Line 368 checksif any(pricing.values())but is non-blocking — models ARE synced without pricing. -
src/services/pricing.py(lines 783-839) —HIGH_VALUE_MODEL_PATTERNSguard raises ValueError on default pricing fallback
Known Issues:
- P1-3 (Delta Report): Models without pricing are synced into the catalog. Non-high-value models without pricing fall to default ($0.00002/token) — potential under-billing.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.4.1 | Model detail returns HuggingFace data (downloads, likes, parameters) when available | GET /api/models/detail?model_id=meta-llama/... |
P1 |
| AC-5.4.2 | HuggingFace data is cached with TTL | Verify caching on repeated requests | P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-5.5.1 |
GET /v1/models?provider=fireworks returns only Fireworks models |
Filter and verify | P0 |
| AC-5.5.2 |
GET /v1/models/search?q=llama returns matching models |
Verify results | P0 |
| AC-5.5.3 |
GET /v1/models/trending returns models ranked by usage |
Inspect response | P1 |
| AC-5.5.4 |
GET /v1/gateways returns all gateways with name, color, priority, site_url |
Inspect response | P0 |
| AC-5.5.5 | Model comparison works across providers | GET /v1/models/{provider}/{model}/compare |
P1 |
Status: Complete (with atomicity concern on legacy path)
What it does: Atomic billing unit. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Pre-flight checks, idempotent deductions (UNIQUE constraint + RPC), subscription allowance consumed first, auto-refund on provider errors.
What it does NOT do: No real-time credit streaming during generation. No credit expiration. No rollover. No credit transfers. No multi-currency.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.1.1 | Pre-flight check: user with 0 credits receives 402 BEFORE any provider call | POST with 0-credit user, verify no upstream call | P0 |
| AC-6.1.2 | Idempotent deduction: same request ID sent twice deducts credits only once | POST twice with same X-Request-ID
|
P0 |
| AC-6.1.3 | Subscription allowance consumed before purchased credits | User with both: make request, verify subscription decreases first | P0 |
| AC-6.1.4 | Provider 5xx error → automatic credit refund | Trigger 5xx, verify refund in credit_transactions
|
P0 |
| AC-6.1.5 | Provider timeout → automatic credit refund | Trigger timeout, verify refund | P0 |
| AC-6.1.6 | Provider 4xx error (user error) → NO refund | Trigger 4xx, verify no refund | P0 |
| AC-6.1.7 | High-value models (GPT-4, Claude, Gemini, o1/o3/o4) blocked if pricing falls to default | Verify pricing guard fires for each pattern | P0 |
| AC-6.1.8 | Credit transactions logged with request_id, user_id, model, token counts, cost | Check credit_transactions table |
P0 |
| AC-6.1.9 | Balance update and transaction log happen atomically (single DB transaction via RPC) | Verify atomic_deduct_credits RPC is used |
P0 |
| AC-6.1.10 | Legacy fallback path either doesn't exist or handles transaction logging failure safely | Verify legacy path behavior on logging failure | P0 (KNOWN RISK) |
| AC-6.1.11 | Credit transaction history is paginated | GET /credits/transactions?limit=10 |
P1 |
| AC-6.1.12 | Admin can add/adjust/refund credits |
POST /credits/add, /adjust, /refund
|
P1 |
| AC-6.1.13 | Daily usage cap prevents runaway costs | Exceed daily limit, verify 402 | P1 |
| AC-6.1.14 |
request_id has UNIQUE constraint in DB (belt-and-suspenders idempotency) |
Check migration 20260223000001_add_request_id_to_credit_transactions.sql
|
P0 |
Code References:
-
src/db/users.py(lines 701-1106) — Credit deduction- Atomic RPC path (lines 862-967) — Correct
- Legacy fallback path (lines 987-1096) — Risk: two separate calls, if logging fails credits already deducted (lines 1077-1082)
-
src/services/pricing.py(lines 783-839) —HIGH_VALUE_MODEL_PATTERNS -
src/routes/chat.py(lines 1670-1742) — Auto-refund logic
Known Issues:
- P0-2: Legacy fallback path may create orphaned deductions (balance reduced, no transaction record)
- P0-3: Pricing guard needs end-to-end verification — must fire BEFORE provider call
- P0-4: Auto-refund needs integration testing for edge cases (partial stream, refund failure)
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.2.1 | New user gets $5 credits and trial expiring in 3 days | Register, check balance + trial_end
|
P0 (config mismatch) |
| AC-6.2.2 | Trial user can make requests until credits/limits exhausted | Make requests during trial | P0 |
| AC-6.2.3 | Expired trial returns 402 for paid models | POST after trial expiry | P0 |
| AC-6.2.4 | Expired trial CAN access :free suffix models |
POST with :free model after expiry |
P0 |
| AC-6.2.5 |
GET /plans returns available plan tiers with pricing |
Inspect response | P0 |
| AC-6.2.6 |
GET /trial/status returns active/expired and days remaining |
Check response | P0 |
| AC-6.2.7 | Unused subscription allowance does NOT roll over (resets monthly) | Verify at month boundary | P1 |
| AC-6.2.8 | Purchased credits never expire and survive plan changes | Change plan, verify credits persist | P1 |
| AC-6.2.9 | Daily trial limit ($1/day) is enforced | Exceed $1 in trial, verify blocking | P0 |
Known Issues:
- P0-7 (Delta Report): Trial config mismatch — CLAUDE.md says $5, wiki says $10, code says $5. Must reconcile.
-
src/config/usage_limits.py— Trial: $5, 3 days, $1/day -
src/db/trials.py(line 44) — Formulatrial_days * 5suggests variable durations
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.3.1 | User can view activity stats (total requests/tokens/spend by model/provider) | GET /user/activity/stats |
P0 |
| AC-6.3.2 | Activity log is paginated (limit 1-1000) | GET /user/activity/log?limit=50 |
P0 |
| AC-6.3.3 | Activity log total field returns actual DB total, not page count |
Verify total vs count
|
P1 (KNOWN BUG) |
| AC-6.3.4 | Per-API-key usage breakdown is available | GET /user/api-keys/{key_id}/usage |
P2 (NOT IMPLEMENTED) |
| AC-6.3.5 | CSV/JSON export is available | GET /user/usage/export?format=csv |
P2 (NOT IMPLEMENTED) |
Known Issues:
-
P1-4 (Delta Report):
src/routes/users.py(line 515) —"total": len(transactions)returns page count, not DB total -
P2-1:
activity_logstoresuser_idbut NOTapi_key_id— no per-key breakdown
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.4.1 | Outbound webhook delivery for credits.low, credits.depleted, model.degraded events |
Configure webhook, trigger events | Deferred (D-10) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-6.5.1 | Per-tier SLA violations are detected with auto credit-back compensation | Monitor SLA metrics | Deferred (D-14) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.1.1 | Template library with versioning | CRUD on prompt templates | Deferred (D-12) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.2.1 |
POST /v1/batch/jobs submits bulk workloads |
Submit batch job | Deferred (D-11) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.3.1 | Side-by-side model comparison for same prompt | Compare endpoint | Deferred (D-13) |
Status: Not Implemented (Deferred — frontend-coupled)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-7.4.1 | Interactive prompt testing UI | Access playground | Deferred |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.1.1 |
GET /metrics returns valid Prometheus text format |
Parse response | P0 |
| AC-8.1.2 | OpenMetrics format with exemplar support is available via content negotiation | Accept: application/openmetrics-text |
P1 |
| AC-8.1.3 | Parsed metrics include p50, p95, p99 latency percentiles | GET /api/metrics/parsed |
P0 |
| AC-8.1.4 | Real-time stats update within 60 seconds of new requests | GET /api/monitoring/stats/realtime |
P1 |
| AC-8.1.5 | Error rates tracked per provider and per model | GET /api/monitoring/error-rates |
P1 |
| AC-8.1.6 | Anomaly detection flags unusual patterns | GET /api/monitoring/anomalies |
P1 |
| AC-8.1.7 | Grafana SimpleJSON datasource protocol fully implemented |
GET /prometheus/datasource (test), POST /search, /query
|
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.2.1 | OpenTelemetry traces are initialized and exportable | GET /api/instrumentation/health |
P0 |
| AC-8.2.2 | Every request gets a trace ID linking middleware → auth → routing → provider → billing | Inspect trace in Tempo | P1 |
| AC-8.2.3 | Exemplar linking from metrics to traces works | Verify in Grafana | P2 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.3.1 | Autonomous error monitor status is retrievable | GET /error-monitor/autonomous/status |
P0 |
| AC-8.3.2 | Dashboard provides error landscape overview | GET /error-monitor/dashboard |
P0 |
| AC-8.3.3 | Recent errors sorted by recency | GET /error-monitor/errors/recent |
P0 |
| AC-8.3.4 | Critical errors flagged separately | GET /error-monitor/errors/critical |
P0 |
| AC-8.3.5 | Error patterns detect recurring issues | GET /error-monitor/errors/patterns |
P1 |
| AC-8.3.6 | AI fix suggestions generated via Claude | POST /error-monitor/fixes/generate-for-error |
P2 |
Note: All error monitor endpoints require NO auth (all public). Error patterns are in-memory only — lost on restart.
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.4.1 | Arize Phoenix config exists and is functional | Check Arize initialization | P2 |
| AC-8.4.2 | OpenTelemetry captures inference metadata (model, tokens, latency) | Inspect trace attributes | P1 |
Known Issues: Arize Phoenix not exposed via API. Braintrust not integrated. No prompt/response pair recording.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.5.1 | Pyroscope profiling tags cache/Redis layers with operation context | Verify tag presence in Pyroscope | P1 |
| AC-8.5.2 | Profiling does not add measurable latency to requests | Compare request times with/without profiling | P1 |
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-8.6.1 | User can view their own usage dashboard data |
GET /user/activity/stats, GET /user/monitor
|
P0 |
| AC-8.6.2 | Model health status visible to users | GET /v1/model-health |
P0 |
| AC-8.6.3 | Public status page with provider/model availability |
GET /v1/status/, GET /v1/status/providers
|
P0 |
| AC-8.6.4 | Latency percentiles exposed to customers | GET /user/latency?model=... |
P2 (NOT IMPLEMENTED) |
Status: Complete
What it does: POST /v1/chat/completions — full drop-in replacement. Streaming SSE, tool/function calling, JSON mode, logprobs. Any OpenAI SDK app works by changing base URL.
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-9.1.1 | Non-streaming returns 200 with choices[0].message.content, usage.prompt_tokens, usage.completion_tokens
|
POST with stream: false
|
P0 |
| AC-9.1.2 | Streaming returns SSE where each line starts with data: , ends with data: [DONE]
|
POST with stream: true
|
P0 |
| AC-9.1.3 |
response_format: {"type": "json_object"} returns valid parseable JSON |
POST with JSON mode | P0 |
| AC-9.1.4 |
tools array returns tool_calls when model decides to call a tool |
POST with tool definitions | P0 |
| AC-9.1.5 |
logprobs: true returns a logprobs field |
POST with logprobs | P1 |
| AC-9.1.6 | OpenAI Python SDK works with zero changes beyond base_url and api_key
|
openai.OpenAI(base_url="$BASE/v1") |
P0 |
| AC-9.1.7 | All inference errors use OpenAI-compatible format: {"error": {"message": "...", "type": "...", "code": "..."}}
|
Trigger errors, inspect format | P1 (KNOWN ISSUE) |
| AC-9.1.8 | Unauthenticated request with whitelisted model returns 200 | POST without auth header | P0 |
| AC-9.1.9 | Unauthenticated request with non-whitelisted model returns 401/403 | POST without auth header | P0 |
| AC-9.1.10 | Streaming normalization handles OpenAI, Gemini, Anthropic, Fireworks formats | Test stream from each provider type | P0 |
| AC-9.1.11 | Unrecognized streaming format logs a warning (not silently dropped) | Check logs for dropped chunks | P1 (KNOWN BUG) |
Known Issues:
-
P1-2: ~5% of errors use FastAPI default
{"detail": "..."}instead of OpenAI format — breaks SDK error handling -
P1-8: Stream normalizer returns
Nonefor unrecognized chunks (silently dropped, no warning)
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-9.2.1 | Non-streaming returns 200 with content[0].text, usage.input_tokens, usage.output_tokens in Anthropic format |
POST /v1/messages
|
P0 |
| AC-9.2.2 | Streaming returns SSE in Anthropic format (message_start, content_block_delta, message_stop) |
POST with stream: true
|
P0 |
| AC-9.2.3 | Credits deducted using Anthropic token counts | Compare balance before/after | P0 |
| AC-9.2.4 | Anthropic Python SDK works with zero changes beyond base_url and api_key
|
anthropic.Anthropic(base_url="$BASE/v1") |
P0 |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-10.1.1 | Requests routed to nearest provider region for lowest latency | Test from different regions | Deferred (D-15) |
Status: Not Implemented (Deferred)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-10.2.1 | EU customers' requests routed to EU-based providers | Test with EU IP | Deferred (D-16) |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-10.3.1 | Vercel serverless deployment works via api/index.py
|
Deploy to Vercel | P0 |
| AC-10.3.2 | Railway/Docker deployment works via start.sh
|
Deploy to Railway | P0 |
| AC-10.3.3 | Dev server starts with python src/main.py or uvicorn src.main:app --reload
|
Start locally | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.1.1 |
GET /api/stripe/credit-packages returns available packages (public, no auth) |
Inspect response | P0 |
| AC-CC.1.2 |
POST /api/stripe/checkout-session returns valid Stripe checkout URL |
Create session | P0 |
| AC-CC.1.3 | Successful payment webhook adds credits to user's balance | Simulate payment_intent.succeeded webhook |
P0 |
| AC-CC.1.4 | Webhook endpoint ALWAYS returns 200, even if processing fails | Send malformed webhook | P0 |
| AC-CC.1.5 | Payment history is paginated with amount, date, status | GET /api/stripe/payments |
P0 |
| AC-CC.1.6 | Subscription checkout creates Stripe subscription and assigns plan | POST /api/stripe/subscription-checkout |
P0 |
| AC-CC.1.7 | Subscription upgrade/downgrade/cancel work | Test each operation | P1 |
| AC-CC.1.8 | Webhook handles all events: payment_intent.succeeded, charge.succeeded, invoice.paid, customer.subscription.created
|
Test each event type | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.2.1 | Valid coupon redeems and adds correct credit amount | POST /coupons/redeem |
P0 |
| AC-CC.2.2 | Expired coupon returns 400 | Redeem expired code | P0 |
| AC-CC.2.3 | Already-redeemed coupon (same user) returns 400 | Redeem twice | P0 |
| AC-CC.2.4 | User-specific coupon redeemed by wrong user returns 400/403 | Redeem with different user | P0 |
| AC-CC.2.5 |
GET /coupons/available returns global + user-targeted coupons |
Inspect response | P1 |
| AC-CC.2.6 | Redemption history shows past redemptions | GET /coupons/history |
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.3.1 | User generates unique referral code | POST /referral/generate |
P0 |
| AC-CC.3.2 | Referral code validates successfully | POST /referral/validate |
P0 |
| AC-CC.3.3 | Self-referral is prevented | Attempt self-referral | P0 |
| AC-CC.3.4 | Referral stats show total referred, conversions, rewards | GET /referral/stats |
P1 |
| AC-CC.3.5 | Successful referral grants $10 credits to both parties on first $10+ purchase | Complete referral flow | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.4.1 | Sessions can be created, listed, updated, deleted | CRUD on /v1/chat/sessions/*
|
P0 |
| AC-CC.4.2 | Messages can be saved individually and in batch | POST single and batch | P0 |
| AC-CC.4.3 | Full-text search returns matching sessions | POST /v1/chat/search |
P0 |
| AC-CC.4.4 | Duplicate messages are deduplicated | Save same message twice, verify single entry | P1 |
| AC-CC.4.5 | Chat stats return accurate usage data | GET /v1/chat/stats |
P1 |
| AC-CC.4.6 | Share links provide public read-only access | Create share, access without auth | P1 |
| AC-CC.4.7 | Feedback CRUD (create, read, update, delete) works per session | CRUD on /v1/chat/feedback/*
|
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.5.1 | Created key is in gw_{env}_* format |
POST /user/api-keys |
P0 |
| AC-CC.5.2 | Key creation rate-limited to 10 per hour; 11th returns 429 | Create 11 keys | P0 |
| AC-CC.5.3 | Keys can be listed showing all active keys | GET /user/api-keys |
P0 |
| AC-CC.5.4 | Keys can be updated (name, restrictions) | PUT /user/api-keys/{key_id} |
P0 |
| AC-CC.5.5 | Keys can be deleted | DELETE /user/api-keys/{key_id} |
P0 |
| AC-CC.5.6 | Deleted key no longer authenticates (returns 401) | Use deleted key | P0 |
| AC-CC.5.7 | Audit logs record key creation, usage, deletion | GET /user/api-keys/audit-logs |
P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.6.1 |
POST /v1/images/generations returns 200 with image data or URL |
POST with prompt | P0 |
| AC-CC.6.2 | Credits deducted based on image generation pricing | Compare balance before/after | P0 |
| AC-CC.6.3 | 0-credit user receives 402 | POST with 0-credit user | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.7.1 | File upload transcription returns 200 with text | POST with audio file | P0 |
| AC-CC.7.2 | Base64 transcription returns 200 | POST /v1/audio/transcriptions/base64 |
P0 |
| AC-CC.7.3 | Unsupported format returns appropriate error | POST with invalid format | P1 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.8.1 |
GET /v1/tools returns available tools (web_search, text_to_speech) |
Inspect response | P0 |
| AC-CC.8.2 | Tool definitions in OpenAI function-calling format | GET /v1/tools/definitions |
P0 |
| AC-CC.8.3 | Nonexistent tool returns 404 | GET /v1/tools/fake_tool |
P0 |
| AC-CC.8.4 | Web search execution returns results |
POST /v1/tools/execute with web_search |
P0 |
| AC-CC.8.5 | SSRF protection blocks internal/private IP ranges | Attempt internal URL in tool execution | P0 |
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.9.1 | Partner config is publicly accessible | GET /partner-trials/config/{code} |
P0 |
| AC-CC.9.2 | Partner code check always returns 200 (valid/invalid in body) | GET /partner-trials/check/{code} |
P0 |
| AC-CC.9.3 | Starting partner trial applies partner-specific credits and limits |
POST /partner-trials/start with known partner code |
P0 |
| AC-CC.9.4 | Partner trial daily limit is enforced | Exceed daily limit | P0 |
| AC-CC.9.5 | Partner trial config is cached (5-min in-memory) | Check timing | P1 |
Status: Complete (partial test coverage)
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.10.1 | User can retrieve notification preferences | GET /user/notifications/preferences |
P0 |
| AC-CC.10.2 | Usage report can be triggered on demand | POST /user/notifications/send-usage-report |
P0 |
| AC-CC.10.3 | Test notification sends successfully | POST /user/notifications/test |
P0 |
| AC-CC.10.4 | Notification failure does not crash the system | Disable Resend, verify graceful handling | P0 |
| AC-CC.10.5 | Retry logic on notification delivery failure | Verify 2-3 retries with backoff | P2 (NOT IMPLEMENTED) |
Known Issues:
- P2-4 (Delta Report): No retry logic, no persistent delivery tracking. On failure: logs error, returns False, continues silently.
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.11.1 | Non-admin users receive 403 on ALL admin endpoints | Use user key on admin endpoint | P0 |
| AC-CC.11.2 | Admin can list, search, view user details |
GET /admin/users, /admin/users/{id}
|
P0 |
| AC-CC.11.3 | Admin credit grants respect per-transaction cap and 24h daily limit | Exceed limits | P0 |
| AC-CC.11.4 | Admin can assign plans | POST /admin/assign-plan |
P0 |
| AC-CC.11.5 | System monitor returns user counts, credit totals, API usage | GET /admin/monitor |
P0 |
| AC-CC.11.6 | Cache operations work (status, refresh, clear) | GET/POST cache endpoints | P1 |
| AC-CC.11.7 | Model sync can be triggered | POST /admin/model-sync/trigger |
P1 |
| AC-CC.11.8 |
GET /admin/model-sync/providers requires admin auth |
Verify auth enforcement | P0 (KNOWN RISK) |
| AC-CC.11.9 | Bulk user delete by domain respects protected domains (gmail, yahoo, outlook) | Attempt protected domain delete | P0 |
| AC-CC.11.10 | Bulk user delete defaults to dry_run=true | Verify default behavior | P0 |
Known Issues:
-
P0-6 (Delta Report):
GET /admin/model-sync/providersdocumented as "No auth enforced" — leaks infrastructure details (33 providers).
Status: Complete
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.12.1 | API keys are Fernet-encrypted in DB | Query DB directly | P0 |
| AC-CC.12.2 | API key lookup uses HMAC hash, not brute-force decryption | Verify code path | P0 |
| AC-CC.12.3 | SQL injection attempts are sanitized/rejected |
'; DROP TABLE users; -- in inputs |
P0 |
| AC-CC.12.4 | XSS payloads are sanitized/rejected |
<script>alert(1)</script> in inputs |
P0 |
| AC-CC.12.5 | Command injection blocked |
; rm -rf / in inputs |
P0 |
| AC-CC.12.6 | Path traversal blocked |
../../etc/passwd in inputs |
P0 |
| AC-CC.12.7 | Error messages never expose stack traces, internal paths, or sensitive data | Trigger errors, inspect responses | P0 |
| AC-CC.12.8 | Admin security violations logged in audit trail | Attempt unauthorized admin access | P0 |
| AC-CC.12.9 | Temporary/disposable email domains detected during registration | Register with user@tempmail.com
|
P1 |
Status: Partial
| # | Criterion | Verification | Priority |
|---|---|---|---|
| AC-CC.13.1 | REST path function calling works (OpenAI tools → Vertex functionDeclarations) | POST with tools to Vertex model via REST | P0 |
| AC-CC.13.2 | SDK path function calling either works OR is avoided when tools present | POST with tools via SDK path | P1 (KNOWN BUG) |
| AC-CC.13.3 | Tool choice options (auto, required, none) are translated correctly | Test each tool_choice value | P1 |
Code References:
-
src/services/google_vertex_client.py(lines 250-402, 662-707) — REST path implemented - Lines 585-587 — SDK path has TODO: "Function calling may not work correctly"
Known Issues:
- P1-5 (Delta Report): SDK path has TODO. If SDK path is used when tools are present, function calling silently fails.
| Layer | Feature | Criteria | Status | Known Issues |
|---|---|---|---|---|
| 1 | API Key Auth | 10 | Complete | — |
| 1 | RBAC | 6 | Complete | — |
| 1 | IP Allowlists | 6 | Complete | — |
| 1 | Domain Restrictions | 4 | Complete | — |
| 1 | Three-Layer Rate Limiting | 14 | Complete | P0-5: Missing headers on L2/L3 |
| 1 | Input Guardrails (4 features) | 4 | Not Implemented | Deferred |
| 1 | Output Guardrails (3 features) | 3 | Not Implemented | Deferred |
| 2 | Model Resolution | 7 | Complete | — |
| 2 | General Router | 7 | Complete | — |
| 2 | Code Router | 7 | Complete | — |
| 2 | Provider Failover | 9 | Complete | — |
| 2 | Circuit Breakers | 10 | Complete | P1-7: Timing discrepancy (60s vs 5min) |
| 2 | Health-Weighted LB | 2 | Partial | — |
| 2 | Latency/Cost Optimal | 2 | Partial | Hardcoded latency model |
| 2 | Traffic Splitting | 1 | Not Implemented | Deferred |
| 3 | Tiered Health Monitoring | 10 | Complete | — |
| 3 | Passive Health Capture | 2 | Complete | — |
| 3 | Incident Management | 5 | Complete | — |
| 3 | Model Quality Scoring | 2 | Partial | Static/hardcoded |
| 3 | Per-Customer Quality | 1 | Not Implemented | Deferred |
| 3 | Provider Credit Monitoring | 3 | Partial | P1-1: OpenRouter only |
| 4 | Semantic Cache | 1 | Not Implemented | Deferred |
| 4 | Exact-Match Cache | 1 | Not Implemented | Deferred (infra exists) |
| 4 | Butter.dev Cache | 2 | Partial | P0-1: Ghost feature |
| 4 | Supporting Caches | 5 | Complete | — |
| 5 | Background Model Sync | 4 | Complete | — |
| 5 | Model Metadata Standard | 2 | Complete | — |
| 5 | Catalog Inclusion | 3 | Partial | P1-3: No gating at sync |
| 5 | HuggingFace Enrichment | 2 | Complete | — |
| 5 | Model Discovery & Search | 5 | Complete | — |
| 6 | Credit System | 14 | Complete | P0-2/3/4: Atomicity, pricing guard, refund |
| 6 | Plans & Tiers | 9 | Complete | P0-7: Config mismatch |
| 6 | Customer Usage Analytics | 5 | Partial | P1-4: Pagination bug, P2-1/2: Per-key, export |
| 6 | Customer Webhooks | 1 | Not Implemented | Deferred |
| 6 | SLA Tracking | 1 | Not Implemented | Deferred |
| 7 | Prompt Management | 1 | Not Implemented | Deferred |
| 7 | Batch/Async Inference | 1 | Not Implemented | Deferred |
| 7 | Evaluation & Testing | 1 | Not Implemented | Deferred |
| 7 | Playground | 1 | Not Implemented | Deferred |
| 8 | Metrics & Dashboards | 7 | Complete | — |
| 8 | Distributed Tracing | 3 | Complete | — |
| 8 | Error Tracking | 6 | Complete | — |
| 8 | AI-Specific Tracing | 2 | Partial | Arize/Braintrust gaps |
| 8 | Profiling | 2 | Complete | — |
| 8 | Customer Observability | 4 | Partial | P2-3: No latency API |
| 9 | OpenAI-Compatible API | 11 | Complete | P1-2: Error format, P1-8: Stream drops |
| 9 | Anthropic-Compatible API | 4 | Complete | — |
| 10 | Multi-Region Routing | 1 | Not Implemented | Deferred |
| 10 | Data Residency | 1 | Not Implemented | Deferred |
| 10 | Multi-Target Deployment | 3 | Complete | — |
| CC | Stripe Payments | 8 | Complete | — |
| CC | Coupons | 6 | Complete | — |
| CC | Referrals | 5 | Complete | — |
| CC | Chat History | 7 | Complete | — |
| CC | API Key Management | 7 | Complete | — |
| CC | Image Generation | 3 | Complete | — |
| CC | Audio Transcription | 3 | Complete | — |
| CC | Server-Side Tools | 5 | Complete | — |
| CC | Partner Trials | 5 | Complete | — |
| CC | Notifications | 5 | Complete | P2-4: No retry/delivery tracking |
| CC | Admin Operations | 10 | Complete | P0-6: Model-sync providers auth |
| CC | Security | 9 | Complete | — |
| CC | Google Vertex FC | 3 | Partial | P1-5: SDK path TODO |
| TOTAL | 323 |
| Priority | Count | Description |
|---|---|---|
| P0 | 7 bugs across 46 criteria | Ghost features, billing atomicity, pricing guard, refund verification, rate limit headers, admin auth, trial config |
| P1 | 8 bugs across 28 criteria | Provider monitoring, error format, catalog gating, pagination, Vertex FC, overage, circuit breaker timing, stream normalization |
| P2 | 4 gaps across 12 criteria | Per-key usage, export, latency API, notification delivery |
| Deferred | 20 features, 24 criteria | Guardrails, caching, webhooks, batch, prompts, eval, SLA, geo-routing, GDPR, traffic splitting |
Source: Conceptual Model Features | Features | Delta Report | Testing Plan | Acceptance Criteria
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References