Skip to content

Features Acceptance Criteria

arminrad edited this page Mar 16, 2026 · 2 revisions

Acceptance Criteria

Reading path: Conceptual Model | Stability Definition | Conceptual Model Features | Features | Delta Report | Acceptance Criteria (you are here)

Read after: Delta Report (so you know the gaps and priorities) You're at the end of the reading path! From here, go to Testing Guide to see how these criteria are tested.


TL;DR — This is the single source of truth for acceptance criteria across all 56 features in the Conceptual Model. Each feature has: what it must do, what it must NOT do, testable acceptance criteria, implementation status, code references, known issues, and priority. Organized by the 10-layer architecture. Use this to verify any feature is "done." For detailed Given/When/Then format criteria and boundary validations, see Conceptual Model Acceptance Criteria.


Consolidation note: This document is the primary acceptance criteria reference. It incorporates the implementation-aware criteria. For the spec-pure criteria (Given/When/Then format with integration requirements), see Conceptual Model Acceptance Criteria. For compact test-plan-linked criteria, see the Testing Plan directly.


How to Read This Document

Each feature section includes:

  • Description: What the feature does and its boundaries (from Conceptual Model)
  • Implementation Status: Current state (Complete / Partial / Not Implemented)
  • Acceptance Criteria: Numbered, testable statements — a feature is accepted when ALL criteria pass
  • Code References: File paths and line numbers for verification
  • Known Issues: Bugs, gaps, or discrepancies found during code investigation
  • Priority: P0 (must fix before release), P1 (should fix), P2 (nice to have), Deferred (post-release)

Layer 1: Ingress

1.1 API Key Authentication

Status: Complete

What it does: Authenticates every API request using API keys encrypted at rest with Fernet AES-128. Keys are looked up via HMAC-SHA256 hash for O(log n) retrieval. Validates that keys are active, not expired, and not rate-limited.

What it does NOT do: No OAuth/JWT for API requests. No automatic key rotation. No multi-key auth per request.

# Criterion Verification Priority
AC-1.1.1 Valid API key in Authorization: Bearer gw_* header returns 200 Send request with valid key P0
AC-1.1.2 Invalid API key returns 401, never 200 or 500 Send request with Bearer invalid_key P0
AC-1.1.3 Expired API key returns 401 with clear message Use a key past its expires_at P0
AC-1.1.4 Deactivated API key (is_active=false) returns 401 Deactivate key, then use it P0
AC-1.1.5 API keys in DB are Fernet-encrypted ciphertext, never plaintext Query api_keys_new table directly, verify encrypted_key column is ciphertext P0
AC-1.1.6 Key lookup uses HMAC-SHA256 hash index, not brute-force decryption of all keys Verify key_hash column is indexed, lookup is O(log n) by timing with 1 key vs 1000 keys P0
AC-1.1.7 Key format is gw_{env}_{43_random_chars} (e.g., gw_live_abc123...) Create new key, verify format regex P0
AC-1.1.8 Key creation stores last4 characters for user-friendly identification Create key, check last4 field in response and DB P1
AC-1.1.9 Authentication is cached (5-min TTL, 512-entry LRU) — second request with same key is faster Time two consecutive auth calls, second should be <5ms vs 50-150ms P1
AC-1.1.10 When Redis is down, auth cache falls back to local memory — requests are never blocked Stop Redis, verify auth still works P0

Code References:

  • src/security/security.py — Fernet encryption, HMAC hashing
  • src/security/deps.pyget_api_key(), get_current_user(), validate_api_key_security()
  • src/db/api_keys.py — Key CRUD, key lookup by hash

1.2 Role-Based Access Control (RBAC)

Status: Complete

What it does: Assigns roles (admin, team, dev, free) to users. Permissions checked at dependency-injection level before route handlers execute. Role changes are audit-logged.

What it does NOT do: No granular per-model permissions. No custom roles. No team-level RBAC. No provider-level permissions.

# Criterion Verification Priority
AC-1.2.1 Non-admin API key returns 403 on ALL /admin/* endpoints GET /admin/users with user key P0
AC-1.2.2 Admin API key returns 200 on admin endpoints GET /admin/users with admin key P0
AC-1.2.3 Unauthorized admin access attempts are logged via audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS") Attempt admin access with user key, check audit log P0
AC-1.2.4 Role updates require admin auth and are logged with a reason POST /admin/roles/update with user_id, new_role, reason P0
AC-1.2.5 GET /admin/roles/permissions/{role} returns the correct permission set for each role Check all 4 roles P1
AC-1.2.6 Role change audit log is retrievable at GET /admin/roles/audit/log Verify entries with timestamps and reasons P1

Code References:

  • src/security/deps.pyrequire_admin dependency
  • src/routes/admin.py — Admin route handlers
  • src/db/roles.py — Role management

1.3 Per-Key IP Allowlists

Status: Complete

What it does: Restricts API key usage to specific IP addresses or CIDR ranges. Requests from non-allowlisted IPs are rejected before processing.

What it does NOT do: No geo-based restrictions. No IPv6 ranges. No automatic IP suggestions.

# Criterion Verification Priority
AC-1.3.1 Admin can create IP allowlist entries with IPv4 addresses POST /api/admin/ip-whitelist with {"ip": "1.2.3.4"} P0
AC-1.3.2 Admin can create IP allowlist entries with CIDR notation POST /api/admin/ip-whitelist with {"ip": "10.0.0.0/24"} P0
AC-1.3.3 API key with allowlist rejects requests from non-allowed IPs with 403 Use key from IP not in allowlist P0
AC-1.3.4 API key with allowlist accepts requests from allowed IPs Use key from allowlisted IP P0
AC-1.3.5 POST /api/admin/ip-whitelist/check correctly reports allowed vs blocked IPs Test with both allowed and blocked IPs P1
AC-1.3.6 Allowlist entries can be listed, updated, and deleted CRUD operations on /api/admin/ip-whitelist/* P1

Code References:

  • src/routes/admin.py — IP allowlist endpoints
  • src/security/deps.py — IP validation in validate_api_key_security()

1.4 Domain Restrictions

Status: Complete

What it does: Limits which HTTP referrer domains can use a specific API key. Prevents stolen keys from being used on unauthorized domains.

What it does NOT do: No domain ownership validation. No subdomain wildcards. No server-side restriction (only applies when Referer header present).

# Criterion Verification Priority
AC-1.4.1 API key with domain restriction rejects requests with wrong Referer header Send request with Referer: https://unauthorized.com P0
AC-1.4.2 API key with domain restriction accepts requests with correct Referer Send request with configured Referer domain P0
AC-1.4.3 Requests without Referer header bypass domain restriction (server-side usage) Send request without Referer header P0
AC-1.4.4 Multiple domains can be configured per key Configure 3 domains, verify all work P1

Code References:

  • src/security/deps.pyvalidate_api_key_security() domain check

1.5 Three-Layer Rate Limiting

Status: Complete (with known header gap on Layers 2 and 3)

What it does:

  • Layer 1 (IP): Security middleware with behavioral analysis, velocity detection. 300 RPM for unauthenticated, authenticated users exempt.
  • Layer 2 (API Key): Redis-backed per-key limits tied to plan tier.
  • Layer 3 (Anonymous): Stricter limits for unauthenticated requests.
  • Fallback: In-memory rate limiter when Redis is unavailable.

What it does NOT do: No per-model rate limits. No burst/token-bucket. No cross-instance IP state sharing. Rejected requests consume zero credits.

# Criterion Verification Priority
AC-1.5.1 Unauthenticated requests exceeding 300 RPM from same IP receive 429 Send 301 requests from one IP P0
AC-1.5.2 Authenticated users are exempt from IP-level rate limiting Verify no IP block on auth requests P0
AC-1.5.3 API key exceeding plan RPM receives 429 Exceed per-key limit P0
AC-1.5.4 Anonymous rate limits are stricter than authenticated limits Compare thresholds for anon vs auth P0
AC-1.5.5 Layer 1 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Reason, X-RateLimit-Mode Trigger Layer 1 429, inspect headers P0
AC-1.5.6 Layer 2 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset Trigger Layer 2 429, inspect headers P0 (KNOWN BUG)
AC-1.5.7 Layer 3 429 response includes Retry-After and X-RateLimit-* headers Trigger Layer 3 429, inspect headers P0 (KNOWN BUG)
AC-1.5.8 When Redis is down, rate limiting continues via in-memory fallback — requests are never blocked Stop Redis, verify rate limiting works P0
AC-1.5.9 Velocity mode activates when error rate exceeds 25% and reduces limits to 50% Trigger >25% error rate, check GET /velocity-mode-status P0
AC-1.5.10 Velocity mode deactivates after 3 minutes of normal error rates Wait for cooldown, verify normal limits restored P1
AC-1.5.11 Rate limit configuration viewable at GET /user/rate-limits Check response format P1
AC-1.5.12 Per-key rate limits updatable via PUT /user/rate-limits/{key_id} Update and verify enforcement P1
AC-1.5.13 Auth endpoint rate-limits to 10 requests per 15 minutes per IP POST /auth 11 times, 11th returns 429 P0
AC-1.5.14 Registration rate-limits to 3 requests per hour per IP POST /auth/register 4 times, 4th returns 429 P0

Code References:

  • src/middleware/security_middleware.py (lines 647-716) — Layer 1, headers present
  • src/services/rate_limiting.py (lines 78-94) — Layer 2, RateLimitResult dataclass has fields but NOT converted to HTTP headers
  • src/services/anonymous_rate_limiter.py — Layer 3, NO headers
  • src/services/rate_limiting_fallback.py — In-memory fallback

Known Issues:

  • P0-5 (Delta Report): Layer 2 RateLimitResult fields exist but are not converted to HTTP response headers. Layer 3 has no rate limit headers at all. Clients get bare 429 rejections with no retry information.

1.6–1.9 Input Guardrails (PII Detection, Prompt Injection, Topic Restrictions, Content Moderation)

Status: Not Implemented (Deferred)

What these would do: PII scanning (phone, SSN, email, credit card), prompt injection pattern detection, per-key topic restrictions, content moderation via external classifiers.

# Criterion Verification Priority
AC-1.6.1 PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers Send prompt with PII Deferred
AC-1.7.1 Prompt injection patterns that attempt to override system prompts are detected and blocked Send known injection pattern Deferred
AC-1.8.1 Per-API-key topic restrictions limit responses to configured domains Configure restriction, test out-of-domain Deferred
AC-1.9.1 Content moderation blocks harmful inputs before reaching providers Send harmful content Deferred

Note: These are Conceptual Model features (D-1 through D-4 in Delta Report). Not required for stable release. No code exists.


1.10–1.12 Output Guardrails (Content Filtering, Structured Output Validation, Hallucination Flags)

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-1.10.1 Output content filtering scans responses for policy violations before returning Trigger policy-violating response Deferred
AC-1.11.1 Structured output validation confirms JSON schema conformance when requested Request JSON schema output Deferred (D-5, Small effort)
AC-1.12.1 Provider-side safety metadata (refusals, safety triggers) is surfaced in standardized format Trigger safety filter, inspect response Deferred

Layer 2: Core Routing Engine

2.1 Model Resolution Pipeline

Status: Complete

What it does: Three-stage pipeline: alias normalization (120+ aliases) → provider detection (overrides → format rules → mapping tables → org-prefix fallbacks) → model ID transformation (provider-native format).

What it does NOT do: No user-defined aliases. No version/snapshot resolution. No per-modality routing differences.

# Criterion Verification Priority
AC-2.1.1 gpt-4o resolves to openai/gpt-4o POST /v1/chat/completions with model: "gpt-4o" P0
AC-2.1.2 r1 resolves to deepseek/deepseek-r1 POST /v1/chat/completions with model: "r1" P0
AC-2.1.3 Canonical IDs (e.g., openai/gpt-4o) work directly without alias resolution POST with canonical ID P0
AC-2.1.4 Provider detection correctly routes google/gemini-* models to Vertex when credentials available POST with Gemini model P0
AC-2.1.5 No alias maps to itself (no self-referencing loops) Inspect MODEL_ALIASES dict for cycles P0
AC-2.1.6 Fireworks model IDs are transformed to accounts/fireworks/models/... format POST with Fireworks model, verify upstream call format P1
AC-2.1.7 Nonexistent model returns 400 or 404, not 500 POST with model: "nonexistent/model" P0

Code References:

  • src/services/models.pyMODEL_ALIASES dict, resolution pipeline
  • src/services/model_transformations.py — Provider-specific ID transformations
  • src/services/model_availability.py — Availability checking

2.2 Intelligent Routing — General Router

Status: Complete

What it does: ML-powered model selection via NotDiamond. Four modes: quality (openai/gpt-4o), cost (openai/gpt-4o-mini), latency (groq/llama-3.3-70b-versatile), balanced (anthropic/claude-sonnet-4). Falls back to mode-specific defaults when NotDiamond unavailable.

What it does NOT do: No user feedback learning. No custom model pools. No routing constraints.

# Criterion Verification Priority
AC-2.2.1 router:general:quality selects a high-quality model and returns 200 POST chat with model: "router:general:quality" P0
AC-2.2.2 router:general:cost selects a cheaper model than quality mode Compare selected models for same prompt P0
AC-2.2.3 router:general:latency selects a low-latency model POST and verify selection P0
AC-2.2.4 router:general:balanced considers quality, cost, and latency POST and verify selection P0
AC-2.2.5 When NotDiamond is unavailable, fallback models are used without error Disable NotDiamond, verify graceful fallback P0
AC-2.2.6 GET /general-router/settings/options returns available strategies and model pools Inspect response P1
AC-2.2.7 POST /general-router/test returns selected model + reasoning POST with sample prompt P1

Code References:

  • src/services/general_router.py — Routing logic, NotDiamond integration
  • src/routes/general_router.py — Endpoints

2.3 Intelligent Routing — Code Router

Status: Complete

What it does: Benchmark-driven model selection for coding tasks. 4 tiers by SWE-bench/HumanEval scores. Modes: auto (complexity-based), price, quality, agentic. Static data from code_quality_priors.json.

What it does NOT do: No code execution. No feedback learning. No custom tiers. No language detection.

# Criterion Verification Priority
AC-2.3.1 router:code:auto classifies prompt complexity and selects appropriate tier POST with code prompt P0
AC-2.3.2 router:code:quality selects highest-tier code model POST and verify P0
AC-2.3.3 router:code:price selects cost-effective code model POST and verify P0
AC-2.3.4 router:code:agentic selects model optimized for multi-step tool use POST and verify P0
AC-2.3.5 GET /code-router/tiers returns models with SWE-bench/HumanEval scores Inspect response P0
AC-2.3.6 Code router works entirely from in-memory data (no DB/Redis dependency) Verify response with Redis down P0
AC-2.3.7 POST /code-router/test returns selected model and routing rationale POST with sample prompt P1

Code References:

  • src/services/code_router.py — Routing logic, tier selection
  • src/services/code_quality_priors.json — Static benchmark data
  • src/routes/code_router.py — Endpoints

2.4 Provider Failover

Status: Complete

What it does: 14-provider prioritized failover chain. Failover triggers on 401/402/403/404/502/503/504. Does NOT trigger on 400 (user error) or 429 (retries with backoff). Model-aware rules: OpenAI → OpenRouter only, Anthropic → OpenRouter only, open-source → all providers.

What it does NOT do: No mid-stream failover. No user-configured chains. No same-pricing guarantee across providers.

# Criterion Verification Priority
AC-2.4.1 Primary provider 502/503/504 → request succeeds via fallback transparently Force primary failure, verify success P0
AC-2.4.2 Provider 401/402/403/404 → failover to next provider Force auth error, verify failover P0
AC-2.4.3 Provider 400 (user error) → returns 400 to user, NO failover Send malformed request P0
AC-2.4.4 Provider 429 → retries with backoff, does NOT failover Trigger rate limit, verify retry behavior P0
AC-2.4.5 OpenAI models only failover to OpenAI → OpenRouter Inspect failover chain for openai/gpt-4o P0
AC-2.4.6 Anthropic models only failover to Anthropic → OpenRouter Inspect failover chain for anthropic/claude-sonnet-4 P0
AC-2.4.7 Open-source models can failover across all providers Inspect chain for meta-llama/llama-3-70b P0
AC-2.4.8 Failover chain skips providers with OPEN circuit breakers Open a breaker, verify provider is skipped P0
AC-2.4.9 User receives no indication of failover (transparent to client) Monitor response during failover P0

Code References:

  • src/services/provider_failover.py — Failover chain construction, error classification
  • src/routes/chat.pybuild_provider_failover_chain() integration

2.5 Circuit Breakers

Status: Complete (with timing discrepancy)

What it does: Per-provider circuit breakers. CLOSED → OPEN (5 consecutive failures) → HALF_OPEN (after timeout) → CLOSED (3 consecutive successes) or back to OPEN. Redis + in-memory state.

What it does NOT do: No per-provider threshold configuration. No error-type differentiation. No operator alerts. No persistent state beyond Redis.

# Criterion Verification Priority
AC-2.5.1 New provider starts in CLOSED state GET /circuit-breakers/{new_provider} P0
AC-2.5.2 After 5 consecutive failures, state transitions to OPEN Send 5 failing requests, check state P0
AC-2.5.3 OPEN state prevents requests to that provider Verify provider is skipped in failover P0
AC-2.5.4 After timeout period, OPEN transitions to HALF_OPEN Wait for timeout, check state P0
AC-2.5.5 In HALF_OPEN, a successful request transitions to CLOSED Send success, check state P0
AC-2.5.6 In HALF_OPEN, a failed request transitions back to OPEN Send failure, check state P0
AC-2.5.7 POST /circuit-breakers/{provider}/reset resets to CLOSED Reset and verify P0
AC-2.5.8 POST /circuit-breakers/reset-all resets all breakers Reset all and verify P0
AC-2.5.9 Circuit breaker endpoints require NO auth (public) Verify no auth needed P1
AC-2.5.10 Prometheus metrics emitted on state transitions Check circuit_breaker_state_transitions_total P1

Code References:

  • src/services/circuit_breaker.py (line 67) — Default timeout 60 seconds
  • Redis keys: circuit_breaker:{provider}:{state|failure_count|success_count|opened_at} (3600s TTL)

Known Issues:

  • P1-7 (Delta Report): Code uses 60-second timeout, but Conceptual Model says 5 minutes and wiki Testing Plan says 5 minutes. Either code or docs must be updated.

2.6 Health-Weighted Load Balancing

Status: Partial

What it does: Checks primary provider health score before routing. Below-threshold providers are demoted in failover chain.

What it does NOT do: No proportional traffic splitting by health score. No per-model health. No predictive health.

# Criterion Verification Priority
AC-2.6.1 When primary provider health is below threshold, a healthier provider is promoted Degrade a provider, verify chain reordering P1
AC-2.6.2 Health-based promotion is a binary decision (promote or don't) Verify no weighted splitting P1

2.7–2.8 Latency-Optimal / Cost-Optimal Selection

Status: Partial

What it does: Route to lowest-latency or cheapest provider for same model. General Router "latency" mode hardcodes to groq/llama-3.3-70b-versatile.

# Criterion Verification Priority
AC-2.7.1 Latency mode selects a low-latency provider Verify model selection via router P1
AC-2.8.1 Cost mode selects cheapest capable provider Compare pricing of selected vs alternatives P1

Known Issues: No dynamic latency-optimal selection — latency mode hardcodes a specific model rather than measuring real-time latency. Deferred for post-release.


2.9 Traffic Splitting

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-2.9.1 Traffic for same model is distributed across providers at configured ratios Monitor provider selection distribution Deferred (D-17)

Layer 3: Intelligence

3.1 Tiered Health Monitoring

Status: Complete

What it does: Continuous monitoring at intervals by tier: Critical (5min), Popular (30min), Standard (2-4hr), On-Demand (when requested). Health checks verify availability and latency.

# Criterion Verification Priority
AC-3.1.1 GET /health always returns 200, even when dependencies are degraded Call when DB is down P0
AC-3.1.2 Health response includes version, status, and timestamp Inspect response P0
AC-3.1.3 GET /health/system returns memory, CPU, and connection pool stats Inspect response P0
AC-3.1.4 Provider health scores are 0-100 per provider GET /health/providers P0
AC-3.1.5 Model health shows healthy, degraded, or down per model GET /health/models P0
AC-3.1.6 GET /health/quick is sub-millisecond (static response) Time the endpoint P1
AC-3.1.7 GET /health/railway returns comprehensive check (DB, Redis, providers) Inspect response P1
AC-3.1.8 Gateway health dashboard returns HTML and JSON formats GET /health/gateways/dashboard and /data P1
AC-3.1.9 Health insights provide actionable recommendations GET /health/insights P2
AC-3.1.10 Background monitoring can be started and stopped POST /health/monitoring/start, /stop P1

Code References:

  • src/services/intelligent_health_monitor.py — Tiered monitoring
  • src/services/autonomous_monitor.py — Background monitoring
  • src/routes/health.py — Health endpoints

3.2 Passive Health Capture

Status: Complete

What it does: Every real inference request contributes health data as a background task — success/failure, latency, tokens, provider response codes. Zero overhead on request path.

# Criterion Verification Priority
AC-3.2.1 Health data is captured after response is returned (no latency impact on user) Verify background task execution P0
AC-3.2.2 Captured data includes: latency, tokens, status, provider Inspect health data store P1

3.3 Incident Management

Status: Complete

What it does: Auto-creates incidents on health degradation. Severity levels, timestamps, captured logs, resolution tracking, MTTR calculation.

# Criterion Verification Priority
AC-3.3.1 Downtime incidents can be listed with filters GET /admin/downtime/incidents?status=ongoing P0
AC-3.3.2 Incidents can be resolved with notes POST /admin/downtime/incidents/{id}/resolve P0
AC-3.3.3 Already-resolved incidents reject re-resolution Attempt to resolve again P1
AC-3.3.4 Incident analysis shows error patterns and type distribution GET /admin/downtime/incidents/{id}/analysis P1
AC-3.3.5 MTTR statistics are computed GET /admin/downtime/statistics P1

Code References:

  • src/routes/admin.py — Downtime tracking endpoints

3.4 Model Quality Scoring & Benchmarks

Status: Partial

What it does: Hardcoded quality priors for ~20 models (task-specific: simple_qa, code_gen, reasoning, etc.). SWE-bench/HumanEval in Code Router.

What it does NOT do: Not stored in DB. Not updatable without code change. Missing MMLU, MATH, MT-Bench, LMSYS Arena ELO, LiveBench.

# Criterion Verification Priority
AC-3.4.1 Code router tiers include SWE-bench and HumanEval scores GET /code-router/tiers P0
AC-3.4.2 Model selector uses quality priors for task-specific routing Verify model_selector.py quality maps P1

Known Issues: Quality data is static/hardcoded, not from DB. Missing several major benchmarks. No dynamic updating.


3.5 Per-Customer Quality Tracking

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-3.5.1 Per-customer success rates are tracked per model Check customer-model analytics Deferred (D-19)

3.6 Provider Credit Monitoring

Status: Partial (OpenRouter only)

What it does: Tracks upstream provider credit balances. OpenRouter: full implementation with API call, 15-min cache, threshold alerts (critical $5, warning $20, info $50).

# Criterion Verification Priority
AC-3.6.1 GET /api/provider-credits/balance returns credit balances for monitored providers Inspect response P0
AC-3.6.2 OpenRouter balance is cached for 15 minutes Check timing of two consecutive calls P1
AC-3.6.3 Threshold alerts fire at critical ($5), warning ($20), info ($50) Verify alert logic P1

Code References:

  • src/services/provider_credit_monitor.py (lines 33-138) — OpenRouter implementation
  • Lines 165-167 — TODO stubs for all other providers

Known Issues:

  • P1-1 (Delta Report): Only OpenRouter implemented. 29 other providers have TODO stubs. No preemptive deprioritization in failover chain.

Layer 4: Caching System

4.1 Semantic Cache

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-4.1.1 Semantically similar prompts return cached responses (cosine similarity >0.95) Test with paraphrased prompt Deferred (D-8)

4.2 Exact-Match Response Cache

Status: Not Implemented (Deferred — infrastructure exists but not wired)

# Criterion Verification Priority
AC-4.2.1 Identical inference requests (same messages + model + params) return cached response Send same request twice, compare latency Deferred (D-9)

Code References:

  • src/services/response_cache.py — SHA-256 hashing, Redis + in-memory fallback exists but NOT wired into inference path

4.3 External Cache (Butter.dev)

Status: Partial (Ghost Feature — P0 issue)

What it does: Butter.dev proxy used for all requests. User preference endpoints exist but are ignored.

# Criterion Verification Priority
AC-4.3.1 If Butter cache settings endpoints exist, user preference MUST be respected during inference Set enable_butter_cache=false, verify Butter proxy is NOT used P0 (KNOWN BUG)
AC-4.3.2 OR: Butter cache settings endpoints are removed entirely Verify endpoints don't exist P0 Alternative

Code References:

  • src/routes/users.py (lines 305-408) — GET/PUT /user/cache-settings exist, store preference
  • src/routes/chat.py (line 697) — Always calls get_butter_pooled_async_client() without checking preference

Known Issues:

  • P0-1 (Delta Report): Ghost feature. User can toggle a setting that does nothing. Trust-eroding.

4.4 Supporting Caches

Status: Complete

# Criterion Verification Priority
AC-4.4.1 Catalog endpoint responds in sub-100ms on cache hit Time GET /v1/models on second request P0
AC-4.4.2 Auth cache reduces lookup latency from ~100ms to <5ms Compare first vs second auth timing P1
AC-4.4.3 When Redis is down, local memory cache activates — no requests blocked Stop Redis, verify normal operation P0
AC-4.4.4 Cache invalidation clears all layers POST /admin/cache/clear, verify fresh data P1
AC-4.4.5 Stampede protection prevents multiple simultaneous cache rebuilds Concurrent requests to cold cache P1

Layer 5: Model Catalog

5.1 Background Model Sync

Status: Complete

# Criterion Verification Priority
AC-5.1.1 Model sync can be triggered incrementally and fully POST /admin/model-sync/trigger and /all P0
AC-5.1.2 If provider API is down, last synced catalog is served Verify stale catalog on provider failure P0
AC-5.1.3 Per-provider sync works POST /admin/model-sync/provider/{slug} P1
AC-5.1.4 Full resync (delete + reimport) works POST /admin/model-sync/full P1

5.2 Model Metadata Standard

Status: Complete

# Criterion Verification Priority
AC-5.2.1 Every model in GET /v1/models has id, name, provider_slug, context_length, and pricing Inspect response schema P0
AC-5.2.2 No model has null or zero pricing for both prompt and completion Scan all models in response P0 (see 5.3)

5.3 Catalog Inclusion Requirements

Status: Partial (gating not enforced at sync)

# Criterion Verification Priority
AC-5.3.1 Models without pricing are rejected during sync (not visible to users) Check catalog for models with null pricing P1 (KNOWN BUG)
AC-5.3.2 GET /v1/models/unique returns no duplicate model IDs Check for uniqueness P0
AC-5.3.3 High-value models without explicit pricing are BLOCKED, not served at default rate Verify pricing guard for GPT-4, Claude, Gemini P0

Code References:

  • src/services/model_catalog_sync.pyextract_pricing() (lines 136-153) returns all None for missing pricing. Line 368 checks if any(pricing.values()) but is non-blocking — models ARE synced without pricing.
  • src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS guard raises ValueError on default pricing fallback

Known Issues:

  • P1-3 (Delta Report): Models without pricing are synced into the catalog. Non-high-value models without pricing fall to default ($0.00002/token) — potential under-billing.

5.4 HuggingFace Enrichment

Status: Complete

# Criterion Verification Priority
AC-5.4.1 Model detail returns HuggingFace data (downloads, likes, parameters) when available GET /api/models/detail?model_id=meta-llama/... P1
AC-5.4.2 HuggingFace data is cached with TTL Verify caching on repeated requests P1

5.5 Model Discovery & Search

Status: Complete

# Criterion Verification Priority
AC-5.5.1 GET /v1/models?provider=fireworks returns only Fireworks models Filter and verify P0
AC-5.5.2 GET /v1/models/search?q=llama returns matching models Verify results P0
AC-5.5.3 GET /v1/models/trending returns models ranked by usage Inspect response P1
AC-5.5.4 GET /v1/gateways returns all gateways with name, color, priority, site_url Inspect response P0
AC-5.5.5 Model comparison works across providers GET /v1/models/{provider}/{model}/compare P1

Layer 6: Business

6.1 Credit System

Status: Complete (with atomicity concern on legacy path)

What it does: Atomic billing unit. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Pre-flight checks, idempotent deductions (UNIQUE constraint + RPC), subscription allowance consumed first, auto-refund on provider errors.

What it does NOT do: No real-time credit streaming during generation. No credit expiration. No rollover. No credit transfers. No multi-currency.

# Criterion Verification Priority
AC-6.1.1 Pre-flight check: user with 0 credits receives 402 BEFORE any provider call POST with 0-credit user, verify no upstream call P0
AC-6.1.2 Idempotent deduction: same request ID sent twice deducts credits only once POST twice with same X-Request-ID P0
AC-6.1.3 Subscription allowance consumed before purchased credits User with both: make request, verify subscription decreases first P0
AC-6.1.4 Provider 5xx error → automatic credit refund Trigger 5xx, verify refund in credit_transactions P0
AC-6.1.5 Provider timeout → automatic credit refund Trigger timeout, verify refund P0
AC-6.1.6 Provider 4xx error (user error) → NO refund Trigger 4xx, verify no refund P0
AC-6.1.7 High-value models (GPT-4, Claude, Gemini, o1/o3/o4) blocked if pricing falls to default Verify pricing guard fires for each pattern P0
AC-6.1.8 Credit transactions logged with request_id, user_id, model, token counts, cost Check credit_transactions table P0
AC-6.1.9 Balance update and transaction log happen atomically (single DB transaction via RPC) Verify atomic_deduct_credits RPC is used P0
AC-6.1.10 Legacy fallback path either doesn't exist or handles transaction logging failure safely Verify legacy path behavior on logging failure P0 (KNOWN RISK)
AC-6.1.11 Credit transaction history is paginated GET /credits/transactions?limit=10 P1
AC-6.1.12 Admin can add/adjust/refund credits POST /credits/add, /adjust, /refund P1
AC-6.1.13 Daily usage cap prevents runaway costs Exceed daily limit, verify 402 P1
AC-6.1.14 request_id has UNIQUE constraint in DB (belt-and-suspenders idempotency) Check migration 20260223000001_add_request_id_to_credit_transactions.sql P0

Code References:

  • src/db/users.py (lines 701-1106) — Credit deduction
    • Atomic RPC path (lines 862-967) — Correct
    • Legacy fallback path (lines 987-1096) — Risk: two separate calls, if logging fails credits already deducted (lines 1077-1082)
  • src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS
  • src/routes/chat.py (lines 1670-1742) — Auto-refund logic

Known Issues:

  • P0-2: Legacy fallback path may create orphaned deductions (balance reduced, no transaction record)
  • P0-3: Pricing guard needs end-to-end verification — must fire BEFORE provider call
  • P0-4: Auto-refund needs integration testing for edge cases (partial stream, refund failure)

6.2 Plans & Tiers

Status: Complete

# Criterion Verification Priority
AC-6.2.1 New user gets $5 credits and trial expiring in 3 days Register, check balance + trial_end P0 (config mismatch)
AC-6.2.2 Trial user can make requests until credits/limits exhausted Make requests during trial P0
AC-6.2.3 Expired trial returns 402 for paid models POST after trial expiry P0
AC-6.2.4 Expired trial CAN access :free suffix models POST with :free model after expiry P0
AC-6.2.5 GET /plans returns available plan tiers with pricing Inspect response P0
AC-6.2.6 GET /trial/status returns active/expired and days remaining Check response P0
AC-6.2.7 Unused subscription allowance does NOT roll over (resets monthly) Verify at month boundary P1
AC-6.2.8 Purchased credits never expire and survive plan changes Change plan, verify credits persist P1
AC-6.2.9 Daily trial limit ($1/day) is enforced Exceed $1 in trial, verify blocking P0

Known Issues:

  • P0-7 (Delta Report): Trial config mismatch — CLAUDE.md says $5, wiki says $10, code says $5. Must reconcile.
  • src/config/usage_limits.py — Trial: $5, 3 days, $1/day
  • src/db/trials.py (line 44) — Formula trial_days * 5 suggests variable durations

6.3 Customer Usage Analytics

Status: Partial

# Criterion Verification Priority
AC-6.3.1 User can view activity stats (total requests/tokens/spend by model/provider) GET /user/activity/stats P0
AC-6.3.2 Activity log is paginated (limit 1-1000) GET /user/activity/log?limit=50 P0
AC-6.3.3 Activity log total field returns actual DB total, not page count Verify total vs count P1 (KNOWN BUG)
AC-6.3.4 Per-API-key usage breakdown is available GET /user/api-keys/{key_id}/usage P2 (NOT IMPLEMENTED)
AC-6.3.5 CSV/JSON export is available GET /user/usage/export?format=csv P2 (NOT IMPLEMENTED)

Known Issues:

  • P1-4 (Delta Report): src/routes/users.py (line 515) — "total": len(transactions) returns page count, not DB total
  • P2-1: activity_log stores user_id but NOT api_key_id — no per-key breakdown

6.4 Customer Webhooks

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-6.4.1 Outbound webhook delivery for credits.low, credits.depleted, model.degraded events Configure webhook, trigger events Deferred (D-10)

6.5 SLA Tracking

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-6.5.1 Per-tier SLA violations are detected with auto credit-back compensation Monitor SLA metrics Deferred (D-14)

Layer 7: Developer Platform

7.1 Prompt Management

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-7.1.1 Template library with versioning CRUD on prompt templates Deferred (D-12)

7.2 Batch / Async Inference

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-7.2.1 POST /v1/batch/jobs submits bulk workloads Submit batch job Deferred (D-11)

7.3 Evaluation & Testing

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-7.3.1 Side-by-side model comparison for same prompt Compare endpoint Deferred (D-13)

7.4 Playground

Status: Not Implemented (Deferred — frontend-coupled)

# Criterion Verification Priority
AC-7.4.1 Interactive prompt testing UI Access playground Deferred

Layer 8: Observability

8.1 Internal Metrics & Dashboards

Status: Complete

# Criterion Verification Priority
AC-8.1.1 GET /metrics returns valid Prometheus text format Parse response P0
AC-8.1.2 OpenMetrics format with exemplar support is available via content negotiation Accept: application/openmetrics-text P1
AC-8.1.3 Parsed metrics include p50, p95, p99 latency percentiles GET /api/metrics/parsed P0
AC-8.1.4 Real-time stats update within 60 seconds of new requests GET /api/monitoring/stats/realtime P1
AC-8.1.5 Error rates tracked per provider and per model GET /api/monitoring/error-rates P1
AC-8.1.6 Anomaly detection flags unusual patterns GET /api/monitoring/anomalies P1
AC-8.1.7 Grafana SimpleJSON datasource protocol fully implemented GET /prometheus/datasource (test), POST /search, /query P1

8.2 Distributed Tracing

Status: Complete

# Criterion Verification Priority
AC-8.2.1 OpenTelemetry traces are initialized and exportable GET /api/instrumentation/health P0
AC-8.2.2 Every request gets a trace ID linking middleware → auth → routing → provider → billing Inspect trace in Tempo P1
AC-8.2.3 Exemplar linking from metrics to traces works Verify in Grafana P2

8.3 Error Tracking

Status: Complete

# Criterion Verification Priority
AC-8.3.1 Autonomous error monitor status is retrievable GET /error-monitor/autonomous/status P0
AC-8.3.2 Dashboard provides error landscape overview GET /error-monitor/dashboard P0
AC-8.3.3 Recent errors sorted by recency GET /error-monitor/errors/recent P0
AC-8.3.4 Critical errors flagged separately GET /error-monitor/errors/critical P0
AC-8.3.5 Error patterns detect recurring issues GET /error-monitor/errors/patterns P1
AC-8.3.6 AI fix suggestions generated via Claude POST /error-monitor/fixes/generate-for-error P2

Note: All error monitor endpoints require NO auth (all public). Error patterns are in-memory only — lost on restart.


8.4 AI-Specific Tracing

Status: Partial

# Criterion Verification Priority
AC-8.4.1 Arize Phoenix config exists and is functional Check Arize initialization P2
AC-8.4.2 OpenTelemetry captures inference metadata (model, tokens, latency) Inspect trace attributes P1

Known Issues: Arize Phoenix not exposed via API. Braintrust not integrated. No prompt/response pair recording.


8.5 Profiling

Status: Complete

# Criterion Verification Priority
AC-8.5.1 Pyroscope profiling tags cache/Redis layers with operation context Verify tag presence in Pyroscope P1
AC-8.5.2 Profiling does not add measurable latency to requests Compare request times with/without profiling P1

8.6 Customer-Facing Observability

Status: Partial

# Criterion Verification Priority
AC-8.6.1 User can view their own usage dashboard data GET /user/activity/stats, GET /user/monitor P0
AC-8.6.2 Model health status visible to users GET /v1/model-health P0
AC-8.6.3 Public status page with provider/model availability GET /v1/status/, GET /v1/status/providers P0
AC-8.6.4 Latency percentiles exposed to customers GET /user/latency?model=... P2 (NOT IMPLEMENTED)

Layer 9: API Compatibility

9.1 OpenAI-Compatible API

Status: Complete

What it does: POST /v1/chat/completions — full drop-in replacement. Streaming SSE, tool/function calling, JSON mode, logprobs. Any OpenAI SDK app works by changing base URL.

# Criterion Verification Priority
AC-9.1.1 Non-streaming returns 200 with choices[0].message.content, usage.prompt_tokens, usage.completion_tokens POST with stream: false P0
AC-9.1.2 Streaming returns SSE where each line starts with data: , ends with data: [DONE] POST with stream: true P0
AC-9.1.3 response_format: {"type": "json_object"} returns valid parseable JSON POST with JSON mode P0
AC-9.1.4 tools array returns tool_calls when model decides to call a tool POST with tool definitions P0
AC-9.1.5 logprobs: true returns a logprobs field POST with logprobs P1
AC-9.1.6 OpenAI Python SDK works with zero changes beyond base_url and api_key openai.OpenAI(base_url="$BASE/v1") P0
AC-9.1.7 All inference errors use OpenAI-compatible format: {"error": {"message": "...", "type": "...", "code": "..."}} Trigger errors, inspect format P1 (KNOWN ISSUE)
AC-9.1.8 Unauthenticated request with whitelisted model returns 200 POST without auth header P0
AC-9.1.9 Unauthenticated request with non-whitelisted model returns 401/403 POST without auth header P0
AC-9.1.10 Streaming normalization handles OpenAI, Gemini, Anthropic, Fireworks formats Test stream from each provider type P0
AC-9.1.11 Unrecognized streaming format logs a warning (not silently dropped) Check logs for dropped chunks P1 (KNOWN BUG)

Known Issues:

  • P1-2: ~5% of errors use FastAPI default {"detail": "..."} instead of OpenAI format — breaks SDK error handling
  • P1-8: Stream normalizer returns None for unrecognized chunks (silently dropped, no warning)

9.2 Anthropic-Compatible API

Status: Complete

# Criterion Verification Priority
AC-9.2.1 Non-streaming returns 200 with content[0].text, usage.input_tokens, usage.output_tokens in Anthropic format POST /v1/messages P0
AC-9.2.2 Streaming returns SSE in Anthropic format (message_start, content_block_delta, message_stop) POST with stream: true P0
AC-9.2.3 Credits deducted using Anthropic token counts Compare balance before/after P0
AC-9.2.4 Anthropic Python SDK works with zero changes beyond base_url and api_key anthropic.Anthropic(base_url="$BASE/v1") P0

Layer 10: Infrastructure & Deployment

10.1 Multi-Region Routing

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-10.1.1 Requests routed to nearest provider region for lowest latency Test from different regions Deferred (D-15)

10.2 Data Residency

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-10.2.1 EU customers' requests routed to EU-based providers Test with EU IP Deferred (D-16)

10.3 Multi-Target Deployment

Status: Complete

# Criterion Verification Priority
AC-10.3.1 Vercel serverless deployment works via api/index.py Deploy to Vercel P0
AC-10.3.2 Railway/Docker deployment works via start.sh Deploy to Railway P0
AC-10.3.3 Dev server starts with python src/main.py or uvicorn src.main:app --reload Start locally P0

Cross-Cutting Features

CC-1: Stripe Payments

Status: Complete

# Criterion Verification Priority
AC-CC.1.1 GET /api/stripe/credit-packages returns available packages (public, no auth) Inspect response P0
AC-CC.1.2 POST /api/stripe/checkout-session returns valid Stripe checkout URL Create session P0
AC-CC.1.3 Successful payment webhook adds credits to user's balance Simulate payment_intent.succeeded webhook P0
AC-CC.1.4 Webhook endpoint ALWAYS returns 200, even if processing fails Send malformed webhook P0
AC-CC.1.5 Payment history is paginated with amount, date, status GET /api/stripe/payments P0
AC-CC.1.6 Subscription checkout creates Stripe subscription and assigns plan POST /api/stripe/subscription-checkout P0
AC-CC.1.7 Subscription upgrade/downgrade/cancel work Test each operation P1
AC-CC.1.8 Webhook handles all events: payment_intent.succeeded, charge.succeeded, invoice.paid, customer.subscription.created Test each event type P0

CC-2: Coupons

Status: Complete

# Criterion Verification Priority
AC-CC.2.1 Valid coupon redeems and adds correct credit amount POST /coupons/redeem P0
AC-CC.2.2 Expired coupon returns 400 Redeem expired code P0
AC-CC.2.3 Already-redeemed coupon (same user) returns 400 Redeem twice P0
AC-CC.2.4 User-specific coupon redeemed by wrong user returns 400/403 Redeem with different user P0
AC-CC.2.5 GET /coupons/available returns global + user-targeted coupons Inspect response P1
AC-CC.2.6 Redemption history shows past redemptions GET /coupons/history P1

CC-3: Referrals

Status: Complete

# Criterion Verification Priority
AC-CC.3.1 User generates unique referral code POST /referral/generate P0
AC-CC.3.2 Referral code validates successfully POST /referral/validate P0
AC-CC.3.3 Self-referral is prevented Attempt self-referral P0
AC-CC.3.4 Referral stats show total referred, conversions, rewards GET /referral/stats P1
AC-CC.3.5 Successful referral grants $10 credits to both parties on first $10+ purchase Complete referral flow P0

CC-4: Chat History & Sessions

Status: Complete

# Criterion Verification Priority
AC-CC.4.1 Sessions can be created, listed, updated, deleted CRUD on /v1/chat/sessions/* P0
AC-CC.4.2 Messages can be saved individually and in batch POST single and batch P0
AC-CC.4.3 Full-text search returns matching sessions POST /v1/chat/search P0
AC-CC.4.4 Duplicate messages are deduplicated Save same message twice, verify single entry P1
AC-CC.4.5 Chat stats return accurate usage data GET /v1/chat/stats P1
AC-CC.4.6 Share links provide public read-only access Create share, access without auth P1
AC-CC.4.7 Feedback CRUD (create, read, update, delete) works per session CRUD on /v1/chat/feedback/* P1

CC-5: API Key Management

Status: Complete

# Criterion Verification Priority
AC-CC.5.1 Created key is in gw_{env}_* format POST /user/api-keys P0
AC-CC.5.2 Key creation rate-limited to 10 per hour; 11th returns 429 Create 11 keys P0
AC-CC.5.3 Keys can be listed showing all active keys GET /user/api-keys P0
AC-CC.5.4 Keys can be updated (name, restrictions) PUT /user/api-keys/{key_id} P0
AC-CC.5.5 Keys can be deleted DELETE /user/api-keys/{key_id} P0
AC-CC.5.6 Deleted key no longer authenticates (returns 401) Use deleted key P0
AC-CC.5.7 Audit logs record key creation, usage, deletion GET /user/api-keys/audit-logs P1

CC-6: Image Generation

Status: Complete

# Criterion Verification Priority
AC-CC.6.1 POST /v1/images/generations returns 200 with image data or URL POST with prompt P0
AC-CC.6.2 Credits deducted based on image generation pricing Compare balance before/after P0
AC-CC.6.3 0-credit user receives 402 POST with 0-credit user P0

CC-7: Audio Transcription

Status: Complete

# Criterion Verification Priority
AC-CC.7.1 File upload transcription returns 200 with text POST with audio file P0
AC-CC.7.2 Base64 transcription returns 200 POST /v1/audio/transcriptions/base64 P0
AC-CC.7.3 Unsupported format returns appropriate error POST with invalid format P1

CC-8: Server-Side Tools

Status: Complete

# Criterion Verification Priority
AC-CC.8.1 GET /v1/tools returns available tools (web_search, text_to_speech) Inspect response P0
AC-CC.8.2 Tool definitions in OpenAI function-calling format GET /v1/tools/definitions P0
AC-CC.8.3 Nonexistent tool returns 404 GET /v1/tools/fake_tool P0
AC-CC.8.4 Web search execution returns results POST /v1/tools/execute with web_search P0
AC-CC.8.5 SSRF protection blocks internal/private IP ranges Attempt internal URL in tool execution P0

CC-9: Partner Trials

Status: Complete

# Criterion Verification Priority
AC-CC.9.1 Partner config is publicly accessible GET /partner-trials/config/{code} P0
AC-CC.9.2 Partner code check always returns 200 (valid/invalid in body) GET /partner-trials/check/{code} P0
AC-CC.9.3 Starting partner trial applies partner-specific credits and limits POST /partner-trials/start with known partner code P0
AC-CC.9.4 Partner trial daily limit is enforced Exceed daily limit P0
AC-CC.9.5 Partner trial config is cached (5-min in-memory) Check timing P1

CC-10: Notifications

Status: Complete (partial test coverage)

# Criterion Verification Priority
AC-CC.10.1 User can retrieve notification preferences GET /user/notifications/preferences P0
AC-CC.10.2 Usage report can be triggered on demand POST /user/notifications/send-usage-report P0
AC-CC.10.3 Test notification sends successfully POST /user/notifications/test P0
AC-CC.10.4 Notification failure does not crash the system Disable Resend, verify graceful handling P0
AC-CC.10.5 Retry logic on notification delivery failure Verify 2-3 retries with backoff P2 (NOT IMPLEMENTED)

Known Issues:

  • P2-4 (Delta Report): No retry logic, no persistent delivery tracking. On failure: logs error, returns False, continues silently.

CC-11: Admin Operations

Status: Complete

# Criterion Verification Priority
AC-CC.11.1 Non-admin users receive 403 on ALL admin endpoints Use user key on admin endpoint P0
AC-CC.11.2 Admin can list, search, view user details GET /admin/users, /admin/users/{id} P0
AC-CC.11.3 Admin credit grants respect per-transaction cap and 24h daily limit Exceed limits P0
AC-CC.11.4 Admin can assign plans POST /admin/assign-plan P0
AC-CC.11.5 System monitor returns user counts, credit totals, API usage GET /admin/monitor P0
AC-CC.11.6 Cache operations work (status, refresh, clear) GET/POST cache endpoints P1
AC-CC.11.7 Model sync can be triggered POST /admin/model-sync/trigger P1
AC-CC.11.8 GET /admin/model-sync/providers requires admin auth Verify auth enforcement P0 (KNOWN RISK)
AC-CC.11.9 Bulk user delete by domain respects protected domains (gmail, yahoo, outlook) Attempt protected domain delete P0
AC-CC.11.10 Bulk user delete defaults to dry_run=true Verify default behavior P0

Known Issues:

  • P0-6 (Delta Report): GET /admin/model-sync/providers documented as "No auth enforced" — leaks infrastructure details (33 providers).

CC-12: Security

Status: Complete

# Criterion Verification Priority
AC-CC.12.1 API keys are Fernet-encrypted in DB Query DB directly P0
AC-CC.12.2 API key lookup uses HMAC hash, not brute-force decryption Verify code path P0
AC-CC.12.3 SQL injection attempts are sanitized/rejected '; DROP TABLE users; -- in inputs P0
AC-CC.12.4 XSS payloads are sanitized/rejected <script>alert(1)</script> in inputs P0
AC-CC.12.5 Command injection blocked ; rm -rf / in inputs P0
AC-CC.12.6 Path traversal blocked ../../etc/passwd in inputs P0
AC-CC.12.7 Error messages never expose stack traces, internal paths, or sensitive data Trigger errors, inspect responses P0
AC-CC.12.8 Admin security violations logged in audit trail Attempt unauthorized admin access P0
AC-CC.12.9 Temporary/disposable email domains detected during registration Register with user@tempmail.com P1

CC-13: Google Vertex Function Calling

Status: Partial

# Criterion Verification Priority
AC-CC.13.1 REST path function calling works (OpenAI tools → Vertex functionDeclarations) POST with tools to Vertex model via REST P0
AC-CC.13.2 SDK path function calling either works OR is avoided when tools present POST with tools via SDK path P1 (KNOWN BUG)
AC-CC.13.3 Tool choice options (auto, required, none) are translated correctly Test each tool_choice value P1

Code References:

  • src/services/google_vertex_client.py (lines 250-402, 662-707) — REST path implemented
  • Lines 585-587 — SDK path has TODO: "Function calling may not work correctly"

Known Issues:

  • P1-5 (Delta Report): SDK path has TODO. If SDK path is used when tools are present, function calling silently fails.

Summary Matrix

Layer Feature Criteria Status Known Issues
1 API Key Auth 10 Complete
1 RBAC 6 Complete
1 IP Allowlists 6 Complete
1 Domain Restrictions 4 Complete
1 Three-Layer Rate Limiting 14 Complete P0-5: Missing headers on L2/L3
1 Input Guardrails (4 features) 4 Not Implemented Deferred
1 Output Guardrails (3 features) 3 Not Implemented Deferred
2 Model Resolution 7 Complete
2 General Router 7 Complete
2 Code Router 7 Complete
2 Provider Failover 9 Complete
2 Circuit Breakers 10 Complete P1-7: Timing discrepancy (60s vs 5min)
2 Health-Weighted LB 2 Partial
2 Latency/Cost Optimal 2 Partial Hardcoded latency model
2 Traffic Splitting 1 Not Implemented Deferred
3 Tiered Health Monitoring 10 Complete
3 Passive Health Capture 2 Complete
3 Incident Management 5 Complete
3 Model Quality Scoring 2 Partial Static/hardcoded
3 Per-Customer Quality 1 Not Implemented Deferred
3 Provider Credit Monitoring 3 Partial P1-1: OpenRouter only
4 Semantic Cache 1 Not Implemented Deferred
4 Exact-Match Cache 1 Not Implemented Deferred (infra exists)
4 Butter.dev Cache 2 Partial P0-1: Ghost feature
4 Supporting Caches 5 Complete
5 Background Model Sync 4 Complete
5 Model Metadata Standard 2 Complete
5 Catalog Inclusion 3 Partial P1-3: No gating at sync
5 HuggingFace Enrichment 2 Complete
5 Model Discovery & Search 5 Complete
6 Credit System 14 Complete P0-2/3/4: Atomicity, pricing guard, refund
6 Plans & Tiers 9 Complete P0-7: Config mismatch
6 Customer Usage Analytics 5 Partial P1-4: Pagination bug, P2-1/2: Per-key, export
6 Customer Webhooks 1 Not Implemented Deferred
6 SLA Tracking 1 Not Implemented Deferred
7 Prompt Management 1 Not Implemented Deferred
7 Batch/Async Inference 1 Not Implemented Deferred
7 Evaluation & Testing 1 Not Implemented Deferred
7 Playground 1 Not Implemented Deferred
8 Metrics & Dashboards 7 Complete
8 Distributed Tracing 3 Complete
8 Error Tracking 6 Complete
8 AI-Specific Tracing 2 Partial Arize/Braintrust gaps
8 Profiling 2 Complete
8 Customer Observability 4 Partial P2-3: No latency API
9 OpenAI-Compatible API 11 Complete P1-2: Error format, P1-8: Stream drops
9 Anthropic-Compatible API 4 Complete
10 Multi-Region Routing 1 Not Implemented Deferred
10 Data Residency 1 Not Implemented Deferred
10 Multi-Target Deployment 3 Complete
CC Stripe Payments 8 Complete
CC Coupons 6 Complete
CC Referrals 5 Complete
CC Chat History 7 Complete
CC API Key Management 7 Complete
CC Image Generation 3 Complete
CC Audio Transcription 3 Complete
CC Server-Side Tools 5 Complete
CC Partner Trials 5 Complete
CC Notifications 5 Complete P2-4: No retry/delivery tracking
CC Admin Operations 10 Complete P0-6: Model-sync providers auth
CC Security 9 Complete
CC Google Vertex FC 3 Partial P1-5: SDK path TODO
TOTAL 323

Priority Summary

Priority Count Description
P0 7 bugs across 46 criteria Ghost features, billing atomicity, pricing guard, refund verification, rate limit headers, admin auth, trial config
P1 8 bugs across 28 criteria Provider monitoring, error format, catalog gating, pagination, Vertex FC, overage, circuit breaker timing, stream normalization
P2 4 gaps across 12 criteria Per-key usage, export, latency API, notification delivery
Deferred 20 features, 24 criteria Guardrails, caching, webhooks, batch, prompts, eval, SLA, geo-routing, GDPR, traffic splitting

Source: Conceptual Model Features | Features | Delta Report | Testing Plan | Acceptance Criteria

Clone this wiki locally