Skip to content

Features Acceptance Criteria

arminrad edited this page Mar 9, 2026 · 2 revisions

Features Acceptance Criteria

Detailed acceptance criteria for every Gatewayz feature, organized by the Conceptual Model's 10-layer architecture.

Each feature includes: what it must do, what it must NOT do, detailed acceptance criteria with verification methods, code-level references, known issues, and implementation status.

Derived from: Conceptual Model Features, Features, Delta Report, Testing Plan, Test Coverage Audit

Last Updated: 2026-03-09 | Version: 2.0.4


How to Read This Document

Each feature section includes:

  • Description: What the feature does and its boundaries (from Conceptual Model)
  • Implementation Status: Current state (Complete / Partial / Not Implemented)
  • Acceptance Criteria: Numbered, testable statements — a feature is accepted when ALL criteria pass
  • Code References: File paths and line numbers for verification
  • Known Issues: Bugs, gaps, or discrepancies found during code investigation
  • Priority: P0 (must fix before release), P1 (should fix), P2 (nice to have), Deferred (post-release)

Layer 1: Ingress

1.1 API Key Authentication

Status: Complete

What it does: Authenticates every API request using API keys encrypted at rest with Fernet AES-128. Keys are looked up via HMAC-SHA256 hash for O(log n) retrieval. Validates that keys are active, not expired, and not rate-limited.

What it does NOT do: No OAuth/JWT for API requests. No automatic key rotation. No multi-key auth per request.

# Criterion Verification Priority
AC-1.1.1 Valid API key in Authorization: Bearer gw_* header returns 200 Send request with valid key P0
AC-1.1.2 Invalid API key returns 401, never 200 or 500 Send request with Bearer invalid_key P0
AC-1.1.3 Expired API key returns 401 with clear message Use a key past its expires_at P0
AC-1.1.4 Deactivated API key (is_active=false) returns 401 Deactivate key, then use it P0
AC-1.1.5 API keys in DB are Fernet-encrypted ciphertext, never plaintext Query api_keys_new table directly, verify encrypted_key column is ciphertext P0
AC-1.1.6 Key lookup uses HMAC-SHA256 hash index, not brute-force decryption of all keys Verify key_hash column is indexed, lookup is O(log n) by timing with 1 key vs 1000 keys P0
AC-1.1.7 Key format is gw_{env}_{43_random_chars} (e.g., gw_live_abc123...) Create new key, verify format regex P0
AC-1.1.8 Key creation stores last4 characters for user-friendly identification Create key, check last4 field in response and DB P1
AC-1.1.9 Authentication is cached (5-min TTL, 512-entry LRU) — second request with same key is faster Time two consecutive auth calls, second should be <5ms vs 50-150ms P1
AC-1.1.10 When Redis is down, auth cache falls back to local memory — requests are never blocked Stop Redis, verify auth still works P0

Code References:

  • src/security/security.py — Fernet encryption, HMAC hashing
  • src/security/deps.pyget_api_key(), get_current_user(), validate_api_key_security()
  • src/db/api_keys.py — Key CRUD, key lookup by hash

1.2 Role-Based Access Control (RBAC)

Status: Complete

What it does: Assigns roles (admin, team, dev, free) to users. Permissions checked at dependency-injection level before route handlers execute. Role changes are audit-logged.

What it does NOT do: No granular per-model permissions. No custom roles. No team-level RBAC. No provider-level permissions.

# Criterion Verification Priority
AC-1.2.1 Non-admin API key returns 403 on ALL /admin/* endpoints GET /admin/users with user key P0
AC-1.2.2 Admin API key returns 200 on admin endpoints GET /admin/users with admin key P0
AC-1.2.3 Unauthorized admin access attempts are logged via audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS") Attempt admin access with user key, check audit log P0
AC-1.2.4 Role updates require admin auth and are logged with a reason POST /admin/roles/update with user_id, new_role, reason P0
AC-1.2.5 GET /admin/roles/permissions/{role} returns the correct permission set for each role Check all 4 roles P1
AC-1.2.6 Role change audit log is retrievable at GET /admin/roles/audit/log Verify entries with timestamps and reasons P1

Code References:

  • src/security/deps.pyrequire_admin dependency
  • src/routes/admin.py — Admin route handlers
  • src/db/roles.py — Role management

1.3 Per-Key IP Allowlists

Status: Complete

What it does: Restricts API key usage to specific IP addresses or CIDR ranges. Requests from non-allowlisted IPs are rejected before processing.

What it does NOT do: No geo-based restrictions. No IPv6 ranges. No automatic IP suggestions.

# Criterion Verification Priority
AC-1.3.1 Admin can create IP allowlist entries with IPv4 addresses POST /api/admin/ip-whitelist with {"ip": "1.2.3.4"} P0
AC-1.3.2 Admin can create IP allowlist entries with CIDR notation POST /api/admin/ip-whitelist with {"ip": "10.0.0.0/24"} P0
AC-1.3.3 API key with allowlist rejects requests from non-allowed IPs with 403 Use key from IP not in allowlist P0
AC-1.3.4 API key with allowlist accepts requests from allowed IPs Use key from allowlisted IP P0
AC-1.3.5 POST /api/admin/ip-whitelist/check correctly reports allowed vs blocked IPs Test with both allowed and blocked IPs P1
AC-1.3.6 Allowlist entries can be listed, updated, and deleted CRUD operations on /api/admin/ip-whitelist/* P1

Code References:

  • src/routes/admin.py — IP allowlist endpoints
  • src/security/deps.py — IP validation in validate_api_key_security()

1.4 Domain Restrictions

Status: Complete

What it does: Limits which HTTP referrer domains can use a specific API key. Prevents stolen keys from being used on unauthorized domains.

What it does NOT do: No domain ownership validation. No subdomain wildcards. No server-side restriction (only applies when Referer header present).

# Criterion Verification Priority
AC-1.4.1 API key with domain restriction rejects requests with wrong Referer header Send request with Referer: https://unauthorized.com P0
AC-1.4.2 API key with domain restriction accepts requests with correct Referer Send request with configured Referer domain P0
AC-1.4.3 Requests without Referer header bypass domain restriction (server-side usage) Send request without Referer header P0
AC-1.4.4 Multiple domains can be configured per key Configure 3 domains, verify all work P1

Code References:

  • src/security/deps.pyvalidate_api_key_security() domain check

1.5 Three-Layer Rate Limiting

Status: Complete (with known header gap on Layers 2 and 3)

What it does:

  • Layer 1 (IP): Security middleware with behavioral analysis, velocity detection. 300 RPM for unauthenticated, authenticated users exempt.
  • Layer 2 (API Key): Redis-backed per-key limits tied to plan tier.
  • Layer 3 (Anonymous): Stricter limits for unauthenticated requests.
  • Fallback: In-memory rate limiter when Redis is unavailable.

What it does NOT do: No per-model rate limits. No burst/token-bucket. No cross-instance IP state sharing. Rejected requests consume zero credits.

# Criterion Verification Priority
AC-1.5.1 Unauthenticated requests exceeding 300 RPM from same IP receive 429 Send 301 requests from one IP P0
AC-1.5.2 Authenticated users are exempt from IP-level rate limiting Verify no IP block on auth requests P0
AC-1.5.3 API key exceeding plan RPM receives 429 Exceed per-key limit P0
AC-1.5.4 Anonymous rate limits are stricter than authenticated limits Compare thresholds for anon vs auth P0
AC-1.5.5 Layer 1 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-RateLimit-Reason, X-RateLimit-Mode Trigger Layer 1 429, inspect headers P0
AC-1.5.6 Layer 2 429 response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset Trigger Layer 2 429, inspect headers P0 (KNOWN BUG)
AC-1.5.7 Layer 3 429 response includes Retry-After and X-RateLimit-* headers Trigger Layer 3 429, inspect headers P0 (KNOWN BUG)
AC-1.5.8 When Redis is down, rate limiting continues via in-memory fallback — requests are never blocked Stop Redis, verify rate limiting works P0
AC-1.5.9 Velocity mode activates when error rate exceeds 25% and reduces limits to 50% Trigger >25% error rate, check GET /velocity-mode-status P0
AC-1.5.10 Velocity mode deactivates after 3 minutes of normal error rates Wait for cooldown, verify normal limits restored P1
AC-1.5.11 Rate limit configuration viewable at GET /user/rate-limits Check response format P1
AC-1.5.12 Per-key rate limits updatable via PUT /user/rate-limits/{key_id} Update and verify enforcement P1
AC-1.5.13 Auth endpoint rate-limits to 10 requests per 15 minutes per IP POST /auth 11 times, 11th returns 429 P0
AC-1.5.14 Registration rate-limits to 3 requests per hour per IP POST /auth/register 4 times, 4th returns 429 P0

Code References:

  • src/middleware/security_middleware.py (lines 647-716) — Layer 1, headers present
  • src/services/rate_limiting.py (lines 78-94) — Layer 2, RateLimitResult dataclass has fields but NOT converted to HTTP headers
  • src/services/anonymous_rate_limiter.py — Layer 3, NO headers
  • src/services/rate_limiting_fallback.py — In-memory fallback

Known Issues:

  • P0-5 (Delta Report): Layer 2 RateLimitResult fields exist but are not converted to HTTP response headers. Layer 3 has no rate limit headers at all. Clients get bare 429 rejections with no retry information.

1.6–1.9 Input Guardrails (PII Detection, Prompt Injection, Topic Restrictions, Content Moderation)

Status: Not Implemented (Deferred)

What these would do: PII scanning (phone, SSN, email, credit card), prompt injection pattern detection, per-key topic restrictions, content moderation via external classifiers.

# Criterion Verification Priority
AC-1.6.1 PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers Send prompt with PII Deferred
AC-1.7.1 Prompt injection patterns that attempt to override system prompts are detected and blocked Send known injection pattern Deferred
AC-1.8.1 Per-API-key topic restrictions limit responses to configured domains Configure restriction, test out-of-domain Deferred
AC-1.9.1 Content moderation blocks harmful inputs before reaching providers Send harmful content Deferred

Note: These are Conceptual Model features (D-1 through D-4 in Delta Report). Not required for stable release. No code exists.


1.10–1.12 Output Guardrails (Content Filtering, Structured Output Validation, Hallucination Flags)

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-1.10.1 Output content filtering scans responses for policy violations before returning Trigger policy-violating response Deferred
AC-1.11.1 Structured output validation confirms JSON schema conformance when requested Request JSON schema output Deferred (D-5, Small effort)
AC-1.12.1 Provider-side safety metadata (refusals, safety triggers) is surfaced in standardized format Trigger safety filter, inspect response Deferred

Layer 2: Core Routing Engine

2.1 Model Resolution Pipeline

Status: Complete

What it does: Three-stage pipeline: alias normalization (120+ aliases) → provider detection (overrides → format rules → mapping tables → org-prefix fallbacks) → model ID transformation (provider-native format).

What it does NOT do: No user-defined aliases. No version/snapshot resolution. No per-modality routing differences.

# Criterion Verification Priority
AC-2.1.1 gpt-4o resolves to openai/gpt-4o POST /v1/chat/completions with model: "gpt-4o" P0
AC-2.1.2 r1 resolves to deepseek/deepseek-r1 POST /v1/chat/completions with model: "r1" P0
AC-2.1.3 Canonical IDs (e.g., openai/gpt-4o) work directly without alias resolution POST with canonical ID P0
AC-2.1.4 Provider detection correctly routes google/gemini-* models to Vertex when credentials available POST with Gemini model P0
AC-2.1.5 No alias maps to itself (no self-referencing loops) Inspect MODEL_ALIASES dict for cycles P0
AC-2.1.6 Fireworks model IDs are transformed to accounts/fireworks/models/... format POST with Fireworks model, verify upstream call format P1
AC-2.1.7 Nonexistent model returns 400 or 404, not 500 POST with model: "nonexistent/model" P0

Code References:

  • src/services/models.pyMODEL_ALIASES dict, resolution pipeline
  • src/services/model_transformations.py — Provider-specific ID transformations
  • src/services/model_availability.py — Availability checking

2.2 Intelligent Routing — General Router

Status: Complete

What it does: ML-powered model selection via NotDiamond. Four modes: quality (openai/gpt-4o), cost (openai/gpt-4o-mini), latency (groq/llama-3.3-70b-versatile), balanced (anthropic/claude-sonnet-4). Falls back to mode-specific defaults when NotDiamond unavailable.

What it does NOT do: No user feedback learning. No custom model pools. No routing constraints.

# Criterion Verification Priority
AC-2.2.1 router:general:quality selects a high-quality model and returns 200 POST chat with model: "router:general:quality" P0
AC-2.2.2 router:general:cost selects a cheaper model than quality mode Compare selected models for same prompt P0
AC-2.2.3 router:general:latency selects a low-latency model POST and verify selection P0
AC-2.2.4 router:general:balanced considers quality, cost, and latency POST and verify selection P0
AC-2.2.5 When NotDiamond is unavailable, fallback models are used without error Disable NotDiamond, verify graceful fallback P0
AC-2.2.6 GET /general-router/settings/options returns available strategies and model pools Inspect response P1
AC-2.2.7 POST /general-router/test returns selected model + reasoning POST with sample prompt P1

Code References:

  • src/services/general_router.py — Routing logic, NotDiamond integration
  • src/routes/general_router.py — Endpoints

2.3 Intelligent Routing — Code Router

Status: Complete

What it does: Benchmark-driven model selection for coding tasks. 4 tiers by SWE-bench/HumanEval scores. Modes: auto (complexity-based), price, quality, agentic. Static data from code_quality_priors.json.

What it does NOT do: No code execution. No feedback learning. No custom tiers. No language detection.

# Criterion Verification Priority
AC-2.3.1 router:code:auto classifies prompt complexity and selects appropriate tier POST with code prompt P0
AC-2.3.2 router:code:quality selects highest-tier code model POST and verify P0
AC-2.3.3 router:code:price selects cost-effective code model POST and verify P0
AC-2.3.4 router:code:agentic selects model optimized for multi-step tool use POST and verify P0
AC-2.3.5 GET /code-router/tiers returns models with SWE-bench/HumanEval scores Inspect response P0
AC-2.3.6 Code router works entirely from in-memory data (no DB/Redis dependency) Verify response with Redis down P0
AC-2.3.7 POST /code-router/test returns selected model and routing rationale POST with sample prompt P1

Code References:

  • src/services/code_router.py — Routing logic, tier selection
  • src/services/code_quality_priors.json — Static benchmark data
  • src/routes/code_router.py — Endpoints

2.4 Provider Failover

Status: Complete

What it does: 14-provider prioritized failover chain. Failover triggers on 401/402/403/404/502/503/504. Does NOT trigger on 400 (user error) or 429 (retries with backoff). Model-aware rules: OpenAI → OpenRouter only, Anthropic → OpenRouter only, open-source → all providers.

What it does NOT do: No mid-stream failover. No user-configured chains. No same-pricing guarantee across providers.

# Criterion Verification Priority
AC-2.4.1 Primary provider 502/503/504 → request succeeds via fallback transparently Force primary failure, verify success P0
AC-2.4.2 Provider 401/402/403/404 → failover to next provider Force auth error, verify failover P0
AC-2.4.3 Provider 400 (user error) → returns 400 to user, NO failover Send malformed request P0
AC-2.4.4 Provider 429 → retries with backoff, does NOT failover Trigger rate limit, verify retry behavior P0
AC-2.4.5 OpenAI models only failover to OpenAI → OpenRouter Inspect failover chain for openai/gpt-4o P0
AC-2.4.6 Anthropic models only failover to Anthropic → OpenRouter Inspect failover chain for anthropic/claude-sonnet-4 P0
AC-2.4.7 Open-source models can failover across all providers Inspect chain for meta-llama/llama-3-70b P0
AC-2.4.8 Failover chain skips providers with OPEN circuit breakers Open a breaker, verify provider is skipped P0
AC-2.4.9 User receives no indication of failover (transparent to client) Monitor response during failover P0

Code References:

  • src/services/provider_failover.py — Failover chain construction, error classification
  • src/routes/chat.pybuild_provider_failover_chain() integration

2.5 Circuit Breakers

Status: Complete (with timing discrepancy)

What it does: Per-provider circuit breakers. CLOSED → OPEN (5 consecutive failures) → HALF_OPEN (after timeout) → CLOSED (3 consecutive successes) or back to OPEN. Redis + in-memory state.

What it does NOT do: No per-provider threshold configuration. No error-type differentiation. No operator alerts. No persistent state beyond Redis.

# Criterion Verification Priority
AC-2.5.1 New provider starts in CLOSED state GET /circuit-breakers/{new_provider} P0
AC-2.5.2 After 5 consecutive failures, state transitions to OPEN Send 5 failing requests, check state P0
AC-2.5.3 OPEN state prevents requests to that provider Verify provider is skipped in failover P0
AC-2.5.4 After timeout period, OPEN transitions to HALF_OPEN Wait for timeout, check state P0
AC-2.5.5 In HALF_OPEN, a successful request transitions to CLOSED Send success, check state P0
AC-2.5.6 In HALF_OPEN, a failed request transitions back to OPEN Send failure, check state P0
AC-2.5.7 POST /circuit-breakers/{provider}/reset resets to CLOSED Reset and verify P0
AC-2.5.8 POST /circuit-breakers/reset-all resets all breakers Reset all and verify P0
AC-2.5.9 Circuit breaker endpoints require NO auth (public) Verify no auth needed P1
AC-2.5.10 Prometheus metrics emitted on state transitions Check circuit_breaker_state_transitions_total P1

Code References:

  • src/services/circuit_breaker.py (line 67) — Default timeout 60 seconds
  • Redis keys: circuit_breaker:{provider}:{state|failure_count|success_count|opened_at} (3600s TTL)

Known Issues:

  • P1-7 (Delta Report): Code uses 60-second timeout, but Conceptual Model says 5 minutes and wiki Testing Plan says 5 minutes. Either code or docs must be updated.

2.6 Health-Weighted Load Balancing

Status: Partial

What it does: Checks primary provider health score before routing. Below-threshold providers are demoted in failover chain.

What it does NOT do: No proportional traffic splitting by health score. No per-model health. No predictive health.

# Criterion Verification Priority
AC-2.6.1 When primary provider health is below threshold, a healthier provider is promoted Degrade a provider, verify chain reordering P1
AC-2.6.2 Health-based promotion is a binary decision (promote or don't) Verify no weighted splitting P1

2.7–2.8 Latency-Optimal / Cost-Optimal Selection

Status: Partial

What it does: Route to lowest-latency or cheapest provider for same model. General Router "latency" mode hardcodes to groq/llama-3.3-70b-versatile.

# Criterion Verification Priority
AC-2.7.1 Latency mode selects a low-latency provider Verify model selection via router P1
AC-2.8.1 Cost mode selects cheapest capable provider Compare pricing of selected vs alternatives P1

Known Issues: No dynamic latency-optimal selection — latency mode hardcodes a specific model rather than measuring real-time latency. Deferred for post-release.


2.9 Traffic Splitting

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-2.9.1 Traffic for same model is distributed across providers at configured ratios Monitor provider selection distribution Deferred (D-17)

Layer 3: Intelligence

3.1 Tiered Health Monitoring

Status: Complete

What it does: Continuous monitoring at intervals by tier: Critical (5min), Popular (30min), Standard (2-4hr), On-Demand (when requested). Health checks verify availability and latency.

# Criterion Verification Priority
AC-3.1.1 GET /health always returns 200, even when dependencies are degraded Call when DB is down P0
AC-3.1.2 Health response includes version, status, and timestamp Inspect response P0
AC-3.1.3 GET /health/system returns memory, CPU, and connection pool stats Inspect response P0
AC-3.1.4 Provider health scores are 0-100 per provider GET /health/providers P0
AC-3.1.5 Model health shows healthy, degraded, or down per model GET /health/models P0
AC-3.1.6 GET /health/quick is sub-millisecond (static response) Time the endpoint P1
AC-3.1.7 GET /health/railway returns comprehensive check (DB, Redis, providers) Inspect response P1
AC-3.1.8 Gateway health dashboard returns HTML and JSON formats GET /health/gateways/dashboard and /data P1
AC-3.1.9 Health insights provide actionable recommendations GET /health/insights P2
AC-3.1.10 Background monitoring can be started and stopped POST /health/monitoring/start, /stop P1

Code References:

  • src/services/intelligent_health_monitor.py — Tiered monitoring
  • src/services/autonomous_monitor.py — Background monitoring
  • src/routes/health.py — Health endpoints

3.2 Passive Health Capture

Status: Complete

What it does: Every real inference request contributes health data as a background task — success/failure, latency, tokens, provider response codes. Zero overhead on request path.

# Criterion Verification Priority
AC-3.2.1 Health data is captured after response is returned (no latency impact on user) Verify background task execution P0
AC-3.2.2 Captured data includes: latency, tokens, status, provider Inspect health data store P1

3.3 Incident Management

Status: Complete

What it does: Auto-creates incidents on health degradation. Severity levels, timestamps, captured logs, resolution tracking, MTTR calculation.

# Criterion Verification Priority
AC-3.3.1 Downtime incidents can be listed with filters GET /admin/downtime/incidents?status=ongoing P0
AC-3.3.2 Incidents can be resolved with notes POST /admin/downtime/incidents/{id}/resolve P0
AC-3.3.3 Already-resolved incidents reject re-resolution Attempt to resolve again P1
AC-3.3.4 Incident analysis shows error patterns and type distribution GET /admin/downtime/incidents/{id}/analysis P1
AC-3.3.5 MTTR statistics are computed GET /admin/downtime/statistics P1

Code References:

  • src/routes/admin.py — Downtime tracking endpoints

3.4 Model Quality Scoring & Benchmarks

Status: Partial

What it does: Hardcoded quality priors for ~20 models (task-specific: simple_qa, code_gen, reasoning, etc.). SWE-bench/HumanEval in Code Router.

What it does NOT do: Not stored in DB. Not updatable without code change. Missing MMLU, MATH, MT-Bench, LMSYS Arena ELO, LiveBench.

# Criterion Verification Priority
AC-3.4.1 Code router tiers include SWE-bench and HumanEval scores GET /code-router/tiers P0
AC-3.4.2 Model selector uses quality priors for task-specific routing Verify model_selector.py quality maps P1

Known Issues: Quality data is static/hardcoded, not from DB. Missing several major benchmarks. No dynamic updating.


3.5 Per-Customer Quality Tracking

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-3.5.1 Per-customer success rates are tracked per model Check customer-model analytics Deferred (D-19)

3.6 Provider Credit Monitoring

Status: Partial (OpenRouter only)

What it does: Tracks upstream provider credit balances. OpenRouter: full implementation with API call, 15-min cache, threshold alerts (critical $5, warning $20, info $50).

# Criterion Verification Priority
AC-3.6.1 GET /api/provider-credits/balance returns credit balances for monitored providers Inspect response P0
AC-3.6.2 OpenRouter balance is cached for 15 minutes Check timing of two consecutive calls P1
AC-3.6.3 Threshold alerts fire at critical ($5), warning ($20), info ($50) Verify alert logic P1

Code References:

  • src/services/provider_credit_monitor.py (lines 33-138) — OpenRouter implementation
  • Lines 165-167 — TODO stubs for all other providers

Known Issues:

  • P1-1 (Delta Report): Only OpenRouter implemented. 29 other providers have TODO stubs. No preemptive deprioritization in failover chain.

Layer 4: Caching System

4.1 Semantic Cache

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-4.1.1 Semantically similar prompts return cached responses (cosine similarity >0.95) Test with paraphrased prompt Deferred (D-8)

4.2 Exact-Match Response Cache

Status: Not Implemented (Deferred — infrastructure exists but not wired)

# Criterion Verification Priority
AC-4.2.1 Identical inference requests (same messages + model + params) return cached response Send same request twice, compare latency Deferred (D-9)

Code References:

  • src/services/response_cache.py — SHA-256 hashing, Redis + in-memory fallback exists but NOT wired into inference path

4.3 External Cache (Butter.dev)

Status: Partial (Ghost Feature — P0 issue)

What it does: Butter.dev proxy used for all requests. User preference endpoints exist but are ignored.

# Criterion Verification Priority
AC-4.3.1 If Butter cache settings endpoints exist, user preference MUST be respected during inference Set enable_butter_cache=false, verify Butter proxy is NOT used P0 (KNOWN BUG)
AC-4.3.2 OR: Butter cache settings endpoints are removed entirely Verify endpoints don't exist P0 Alternative

Code References:

  • src/routes/users.py (lines 305-408) — GET/PUT /user/cache-settings exist, store preference
  • src/routes/chat.py (line 697) — Always calls get_butter_pooled_async_client() without checking preference

Known Issues:

  • P0-1 (Delta Report): Ghost feature. User can toggle a setting that does nothing. Trust-eroding.

4.4 Supporting Caches

Status: Complete

# Criterion Verification Priority
AC-4.4.1 Catalog endpoint responds in sub-100ms on cache hit Time GET /v1/models on second request P0
AC-4.4.2 Auth cache reduces lookup latency from ~100ms to <5ms Compare first vs second auth timing P1
AC-4.4.3 When Redis is down, local memory cache activates — no requests blocked Stop Redis, verify normal operation P0
AC-4.4.4 Cache invalidation clears all layers POST /admin/cache/clear, verify fresh data P1
AC-4.4.5 Stampede protection prevents multiple simultaneous cache rebuilds Concurrent requests to cold cache P1

Layer 5: Model Catalog

5.1 Background Model Sync

Status: Complete

# Criterion Verification Priority
AC-5.1.1 Model sync can be triggered incrementally and fully POST /admin/model-sync/trigger and /all P0
AC-5.1.2 If provider API is down, last synced catalog is served Verify stale catalog on provider failure P0
AC-5.1.3 Per-provider sync works POST /admin/model-sync/provider/{slug} P1
AC-5.1.4 Full resync (delete + reimport) works POST /admin/model-sync/full P1

5.2 Model Metadata Standard

Status: Complete

# Criterion Verification Priority
AC-5.2.1 Every model in GET /v1/models has id, name, provider_slug, context_length, and pricing Inspect response schema P0
AC-5.2.2 No model has null or zero pricing for both prompt and completion Scan all models in response P0 (see 5.3)

5.3 Catalog Inclusion Requirements

Status: Partial (gating not enforced at sync)

# Criterion Verification Priority
AC-5.3.1 Models without pricing are rejected during sync (not visible to users) Check catalog for models with null pricing P1 (KNOWN BUG)
AC-5.3.2 GET /v1/models/unique returns no duplicate model IDs Check for uniqueness P0
AC-5.3.3 High-value models without explicit pricing are BLOCKED, not served at default rate Verify pricing guard for GPT-4, Claude, Gemini P0

Code References:

  • src/services/model_catalog_sync.pyextract_pricing() (lines 136-153) returns all None for missing pricing. Line 368 checks if any(pricing.values()) but is non-blocking — models ARE synced without pricing.
  • src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS guard raises ValueError on default pricing fallback

Known Issues:

  • P1-3 (Delta Report): Models without pricing are synced into the catalog. Non-high-value models without pricing fall to default ($0.00002/token) — potential under-billing.

5.4 HuggingFace Enrichment

Status: Complete

# Criterion Verification Priority
AC-5.4.1 Model detail returns HuggingFace data (downloads, likes, parameters) when available GET /api/models/detail?model_id=meta-llama/... P1
AC-5.4.2 HuggingFace data is cached with TTL Verify caching on repeated requests P1

5.5 Model Discovery & Search

Status: Complete

# Criterion Verification Priority
AC-5.5.1 GET /v1/models?provider=fireworks returns only Fireworks models Filter and verify P0
AC-5.5.2 GET /v1/models/search?q=llama returns matching models Verify results P0
AC-5.5.3 GET /v1/models/trending returns models ranked by usage Inspect response P1
AC-5.5.4 GET /v1/gateways returns all gateways with name, color, priority, site_url Inspect response P0
AC-5.5.5 Model comparison works across providers GET /v1/models/{provider}/{model}/compare P1

Layer 6: Business

6.1 Credit System

Status: Complete (with atomicity concern on legacy path)

What it does: Atomic billing unit. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Pre-flight checks, idempotent deductions (UNIQUE constraint + RPC), subscription allowance consumed first, auto-refund on provider errors.

What it does NOT do: No real-time credit streaming during generation. No credit expiration. No rollover. No credit transfers. No multi-currency.

# Criterion Verification Priority
AC-6.1.1 Pre-flight check: user with 0 credits receives 402 BEFORE any provider call POST with 0-credit user, verify no upstream call P0
AC-6.1.2 Idempotent deduction: same request ID sent twice deducts credits only once POST twice with same X-Request-ID P0
AC-6.1.3 Subscription allowance consumed before purchased credits User with both: make request, verify subscription decreases first P0
AC-6.1.4 Provider 5xx error → automatic credit refund Trigger 5xx, verify refund in credit_transactions P0
AC-6.1.5 Provider timeout → automatic credit refund Trigger timeout, verify refund P0
AC-6.1.6 Provider 4xx error (user error) → NO refund Trigger 4xx, verify no refund P0
AC-6.1.7 High-value models (GPT-4, Claude, Gemini, o1/o3/o4) blocked if pricing falls to default Verify pricing guard fires for each pattern P0
AC-6.1.8 Credit transactions logged with request_id, user_id, model, token counts, cost Check credit_transactions table P0
AC-6.1.9 Balance update and transaction log happen atomically (single DB transaction via RPC) Verify atomic_deduct_credits RPC is used P0
AC-6.1.10 Legacy fallback path either doesn't exist or handles transaction logging failure safely Verify legacy path behavior on logging failure P0 (KNOWN RISK)
AC-6.1.11 Credit transaction history is paginated GET /credits/transactions?limit=10 P1
AC-6.1.12 Admin can add/adjust/refund credits POST /credits/add, /adjust, /refund P1
AC-6.1.13 Daily usage cap prevents runaway costs Exceed daily limit, verify 402 P1
AC-6.1.14 request_id has UNIQUE constraint in DB (belt-and-suspenders idempotency) Check migration 20260223000001_add_request_id_to_credit_transactions.sql P0

Code References:

  • src/db/users.py (lines 701-1106) — Credit deduction
    • Atomic RPC path (lines 862-967) — Correct
    • Legacy fallback path (lines 987-1096) — Risk: two separate calls, if logging fails credits already deducted (lines 1077-1082)
  • src/services/pricing.py (lines 783-839) — HIGH_VALUE_MODEL_PATTERNS
  • src/routes/chat.py (lines 1670-1742) — Auto-refund logic

Known Issues:

  • P0-2: Legacy fallback path may create orphaned deductions (balance reduced, no transaction record)
  • P0-3: Pricing guard needs end-to-end verification — must fire BEFORE provider call
  • P0-4: Auto-refund needs integration testing for edge cases (partial stream, refund failure)

6.2 Plans & Tiers

Status: Complete

# Criterion Verification Priority
AC-6.2.1 New user gets $5 credits and trial expiring in 3 days Register, check balance + trial_end P0 (config mismatch)
AC-6.2.2 Trial user can make requests until credits/limits exhausted Make requests during trial P0
AC-6.2.3 Expired trial returns 402 for paid models POST after trial expiry P0
AC-6.2.4 Expired trial CAN access :free suffix models POST with :free model after expiry P0
AC-6.2.5 GET /plans returns available plan tiers with pricing Inspect response P0
AC-6.2.6 GET /trial/status returns active/expired and days remaining Check response P0
AC-6.2.7 Unused subscription allowance does NOT roll over (resets monthly) Verify at month boundary P1
AC-6.2.8 Purchased credits never expire and survive plan changes Change plan, verify credits persist P1
AC-6.2.9 Daily trial limit ($1/day) is enforced Exceed $1 in trial, verify blocking P0

Known Issues:

  • P0-7 (Delta Report): Trial config mismatch — CLAUDE.md says $5, wiki says $10, code says $5. Must reconcile.
  • src/config/usage_limits.py — Trial: $5, 3 days, $1/day
  • src/db/trials.py (line 44) — Formula trial_days * 5 suggests variable durations

6.3 Customer Usage Analytics

Status: Partial

# Criterion Verification Priority
AC-6.3.1 User can view activity stats (total requests/tokens/spend by model/provider) GET /user/activity/stats P0
AC-6.3.2 Activity log is paginated (limit 1-1000) GET /user/activity/log?limit=50 P0
AC-6.3.3 Activity log total field returns actual DB total, not page count Verify total vs count P1 (KNOWN BUG)
AC-6.3.4 Per-API-key usage breakdown is available GET /user/api-keys/{key_id}/usage P2 (NOT IMPLEMENTED)
AC-6.3.5 CSV/JSON export is available GET /user/usage/export?format=csv P2 (NOT IMPLEMENTED)

Known Issues:

  • P1-4 (Delta Report): src/routes/users.py (line 515) — "total": len(transactions) returns page count, not DB total
  • P2-1: activity_log stores user_id but NOT api_key_id — no per-key breakdown

6.4 Customer Webhooks

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-6.4.1 Outbound webhook delivery for credits.low, credits.depleted, model.degraded events Configure webhook, trigger events Deferred (D-10)

6.5 SLA Tracking

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-6.5.1 Per-tier SLA violations are detected with auto credit-back compensation Monitor SLA metrics Deferred (D-14)

Layer 7: Developer Platform

7.1 Prompt Management

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-7.1.1 Template library with versioning CRUD on prompt templates Deferred (D-12)

7.2 Batch / Async Inference

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-7.2.1 POST /v1/batch/jobs submits bulk workloads Submit batch job Deferred (D-11)

7.3 Evaluation & Testing

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-7.3.1 Side-by-side model comparison for same prompt Compare endpoint Deferred (D-13)

7.4 Playground

Status: Not Implemented (Deferred — frontend-coupled)

# Criterion Verification Priority
AC-7.4.1 Interactive prompt testing UI Access playground Deferred

Layer 8: Observability

8.1 Internal Metrics & Dashboards

Status: Complete

# Criterion Verification Priority
AC-8.1.1 GET /metrics returns valid Prometheus text format Parse response P0
AC-8.1.2 OpenMetrics format with exemplar support is available via content negotiation Accept: application/openmetrics-text P1
AC-8.1.3 Parsed metrics include p50, p95, p99 latency percentiles GET /api/metrics/parsed P0
AC-8.1.4 Real-time stats update within 60 seconds of new requests GET /api/monitoring/stats/realtime P1
AC-8.1.5 Error rates tracked per provider and per model GET /api/monitoring/error-rates P1
AC-8.1.6 Anomaly detection flags unusual patterns GET /api/monitoring/anomalies P1
AC-8.1.7 Grafana SimpleJSON datasource protocol fully implemented GET /prometheus/datasource (test), POST /search, /query P1

8.2 Distributed Tracing

Status: Complete

# Criterion Verification Priority
AC-8.2.1 OpenTelemetry traces are initialized and exportable GET /api/instrumentation/health P0
AC-8.2.2 Every request gets a trace ID linking middleware → auth → routing → provider → billing Inspect trace in Tempo P1
AC-8.2.3 Exemplar linking from metrics to traces works Verify in Grafana P2

8.3 Error Tracking

Status: Complete

# Criterion Verification Priority
AC-8.3.1 Autonomous error monitor status is retrievable GET /error-monitor/autonomous/status P0
AC-8.3.2 Dashboard provides error landscape overview GET /error-monitor/dashboard P0
AC-8.3.3 Recent errors sorted by recency GET /error-monitor/errors/recent P0
AC-8.3.4 Critical errors flagged separately GET /error-monitor/errors/critical P0
AC-8.3.5 Error patterns detect recurring issues GET /error-monitor/errors/patterns P1
AC-8.3.6 AI fix suggestions generated via Claude POST /error-monitor/fixes/generate-for-error P2

Note: All error monitor endpoints require NO auth (all public). Error patterns are in-memory only — lost on restart.


8.4 AI-Specific Tracing

Status: Partial

# Criterion Verification Priority
AC-8.4.1 Arize Phoenix config exists and is functional Check Arize initialization P2
AC-8.4.2 OpenTelemetry captures inference metadata (model, tokens, latency) Inspect trace attributes P1

Known Issues: Arize Phoenix not exposed via API. Braintrust not integrated. No prompt/response pair recording.


8.5 Profiling

Status: Complete

# Criterion Verification Priority
AC-8.5.1 Pyroscope profiling tags cache/Redis layers with operation context Verify tag presence in Pyroscope P1
AC-8.5.2 Profiling does not add measurable latency to requests Compare request times with/without profiling P1

8.6 Customer-Facing Observability

Status: Partial

# Criterion Verification Priority
AC-8.6.1 User can view their own usage dashboard data GET /user/activity/stats, GET /user/monitor P0
AC-8.6.2 Model health status visible to users GET /v1/model-health P0
AC-8.6.3 Public status page with provider/model availability GET /v1/status/, GET /v1/status/providers P0
AC-8.6.4 Latency percentiles exposed to customers GET /user/latency?model=... P2 (NOT IMPLEMENTED)

Layer 9: API Compatibility

9.1 OpenAI-Compatible API

Status: Complete

What it does: POST /v1/chat/completions — full drop-in replacement. Streaming SSE, tool/function calling, JSON mode, logprobs. Any OpenAI SDK app works by changing base URL.

# Criterion Verification Priority
AC-9.1.1 Non-streaming returns 200 with choices[0].message.content, usage.prompt_tokens, usage.completion_tokens POST with stream: false P0
AC-9.1.2 Streaming returns SSE where each line starts with data: , ends with data: [DONE] POST with stream: true P0
AC-9.1.3 response_format: {"type": "json_object"} returns valid parseable JSON POST with JSON mode P0
AC-9.1.4 tools array returns tool_calls when model decides to call a tool POST with tool definitions P0
AC-9.1.5 logprobs: true returns a logprobs field POST with logprobs P1
AC-9.1.6 OpenAI Python SDK works with zero changes beyond base_url and api_key openai.OpenAI(base_url="$BASE/v1") P0
AC-9.1.7 All inference errors use OpenAI-compatible format: {"error": {"message": "...", "type": "...", "code": "..."}} Trigger errors, inspect format P1 (KNOWN ISSUE)
AC-9.1.8 Unauthenticated request with whitelisted model returns 200 POST without auth header P0
AC-9.1.9 Unauthenticated request with non-whitelisted model returns 401/403 POST without auth header P0
AC-9.1.10 Streaming normalization handles OpenAI, Gemini, Anthropic, Fireworks formats Test stream from each provider type P0
AC-9.1.11 Unrecognized streaming format logs a warning (not silently dropped) Check logs for dropped chunks P1 (KNOWN BUG)

Known Issues:

  • P1-2: ~5% of errors use FastAPI default {"detail": "..."} instead of OpenAI format — breaks SDK error handling
  • P1-8: Stream normalizer returns None for unrecognized chunks (silently dropped, no warning)

9.2 Anthropic-Compatible API

Status: Complete

# Criterion Verification Priority
AC-9.2.1 Non-streaming returns 200 with content[0].text, usage.input_tokens, usage.output_tokens in Anthropic format POST /v1/messages P0
AC-9.2.2 Streaming returns SSE in Anthropic format (message_start, content_block_delta, message_stop) POST with stream: true P0
AC-9.2.3 Credits deducted using Anthropic token counts Compare balance before/after P0
AC-9.2.4 Anthropic Python SDK works with zero changes beyond base_url and api_key anthropic.Anthropic(base_url="$BASE/v1") P0

Layer 10: Infrastructure & Deployment

10.1 Multi-Region Routing

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-10.1.1 Requests routed to nearest provider region for lowest latency Test from different regions Deferred (D-15)

10.2 Data Residency

Status: Not Implemented (Deferred)

# Criterion Verification Priority
AC-10.2.1 EU customers' requests routed to EU-based providers Test with EU IP Deferred (D-16)

10.3 Multi-Target Deployment

Status: Complete

# Criterion Verification Priority
AC-10.3.1 Vercel serverless deployment works via api/index.py Deploy to Vercel P0
AC-10.3.2 Railway/Docker deployment works via start.sh Deploy to Railway P0
AC-10.3.3 Dev server starts with python src/main.py or uvicorn src.main:app --reload Start locally P0

Cross-Cutting Features

CC-1: Stripe Payments

Status: Complete

# Criterion Verification Priority
AC-CC.1.1 GET /api/stripe/credit-packages returns available packages (public, no auth) Inspect response P0
AC-CC.1.2 POST /api/stripe/checkout-session returns valid Stripe checkout URL Create session P0
AC-CC.1.3 Successful payment webhook adds credits to user's balance Simulate payment_intent.succeeded webhook P0
AC-CC.1.4 Webhook endpoint ALWAYS returns 200, even if processing fails Send malformed webhook P0
AC-CC.1.5 Payment history is paginated with amount, date, status GET /api/stripe/payments P0
AC-CC.1.6 Subscription checkout creates Stripe subscription and assigns plan POST /api/stripe/subscription-checkout P0
AC-CC.1.7 Subscription upgrade/downgrade/cancel work Test each operation P1
AC-CC.1.8 Webhook handles all events: payment_intent.succeeded, charge.succeeded, invoice.paid, customer.subscription.created Test each event type P0

CC-2: Coupons

Status: Complete

# Criterion Verification Priority
AC-CC.2.1 Valid coupon redeems and adds correct credit amount POST /coupons/redeem P0
AC-CC.2.2 Expired coupon returns 400 Redeem expired code P0
AC-CC.2.3 Already-redeemed coupon (same user) returns 400 Redeem twice P0
AC-CC.2.4 User-specific coupon redeemed by wrong user returns 400/403 Redeem with different user P0
AC-CC.2.5 GET /coupons/available returns global + user-targeted coupons Inspect response P1
AC-CC.2.6 Redemption history shows past redemptions GET /coupons/history P1

CC-3: Referrals

Status: Complete

# Criterion Verification Priority
AC-CC.3.1 User generates unique referral code POST /referral/generate P0
AC-CC.3.2 Referral code validates successfully POST /referral/validate P0
AC-CC.3.3 Self-referral is prevented Attempt self-referral P0
AC-CC.3.4 Referral stats show total referred, conversions, rewards GET /referral/stats P1
AC-CC.3.5 Successful referral grants $10 credits to both parties on first $10+ purchase Complete referral flow P0

CC-4: Chat History & Sessions

Status: Complete

# Criterion Verification Priority
AC-CC.4.1 Sessions can be created, listed, updated, deleted CRUD on /v1/chat/sessions/* P0
AC-CC.4.2 Messages can be saved individually and in batch POST single and batch P0
AC-CC.4.3 Full-text search returns matching sessions POST /v1/chat/search P0
AC-CC.4.4 Duplicate messages are deduplicated Save same message twice, verify single entry P1
AC-CC.4.5 Chat stats return accurate usage data GET /v1/chat/stats P1
AC-CC.4.6 Share links provide public read-only access Create share, access without auth P1
AC-CC.4.7 Feedback CRUD (create, read, update, delete) works per session CRUD on /v1/chat/feedback/* P1

CC-5: API Key Management

Status: Complete

# Criterion Verification Priority
AC-CC.5.1 Created key is in gw_{env}_* format POST /user/api-keys P0
AC-CC.5.2 Key creation rate-limited to 10 per hour; 11th returns 429 Create 11 keys P0
AC-CC.5.3 Keys can be listed showing all active keys GET /user/api-keys P0
AC-CC.5.4 Keys can be updated (name, restrictions) PUT /user/api-keys/{key_id} P0
AC-CC.5.5 Keys can be deleted DELETE /user/api-keys/{key_id} P0
AC-CC.5.6 Deleted key no longer authenticates (returns 401) Use deleted key P0
AC-CC.5.7 Audit logs record key creation, usage, deletion GET /user/api-keys/audit-logs P1

CC-6: Image Generation

Status: Complete

# Criterion Verification Priority
AC-CC.6.1 POST /v1/images/generations returns 200 with image data or URL POST with prompt P0
AC-CC.6.2 Credits deducted based on image generation pricing Compare balance before/after P0
AC-CC.6.3 0-credit user receives 402 POST with 0-credit user P0

CC-7: Audio Transcription

Status: Complete

# Criterion Verification Priority
AC-CC.7.1 File upload transcription returns 200 with text POST with audio file P0
AC-CC.7.2 Base64 transcription returns 200 POST /v1/audio/transcriptions/base64 P0
AC-CC.7.3 Unsupported format returns appropriate error POST with invalid format P1

CC-8: Server-Side Tools

Status: Complete

# Criterion Verification Priority
AC-CC.8.1 GET /v1/tools returns available tools (web_search, text_to_speech) Inspect response P0
AC-CC.8.2 Tool definitions in OpenAI function-calling format GET /v1/tools/definitions P0
AC-CC.8.3 Nonexistent tool returns 404 GET /v1/tools/fake_tool P0
AC-CC.8.4 Web search execution returns results POST /v1/tools/execute with web_search P0
AC-CC.8.5 SSRF protection blocks internal/private IP ranges Attempt internal URL in tool execution P0

CC-9: Partner Trials

Status: Complete

# Criterion Verification Priority
AC-CC.9.1 Partner config is publicly accessible GET /partner-trials/config/{code} P0
AC-CC.9.2 Partner code check always returns 200 (valid/invalid in body) GET /partner-trials/check/{code} P0
AC-CC.9.3 Starting partner trial applies partner-specific credits and limits POST /partner-trials/start with known partner code P0
AC-CC.9.4 Partner trial daily limit is enforced Exceed daily limit P0
AC-CC.9.5 Partner trial config is cached (5-min in-memory) Check timing P1

CC-10: Notifications

Status: Complete (partial test coverage)

# Criterion Verification Priority
AC-CC.10.1 User can retrieve notification preferences GET /user/notifications/preferences P0
AC-CC.10.2 Usage report can be triggered on demand POST /user/notifications/send-usage-report P0
AC-CC.10.3 Test notification sends successfully POST /user/notifications/test P0
AC-CC.10.4 Notification failure does not crash the system Disable Resend, verify graceful handling P0
AC-CC.10.5 Retry logic on notification delivery failure Verify 2-3 retries with backoff P2 (NOT IMPLEMENTED)

Known Issues:

  • P2-4 (Delta Report): No retry logic, no persistent delivery tracking. On failure: logs error, returns False, continues silently.

CC-11: Admin Operations

Status: Complete

# Criterion Verification Priority
AC-CC.11.1 Non-admin users receive 403 on ALL admin endpoints Use user key on admin endpoint P0
AC-CC.11.2 Admin can list, search, view user details GET /admin/users, /admin/users/{id} P0
AC-CC.11.3 Admin credit grants respect per-transaction cap and 24h daily limit Exceed limits P0
AC-CC.11.4 Admin can assign plans POST /admin/assign-plan P0
AC-CC.11.5 System monitor returns user counts, credit totals, API usage GET /admin/monitor P0
AC-CC.11.6 Cache operations work (status, refresh, clear) GET/POST cache endpoints P1
AC-CC.11.7 Model sync can be triggered POST /admin/model-sync/trigger P1
AC-CC.11.8 GET /admin/model-sync/providers requires admin auth Verify auth enforcement P0 (KNOWN RISK)
AC-CC.11.9 Bulk user delete by domain respects protected domains (gmail, yahoo, outlook) Attempt protected domain delete P0
AC-CC.11.10 Bulk user delete defaults to dry_run=true Verify default behavior P0

Known Issues:

  • P0-6 (Delta Report): GET /admin/model-sync/providers documented as "No auth enforced" — leaks infrastructure details (33 providers).

CC-12: Security

Status: Complete

# Criterion Verification Priority
AC-CC.12.1 API keys are Fernet-encrypted in DB Query DB directly P0
AC-CC.12.2 API key lookup uses HMAC hash, not brute-force decryption Verify code path P0
AC-CC.12.3 SQL injection attempts are sanitized/rejected '; DROP TABLE users; -- in inputs P0
AC-CC.12.4 XSS payloads are sanitized/rejected <script>alert(1)</script> in inputs P0
AC-CC.12.5 Command injection blocked ; rm -rf / in inputs P0
AC-CC.12.6 Path traversal blocked ../../etc/passwd in inputs P0
AC-CC.12.7 Error messages never expose stack traces, internal paths, or sensitive data Trigger errors, inspect responses P0
AC-CC.12.8 Admin security violations logged in audit trail Attempt unauthorized admin access P0
AC-CC.12.9 Temporary/disposable email domains detected during registration Register with user@tempmail.com P1

CC-13: Google Vertex Function Calling

Status: Partial

# Criterion Verification Priority
AC-CC.13.1 REST path function calling works (OpenAI tools → Vertex functionDeclarations) POST with tools to Vertex model via REST P0
AC-CC.13.2 SDK path function calling either works OR is avoided when tools present POST with tools via SDK path P1 (KNOWN BUG)
AC-CC.13.3 Tool choice options (auto, required, none) are translated correctly Test each tool_choice value P1

Code References:

  • src/services/google_vertex_client.py (lines 250-402, 662-707) — REST path implemented
  • Lines 585-587 — SDK path has TODO: "Function calling may not work correctly"

Known Issues:

  • P1-5 (Delta Report): SDK path has TODO. If SDK path is used when tools are present, function calling silently fails.

Summary Matrix

Layer Feature Criteria Status Known Issues
1 API Key Auth 10 Complete
1 RBAC 6 Complete
1 IP Allowlists 6 Complete
1 Domain Restrictions 4 Complete
1 Three-Layer Rate Limiting 14 Complete P0-5: Missing headers on L2/L3
1 Input Guardrails (4 features) 4 Not Implemented Deferred
1 Output Guardrails (3 features) 3 Not Implemented Deferred
2 Model Resolution 7 Complete
2 General Router 7 Complete
2 Code Router 7 Complete
2 Provider Failover 9 Complete
2 Circuit Breakers 10 Complete P1-7: Timing discrepancy (60s vs 5min)
2 Health-Weighted LB 2 Partial
2 Latency/Cost Optimal 2 Partial Hardcoded latency model
2 Traffic Splitting 1 Not Implemented Deferred
3 Tiered Health Monitoring 10 Complete
3 Passive Health Capture 2 Complete
3 Incident Management 5 Complete
3 Model Quality Scoring 2 Partial Static/hardcoded
3 Per-Customer Quality 1 Not Implemented Deferred
3 Provider Credit Monitoring 3 Partial P1-1: OpenRouter only
4 Semantic Cache 1 Not Implemented Deferred
4 Exact-Match Cache 1 Not Implemented Deferred (infra exists)
4 Butter.dev Cache 2 Partial P0-1: Ghost feature
4 Supporting Caches 5 Complete
5 Background Model Sync 4 Complete
5 Model Metadata Standard 2 Complete
5 Catalog Inclusion 3 Partial P1-3: No gating at sync
5 HuggingFace Enrichment 2 Complete
5 Model Discovery & Search 5 Complete
6 Credit System 14 Complete P0-2/3/4: Atomicity, pricing guard, refund
6 Plans & Tiers 9 Complete P0-7: Config mismatch
6 Customer Usage Analytics 5 Partial P1-4: Pagination bug, P2-1/2: Per-key, export
6 Customer Webhooks 1 Not Implemented Deferred
6 SLA Tracking 1 Not Implemented Deferred
7 Prompt Management 1 Not Implemented Deferred
7 Batch/Async Inference 1 Not Implemented Deferred
7 Evaluation & Testing 1 Not Implemented Deferred
7 Playground 1 Not Implemented Deferred
8 Metrics & Dashboards 7 Complete
8 Distributed Tracing 3 Complete
8 Error Tracking 6 Complete
8 AI-Specific Tracing 2 Partial Arize/Braintrust gaps
8 Profiling 2 Complete
8 Customer Observability 4 Partial P2-3: No latency API
9 OpenAI-Compatible API 11 Complete P1-2: Error format, P1-8: Stream drops
9 Anthropic-Compatible API 4 Complete
10 Multi-Region Routing 1 Not Implemented Deferred
10 Data Residency 1 Not Implemented Deferred
10 Multi-Target Deployment 3 Complete
CC Stripe Payments 8 Complete
CC Coupons 6 Complete
CC Referrals 5 Complete
CC Chat History 7 Complete
CC API Key Management 7 Complete
CC Image Generation 3 Complete
CC Audio Transcription 3 Complete
CC Server-Side Tools 5 Complete
CC Partner Trials 5 Complete
CC Notifications 5 Complete P2-4: No retry/delivery tracking
CC Admin Operations 10 Complete P0-6: Model-sync providers auth
CC Security 9 Complete
CC Google Vertex FC 3 Partial P1-5: SDK path TODO
TOTAL 323

Priority Summary

Priority Count Description
P0 7 bugs across 46 criteria Ghost features, billing atomicity, pricing guard, refund verification, rate limit headers, admin auth, trial config
P1 8 bugs across 28 criteria Provider monitoring, error format, catalog gating, pagination, Vertex FC, overage, circuit breaker timing, stream normalization
P2 4 gaps across 12 criteria Per-key usage, export, latency API, notification delivery
Deferred 20 features, 24 criteria Guardrails, caching, webhooks, batch, prompts, eval, SLA, geo-routing, GDPR, traffic splitting

Source: Conceptual Model Features | Features | Delta Report | Testing Plan | Acceptance Criteria

Clone this wiki locally