Skip to content

Acceptance Criteria

arminrad edited this page Mar 8, 2026 · 2 revisions

Acceptance Criteria

Formal acceptance criteria for every Gatewayz feature area, derived from the Conceptual Model, Features, and Testing Plan.

Each feature has: a description of what "done" looks like, measurable acceptance criteria, and the corresponding test plan references.

Last updated: 2026-03-08


How to Read This Document

Each feature section follows this format:

  • Description: What the feature does (from the Conceptual Model)
  • Acceptance Criteria: Numbered, testable statements. A feature is accepted when ALL criteria pass.
  • Test Plan Refs: Links to specific test cases in the Testing Plan
  • Automated Tests: Whether automated tests exist (from Test Coverage Audit)

1. Authentication & Authorization

Description

Users authenticate via Privy (email, phone, OAuth) and receive an API key. API keys are encrypted at rest with Fernet (AES-128). Keys are looked up via HMAC-SHA256 hash. Role-based access control (admin, developer, free) governs endpoint access.

Acceptance Criteria

# Criterion Verification
AC-1.1 A valid Privy token returns 200 with api_key, credits, and subscription_status fields POST /auth with valid token
AC-1.2 A first-time user receives exactly $5.00 in credits and a trial expiring 3 days from creation POST /auth with new identity, check credits and trial_end
AC-1.3 An invalid or expired Privy token returns 401 or 403, never 200 POST /auth with bad token
AC-1.4 Auth endpoint rate-limits to 10 requests per 15 minutes per IP; the 11th returns 429 POST /auth 11 times from same IP
AC-1.5 Registration rate-limits to 3 requests per hour per IP; the 4th returns 429 POST /auth/register 4 times
AC-1.6 API keys stored in the database are Fernet-encrypted — raw DB values are not plaintext and not the key itself Query api_keys_new table directly
AC-1.7 API key lookup uses HMAC-SHA256 hash, not decryption of all keys Verify code path or timing (O(log n) not O(n))
AC-1.8 Non-admin API keys receive 403 when accessing /admin/* endpoints GET /admin/users with $USER_KEY
AC-1.9 Admin API keys receive 200 when accessing /admin/* endpoints GET /admin/users with $ADMIN_KEY
AC-1.10 Auth health endpoint returns 200 with DB, Redis, cache, and timeout status even when dependencies are degraded GET /auth/health
AC-1.11 Temporary/disposable email domains are detected and flagged during registration Register with user@tempmail.com

Test Plan Refs: 2.1–2.6, 24.1.1–24.1.2, 25.15, 25.17

Automated Tests: ✅ Full coverage


2. Chat & Inference — OpenAI-Compatible

Description

POST /v1/chat/completions accepts OpenAI-format requests and returns OpenAI-format responses. Supports streaming (SSE), JSON mode, tool/function calling, logprobs. Any application built for the OpenAI API works by changing only the base URL.

Acceptance Criteria

# Criterion Verification
AC-2.1 Non-streaming request returns 200 with choices[0].message.content, usage.prompt_tokens, usage.completion_tokens POST with stream: false
AC-2.2 Streaming request returns SSE stream where each line starts with data: , contains valid JSON, and stream ends with data: [DONE] POST with stream: true
AC-2.3 response_format: {"type": "json_object"} returns content that is valid parseable JSON POST with JSON mode
AC-2.4 Request with tools array returns tool_calls in the response when the model decides to call a tool POST with tool definitions
AC-2.5 Request with logprobs: true returns a logprobs field in the response POST with logprobs enabled
AC-2.6 After a successful completion, user's credit balance decreases by (prompt_tokens × prompt_price) + (completion_tokens × completion_price) Compare balance before and after
AC-2.7 User with 0 credits receives 402 without a provider API call being made (pre-flight check) POST with 0-credit user, verify no upstream call
AC-2.8 User with expired trial receives 402 POST with expired trial key
AC-2.9 Request with nonexistent model returns 400 or 404, not 500 POST with model: "nonexistent/model"
AC-2.10 Unauthenticated request with a whitelisted model returns 200 POST without auth header
AC-2.11 Unauthenticated request with a non-whitelisted model returns 401 or 403 POST without auth header
AC-2.12 Exceeding RPM rate limit returns 429 with Retry-After header Exceed rate limit
AC-2.13 OpenAI Python SDK (openai.OpenAI(base_url="$BASE/v1")) works with zero code changes beyond base URL and API key Run OpenAI SDK test

Test Plan Refs: 3.1.1–3.1.12, 25.1, 25.18

Automated Tests: ✅ Full coverage


3. Chat & Inference — Anthropic-Compatible

Description

POST /v1/messages accepts Anthropic-format requests and returns Anthropic-format responses. Supports streaming in Anthropic SSE event format.

Acceptance Criteria

# Criterion Verification
AC-3.1 Non-streaming request returns 200 with content[0].text, usage.input_tokens, usage.output_tokens in Anthropic format POST /v1/messages
AC-3.2 Streaming request returns SSE events in Anthropic format (message_start, content_block_delta, message_stop) POST with stream: true
AC-3.3 Credits are deducted using Anthropic token counts Compare balance before and after
AC-3.4 Anthropic Python SDK (anthropic.Anthropic(base_url="$BASE/v1")) works with zero code changes beyond base URL and API key Run Anthropic SDK test

Test Plan Refs: 3.2.1–3.2.2, 25.2

Automated Tests: ✅ Full coverage


4. Model Resolution & Aliasing

Description

120+ short aliases map to canonical model IDs. Provider detection follows: explicit overrides → format-based rules → mapping tables → org-prefix fallbacks. Model IDs are transformed to each provider's native format.

Acceptance Criteria

# Criterion Verification
AC-4.1 gpt-4o resolves to openai/gpt-4o POST chat with model: "gpt-4o"
AC-4.2 r1 resolves to deepseek/deepseek-r1 POST chat with model: "r1"
AC-4.3 Canonical IDs (e.g., openai/gpt-4o) work directly without alias resolution POST chat with canonical ID
AC-4.4 Provider detection correctly routes google/gemini-* models to Vertex when credentials are available POST chat with Gemini model
AC-4.5 No alias maps to itself (no self-referencing loops) Inspect MODEL_ALIASES dict

Test Plan Refs: 3.3.1–3.3.2, 25.19

Automated Tests: ✅ Full coverage


5. Provider Failover

Description

When a provider fails, requests automatically retry with the next provider in a 14-provider prioritized chain. Circuit breakers prevent repeated calls to failing providers.

Acceptance Criteria

# Criterion Verification
AC-5.1 When primary provider returns 502/503/504, request succeeds via fallback provider transparently Force primary failure, verify success
AC-5.2 When provider returns 401/402/403/404, request fails over to next provider Force auth error on primary
AC-5.3 When provider returns 400 (user error), request does NOT failover — returns 400 to user Send malformed request
AC-5.4 When provider returns 429, request retries with backoff, does NOT failover Trigger rate limit on provider
AC-5.5 OpenAI models only failover to OpenAI → OpenRouter (not arbitrary providers) Inspect failover chain for OpenAI model
AC-5.6 Anthropic models only failover to Anthropic → OpenRouter Inspect failover chain for Anthropic model
AC-5.7 Open-source models can failover across all providers Inspect failover chain for open-source model

Test Plan Refs: 3.3.3, 25.3, 25.4

Automated Tests: ✅ Full coverage


6. Circuit Breakers

Description

Per-provider circuit breakers with states: CLOSED (healthy) → OPEN (blocked after 5 consecutive failures) → HALF_OPEN (testing recovery after 5-minute cooldown).

Acceptance Criteria

# Criterion Verification
AC-6.1 New provider starts in CLOSED state GET /circuit-breakers/{new_provider}
AC-6.2 After 5 consecutive failures, state transitions to OPEN Send 5 failing requests, check state
AC-6.3 OPEN state prevents requests from being routed to that provider Verify provider is skipped in failover chain
AC-6.4 After 300 seconds (5 min), OPEN transitions to HALF_OPEN Wait 5 min, check state
AC-6.5 In HALF_OPEN, a successful request transitions to CLOSED Send successful request, check state
AC-6.6 In HALF_OPEN, a failed request transitions back to OPEN Send failing request, check state
AC-6.7 Manual reset via POST /circuit-breakers/{provider}/reset returns state to CLOSED Reset and verify
AC-6.8 POST /circuit-breakers/reset-all resets all breakers to CLOSED Reset all and verify

Test Plan Refs: 5.1–5.4, 25.5–25.6

Automated Tests: ✅ Full coverage


7. Intelligent Routing

Description

Two routing systems: General Router (quality/cost/latency/balanced via NotDiamond) and Code Router (benchmark-driven, SWE-bench/HumanEval scored, tiered by complexity).

Acceptance Criteria

# Criterion Verification
AC-7.1 router:general:quality selects a high-quality model and returns 200 POST chat with router model
AC-7.2 router:general:cost selects a cheaper model than router:general:quality for the same prompt Compare selected models
AC-7.3 router:general:latency selects a low-latency model POST and verify model selection
AC-7.4 router:general:balanced considers quality, cost, and latency POST and verify selection
AC-7.5 router:code:auto classifies prompt complexity and selects an appropriate tier model POST with code prompt
AC-7.6 router:code:quality selects the highest-tier code model POST and verify
AC-7.7 router:code:price selects a cost-effective code model POST and verify
AC-7.8 router:code:agentic selects a model optimized for agentic coding tasks POST and verify
AC-7.9 Code router tiers endpoint returns models with SWE-bench/HumanEval benchmark scores GET /code-router/tiers
AC-7.10 Router test endpoints return selected model + reasoning/rationale POST /code-router/test, POST /general-router/test

Test Plan Refs: 3.4.1–3.4.8, 6.1–6.5, 7.1–7.5

Automated Tests: ✅ Full coverage


8. Rate Limiting (Three Layers)

Description

Layer 1: IP-level (security middleware, behavioral analysis, velocity detection). Layer 2: API key-level (Redis-backed, per-plan limits). Layer 3: Anonymous (stricter limits for unauthenticated). Graceful degradation to in-memory fallback when Redis is down.

Acceptance Criteria

# Criterion Verification
AC-8.1 Unauthenticated requests exceeding 300 RPM from same IP receive 429 Exceed IP limit
AC-8.2 Authenticated requests exceeding plan RPM receive 429 Exceed key limit
AC-8.3 Anonymous rate limits are stricter than authenticated limits Compare thresholds
AC-8.4 Authenticated users are exempt from IP-level rate limiting Verify no IP block on auth users
AC-8.5 Rate limit response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset headers Inspect 429 response headers
AC-8.6 When Redis is unavailable, rate limiting continues via in-memory fallback — requests are never blocked due to infrastructure failure Stop Redis, verify rate limiting works
AC-8.7 Velocity mode activates when error rate exceeds threshold (25%) and reduces limits Trigger high error rate
AC-8.8 Velocity mode deactivates after 3 minutes of normal error rates Wait for cooldown
AC-8.9 Rate limit configuration is viewable at GET /user/rate-limits Check endpoint
AC-8.10 Per-key rate limits can be updated via PUT /user/rate-limits/{key_id} Update and verify

Test Plan Refs: 8.4.1–8.4.4, 25.7–25.10

Automated Tests: ✅ Full coverage


9. Credit System

Description

Atomic unit of billing. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Subscription allowance consumed first, then purchased credits. Pre-flight checks, idempotent deductions, auto-refunds on provider errors.

Acceptance Criteria

# Criterion Verification
AC-9.1 Pre-flight check: user with 0 credits receives 402 before any provider call is made POST chat with 0-credit user
AC-9.2 Idempotent deduction: same request ID sent twice results in credits deducted only once POST twice with same X-Request-ID
AC-9.3 Subscription allowance is consumed before purchased credits User with both: make request, verify subscription decreases
AC-9.4 Provider 5xx error results in automatic credit refund Trigger 5xx, verify refund transaction
AC-9.5 Provider 4xx error (user error) does NOT trigger refund Trigger 4xx, verify no refund
AC-9.6 High-value models (GPT-4, Claude, Gemini, o1/o3/o4) are blocked if pricing falls to default — prevents under-billing Verify explicit pricing exists for all premium models
AC-9.7 Credit transactions are logged with unique request ID, user ID, model, token counts, and cost Check credit_transactions table
AC-9.8 Balance update and transaction record happen atomically (single DB transaction) Verify no orphaned records
AC-9.9 Admin can add/adjust/refund credits via /credits/* endpoints POST /credits/add and verify
AC-9.10 Credit transaction history is paginated and filterable GET /credits/transactions?limit=10

Test Plan Refs: 9.1.1–9.1.7, 9.2.1–9.2.4, 25.13, 25.14

Automated Tests: ✅ Full coverage


10. Model Catalog

Description

10,000+ models from 30+ providers. Background sync, multi-layer caching, HuggingFace enrichment. Models require resolvable pricing, active provider, valid modality, and deduplication.

Acceptance Criteria

# Criterion Verification
AC-10.1 GET /v1/models returns a list where every model has id, name, provider_slug, context_length, and pricing data Inspect response
AC-10.2 No model in the catalog has null or zero pricing Scan all models in response
AC-10.3 GET /v1/models?provider=fireworks returns only models from Fireworks Filter and verify
AC-10.4 GET /v1/models/unique returns no duplicate model IDs Check for uniqueness
AC-10.5 GET /v1/models/search?q=llama returns models matching "llama" Verify results
AC-10.6 Model detail endpoint returns HuggingFace enrichment (downloads, likes, parameters) when available GET /api/models/detail?model_id=meta-llama/...
AC-10.7 Catalog is served from cache on subsequent requests (sub-100ms response time on cache hit) Time two consecutive requests
AC-10.8 GET /v1/gateways returns all registered gateways with name, color, priority, site_url Inspect response
AC-10.9 Model health endpoint shows healthy, degraded, or down status per model GET /v1/model-health
AC-10.10 Model availability endpoint shows fallback providers for a given model GET /availability/fallback/{model_id}

Test Plan Refs: 4.1.1–4.1.12, 4.2–4.8, 25.11

Automated Tests: ✅ Full coverage


11. Plans & Trials

Description

Trial: 3 days, $5 cap, 1M tokens, 10K requests. Plans: Dev (pay-as-you-go), Team (subscription), Enterprise (custom). Expired trials can still access :free models. Purchased credits never expire.

Acceptance Criteria

# Criterion Verification
AC-11.1 New user gets $5 credits and trial expiring in 3 days Register and check balance + trial_end
AC-11.2 Trial user can make requests until credits/limits exhausted Make requests during trial
AC-11.3 Expired trial returns 402 for paid models POST chat after trial expiry
AC-11.4 Expired trial can still access :free suffix models POST chat with :free model after expiry
AC-11.5 GET /plans returns available plan tiers with pricing Inspect response
AC-11.6 GET /trial/status returns active/expired and days remaining Check response
AC-11.7 Unused subscription allowance does NOT roll over — resets monthly Verify at month boundary
AC-11.8 Purchased credits never expire and survive plan changes Change plan, verify credits persist

Test Plan Refs: 8.5.1–8.5.4, 25.15, 25.16

Automated Tests: ✅ Full coverage


12. Payments (Stripe)

Description

Credit package purchases, subscription management, checkout sessions, payment intents, webhooks. Stripe webhook always returns 200 (even on processing errors).

Acceptance Criteria

# Criterion Verification
AC-12.1 GET /api/stripe/credit-packages returns available packages with pricing (public, no auth) Inspect response
AC-12.2 POST /api/stripe/checkout-session returns a valid Stripe checkout URL Create session
AC-12.3 Successful payment webhook adds credits to user's balance Simulate webhook
AC-12.4 Webhook endpoint always returns 200, even if processing fails internally Send malformed webhook
AC-12.5 Payment history is paginated and includes amount, date, status GET /api/stripe/payments
AC-12.6 Subscription checkout creates a Stripe subscription and assigns the plan POST /api/stripe/subscription-checkout

Test Plan Refs: 11.1–11.8

Automated Tests: ✅ Full coverage


13. Coupons

Description

Redeemable coupon codes that add credits. Supports global and user-specific coupons, expiration, max redemptions, and one-per-user restrictions.

Acceptance Criteria

# Criterion Verification
AC-13.1 Valid coupon code redeems successfully and adds the correct credit amount POST /coupons/redeem
AC-13.2 Expired coupon returns 400 Redeem expired code
AC-13.3 Already-redeemed coupon (by same user) returns 400 Redeem twice
AC-13.4 User-specific coupon redeemed by wrong user returns 400 or 403 Redeem with different user's key
AC-13.5 GET /coupons/available returns both global and user-targeted coupons Inspect response
AC-13.6 Redemption history shows all past redemptions for the user GET /coupons/history

Test Plan Refs: 10.1–10.6

Automated Tests: ✅ Full coverage


14. Referrals

Description

Users generate referral codes. Referred users sign up with the code. Both parties receive credit rewards upon conversion.

Acceptance Criteria

# Criterion Verification
AC-14.1 User can generate a unique referral code POST /referral/generate
AC-14.2 Referral code can be validated POST /referral/validate
AC-14.3 Self-referral is prevented Attempt self-referral, verify rejection
AC-14.4 Referral stats show total referred, conversions, and rewards GET /referral/stats
AC-14.5 Successful referral grants credits to both referrer and referred user Complete referral flow

Test Plan Refs: 8.6.1–8.6.4

Automated Tests: ✅ Full coverage


15. Chat History & Sessions

Description

Persistent chat sessions with message storage, batch operations, full-text search, metadata updates, deletion, and usage stats.

Acceptance Criteria

# Criterion Verification
AC-15.1 Sessions can be listed, created, updated, and deleted CRUD operations
AC-15.2 Messages can be saved individually and in batch POST single and batch
AC-15.3 Full-text search returns matching sessions across all user's history POST /v1/chat/search
AC-15.4 Duplicate messages are deduplicated (no double-saves) Save same message twice, verify single entry
AC-15.5 Chat stats return accurate usage data GET /v1/chat/stats
AC-15.6 Share links provide public read-only access to conversations Create share, access without auth
AC-15.7 Feedback can be submitted, retrieved, updated, and deleted per session CRUD on feedback

Test Plan Refs: 3.5.1–3.5.8, 3.6.1–3.6.6, 3.7.1–3.7.4

Automated Tests: ✅ Full coverage


16. API Key Management

Description

Users create, list, update, and delete API keys. Keys are in gw_{env}_* format. Rate limited to 10 creations per hour. Keys encrypted at rest.

Acceptance Criteria

# Criterion Verification
AC-16.1 Created key is in gw_{env}_* format (e.g., gw_dev_abc123...) POST /user/api-keys
AC-16.2 Key creation rate-limited to 10 per hour; 11th returns 429 Create 11 keys
AC-16.3 Keys can be listed, showing all active keys GET /user/api-keys
AC-16.4 Keys can be updated (name, restrictions) PUT /user/api-keys/{key_id}
AC-16.5 Keys can be deleted DELETE /user/api-keys/{key_id}
AC-16.6 Deleted key no longer authenticates Use deleted key, verify 401
AC-16.7 Audit logs record key creation, usage, and deletion GET /user/api-keys/audit-logs

Test Plan Refs: 8.3.1–8.3.7

Automated Tests: ✅ Full coverage


17. Health & Monitoring

Description

Tiered health monitoring (Critical/Popular/Standard/On-Demand). System, provider, model, and gateway health endpoints. Health endpoints always return 200 (degradation in body, not status code).

Acceptance Criteria

# Criterion Verification
AC-17.1 GET /health always returns 200, even when dependencies are degraded Call when DB is down
AC-17.2 Health response includes version, status, and timestamp Inspect response
AC-17.3 GET /health/system returns memory, CPU, and connection pool stats Inspect response
AC-17.4 Provider health scores are 0–100 per provider GET /health/providers
AC-17.5 Model health shows healthy, degraded, or down per model GET /health/models
AC-17.6 Gateway health dashboard returns both HTML and JSON formats GET /health/gateways/dashboard and /health/gateways/dashboard/data
AC-17.7 Uptime metrics are tracked and returned GET /health/uptime
AC-17.8 Health insights provide actionable recommendations GET /health/insights

Test Plan Refs: 12.1.1–12.4.4, 25.20

Automated Tests: ✅ Full coverage


18. Metrics & Observability

Description

Prometheus metrics, OpenTelemetry tracing, Sentry error tracking, Arize AI tracing, Pyroscope profiling.

Acceptance Criteria

# Criterion Verification
AC-18.1 GET /metrics returns valid Prometheus text format Parse response
AC-18.2 Parsed metrics include p50, p95, p99 latency percentiles GET /api/metrics/parsed
AC-18.3 Real-time stats update within 60 seconds of new requests GET /api/monitoring/stats/realtime
AC-18.4 Error rates are tracked per provider and per model GET /api/monitoring/error-rates
AC-18.5 Anomaly detection flags unusual patterns GET /api/monitoring/anomalies
AC-18.6 OpenTelemetry traces are initialized and exportable GET /api/instrumentation/health

Test Plan Refs: 13.1–13.12

Automated Tests: ✅ Full coverage


19. Caching System

Description

Multi-layer caching: semantic cache, exact-match response cache, external cache (Butter.dev), auth cache, catalog cache (L1/L2), DB query cache, health cache, local memory fallback. Every cache degrades gracefully.

Acceptance Criteria

# Criterion Verification
AC-19.1 Catalog endpoint responds in sub-100ms on cache hit Time GET /v1/models on second request
AC-19.2 Auth cache reduces lookup latency from 50-150ms to 1-5ms on subsequent requests Compare first vs. second auth timing
AC-19.3 When Redis is down, local memory cache activates — no requests are blocked Stop Redis, verify normal operation
AC-19.4 Cache invalidation clears all layers POST /admin/cache/clear, verify fresh data
AC-19.5 Cache TTLs are respected (auth: 5-10min, catalog L1: 5min, catalog L2: 15-30min) Verify expiration behavior
AC-19.6 Stampede protection prevents multiple simultaneous cache rebuilds Concurrent requests to cold cache

Test Plan Refs: 25.11, 25.12

Automated Tests: ✅ Full coverage


20. Image Generation

Description

POST /v1/images/generations generates images via provider routing. Credits deducted per generation.

Acceptance Criteria

# Criterion Verification
AC-20.1 Image generation returns 200 with image data or URL POST /v1/images/generations
AC-20.2 Credits are deducted based on image generation pricing Compare balance before/after
AC-20.3 User with 0 credits receives 402 POST with 0-credit user

Test Plan Refs: 18.1–18.3

Automated Tests: ✅ Full coverage


21. Audio Transcription

Description

POST /v1/audio/transcriptions accepts audio files (upload or base64) and returns transcription text.

Acceptance Criteria

# Criterion Verification
AC-21.1 File upload transcription returns 200 with text POST with audio file
AC-21.2 Base64 audio transcription returns 200 with text POST /v1/audio/transcriptions/base64
AC-21.3 Unsupported audio format returns appropriate error POST with invalid format

Test Plan Refs: 19.1–19.2

Automated Tests: ✅ Full coverage


22. Server-Side Tools

Description

Built-in tools (web search, text-to-speech) exposed via /v1/tools. SSRF protection on tool execution.

Acceptance Criteria

# Criterion Verification
AC-22.1 GET /v1/tools returns available tools (web_search, text_to_speech) Inspect response
AC-22.2 Tool definitions are in OpenAI function-calling format GET /v1/tools/definitions
AC-22.3 Nonexistent tool returns 404 GET /v1/tools/fake_tool
AC-22.4 Web search execution returns search results POST /v1/tools/execute
AC-22.5 SSRF protection prevents requests to internal/private IP ranges Attempt internal URL

Test Plan Refs: 20.1–20.6

Automated Tests: ✅ Full coverage


23. IP Allowlist

Description

Admin-managed IP allowlists restrict API key usage to specific IP addresses or ranges.

Acceptance Criteria

# Criterion Verification
AC-23.1 Admin can create, list, update, and delete IP allowlist entries CRUD operations
AC-23.2 IP check endpoint correctly identifies allowed vs. blocked IPs POST /api/admin/ip-whitelist/check
AC-23.3 API key with IP allowlist rejects requests from non-allowed IPs Use key from blocked IP

Test Plan Refs: 21.1–21.5

Automated Tests: ✅ Full coverage


24. Partner Trials

Description

Partner-specific trial configurations with custom credit amounts, durations, and daily limits.

Acceptance Criteria

# Criterion Verification
AC-24.1 Partner config is publicly accessible GET /partner-trials/config/{code}
AC-24.2 Partner code validation always returns 200 (valid or invalid indicated in body) GET /partner-trials/check/{code}
AC-24.3 Starting a partner trial applies partner-specific credits and limits POST /partner-trials/start
AC-24.4 Partner trial daily limit is enforced Check enforcement after exceeding

Test Plan Refs: 22.1–22.5

Automated Tests: ✅ Full coverage


25. Notifications

Description

User notification preferences, usage reports, and test notifications via email (Resend).

Acceptance Criteria

# Criterion Verification
AC-25.1 User can retrieve notification preferences GET /user/notifications/preferences
AC-25.2 Usage report can be triggered on demand POST /user/notifications/send-usage-report
AC-25.3 Test notification sends successfully POST /user/notifications/test

Test Plan Refs: 23.1–23.3

Automated Tests: ⚠️ Partial (delivery verification missing)


26. Admin Operations

Description

80+ admin endpoints for user management, credit operations, system monitoring, cache management, model sync, RBAC, trial analytics, downtime tracking, and coupon management.

Acceptance Criteria

# Criterion Verification
AC-26.1 Non-admin users receive 403 on all admin endpoints Use $USER_KEY on admin endpoint
AC-26.2 Admin can list, search, and view user details GET /admin/users
AC-26.3 Admin credit grants respect per-transaction cap and 24h daily limit Exceed limits
AC-26.4 Admin can assign plans to users POST /admin/assign-plan
AC-26.5 System monitor returns user counts, credit totals, and API usage GET /admin/monitor
AC-26.6 Cache status, refresh, and clear operations work GET/POST cache endpoints
AC-26.7 Model sync can be triggered incrementally and fully POST sync endpoints
AC-26.8 Role updates are logged in the audit trail POST /admin/roles/update, check audit log
AC-26.9 Downtime incidents can be listed, viewed, and resolved CRUD on downtime endpoints
AC-26.10 Coupon analytics show redemption rates and remaining uses GET /admin/coupons/{id}/analytics

Test Plan Refs: 24.1–24.10

Automated Tests: ✅ Full coverage


27. Security

Description

Fernet encryption for API keys, HMAC-SHA256 hashing, injection prevention (SQL, XSS, command, path traversal, LDAP, header, JSON), audit logging.

Acceptance Criteria

# Criterion Verification
AC-27.1 API keys are Fernet-encrypted in the database — raw values are ciphertext Query DB directly
AC-27.2 API key lookup uses HMAC hash, not brute-force decryption Verify code path
AC-27.3 SQL injection attempts in user inputs are sanitized/rejected Send '; DROP TABLE users; --
AC-27.4 XSS payloads in user inputs are sanitized/rejected Send <script>alert(1)</script>
AC-27.5 Command injection attempts are blocked Send ; rm -rf / in input fields
AC-27.6 Path traversal attempts are blocked Send ../../etc/passwd
AC-27.7 Admin security violations are logged in the audit trail Attempt unauthorized admin access, check log
AC-27.8 Error messages never expose stack traces, internal paths, or sensitive data Trigger errors, inspect responses

Test Plan Refs: 25.17

Automated Tests: ✅ Full coverage


28. Error Monitoring

Description

Autonomous error monitoring with dashboard, recent/critical/fixable error classification, and pattern detection.

Acceptance Criteria

# Criterion Verification
AC-28.1 Monitor status returns current operational state GET /error-monitor/autonomous/status
AC-28.2 Dashboard provides overview of error landscape GET /error-monitor/dashboard
AC-28.3 Recent errors are retrievable and sorted by recency GET /error-monitor/errors/recent
AC-28.4 Critical errors are flagged separately GET /error-monitor/errors/critical
AC-28.5 Error patterns detect recurring issues GET /error-monitor/errors/patterns

Test Plan Refs: 17.1–17.7

Automated Tests: ✅ Full coverage


29. Guardrails (Target Architecture)

Description

Input guardrails: PII detection, prompt injection defense, topic restrictions, content moderation. Output guardrails: content filtering, structured output validation, hallucination flags. This is aspirational — described in the Conceptual Model as future capability.

Acceptance Criteria

# Criterion Verification
AC-29.1 PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers and optionally redacts or blocks Send prompt with PII
AC-29.2 Prompt injection patterns that attempt to override system prompts are detected and blocked Send known injection pattern
AC-29.3 Per-API-key topic restrictions limit model responses to configured domains Configure restriction, test out-of-domain prompt
AC-29.4 Content moderation blocks harmful or policy-violating inputs before reaching providers Send harmful content
AC-29.5 Output content filtering scans responses for policy violations before returning to customer Trigger policy-violating response
AC-29.6 Structured output validation confirms JSON schema conformance when requested Request JSON schema output
AC-29.7 Provider-side safety metadata (refusals, safety filter triggers) is surfaced in standardized format Trigger safety filter on provider

Test Plan Refs: None (not yet implemented)

Automated Tests: ❌ No coverage (feature not yet implemented)


Summary Matrix

# Feature Criteria Count Automated Status
1 Authentication & Authorization 11 Implemented
2 Chat — OpenAI-Compatible 13 Implemented
3 Chat — Anthropic-Compatible 4 Implemented
4 Model Resolution & Aliasing 5 Implemented
5 Provider Failover 7 Implemented
6 Circuit Breakers 8 Implemented
7 Intelligent Routing 10 Implemented
8 Rate Limiting (3 Layers) 10 Implemented
9 Credit System 10 Implemented
10 Model Catalog 10 Implemented
11 Plans & Trials 8 Implemented
12 Payments (Stripe) 6 Implemented
13 Coupons 6 Implemented
14 Referrals 5 Implemented
15 Chat History & Sessions 7 Implemented
16 API Key Management 7 Implemented
17 Health & Monitoring 8 Implemented
18 Metrics & Observability 6 Implemented
19 Caching System 6 Implemented
20 Image Generation 3 Implemented
21 Audio Transcription 3 Implemented
22 Server-Side Tools 5 Implemented
23 IP Allowlist 3 Implemented
24 Partner Trials 4 Implemented
25 Notifications 3 ⚠️ Implemented (partial test coverage)
26 Admin Operations 10 Implemented
27 Security 8 Implemented
28 Error Monitoring 5 Implemented
29 Guardrails 7 Not implemented
TOTAL 202

Clone this wiki locally