-
Notifications
You must be signed in to change notification settings - Fork 1
Acceptance Criteria
Formal acceptance criteria for every Gatewayz feature area, derived from the Conceptual Model, Features, and Testing Plan.
Each feature has: a description of what "done" looks like, measurable acceptance criteria, and the corresponding test plan references.
Last updated: 2026-03-08
Each feature section follows this format:
- Description: What the feature does (from the Conceptual Model)
- Acceptance Criteria: Numbered, testable statements. A feature is accepted when ALL criteria pass.
- Test Plan Refs: Links to specific test cases in the Testing Plan
- Automated Tests: Whether automated tests exist (from Test Coverage Audit)
Users authenticate via Privy (email, phone, OAuth) and receive an API key. API keys are encrypted at rest with Fernet (AES-128). Keys are looked up via HMAC-SHA256 hash. Role-based access control (admin, developer, free) governs endpoint access.
| # | Criterion | Verification |
|---|---|---|
| AC-1.1 | A valid Privy token returns 200 with api_key, credits, and subscription_status fields |
POST /auth with valid token |
| AC-1.2 | A first-time user receives exactly $5.00 in credits and a trial expiring 3 days from creation | POST /auth with new identity, check credits and trial_end
|
| AC-1.3 | An invalid or expired Privy token returns 401 or 403, never 200 | POST /auth with bad token |
| AC-1.4 | Auth endpoint rate-limits to 10 requests per 15 minutes per IP; the 11th returns 429 | POST /auth 11 times from same IP |
| AC-1.5 | Registration rate-limits to 3 requests per hour per IP; the 4th returns 429 | POST /auth/register 4 times |
| AC-1.6 | API keys stored in the database are Fernet-encrypted — raw DB values are not plaintext and not the key itself | Query api_keys_new table directly |
| AC-1.7 | API key lookup uses HMAC-SHA256 hash, not decryption of all keys | Verify code path or timing (O(log n) not O(n)) |
| AC-1.8 | Non-admin API keys receive 403 when accessing /admin/* endpoints |
GET /admin/users with $USER_KEY
|
| AC-1.9 | Admin API keys receive 200 when accessing /admin/* endpoints |
GET /admin/users with $ADMIN_KEY
|
| AC-1.10 | Auth health endpoint returns 200 with DB, Redis, cache, and timeout status even when dependencies are degraded | GET /auth/health
|
| AC-1.11 | Temporary/disposable email domains are detected and flagged during registration | Register with user@tempmail.com
|
Test Plan Refs: 2.1–2.6, 24.1.1–24.1.2, 25.15, 25.17
Automated Tests: ✅ Full coverage
POST /v1/chat/completions accepts OpenAI-format requests and returns OpenAI-format responses. Supports streaming (SSE), JSON mode, tool/function calling, logprobs. Any application built for the OpenAI API works by changing only the base URL.
| # | Criterion | Verification |
|---|---|---|
| AC-2.1 | Non-streaming request returns 200 with choices[0].message.content, usage.prompt_tokens, usage.completion_tokens
|
POST with stream: false
|
| AC-2.2 | Streaming request returns SSE stream where each line starts with data: , contains valid JSON, and stream ends with data: [DONE]
|
POST with stream: true
|
| AC-2.3 |
response_format: {"type": "json_object"} returns content that is valid parseable JSON |
POST with JSON mode |
| AC-2.4 | Request with tools array returns tool_calls in the response when the model decides to call a tool |
POST with tool definitions |
| AC-2.5 | Request with logprobs: true returns a logprobs field in the response |
POST with logprobs enabled |
| AC-2.6 | After a successful completion, user's credit balance decreases by (prompt_tokens × prompt_price) + (completion_tokens × completion_price)
|
Compare balance before and after |
| AC-2.7 | User with 0 credits receives 402 without a provider API call being made (pre-flight check) | POST with 0-credit user, verify no upstream call |
| AC-2.8 | User with expired trial receives 402 | POST with expired trial key |
| AC-2.9 | Request with nonexistent model returns 400 or 404, not 500 | POST with model: "nonexistent/model"
|
| AC-2.10 | Unauthenticated request with a whitelisted model returns 200 | POST without auth header |
| AC-2.11 | Unauthenticated request with a non-whitelisted model returns 401 or 403 | POST without auth header |
| AC-2.12 | Exceeding RPM rate limit returns 429 with Retry-After header |
Exceed rate limit |
| AC-2.13 | OpenAI Python SDK (openai.OpenAI(base_url="$BASE/v1")) works with zero code changes beyond base URL and API key |
Run OpenAI SDK test |
Test Plan Refs: 3.1.1–3.1.12, 25.1, 25.18
Automated Tests: ✅ Full coverage
POST /v1/messages accepts Anthropic-format requests and returns Anthropic-format responses. Supports streaming in Anthropic SSE event format.
| # | Criterion | Verification |
|---|---|---|
| AC-3.1 | Non-streaming request returns 200 with content[0].text, usage.input_tokens, usage.output_tokens in Anthropic format |
POST /v1/messages
|
| AC-3.2 | Streaming request returns SSE events in Anthropic format (message_start, content_block_delta, message_stop) |
POST with stream: true
|
| AC-3.3 | Credits are deducted using Anthropic token counts | Compare balance before and after |
| AC-3.4 | Anthropic Python SDK (anthropic.Anthropic(base_url="$BASE/v1")) works with zero code changes beyond base URL and API key |
Run Anthropic SDK test |
Test Plan Refs: 3.2.1–3.2.2, 25.2
Automated Tests: ✅ Full coverage
120+ short aliases map to canonical model IDs. Provider detection follows: explicit overrides → format-based rules → mapping tables → org-prefix fallbacks. Model IDs are transformed to each provider's native format.
| # | Criterion | Verification |
|---|---|---|
| AC-4.1 |
gpt-4o resolves to openai/gpt-4o
|
POST chat with model: "gpt-4o"
|
| AC-4.2 |
r1 resolves to deepseek/deepseek-r1
|
POST chat with model: "r1"
|
| AC-4.3 | Canonical IDs (e.g., openai/gpt-4o) work directly without alias resolution |
POST chat with canonical ID |
| AC-4.4 | Provider detection correctly routes google/gemini-* models to Vertex when credentials are available |
POST chat with Gemini model |
| AC-4.5 | No alias maps to itself (no self-referencing loops) | Inspect MODEL_ALIASES dict |
Test Plan Refs: 3.3.1–3.3.2, 25.19
Automated Tests: ✅ Full coverage
When a provider fails, requests automatically retry with the next provider in a 14-provider prioritized chain. Circuit breakers prevent repeated calls to failing providers.
| # | Criterion | Verification |
|---|---|---|
| AC-5.1 | When primary provider returns 502/503/504, request succeeds via fallback provider transparently | Force primary failure, verify success |
| AC-5.2 | When provider returns 401/402/403/404, request fails over to next provider | Force auth error on primary |
| AC-5.3 | When provider returns 400 (user error), request does NOT failover — returns 400 to user | Send malformed request |
| AC-5.4 | When provider returns 429, request retries with backoff, does NOT failover | Trigger rate limit on provider |
| AC-5.5 | OpenAI models only failover to OpenAI → OpenRouter (not arbitrary providers) | Inspect failover chain for OpenAI model |
| AC-5.6 | Anthropic models only failover to Anthropic → OpenRouter | Inspect failover chain for Anthropic model |
| AC-5.7 | Open-source models can failover across all providers | Inspect failover chain for open-source model |
Test Plan Refs: 3.3.3, 25.3, 25.4
Automated Tests: ✅ Full coverage
Per-provider circuit breakers with states: CLOSED (healthy) → OPEN (blocked after 5 consecutive failures) → HALF_OPEN (testing recovery after 5-minute cooldown).
| # | Criterion | Verification |
|---|---|---|
| AC-6.1 | New provider starts in CLOSED state | GET /circuit-breakers/{new_provider}
|
| AC-6.2 | After 5 consecutive failures, state transitions to OPEN | Send 5 failing requests, check state |
| AC-6.3 | OPEN state prevents requests from being routed to that provider | Verify provider is skipped in failover chain |
| AC-6.4 | After 300 seconds (5 min), OPEN transitions to HALF_OPEN | Wait 5 min, check state |
| AC-6.5 | In HALF_OPEN, a successful request transitions to CLOSED | Send successful request, check state |
| AC-6.6 | In HALF_OPEN, a failed request transitions back to OPEN | Send failing request, check state |
| AC-6.7 | Manual reset via POST /circuit-breakers/{provider}/reset returns state to CLOSED |
Reset and verify |
| AC-6.8 |
POST /circuit-breakers/reset-all resets all breakers to CLOSED |
Reset all and verify |
Test Plan Refs: 5.1–5.4, 25.5–25.6
Automated Tests: ✅ Full coverage
Two routing systems: General Router (quality/cost/latency/balanced via NotDiamond) and Code Router (benchmark-driven, SWE-bench/HumanEval scored, tiered by complexity).
| # | Criterion | Verification |
|---|---|---|
| AC-7.1 |
router:general:quality selects a high-quality model and returns 200 |
POST chat with router model |
| AC-7.2 |
router:general:cost selects a cheaper model than router:general:quality for the same prompt |
Compare selected models |
| AC-7.3 |
router:general:latency selects a low-latency model |
POST and verify model selection |
| AC-7.4 |
router:general:balanced considers quality, cost, and latency |
POST and verify selection |
| AC-7.5 |
router:code:auto classifies prompt complexity and selects an appropriate tier model |
POST with code prompt |
| AC-7.6 |
router:code:quality selects the highest-tier code model |
POST and verify |
| AC-7.7 |
router:code:price selects a cost-effective code model |
POST and verify |
| AC-7.8 |
router:code:agentic selects a model optimized for agentic coding tasks |
POST and verify |
| AC-7.9 | Code router tiers endpoint returns models with SWE-bench/HumanEval benchmark scores | GET /code-router/tiers
|
| AC-7.10 | Router test endpoints return selected model + reasoning/rationale | POST /code-router/test, POST /general-router/test
|
Test Plan Refs: 3.4.1–3.4.8, 6.1–6.5, 7.1–7.5
Automated Tests: ✅ Full coverage
Layer 1: IP-level (security middleware, behavioral analysis, velocity detection). Layer 2: API key-level (Redis-backed, per-plan limits). Layer 3: Anonymous (stricter limits for unauthenticated). Graceful degradation to in-memory fallback when Redis is down.
| # | Criterion | Verification |
|---|---|---|
| AC-8.1 | Unauthenticated requests exceeding 300 RPM from same IP receive 429 | Exceed IP limit |
| AC-8.2 | Authenticated requests exceeding plan RPM receive 429 | Exceed key limit |
| AC-8.3 | Anonymous rate limits are stricter than authenticated limits | Compare thresholds |
| AC-8.4 | Authenticated users are exempt from IP-level rate limiting | Verify no IP block on auth users |
| AC-8.5 | Rate limit response includes Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset headers |
Inspect 429 response headers |
| AC-8.6 | When Redis is unavailable, rate limiting continues via in-memory fallback — requests are never blocked due to infrastructure failure | Stop Redis, verify rate limiting works |
| AC-8.7 | Velocity mode activates when error rate exceeds threshold (25%) and reduces limits | Trigger high error rate |
| AC-8.8 | Velocity mode deactivates after 3 minutes of normal error rates | Wait for cooldown |
| AC-8.9 | Rate limit configuration is viewable at GET /user/rate-limits
|
Check endpoint |
| AC-8.10 | Per-key rate limits can be updated via PUT /user/rate-limits/{key_id}
|
Update and verify |
Test Plan Refs: 8.4.1–8.4.4, 25.7–25.10
Automated Tests: ✅ Full coverage
Atomic unit of billing. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Subscription allowance consumed first, then purchased credits. Pre-flight checks, idempotent deductions, auto-refunds on provider errors.
| # | Criterion | Verification |
|---|---|---|
| AC-9.1 | Pre-flight check: user with 0 credits receives 402 before any provider call is made | POST chat with 0-credit user |
| AC-9.2 | Idempotent deduction: same request ID sent twice results in credits deducted only once | POST twice with same X-Request-ID
|
| AC-9.3 | Subscription allowance is consumed before purchased credits | User with both: make request, verify subscription decreases |
| AC-9.4 | Provider 5xx error results in automatic credit refund | Trigger 5xx, verify refund transaction |
| AC-9.5 | Provider 4xx error (user error) does NOT trigger refund | Trigger 4xx, verify no refund |
| AC-9.6 | High-value models (GPT-4, Claude, Gemini, o1/o3/o4) are blocked if pricing falls to default — prevents under-billing | Verify explicit pricing exists for all premium models |
| AC-9.7 | Credit transactions are logged with unique request ID, user ID, model, token counts, and cost | Check credit_transactions table |
| AC-9.8 | Balance update and transaction record happen atomically (single DB transaction) | Verify no orphaned records |
| AC-9.9 | Admin can add/adjust/refund credits via /credits/* endpoints |
POST /credits/add and verify |
| AC-9.10 | Credit transaction history is paginated and filterable | GET /credits/transactions?limit=10
|
Test Plan Refs: 9.1.1–9.1.7, 9.2.1–9.2.4, 25.13, 25.14
Automated Tests: ✅ Full coverage
10,000+ models from 30+ providers. Background sync, multi-layer caching, HuggingFace enrichment. Models require resolvable pricing, active provider, valid modality, and deduplication.
| # | Criterion | Verification |
|---|---|---|
| AC-10.1 |
GET /v1/models returns a list where every model has id, name, provider_slug, context_length, and pricing data |
Inspect response |
| AC-10.2 | No model in the catalog has null or zero pricing | Scan all models in response |
| AC-10.3 |
GET /v1/models?provider=fireworks returns only models from Fireworks |
Filter and verify |
| AC-10.4 |
GET /v1/models/unique returns no duplicate model IDs |
Check for uniqueness |
| AC-10.5 |
GET /v1/models/search?q=llama returns models matching "llama" |
Verify results |
| AC-10.6 | Model detail endpoint returns HuggingFace enrichment (downloads, likes, parameters) when available | GET /api/models/detail?model_id=meta-llama/...
|
| AC-10.7 | Catalog is served from cache on subsequent requests (sub-100ms response time on cache hit) | Time two consecutive requests |
| AC-10.8 |
GET /v1/gateways returns all registered gateways with name, color, priority, site_url
|
Inspect response |
| AC-10.9 | Model health endpoint shows healthy, degraded, or down status per model |
GET /v1/model-health
|
| AC-10.10 | Model availability endpoint shows fallback providers for a given model | GET /availability/fallback/{model_id}
|
Test Plan Refs: 4.1.1–4.1.12, 4.2–4.8, 25.11
Automated Tests: ✅ Full coverage
Trial: 3 days, $5 cap, 1M tokens, 10K requests. Plans: Dev (pay-as-you-go), Team (subscription), Enterprise (custom). Expired trials can still access :free models. Purchased credits never expire.
| # | Criterion | Verification |
|---|---|---|
| AC-11.1 | New user gets $5 credits and trial expiring in 3 days | Register and check balance + trial_end |
| AC-11.2 | Trial user can make requests until credits/limits exhausted | Make requests during trial |
| AC-11.3 | Expired trial returns 402 for paid models | POST chat after trial expiry |
| AC-11.4 | Expired trial can still access :free suffix models |
POST chat with :free model after expiry |
| AC-11.5 |
GET /plans returns available plan tiers with pricing |
Inspect response |
| AC-11.6 |
GET /trial/status returns active/expired and days remaining |
Check response |
| AC-11.7 | Unused subscription allowance does NOT roll over — resets monthly | Verify at month boundary |
| AC-11.8 | Purchased credits never expire and survive plan changes | Change plan, verify credits persist |
Test Plan Refs: 8.5.1–8.5.4, 25.15, 25.16
Automated Tests: ✅ Full coverage
Credit package purchases, subscription management, checkout sessions, payment intents, webhooks. Stripe webhook always returns 200 (even on processing errors).
| # | Criterion | Verification |
|---|---|---|
| AC-12.1 |
GET /api/stripe/credit-packages returns available packages with pricing (public, no auth) |
Inspect response |
| AC-12.2 |
POST /api/stripe/checkout-session returns a valid Stripe checkout URL |
Create session |
| AC-12.3 | Successful payment webhook adds credits to user's balance | Simulate webhook |
| AC-12.4 | Webhook endpoint always returns 200, even if processing fails internally | Send malformed webhook |
| AC-12.5 | Payment history is paginated and includes amount, date, status | GET /api/stripe/payments
|
| AC-12.6 | Subscription checkout creates a Stripe subscription and assigns the plan | POST /api/stripe/subscription-checkout
|
Test Plan Refs: 11.1–11.8
Automated Tests: ✅ Full coverage
Redeemable coupon codes that add credits. Supports global and user-specific coupons, expiration, max redemptions, and one-per-user restrictions.
| # | Criterion | Verification |
|---|---|---|
| AC-13.1 | Valid coupon code redeems successfully and adds the correct credit amount | POST /coupons/redeem
|
| AC-13.2 | Expired coupon returns 400 | Redeem expired code |
| AC-13.3 | Already-redeemed coupon (by same user) returns 400 | Redeem twice |
| AC-13.4 | User-specific coupon redeemed by wrong user returns 400 or 403 | Redeem with different user's key |
| AC-13.5 |
GET /coupons/available returns both global and user-targeted coupons |
Inspect response |
| AC-13.6 | Redemption history shows all past redemptions for the user | GET /coupons/history
|
Test Plan Refs: 10.1–10.6
Automated Tests: ✅ Full coverage
Users generate referral codes. Referred users sign up with the code. Both parties receive credit rewards upon conversion.
| # | Criterion | Verification |
|---|---|---|
| AC-14.1 | User can generate a unique referral code | POST /referral/generate
|
| AC-14.2 | Referral code can be validated | POST /referral/validate
|
| AC-14.3 | Self-referral is prevented | Attempt self-referral, verify rejection |
| AC-14.4 | Referral stats show total referred, conversions, and rewards | GET /referral/stats
|
| AC-14.5 | Successful referral grants credits to both referrer and referred user | Complete referral flow |
Test Plan Refs: 8.6.1–8.6.4
Automated Tests: ✅ Full coverage
Persistent chat sessions with message storage, batch operations, full-text search, metadata updates, deletion, and usage stats.
| # | Criterion | Verification |
|---|---|---|
| AC-15.1 | Sessions can be listed, created, updated, and deleted | CRUD operations |
| AC-15.2 | Messages can be saved individually and in batch | POST single and batch |
| AC-15.3 | Full-text search returns matching sessions across all user's history | POST /v1/chat/search
|
| AC-15.4 | Duplicate messages are deduplicated (no double-saves) | Save same message twice, verify single entry |
| AC-15.5 | Chat stats return accurate usage data | GET /v1/chat/stats
|
| AC-15.6 | Share links provide public read-only access to conversations | Create share, access without auth |
| AC-15.7 | Feedback can be submitted, retrieved, updated, and deleted per session | CRUD on feedback |
Test Plan Refs: 3.5.1–3.5.8, 3.6.1–3.6.6, 3.7.1–3.7.4
Automated Tests: ✅ Full coverage
Users create, list, update, and delete API keys. Keys are in gw_{env}_* format. Rate limited to 10 creations per hour. Keys encrypted at rest.
| # | Criterion | Verification |
|---|---|---|
| AC-16.1 | Created key is in gw_{env}_* format (e.g., gw_dev_abc123...) |
POST /user/api-keys
|
| AC-16.2 | Key creation rate-limited to 10 per hour; 11th returns 429 | Create 11 keys |
| AC-16.3 | Keys can be listed, showing all active keys | GET /user/api-keys
|
| AC-16.4 | Keys can be updated (name, restrictions) | PUT /user/api-keys/{key_id}
|
| AC-16.5 | Keys can be deleted | DELETE /user/api-keys/{key_id}
|
| AC-16.6 | Deleted key no longer authenticates | Use deleted key, verify 401 |
| AC-16.7 | Audit logs record key creation, usage, and deletion | GET /user/api-keys/audit-logs
|
Test Plan Refs: 8.3.1–8.3.7
Automated Tests: ✅ Full coverage
Tiered health monitoring (Critical/Popular/Standard/On-Demand). System, provider, model, and gateway health endpoints. Health endpoints always return 200 (degradation in body, not status code).
| # | Criterion | Verification |
|---|---|---|
| AC-17.1 |
GET /health always returns 200, even when dependencies are degraded |
Call when DB is down |
| AC-17.2 | Health response includes version, status, and timestamp
|
Inspect response |
| AC-17.3 |
GET /health/system returns memory, CPU, and connection pool stats |
Inspect response |
| AC-17.4 | Provider health scores are 0–100 per provider | GET /health/providers
|
| AC-17.5 | Model health shows healthy, degraded, or down per model |
GET /health/models
|
| AC-17.6 | Gateway health dashboard returns both HTML and JSON formats | GET /health/gateways/dashboard and /health/gateways/dashboard/data
|
| AC-17.7 | Uptime metrics are tracked and returned | GET /health/uptime
|
| AC-17.8 | Health insights provide actionable recommendations | GET /health/insights
|
Test Plan Refs: 12.1.1–12.4.4, 25.20
Automated Tests: ✅ Full coverage
Prometheus metrics, OpenTelemetry tracing, Sentry error tracking, Arize AI tracing, Pyroscope profiling.
| # | Criterion | Verification |
|---|---|---|
| AC-18.1 |
GET /metrics returns valid Prometheus text format |
Parse response |
| AC-18.2 | Parsed metrics include p50, p95, p99 latency percentiles | GET /api/metrics/parsed
|
| AC-18.3 | Real-time stats update within 60 seconds of new requests | GET /api/monitoring/stats/realtime
|
| AC-18.4 | Error rates are tracked per provider and per model | GET /api/monitoring/error-rates
|
| AC-18.5 | Anomaly detection flags unusual patterns | GET /api/monitoring/anomalies
|
| AC-18.6 | OpenTelemetry traces are initialized and exportable | GET /api/instrumentation/health
|
Test Plan Refs: 13.1–13.12
Automated Tests: ✅ Full coverage
Multi-layer caching: semantic cache, exact-match response cache, external cache (Butter.dev), auth cache, catalog cache (L1/L2), DB query cache, health cache, local memory fallback. Every cache degrades gracefully.
| # | Criterion | Verification |
|---|---|---|
| AC-19.1 | Catalog endpoint responds in sub-100ms on cache hit | Time GET /v1/models on second request |
| AC-19.2 | Auth cache reduces lookup latency from 50-150ms to 1-5ms on subsequent requests | Compare first vs. second auth timing |
| AC-19.3 | When Redis is down, local memory cache activates — no requests are blocked | Stop Redis, verify normal operation |
| AC-19.4 | Cache invalidation clears all layers | POST /admin/cache/clear, verify fresh data |
| AC-19.5 | Cache TTLs are respected (auth: 5-10min, catalog L1: 5min, catalog L2: 15-30min) | Verify expiration behavior |
| AC-19.6 | Stampede protection prevents multiple simultaneous cache rebuilds | Concurrent requests to cold cache |
Test Plan Refs: 25.11, 25.12
Automated Tests: ✅ Full coverage
POST /v1/images/generations generates images via provider routing. Credits deducted per generation.
| # | Criterion | Verification |
|---|---|---|
| AC-20.1 | Image generation returns 200 with image data or URL | POST /v1/images/generations
|
| AC-20.2 | Credits are deducted based on image generation pricing | Compare balance before/after |
| AC-20.3 | User with 0 credits receives 402 | POST with 0-credit user |
Test Plan Refs: 18.1–18.3
Automated Tests: ✅ Full coverage
POST /v1/audio/transcriptions accepts audio files (upload or base64) and returns transcription text.
| # | Criterion | Verification |
|---|---|---|
| AC-21.1 | File upload transcription returns 200 with text | POST with audio file |
| AC-21.2 | Base64 audio transcription returns 200 with text | POST /v1/audio/transcriptions/base64
|
| AC-21.3 | Unsupported audio format returns appropriate error | POST with invalid format |
Test Plan Refs: 19.1–19.2
Automated Tests: ✅ Full coverage
Built-in tools (web search, text-to-speech) exposed via /v1/tools. SSRF protection on tool execution.
| # | Criterion | Verification |
|---|---|---|
| AC-22.1 |
GET /v1/tools returns available tools (web_search, text_to_speech) |
Inspect response |
| AC-22.2 | Tool definitions are in OpenAI function-calling format | GET /v1/tools/definitions
|
| AC-22.3 | Nonexistent tool returns 404 | GET /v1/tools/fake_tool
|
| AC-22.4 | Web search execution returns search results | POST /v1/tools/execute
|
| AC-22.5 | SSRF protection prevents requests to internal/private IP ranges | Attempt internal URL |
Test Plan Refs: 20.1–20.6
Automated Tests: ✅ Full coverage
Admin-managed IP allowlists restrict API key usage to specific IP addresses or ranges.
| # | Criterion | Verification |
|---|---|---|
| AC-23.1 | Admin can create, list, update, and delete IP allowlist entries | CRUD operations |
| AC-23.2 | IP check endpoint correctly identifies allowed vs. blocked IPs | POST /api/admin/ip-whitelist/check
|
| AC-23.3 | API key with IP allowlist rejects requests from non-allowed IPs | Use key from blocked IP |
Test Plan Refs: 21.1–21.5
Automated Tests: ✅ Full coverage
Partner-specific trial configurations with custom credit amounts, durations, and daily limits.
| # | Criterion | Verification |
|---|---|---|
| AC-24.1 | Partner config is publicly accessible | GET /partner-trials/config/{code}
|
| AC-24.2 | Partner code validation always returns 200 (valid or invalid indicated in body) | GET /partner-trials/check/{code}
|
| AC-24.3 | Starting a partner trial applies partner-specific credits and limits | POST /partner-trials/start
|
| AC-24.4 | Partner trial daily limit is enforced | Check enforcement after exceeding |
Test Plan Refs: 22.1–22.5
Automated Tests: ✅ Full coverage
User notification preferences, usage reports, and test notifications via email (Resend).
| # | Criterion | Verification |
|---|---|---|
| AC-25.1 | User can retrieve notification preferences | GET /user/notifications/preferences
|
| AC-25.2 | Usage report can be triggered on demand | POST /user/notifications/send-usage-report
|
| AC-25.3 | Test notification sends successfully | POST /user/notifications/test
|
Test Plan Refs: 23.1–23.3
Automated Tests:
80+ admin endpoints for user management, credit operations, system monitoring, cache management, model sync, RBAC, trial analytics, downtime tracking, and coupon management.
| # | Criterion | Verification |
|---|---|---|
| AC-26.1 | Non-admin users receive 403 on all admin endpoints | Use $USER_KEY on admin endpoint |
| AC-26.2 | Admin can list, search, and view user details | GET /admin/users
|
| AC-26.3 | Admin credit grants respect per-transaction cap and 24h daily limit | Exceed limits |
| AC-26.4 | Admin can assign plans to users | POST /admin/assign-plan
|
| AC-26.5 | System monitor returns user counts, credit totals, and API usage | GET /admin/monitor
|
| AC-26.6 | Cache status, refresh, and clear operations work | GET/POST cache endpoints |
| AC-26.7 | Model sync can be triggered incrementally and fully | POST sync endpoints |
| AC-26.8 | Role updates are logged in the audit trail | POST /admin/roles/update, check audit log |
| AC-26.9 | Downtime incidents can be listed, viewed, and resolved | CRUD on downtime endpoints |
| AC-26.10 | Coupon analytics show redemption rates and remaining uses | GET /admin/coupons/{id}/analytics
|
Test Plan Refs: 24.1–24.10
Automated Tests: ✅ Full coverage
Fernet encryption for API keys, HMAC-SHA256 hashing, injection prevention (SQL, XSS, command, path traversal, LDAP, header, JSON), audit logging.
| # | Criterion | Verification |
|---|---|---|
| AC-27.1 | API keys are Fernet-encrypted in the database — raw values are ciphertext | Query DB directly |
| AC-27.2 | API key lookup uses HMAC hash, not brute-force decryption | Verify code path |
| AC-27.3 | SQL injection attempts in user inputs are sanitized/rejected | Send '; DROP TABLE users; --
|
| AC-27.4 | XSS payloads in user inputs are sanitized/rejected | Send <script>alert(1)</script>
|
| AC-27.5 | Command injection attempts are blocked | Send ; rm -rf / in input fields |
| AC-27.6 | Path traversal attempts are blocked | Send ../../etc/passwd
|
| AC-27.7 | Admin security violations are logged in the audit trail | Attempt unauthorized admin access, check log |
| AC-27.8 | Error messages never expose stack traces, internal paths, or sensitive data | Trigger errors, inspect responses |
Test Plan Refs: 25.17
Automated Tests: ✅ Full coverage
Autonomous error monitoring with dashboard, recent/critical/fixable error classification, and pattern detection.
| # | Criterion | Verification |
|---|---|---|
| AC-28.1 | Monitor status returns current operational state | GET /error-monitor/autonomous/status
|
| AC-28.2 | Dashboard provides overview of error landscape | GET /error-monitor/dashboard
|
| AC-28.3 | Recent errors are retrievable and sorted by recency | GET /error-monitor/errors/recent
|
| AC-28.4 | Critical errors are flagged separately | GET /error-monitor/errors/critical
|
| AC-28.5 | Error patterns detect recurring issues | GET /error-monitor/errors/patterns
|
Test Plan Refs: 17.1–17.7
Automated Tests: ✅ Full coverage
Input guardrails: PII detection, prompt injection defense, topic restrictions, content moderation. Output guardrails: content filtering, structured output validation, hallucination flags. This is aspirational — described in the Conceptual Model as future capability.
| # | Criterion | Verification |
|---|---|---|
| AC-29.1 | PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers and optionally redacts or blocks | Send prompt with PII |
| AC-29.2 | Prompt injection patterns that attempt to override system prompts are detected and blocked | Send known injection pattern |
| AC-29.3 | Per-API-key topic restrictions limit model responses to configured domains | Configure restriction, test out-of-domain prompt |
| AC-29.4 | Content moderation blocks harmful or policy-violating inputs before reaching providers | Send harmful content |
| AC-29.5 | Output content filtering scans responses for policy violations before returning to customer | Trigger policy-violating response |
| AC-29.6 | Structured output validation confirms JSON schema conformance when requested | Request JSON schema output |
| AC-29.7 | Provider-side safety metadata (refusals, safety filter triggers) is surfaced in standardized format | Trigger safety filter on provider |
Test Plan Refs: None (not yet implemented)
Automated Tests: ❌ No coverage (feature not yet implemented)
| # | Feature | Criteria Count | Automated | Status |
|---|---|---|---|---|
| 1 | Authentication & Authorization | 11 | ✅ | Implemented |
| 2 | Chat — OpenAI-Compatible | 13 | ✅ | Implemented |
| 3 | Chat — Anthropic-Compatible | 4 | ✅ | Implemented |
| 4 | Model Resolution & Aliasing | 5 | ✅ | Implemented |
| 5 | Provider Failover | 7 | ✅ | Implemented |
| 6 | Circuit Breakers | 8 | ✅ | Implemented |
| 7 | Intelligent Routing | 10 | ✅ | Implemented |
| 8 | Rate Limiting (3 Layers) | 10 | ✅ | Implemented |
| 9 | Credit System | 10 | ✅ | Implemented |
| 10 | Model Catalog | 10 | ✅ | Implemented |
| 11 | Plans & Trials | 8 | ✅ | Implemented |
| 12 | Payments (Stripe) | 6 | ✅ | Implemented |
| 13 | Coupons | 6 | ✅ | Implemented |
| 14 | Referrals | 5 | ✅ | Implemented |
| 15 | Chat History & Sessions | 7 | ✅ | Implemented |
| 16 | API Key Management | 7 | ✅ | Implemented |
| 17 | Health & Monitoring | 8 | ✅ | Implemented |
| 18 | Metrics & Observability | 6 | ✅ | Implemented |
| 19 | Caching System | 6 | ✅ | Implemented |
| 20 | Image Generation | 3 | ✅ | Implemented |
| 21 | Audio Transcription | 3 | ✅ | Implemented |
| 22 | Server-Side Tools | 5 | ✅ | Implemented |
| 23 | IP Allowlist | 3 | ✅ | Implemented |
| 24 | Partner Trials | 4 | ✅ | Implemented |
| 25 | Notifications | 3 | Implemented (partial test coverage) | |
| 26 | Admin Operations | 10 | ✅ | Implemented |
| 27 | Security | 8 | ✅ | Implemented |
| 28 | Error Monitoring | 5 | ✅ | Implemented |
| 29 | Guardrails | 7 | ❌ | Not implemented |
| TOTAL | 202 |
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References