Acceptance Criteria

Formal acceptance criteria for every Gatewayz feature area, derived from the Conceptual Model, Features, and Testing Plan.

Each feature has: a description of what "done" looks like, measurable acceptance criteria, and the corresponding test plan references.

Last updated: 2026-03-08

How to Read This Document

Each feature section follows this format:

Description: What the feature does (from the Conceptual Model)
Acceptance Criteria: Numbered, testable statements. A feature is accepted when ALL criteria pass.
Test Plan Refs: Links to specific test cases in the Testing Plan
Automated Tests: Whether automated tests exist (from Test Coverage Audit)

1. Authentication & Authorization

Description

Users authenticate via Privy (email, phone, OAuth) and receive an API key. API keys are encrypted at rest with Fernet (AES-128). Keys are looked up via HMAC-SHA256 hash. Role-based access control (admin, developer, free) governs endpoint access.

Acceptance Criteria

#	Criterion	Verification
AC-1.1	A valid Privy token returns 200 with `api_key`, `credits`, and `subscription_status` fields	POST `/auth` with valid token
AC-1.2	A first-time user receives exactly $5.00 in credits and a trial expiring 3 days from creation	POST `/auth` with new identity, check `credits` and `trial_end`
AC-1.3	An invalid or expired Privy token returns 401 or 403, never 200	POST `/auth` with bad token
AC-1.4	Auth endpoint rate-limits to 10 requests per 15 minutes per IP; the 11th returns 429	POST `/auth` 11 times from same IP
AC-1.5	Registration rate-limits to 3 requests per hour per IP; the 4th returns 429	POST `/auth/register` 4 times
AC-1.6	API keys stored in the database are Fernet-encrypted — raw DB values are not plaintext and not the key itself	Query `api_keys_new` table directly
AC-1.7	API key lookup uses HMAC-SHA256 hash, not decryption of all keys	Verify code path or timing (O(log n) not O(n))
AC-1.8	Non-admin API keys receive 403 when accessing `/admin/*` endpoints	GET `/admin/users` with `$USER_KEY`
AC-1.9	Admin API keys receive 200 when accessing `/admin/*` endpoints	GET `/admin/users` with `$ADMIN_KEY`
AC-1.10	Auth health endpoint returns 200 with DB, Redis, cache, and timeout status even when dependencies are degraded	GET `/auth/health`
AC-1.11	Temporary/disposable email domains are detected and flagged during registration	Register with `user@tempmail.com`

Test Plan Refs: 2.1–2.6, 24.1.1–24.1.2, 25.15, 25.17

Automated Tests: ✅ Full coverage

2. Chat & Inference — OpenAI-Compatible

Description

POST /v1/chat/completions accepts OpenAI-format requests and returns OpenAI-format responses. Supports streaming (SSE), JSON mode, tool/function calling, logprobs. Any application built for the OpenAI API works by changing only the base URL.

Acceptance Criteria

#	Criterion	Verification
AC-2.1	Non-streaming request returns 200 with `choices[0].message.content`, `usage.prompt_tokens`, `usage.completion_tokens`	POST with `stream: false`
AC-2.2	Streaming request returns SSE stream where each line starts with `data:` , contains valid JSON, and stream ends with `data: [DONE]`	POST with `stream: true`
AC-2.3	`response_format: {"type": "json_object"}` returns content that is valid parseable JSON	POST with JSON mode
AC-2.4	Request with `tools` array returns `tool_calls` in the response when the model decides to call a tool	POST with tool definitions
AC-2.5	Request with `logprobs: true` returns a `logprobs` field in the response	POST with logprobs enabled
AC-2.6	After a successful completion, user's credit balance decreases by `(prompt_tokens × prompt_price) + (completion_tokens × completion_price)`	Compare balance before and after
AC-2.7	User with 0 credits receives 402 without a provider API call being made (pre-flight check)	POST with 0-credit user, verify no upstream call
AC-2.8	User with expired trial receives 402	POST with expired trial key
AC-2.9	Request with nonexistent model returns 400 or 404, not 500	POST with `model: "nonexistent/model"`
AC-2.10	Unauthenticated request with a whitelisted model returns 200	POST without auth header
AC-2.11	Unauthenticated request with a non-whitelisted model returns 401 or 403	POST without auth header
AC-2.12	Exceeding RPM rate limit returns 429 with `Retry-After` header	Exceed rate limit
AC-2.13	OpenAI Python SDK (`openai.OpenAI(base_url="$BASE/v1")`) works with zero code changes beyond base URL and API key	Run OpenAI SDK test

Test Plan Refs: 3.1.1–3.1.12, 25.1, 25.18

Automated Tests: ✅ Full coverage

3. Chat & Inference — Anthropic-Compatible

Description

POST /v1/messages accepts Anthropic-format requests and returns Anthropic-format responses. Supports streaming in Anthropic SSE event format.

Acceptance Criteria

#	Criterion	Verification
AC-3.1	Non-streaming request returns 200 with `content[0].text`, `usage.input_tokens`, `usage.output_tokens` in Anthropic format	POST `/v1/messages`
AC-3.2	Streaming request returns SSE events in Anthropic format (`message_start`, `content_block_delta`, `message_stop`)	POST with `stream: true`
AC-3.3	Credits are deducted using Anthropic token counts	Compare balance before and after
AC-3.4	Anthropic Python SDK (`anthropic.Anthropic(base_url="$BASE/v1")`) works with zero code changes beyond base URL and API key	Run Anthropic SDK test

Test Plan Refs: 3.2.1–3.2.2, 25.2

Automated Tests: ✅ Full coverage

4. Model Resolution & Aliasing

Description

120+ short aliases map to canonical model IDs. Provider detection follows: explicit overrides → format-based rules → mapping tables → org-prefix fallbacks. Model IDs are transformed to each provider's native format.

Acceptance Criteria

#	Criterion	Verification
AC-4.1	`gpt-4o` resolves to `openai/gpt-4o`	POST chat with `model: "gpt-4o"`
AC-4.2	`r1` resolves to `deepseek/deepseek-r1`	POST chat with `model: "r1"`
AC-4.3	Canonical IDs (e.g., `openai/gpt-4o`) work directly without alias resolution	POST chat with canonical ID
AC-4.4	Provider detection correctly routes `google/gemini-*` models to Vertex when credentials are available	POST chat with Gemini model
AC-4.5	No alias maps to itself (no self-referencing loops)	Inspect `MODEL_ALIASES` dict

Test Plan Refs: 3.3.1–3.3.2, 25.19

Automated Tests: ✅ Full coverage

5. Provider Failover

Description

When a provider fails, requests automatically retry with the next provider in a 14-provider prioritized chain. Circuit breakers prevent repeated calls to failing providers.

Acceptance Criteria

#	Criterion	Verification
AC-5.1	When primary provider returns 502/503/504, request succeeds via fallback provider transparently	Force primary failure, verify success
AC-5.2	When provider returns 401/402/403/404, request fails over to next provider	Force auth error on primary
AC-5.3	When provider returns 400 (user error), request does NOT failover — returns 400 to user	Send malformed request
AC-5.4	When provider returns 429, request retries with backoff, does NOT failover	Trigger rate limit on provider
AC-5.5	OpenAI models only failover to OpenAI → OpenRouter (not arbitrary providers)	Inspect failover chain for OpenAI model
AC-5.6	Anthropic models only failover to Anthropic → OpenRouter	Inspect failover chain for Anthropic model
AC-5.7	Open-source models can failover across all providers	Inspect failover chain for open-source model

Test Plan Refs: 3.3.3, 25.3, 25.4

Automated Tests: ✅ Full coverage

6. Circuit Breakers

Description

Per-provider circuit breakers with states: CLOSED (healthy) → OPEN (blocked after 5 consecutive failures) → HALF_OPEN (testing recovery after 5-minute cooldown).

Acceptance Criteria

#	Criterion	Verification
AC-6.1	New provider starts in CLOSED state	GET `/circuit-breakers/{new_provider}`
AC-6.2	After 5 consecutive failures, state transitions to OPEN	Send 5 failing requests, check state
AC-6.3	OPEN state prevents requests from being routed to that provider	Verify provider is skipped in failover chain
AC-6.4	After 300 seconds (5 min), OPEN transitions to HALF_OPEN	Wait 5 min, check state
AC-6.5	In HALF_OPEN, a successful request transitions to CLOSED	Send successful request, check state
AC-6.6	In HALF_OPEN, a failed request transitions back to OPEN	Send failing request, check state
AC-6.7	Manual reset via `POST /circuit-breakers/{provider}/reset` returns state to CLOSED	Reset and verify
AC-6.8	`POST /circuit-breakers/reset-all` resets all breakers to CLOSED	Reset all and verify

Test Plan Refs: 5.1–5.4, 25.5–25.6

Automated Tests: ✅ Full coverage

7. Intelligent Routing

Description

Two routing systems: General Router (quality/cost/latency/balanced via NotDiamond) and Code Router (benchmark-driven, SWE-bench/HumanEval scored, tiered by complexity).

Acceptance Criteria

#	Criterion	Verification
AC-7.1	`router:general:quality` selects a high-quality model and returns 200	POST chat with router model
AC-7.2	`router:general:cost` selects a cheaper model than `router:general:quality` for the same prompt	Compare selected models
AC-7.3	`router:general:latency` selects a low-latency model	POST and verify model selection
AC-7.4	`router:general:balanced` considers quality, cost, and latency	POST and verify selection
AC-7.5	`router:code:auto` classifies prompt complexity and selects an appropriate tier model	POST with code prompt
AC-7.6	`router:code:quality` selects the highest-tier code model	POST and verify
AC-7.7	`router:code:price` selects a cost-effective code model	POST and verify
AC-7.8	`router:code:agentic` selects a model optimized for agentic coding tasks	POST and verify
AC-7.9	Code router tiers endpoint returns models with SWE-bench/HumanEval benchmark scores	GET `/code-router/tiers`
AC-7.10	Router test endpoints return selected model + reasoning/rationale	POST `/code-router/test`, POST `/general-router/test`

Test Plan Refs: 3.4.1–3.4.8, 6.1–6.5, 7.1–7.5

Automated Tests: ✅ Full coverage

8. Rate Limiting (Three Layers)

Description

Layer 1: IP-level (security middleware, behavioral analysis, velocity detection). Layer 2: API key-level (Redis-backed, per-plan limits). Layer 3: Anonymous (stricter limits for unauthenticated). Graceful degradation to in-memory fallback when Redis is down.

Acceptance Criteria

#	Criterion	Verification
AC-8.1	Unauthenticated requests exceeding 300 RPM from same IP receive 429	Exceed IP limit
AC-8.2	Authenticated requests exceeding plan RPM receive 429	Exceed key limit
AC-8.3	Anonymous rate limits are stricter than authenticated limits	Compare thresholds
AC-8.4	Authenticated users are exempt from IP-level rate limiting	Verify no IP block on auth users
AC-8.5	Rate limit response includes `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset` headers	Inspect 429 response headers
AC-8.6	When Redis is unavailable, rate limiting continues via in-memory fallback — requests are never blocked due to infrastructure failure	Stop Redis, verify rate limiting works
AC-8.7	Velocity mode activates when error rate exceeds threshold (25%) and reduces limits	Trigger high error rate
AC-8.8	Velocity mode deactivates after 3 minutes of normal error rates	Wait for cooldown
AC-8.9	Rate limit configuration is viewable at `GET /user/rate-limits`	Check endpoint
AC-8.10	Per-key rate limits can be updated via `PUT /user/rate-limits/{key_id}`	Update and verify

Test Plan Refs: 8.4.1–8.4.4, 25.7–25.10

Automated Tests: ✅ Full coverage

9. Credit System

Description

Atomic unit of billing. Cost = (prompt_tokens × prompt_price) + (completion_tokens × completion_price). Subscription allowance consumed first, then purchased credits. Pre-flight checks, idempotent deductions, auto-refunds on provider errors.

Acceptance Criteria

#	Criterion	Verification
AC-9.1	Pre-flight check: user with 0 credits receives 402 before any provider call is made	POST chat with 0-credit user
AC-9.2	Idempotent deduction: same request ID sent twice results in credits deducted only once	POST twice with same `X-Request-ID`
AC-9.3	Subscription allowance is consumed before purchased credits	User with both: make request, verify subscription decreases
AC-9.4	Provider 5xx error results in automatic credit refund	Trigger 5xx, verify refund transaction
AC-9.5	Provider 4xx error (user error) does NOT trigger refund	Trigger 4xx, verify no refund
AC-9.6	High-value models (GPT-4, Claude, Gemini, o1/o3/o4) are blocked if pricing falls to default — prevents under-billing	Verify explicit pricing exists for all premium models
AC-9.7	Credit transactions are logged with unique request ID, user ID, model, token counts, and cost	Check `credit_transactions` table
AC-9.8	Balance update and transaction record happen atomically (single DB transaction)	Verify no orphaned records
AC-9.9	Admin can add/adjust/refund credits via `/credits/*` endpoints	POST `/credits/add` and verify
AC-9.10	Credit transaction history is paginated and filterable	GET `/credits/transactions?limit=10`

Test Plan Refs: 9.1.1–9.1.7, 9.2.1–9.2.4, 25.13, 25.14

Automated Tests: ✅ Full coverage

10. Model Catalog

Description

10,000+ models from 30+ providers. Background sync, multi-layer caching, HuggingFace enrichment. Models require resolvable pricing, active provider, valid modality, and deduplication.

Acceptance Criteria

#	Criterion	Verification
AC-10.1	`GET /v1/models` returns a list where every model has `id`, `name`, `provider_slug`, `context_length`, and pricing data	Inspect response
AC-10.2	No model in the catalog has null or zero pricing	Scan all models in response
AC-10.3	`GET /v1/models?provider=fireworks` returns only models from Fireworks	Filter and verify
AC-10.4	`GET /v1/models/unique` returns no duplicate model IDs	Check for uniqueness
AC-10.5	`GET /v1/models/search?q=llama` returns models matching "llama"	Verify results
AC-10.6	Model detail endpoint returns HuggingFace enrichment (downloads, likes, parameters) when available	GET `/api/models/detail?model_id=meta-llama/...`
AC-10.7	Catalog is served from cache on subsequent requests (sub-100ms response time on cache hit)	Time two consecutive requests
AC-10.8	`GET /v1/gateways` returns all registered gateways with `name`, `color`, `priority`, `site_url`	Inspect response
AC-10.9	Model health endpoint shows `healthy`, `degraded`, or `down` status per model	GET `/v1/model-health`
AC-10.10	Model availability endpoint shows fallback providers for a given model	GET `/availability/fallback/{model_id}`

Test Plan Refs: 4.1.1–4.1.12, 4.2–4.8, 25.11

Automated Tests: ✅ Full coverage

11. Plans & Trials

Description

Trial: 3 days, $5 cap, 1M tokens, 10K requests. Plans: Dev (pay-as-you-go), Team (subscription), Enterprise (custom). Expired trials can still access :free models. Purchased credits never expire.

Acceptance Criteria

#	Criterion	Verification
AC-11.1	New user gets $5 credits and trial expiring in 3 days	Register and check balance + trial_end
AC-11.2	Trial user can make requests until credits/limits exhausted	Make requests during trial
AC-11.3	Expired trial returns 402 for paid models	POST chat after trial expiry
AC-11.4	Expired trial can still access `:free` suffix models	POST chat with `:free` model after expiry
AC-11.5	`GET /plans` returns available plan tiers with pricing	Inspect response
AC-11.6	`GET /trial/status` returns `active`/`expired` and days remaining	Check response
AC-11.7	Unused subscription allowance does NOT roll over — resets monthly	Verify at month boundary
AC-11.8	Purchased credits never expire and survive plan changes	Change plan, verify credits persist

Test Plan Refs: 8.5.1–8.5.4, 25.15, 25.16

Automated Tests: ✅ Full coverage

12. Payments (Stripe)

Description

Credit package purchases, subscription management, checkout sessions, payment intents, webhooks. Stripe webhook always returns 200 (even on processing errors).

Acceptance Criteria

#	Criterion	Verification
AC-12.1	`GET /api/stripe/credit-packages` returns available packages with pricing (public, no auth)	Inspect response
AC-12.2	`POST /api/stripe/checkout-session` returns a valid Stripe checkout URL	Create session
AC-12.3	Successful payment webhook adds credits to user's balance	Simulate webhook
AC-12.4	Webhook endpoint always returns 200, even if processing fails internally	Send malformed webhook
AC-12.5	Payment history is paginated and includes amount, date, status	GET `/api/stripe/payments`
AC-12.6	Subscription checkout creates a Stripe subscription and assigns the plan	POST `/api/stripe/subscription-checkout`

Test Plan Refs: 11.1–11.8

Automated Tests: ✅ Full coverage

13. Coupons

Description

Redeemable coupon codes that add credits. Supports global and user-specific coupons, expiration, max redemptions, and one-per-user restrictions.

Acceptance Criteria

#	Criterion	Verification
AC-13.1	Valid coupon code redeems successfully and adds the correct credit amount	POST `/coupons/redeem`
AC-13.2	Expired coupon returns 400	Redeem expired code
AC-13.3	Already-redeemed coupon (by same user) returns 400	Redeem twice
AC-13.4	User-specific coupon redeemed by wrong user returns 400 or 403	Redeem with different user's key
AC-13.5	`GET /coupons/available` returns both global and user-targeted coupons	Inspect response
AC-13.6	Redemption history shows all past redemptions for the user	GET `/coupons/history`

Test Plan Refs: 10.1–10.6

Automated Tests: ✅ Full coverage

14. Referrals

Description

Users generate referral codes. Referred users sign up with the code. Both parties receive credit rewards upon conversion.

Acceptance Criteria

#	Criterion	Verification
AC-14.1	User can generate a unique referral code	POST `/referral/generate`
AC-14.2	Referral code can be validated	POST `/referral/validate`
AC-14.3	Self-referral is prevented	Attempt self-referral, verify rejection
AC-14.4	Referral stats show total referred, conversions, and rewards	GET `/referral/stats`
AC-14.5	Successful referral grants credits to both referrer and referred user	Complete referral flow

Test Plan Refs: 8.6.1–8.6.4

Automated Tests: ✅ Full coverage

15. Chat History & Sessions

Description

Persistent chat sessions with message storage, batch operations, full-text search, metadata updates, deletion, and usage stats.

Acceptance Criteria

#	Criterion	Verification
AC-15.1	Sessions can be listed, created, updated, and deleted	CRUD operations
AC-15.2	Messages can be saved individually and in batch	POST single and batch
AC-15.3	Full-text search returns matching sessions across all user's history	POST `/v1/chat/search`
AC-15.4	Duplicate messages are deduplicated (no double-saves)	Save same message twice, verify single entry
AC-15.5	Chat stats return accurate usage data	GET `/v1/chat/stats`
AC-15.6	Share links provide public read-only access to conversations	Create share, access without auth
AC-15.7	Feedback can be submitted, retrieved, updated, and deleted per session	CRUD on feedback

Test Plan Refs: 3.5.1–3.5.8, 3.6.1–3.6.6, 3.7.1–3.7.4

Automated Tests: ✅ Full coverage

16. API Key Management

Description

Users create, list, update, and delete API keys. Keys are in gw_{env}_* format. Rate limited to 10 creations per hour. Keys encrypted at rest.

Acceptance Criteria

#	Criterion	Verification
AC-16.1	Created key is in `gw_{env}_*` format (e.g., `gw_dev_abc123...`)	POST `/user/api-keys`
AC-16.2	Key creation rate-limited to 10 per hour; 11th returns 429	Create 11 keys
AC-16.3	Keys can be listed, showing all active keys	GET `/user/api-keys`
AC-16.4	Keys can be updated (name, restrictions)	PUT `/user/api-keys/{key_id}`
AC-16.5	Keys can be deleted	DELETE `/user/api-keys/{key_id}`
AC-16.6	Deleted key no longer authenticates	Use deleted key, verify 401
AC-16.7	Audit logs record key creation, usage, and deletion	GET `/user/api-keys/audit-logs`

Test Plan Refs: 8.3.1–8.3.7

Automated Tests: ✅ Full coverage

17. Health & Monitoring

Description

Tiered health monitoring (Critical/Popular/Standard/On-Demand). System, provider, model, and gateway health endpoints. Health endpoints always return 200 (degradation in body, not status code).

Acceptance Criteria

#	Criterion	Verification
AC-17.1	`GET /health` always returns 200, even when dependencies are degraded	Call when DB is down
AC-17.2	Health response includes `version`, `status`, and `timestamp`	Inspect response
AC-17.3	`GET /health/system` returns memory, CPU, and connection pool stats	Inspect response
AC-17.4	Provider health scores are 0–100 per provider	GET `/health/providers`
AC-17.5	Model health shows `healthy`, `degraded`, or `down` per model	GET `/health/models`
AC-17.6	Gateway health dashboard returns both HTML and JSON formats	GET `/health/gateways/dashboard` and `/health/gateways/dashboard/data`
AC-17.7	Uptime metrics are tracked and returned	GET `/health/uptime`
AC-17.8	Health insights provide actionable recommendations	GET `/health/insights`

Test Plan Refs: 12.1.1–12.4.4, 25.20

Automated Tests: ✅ Full coverage

18. Metrics & Observability

Description

Prometheus metrics, OpenTelemetry tracing, Sentry error tracking, Arize AI tracing, Pyroscope profiling.

Acceptance Criteria

#	Criterion	Verification
AC-18.1	`GET /metrics` returns valid Prometheus text format	Parse response
AC-18.2	Parsed metrics include p50, p95, p99 latency percentiles	GET `/api/metrics/parsed`
AC-18.3	Real-time stats update within 60 seconds of new requests	GET `/api/monitoring/stats/realtime`
AC-18.4	Error rates are tracked per provider and per model	GET `/api/monitoring/error-rates`
AC-18.5	Anomaly detection flags unusual patterns	GET `/api/monitoring/anomalies`
AC-18.6	OpenTelemetry traces are initialized and exportable	GET `/api/instrumentation/health`

Test Plan Refs: 13.1–13.12

Automated Tests: ✅ Full coverage

19. Caching System

Description

Multi-layer caching: semantic cache, exact-match response cache, external cache (Butter.dev), auth cache, catalog cache (L1/L2), DB query cache, health cache, local memory fallback. Every cache degrades gracefully.

Acceptance Criteria

#	Criterion	Verification
AC-19.1	Catalog endpoint responds in sub-100ms on cache hit	Time `GET /v1/models` on second request
AC-19.2	Auth cache reduces lookup latency from 50-150ms to 1-5ms on subsequent requests	Compare first vs. second auth timing
AC-19.3	When Redis is down, local memory cache activates — no requests are blocked	Stop Redis, verify normal operation
AC-19.4	Cache invalidation clears all layers	POST `/admin/cache/clear`, verify fresh data
AC-19.5	Cache TTLs are respected (auth: 5-10min, catalog L1: 5min, catalog L2: 15-30min)	Verify expiration behavior
AC-19.6	Stampede protection prevents multiple simultaneous cache rebuilds	Concurrent requests to cold cache

Test Plan Refs: 25.11, 25.12

Automated Tests: ✅ Full coverage

20. Image Generation

Description

POST /v1/images/generations generates images via provider routing. Credits deducted per generation.

Acceptance Criteria

#	Criterion	Verification
AC-20.1	Image generation returns 200 with image data or URL	POST `/v1/images/generations`
AC-20.2	Credits are deducted based on image generation pricing	Compare balance before/after
AC-20.3	User with 0 credits receives 402	POST with 0-credit user

Test Plan Refs: 18.1–18.3

Automated Tests: ✅ Full coverage

21. Audio Transcription

Description

POST /v1/audio/transcriptions accepts audio files (upload or base64) and returns transcription text.

Acceptance Criteria

#	Criterion	Verification
AC-21.1	File upload transcription returns 200 with text	POST with audio file
AC-21.2	Base64 audio transcription returns 200 with text	POST `/v1/audio/transcriptions/base64`
AC-21.3	Unsupported audio format returns appropriate error	POST with invalid format

Test Plan Refs: 19.1–19.2

Automated Tests: ✅ Full coverage

22. Server-Side Tools

Description

Built-in tools (web search, text-to-speech) exposed via /v1/tools. SSRF protection on tool execution.

Acceptance Criteria

#	Criterion	Verification
AC-22.1	`GET /v1/tools` returns available tools (web_search, text_to_speech)	Inspect response
AC-22.2	Tool definitions are in OpenAI function-calling format	GET `/v1/tools/definitions`
AC-22.3	Nonexistent tool returns 404	GET `/v1/tools/fake_tool`
AC-22.4	Web search execution returns search results	POST `/v1/tools/execute`
AC-22.5	SSRF protection prevents requests to internal/private IP ranges	Attempt internal URL

Test Plan Refs: 20.1–20.6

Automated Tests: ✅ Full coverage

23. IP Allowlist

Description

Admin-managed IP allowlists restrict API key usage to specific IP addresses or ranges.

Acceptance Criteria

#	Criterion	Verification
AC-23.1	Admin can create, list, update, and delete IP allowlist entries	CRUD operations
AC-23.2	IP check endpoint correctly identifies allowed vs. blocked IPs	POST `/api/admin/ip-whitelist/check`
AC-23.3	API key with IP allowlist rejects requests from non-allowed IPs	Use key from blocked IP

Test Plan Refs: 21.1–21.5

Automated Tests: ✅ Full coverage

24. Partner Trials

Description

Partner-specific trial configurations with custom credit amounts, durations, and daily limits.

Acceptance Criteria

#	Criterion	Verification
AC-24.1	Partner config is publicly accessible	GET `/partner-trials/config/{code}`
AC-24.2	Partner code validation always returns 200 (valid or invalid indicated in body)	GET `/partner-trials/check/{code}`
AC-24.3	Starting a partner trial applies partner-specific credits and limits	POST `/partner-trials/start`
AC-24.4	Partner trial daily limit is enforced	Check enforcement after exceeding

Test Plan Refs: 22.1–22.5

Automated Tests: ✅ Full coverage

25. Notifications

Description

User notification preferences, usage reports, and test notifications via email (Resend).

Acceptance Criteria

#	Criterion	Verification
AC-25.1	User can retrieve notification preferences	GET `/user/notifications/preferences`
AC-25.2	Usage report can be triggered on demand	POST `/user/notifications/send-usage-report`
AC-25.3	Test notification sends successfully	POST `/user/notifications/test`

Test Plan Refs: 23.1–23.3

Automated Tests: ⚠️ Partial (delivery verification missing)

26. Admin Operations

Description

80+ admin endpoints for user management, credit operations, system monitoring, cache management, model sync, RBAC, trial analytics, downtime tracking, and coupon management.

Acceptance Criteria

#	Criterion	Verification
AC-26.1	Non-admin users receive 403 on all admin endpoints	Use `$USER_KEY` on admin endpoint
AC-26.2	Admin can list, search, and view user details	GET `/admin/users`
AC-26.3	Admin credit grants respect per-transaction cap and 24h daily limit	Exceed limits
AC-26.4	Admin can assign plans to users	POST `/admin/assign-plan`
AC-26.5	System monitor returns user counts, credit totals, and API usage	GET `/admin/monitor`
AC-26.6	Cache status, refresh, and clear operations work	GET/POST cache endpoints
AC-26.7	Model sync can be triggered incrementally and fully	POST sync endpoints
AC-26.8	Role updates are logged in the audit trail	POST `/admin/roles/update`, check audit log
AC-26.9	Downtime incidents can be listed, viewed, and resolved	CRUD on downtime endpoints
AC-26.10	Coupon analytics show redemption rates and remaining uses	GET `/admin/coupons/{id}/analytics`

Test Plan Refs: 24.1–24.10

Automated Tests: ✅ Full coverage

27. Security

Description

Fernet encryption for API keys, HMAC-SHA256 hashing, injection prevention (SQL, XSS, command, path traversal, LDAP, header, JSON), audit logging.

Acceptance Criteria

#	Criterion	Verification
AC-27.1	API keys are Fernet-encrypted in the database — raw values are ciphertext	Query DB directly
AC-27.2	API key lookup uses HMAC hash, not brute-force decryption	Verify code path
AC-27.3	SQL injection attempts in user inputs are sanitized/rejected	Send `'; DROP TABLE users; --`
AC-27.4	XSS payloads in user inputs are sanitized/rejected	Send `<script>alert(1)</script>`
AC-27.5	Command injection attempts are blocked	Send `; rm -rf /` in input fields
AC-27.6	Path traversal attempts are blocked	Send `../../etc/passwd`
AC-27.7	Admin security violations are logged in the audit trail	Attempt unauthorized admin access, check log
AC-27.8	Error messages never expose stack traces, internal paths, or sensitive data	Trigger errors, inspect responses

Test Plan Refs: 25.17

Automated Tests: ✅ Full coverage

28. Error Monitoring

Description

Autonomous error monitoring with dashboard, recent/critical/fixable error classification, and pattern detection.

Acceptance Criteria

#	Criterion	Verification
AC-28.1	Monitor status returns current operational state	GET `/error-monitor/autonomous/status`
AC-28.2	Dashboard provides overview of error landscape	GET `/error-monitor/dashboard`
AC-28.3	Recent errors are retrievable and sorted by recency	GET `/error-monitor/errors/recent`
AC-28.4	Critical errors are flagged separately	GET `/error-monitor/errors/critical`
AC-28.5	Error patterns detect recurring issues	GET `/error-monitor/errors/patterns`

Test Plan Refs: 17.1–17.7

Automated Tests: ✅ Full coverage

29. Guardrails (Target Architecture)

Description

Input guardrails: PII detection, prompt injection defense, topic restrictions, content moderation. Output guardrails: content filtering, structured output validation, hallucination flags. This is aspirational — described in the Conceptual Model as future capability.

Acceptance Criteria

#	Criterion	Verification
AC-29.1	PII detection scans prompts for phone numbers, SSNs, emails, credit card numbers and optionally redacts or blocks	Send prompt with PII
AC-29.2	Prompt injection patterns that attempt to override system prompts are detected and blocked	Send known injection pattern
AC-29.3	Per-API-key topic restrictions limit model responses to configured domains	Configure restriction, test out-of-domain prompt
AC-29.4	Content moderation blocks harmful or policy-violating inputs before reaching providers	Send harmful content
AC-29.5	Output content filtering scans responses for policy violations before returning to customer	Trigger policy-violating response
AC-29.6	Structured output validation confirms JSON schema conformance when requested	Request JSON schema output
AC-29.7	Provider-side safety metadata (refusals, safety filter triggers) is surfaced in standardized format	Trigger safety filter on provider

Test Plan Refs: None (not yet implemented)

Automated Tests: ❌ No coverage (feature not yet implemented)

Summary Matrix

#	Feature	Criteria Count	Automated	Status
1	Authentication & Authorization	11	✅	Implemented
2	Chat — OpenAI-Compatible	13	✅	Implemented
3	Chat — Anthropic-Compatible	4	✅	Implemented
4	Model Resolution & Aliasing	5	✅	Implemented
5	Provider Failover	7	✅	Implemented
6	Circuit Breakers	8	✅	Implemented
7	Intelligent Routing	10	✅	Implemented
8	Rate Limiting (3 Layers)	10	✅	Implemented
9	Credit System	10	✅	Implemented
10	Model Catalog	10	✅	Implemented
11	Plans & Trials	8	✅	Implemented
12	Payments (Stripe)	6	✅	Implemented
13	Coupons	6	✅	Implemented
14	Referrals	5	✅	Implemented
15	Chat History & Sessions	7	✅	Implemented
16	API Key Management	7	✅	Implemented
17	Health & Monitoring	8	✅	Implemented
18	Metrics & Observability	6	✅	Implemented
19	Caching System	6	✅	Implemented
20	Image Generation	3	✅	Implemented
21	Audio Transcription	3	✅	Implemented
22	Server-Side Tools	5	✅	Implemented
23	IP Allowlist	3	✅	Implemented
24	Partner Trials	4	✅	Implemented
25	Notifications	3	⚠️	Implemented (partial test coverage)
26	Admin Operations	10	✅	Implemented
27	Security	8	✅	Implemented
28	Error Monitoring	5	✅	Implemented
29	Guardrails	7	❌	Not implemented
	TOTAL	202

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References