API Mappings

API Mappings - Gatewayz Backend

Auto-generated deep-dive documentation for all API endpoints in gatewayz-backend. Total endpoints documented: 450 Generated: 2026-03-04

Admin (46 endpoints)
Analytics (5 endpoints)
Authentication (5 endpoints)
Chat & Messaging (20 endpoints)
Circuit Breakers (4 endpoints)
Code Router (5 endpoints)
Coupons (3 endpoints)
Credits (6 endpoints)
Diagnostics (2 endpoints)
Error Monitoring (12 endpoints)
General Router (4 endpoints)
Health & Monitoring (30 endpoints)
Metrics & Observability (6 endpoints)
Models & Catalog (23 endpoints)
Other (19 endpoints)
Status (2 endpoints)
Users (8 endpoints)

Admin

46 endpoints

Issue: #1600

Deep-Dive API Documentation: POST /admin/add_credits

Section 1: High-Level Overview

The POST /admin/add_credits endpoint allows authenticated admin users to add credits to any user's account, identified by their API key. It enforces two safety limits: a per-transaction cap (ADMIN_MAX_CREDIT_GRANT env var) and a 24-hour rolling daily limit (ADMIN_DAILY_GRANT_LIMIT env var). On success it writes to the users table (purchased_credits), logs the transaction in credit_transactions, and invalidates the user's in-memory cache entry.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency which chains: get_current_user -> get_api_key -> validate_api_key_security -> get_user -> validate_trial_expiration -> check user.role == "admin" OR user.is_admin == True.

Admin Auth Chain:

get_api_key(): Bearer token extraction, validate_api_key_security, audit log
get_current_user(): get_user with 5-min cache, validate_trial_expiration
require_admin(): checks user.get("is_admin", False) OR user.get("role") == "admin"; if not: audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS"); raises 403

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling for /api/admin paths) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema (AddCreditsRequest, src/schemas/payments.py):

api_key: str (required) - Target user's API key
credits: float (required) - Amount to add (must be positive, validated by add_credits_to_user)
reason: str (required, min 10 chars enforced by Pydantic schema) - Reason for grant

Response Schema:

status: "success"
message: str - "Added {credits} credits to user {username}"
new_balance: float - User's balance after the credit addition
user_id: int
reason: str

Safety Controls:

Per-transaction cap: req.credits > Config.ADMIN_MAX_CREDIT_GRANT -> 400
Daily rolling limit: asyncio.to_thread(get_admin_daily_grant_total, admin_id) checks credit_transactions sum for past 24 hours; if daily_total + req.credits > Config.ADMIN_DAILY_GRANT_LIMIT -> 400

Error Codes:

400: credits > ADMIN_MAX_CREDIT_GRANT; OR daily_total + credits > ADMIN_DAILY_GRANT_LIMIT; OR ValueError from add_credits_to_user (credits <= 0, user not found)
401: Invalid/missing auth
403: Not admin role
404: Target user not found (get_user returns None)
500: Unexpected database error

2.2 Flow Diagram

Request -> require_admin dep (full auth chain) -> extract admin_id, max_single_grant, daily_limit -> check req.credits > max_single_grant -> 400 if exceeded -> asyncio.to_thread(get_admin_daily_grant_total, admin_id) -> check daily_total + req.credits > daily_limit -> 400 if exceeded -> asyncio.to_thread(get_user, req.api_key) -> 404 if not found -> description = req.reason -> asyncio.to_thread(add_credits_to_user, user_id, credits, "admin_credit", description, metadata={reason, admin_user_id, admin_username}, created_by="admin:{admin_id}") -> asyncio.to_thread(get_user, req.api_key) again to get updated balance -> log action -> return response

2.3 Complete Dependency Map

Component	Location	Details
admin_add_credits() handler	src/routes/admin.py:105	Route handler
require_admin dependency	src/security/deps.py:220	Checks role="admin" or is_admin=True; logs violation if not admin
Config.ADMIN_MAX_CREDIT_GRANT	src/config/config.py	Max credits per single grant (env: ADMIN_MAX_CREDIT_GRANT)
Config.ADMIN_DAILY_GRANT_LIMIT	src/config/config.py	Max credits per admin per 24 hours (env: ADMIN_DAILY_GRANT_LIMIT)
get_admin_daily_grant_total()	src/db/credit_transactions.py	SELECT SUM(amount) FROM credit_transactions WHERE created_by LIKE 'admin:{admin_id}' AND created_at >= now()-24h AND transaction_type='admin_credit'
get_user()	src/db/users.py:407	With 5-min in-memory cache; looks up by api_key
add_credits_to_user()	src/db/users.py:505	Fetches current balances, updates purchased_credits, logs transaction
Supabase SELECT users	src/db/users.py:537	SELECT subscription_allowance, purchased_credits FROM users WHERE id=user_id
Supabase UPDATE users	src/db/users.py:582	UPDATE users SET purchased_credits=purchased_after, updated_at=now WHERE id=user_id
log_credit_transaction()	src/db/credit_transactions.py:68	INSERT INTO credit_transactions {user_id, amount, transaction_type="admin_credit", description, balance_before, balance_after, metadata, created_by="admin:{admin_id}"}
invalidate_user_cache_by_id()	src/db/users.py:48	Scans _user_cache dict and removes entries matching user_id
AddCreditsRequest schema	src/schemas/payments.py	api_key:str, credits:float, reason:str (min 10 chars)

Supabase Operations:

SELECT SUM from credit_transactions (daily grant total check)
SELECT from api_keys_new + users (get_user lookup for target user)
SELECT subscription_allowance, purchased_credits from users (get current balance)
UPDATE users SET purchased_credits=new_value, updated_at=now (credit addition)
INSERT into credit_transactions (transaction log with metadata including reason, admin_user_id, admin_username)
SELECT from api_keys_new + users (get_user again for updated balance in response)

2.4 Side Effects

DB READ: get_admin_daily_grant_total reads credit_transactions
DB READ: get_user reads api_keys_new and users tables (x2 — before and after credit addition)
DB WRITE: UPDATE users.purchased_credits and users.updated_at
DB WRITE: INSERT into credit_transactions with full audit trail (amount, balances, reason, admin identity)
Cache INVALIDATION: invalidate_user_cache_by_id() removes all _user_cache entries for the target user (in-process only, not Redis)
Logging: logger.info with admin username, credits added, target username, reason
No email notifications (only registration triggers welcome email)
Audit log: audit_logger.log_api_key_usage() during auth chain; audit_logger.log_security_violation() if non-admin attempts
ObservabilityMiddleware: Records http_requests_total{method="POST", endpoint="/admin/add_credits"} post-response
Sentry sampling: 50% (admin endpoint)

Issue: #1601

API Documentation: GET /admin/balance

Section 1: High-Level Overview

The GET /admin/balance endpoint returns the credit balances and account timestamps for every user in the system. It is an admin-only endpoint consumed by internal tooling and the admin dashboard to get a snapshot of all user balances for financial reporting and debugging. Because it fetches every user record without pagination, it is intended for relatively small datasets and internal use only.

Section 2: Low-Level Detailed Documentation

2.1 Requirements & Pipeline

Aspect	Detail
HTTP Method	GET
Path	`/admin/balance`
Authentication	`require_admin` dependency — valid Bearer API key with `role=admin`
Rate Limiting	SecurityMiddleware IP controls; 50% Sentry sampling for admin endpoints
Request Schema	No body or query parameters
Response Schema	`{ status, total_users, users: [{ api_key, credits, created_at, updated_at }] }`
Error Codes	401 (not authenticated), 403 (not admin), 500 (internal)
Tags	`admin`

Auth chain: require_admin → get_current_user → get_api_key → validate_api_key_security → get_user → validate_trial_expiration → role check

Request lifecycle:

require_admin validates caller's Bearer token and confirms role=admin.
get_all_users() called via asyncio.to_thread — fetches all users from Supabase users table.
Each user's api_key, credits, created_at, updated_at extracted into response list.
Returns { status, total_users, users }.

2.2 Mermaid Diagram

sequenceDiagram
    participant Admin
    participant SEC as SecurityMiddleware
    participant AUTH as require_admin
    participant ROUTE as admin_get_all_balances()
    participant THREAD as asyncio.to_thread
    participant DB as db/users.py get_all_users()
    participant SB as Supabase (users table)

    Admin->>SEC: GET /admin/balance
    SEC->>AUTH: pass (IP OK)
    AUTH->>AUTH: validate API key + role=admin check
    AUTH->>ROUTE: admin_user dict
    ROUTE->>THREAD: get_all_users()
    THREAD->>DB: get_all_users()
    DB->>SB: SELECT api_key, credits, created_at, updated_at FROM users
    SB-->>DB: all user rows
    DB-->>THREAD: list of user dicts
    THREAD-->>ROUTE: users list
    ROUTE-->>Admin: 200 { status, total_users, users: [{ api_key, credits, ... }] }

2.3 Complete Dependency Map

Layer	Name	Purpose
Route file	`src/routes/admin.py`	Route definition and handler
DB module	`src/db/users.py` — `get_all_users()`	Fetches all user records from Supabase
Security dep	`src/security/deps.py` — `require_admin`	Admin role enforcement
Security	`src/security/security.py` — `audit_logger`	Audit logging on key usage and violations
Config	`src/config/config.py`	`IS_DEVELOPMENT`, Sentry, etc.
Middleware	`src/middleware/security_middleware.py`	IP-level rate limiting
Middleware	`src/middleware/observability_middleware.py`	Prometheus metrics
Stdlib	`asyncio.to_thread`	Offloads synchronous DB call to threadpool
Database	Supabase `users` table	Source of all user balance data

2.4 Side Effects

Side Effect	Detail
No DB writes	Read-only query
Audit log	`audit_logger.log_api_key_usage()` records admin endpoint access
No cache	No cache reads or writes; always fetches fresh from Supabase
No notifications	No emails or events emitted
Scale concern	Returns all users in a single response — may be slow or memory-intensive at large scale

Issue: #1602

Deep-Dive API Documentation: GET /admin/monitor

Section 1: High-Level Overview

The GET /admin/monitor endpoint returns a comprehensive system monitoring snapshot for admins, including user counts, credit totals, API usage metrics for today and the past 30 days, and per-user activity breakdowns. It executes multiple parallel Supabase queries against the users, activity_log, usage_records, and api_keys_new tables, merges results from both modern (activity_log) and legacy (usage_records) data sources with deduplication, and returns an aggregated monitoring payload with timestamp.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency: get_current_user -> get_api_key -> validate_api_key_security -> get_user (5-min cache) -> validate_trial_expiration -> check role == "admin".

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

status: "success" | (with "warning" if data has errors)
timestamp: ISO datetime string
data: object from get_admin_monitor_data() containing:
- total_users: int (exact server-side COUNT(*))
- total_credits: float (sum of all user credits)
- api_calls_today: int (count from activity_log last 24h + deduped usage_records)
- api_calls_month: int (count from activity_log last 30 days + deduped usage_records)
- tokens_today: int
- tokens_month: int
- revenue_today: float
- revenue_month: float
- users_today: int (new users in last 24h)
- recent_usage: list of recent activity records
- user_metrics: dict keyed by api_key with per-user stats
If data contains "error" key: response includes "warning" field

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: get_admin_monitor_data returns None or falsy; or exception

2.2 Flow Diagram

Request -> require_admin dep -> asyncio.to_thread(get_admin_monitor_data) -> multiple Supabase queries in sequence -> aggregate and merge activity_log + usage_records -> return data dict -> handler checks "error" in data -> if error: return with warning field -> else: return status="success" with data -> HTTPException 500 if monitor_data is None/falsy or on exception

2.3 Complete Dependency Map

Component	Location	Details
admin_monitor() handler	src/routes/admin.py:205	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
asyncio.to_thread()	stdlib	Runs blocking get_admin_monitor_data() in thread pool
get_admin_monitor_data()	src/db/users.py:1481	Orchestrates all DB queries
get_supabase_client()	src/config/supabase_config.py	Supabase client
Query 1	users table	SELECT id WITH count="exact" -> server-side COUNT(*) for total_users
Query 2	users table	SELECT id, credits, api_key LIMIT 10000 -> for credit totals and user mapping
Query 3	activity_log table	SELECT id WITH count="exact" -> total_activity_count
Query 4	activity_log table	SELECT * WHERE timestamp >= now-24h ORDER BY timestamp DESC LIMIT 10000 -> today's logs
Query 5	activity_log table	SELECT * WHERE timestamp >= now-30days ORDER BY timestamp DESC LIMIT 50000 -> month's logs
Query 6	usage_records table	SELECT * WHERE timestamp >= now-24h LIMIT 10000 -> legacy today
Query 7	usage_records table	SELECT * WHERE timestamp >= now-30days LIMIT 50000 -> legacy month
Query 8	api_keys_new table	SELECT user_id, api_key, is_primary LIMIT 10000 -> user_id<->api_key mapping
make_composite_key()	src/db/users.py:1669	Creates dedup key: "{user_id}
sanitize_for_logging()	src/utils/security_validators.py	Used throughout for safe logging

All Supabase Queries in get_admin_monitor_data():

users.select("id", count="exact").execute() — server COUNT(*)
users.select("id, credits, api_key").limit(10000).execute() — credit data
activity_log.select("id", count="exact").execute() — total count
activity_log.select("*").gte("timestamp", day_ago_iso).order("timestamp", desc=True).limit(10000).execute() — today
activity_log.select("*").gte("timestamp", month_ago_iso).order("timestamp", desc=True).limit(50000).execute() — month
usage_records.select("*").gte("timestamp", day_ago_iso).limit(10000).execute() — legacy today
usage_records.select("*").gte("timestamp", month_ago_iso).limit(50000).execute() — legacy month
api_keys_new.select("user_id, api_key, is_primary").limit(10000).execute() — key mapping

Error Handling: Each query is wrapped in its own try/except; on failure, empty lists/zeros are used and processing continues.

2.4 Side Effects

DB READ x8: Multiple SELECT queries across users, activity_log, usage_records, api_keys_new tables
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no cache invalidations, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/monitor"} post-response
Sentry: 50% sampling rate for admin endpoints
Performance note: This endpoint issues up to 8 sequential Supabase queries and can take 500ms-2s depending on data volume

Issue: #1603

Deep-Dive API Documentation: GET /admin/cache-status

Section 1: High-Level Overview

The GET /admin/cache-status endpoint returns metadata about the in-process provider model cache, including whether it has data, how old the cache is in seconds, its configured TTL, whether it is currently valid, and how many providers are cached. This is a diagnostic endpoint for monitoring the health of the provider catalog cache.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency: get_current_user -> get_api_key -> validate_api_key_security -> get_user (5-min cache) -> validate_trial_expiration -> check role == "admin".

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

status: "success"
cache_info: object containing:
- has_data: bool - whether provider_cache["data"] is not None
- cache_age_seconds: float|None - seconds since cache was populated (None if no timestamp)
- ttl_seconds: int - configured TTL (default 1800 seconds / 30 minutes)
- is_valid: bool - cache_age_seconds is not None AND cache_age_seconds < ttl_seconds
- total_cached_providers: int - len(provider_cache["data"]) or 0
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Exception in get_provider_cache_metadata() or cache age calculation

2.2 Flow Diagram

Request -> require_admin dep -> get_provider_cache_metadata() (reads in-process _provider_cache dict) -> if timestamp exists: compute cache_age = (now - timestamp).total_seconds() else: cache_age = None -> build response with has_data, cache_age_seconds, ttl_seconds, is_valid, total_cached_providers -> return -> Exception -> log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
admin_cache_status() handler	src/routes/admin.py:290	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_provider_cache_metadata()	src/services/model_catalog_cache.py	Returns the _provider_cache dict metadata
_provider_cache	src/main.py:103 OR src/services/model_catalog_cache.py	In-process dict: {"data": None, "timestamp": None, "ttl": 1800}
datetime.now(UTC)	stdlib	Used for cache age calculation: (now - provider_cache["timestamp"]).total_seconds()

Cache Structure (provider_cache):

data: list of provider dicts or None (populated by get_cached_providers())
timestamp: datetime object or None (set when data was last fetched)
ttl: int (seconds, default 1800 = 30 minutes)

is_valid calculation: cache_age is not None AND cache_age < provider_cache.get("ttl", 1800)

Note on cache source: The _provider_cache is an in-process module-level dict in src/services/model_catalog_cache.py (or the legacy dict in src/main.py). It is populated when get_cached_providers() is called and a cache miss occurs. It is NOT stored in Redis — it is purely in-memory, meaning each process instance has its own cache.

2.4 Side Effects

In-process memory READ: Reads _provider_cache dict — no I/O, extremely fast (< 1ms)
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB reads or writes, no Redis operations, no cache invalidations, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/cache-status"} post-response
Sentry: 50% sampling rate for admin endpoints

Issue: #1604

Deep-Dive API Documentation: GET /admin/huggingface-cache-status

Section 1: High-Level Overview

The GET /admin/huggingface-cache-status endpoint returns metadata about the in-process HuggingFace model cache, including age, validity, total count of cached models, and the list of all cached model IDs. It is a diagnostic endpoint for monitoring the HuggingFace catalog cache state without triggering a cache refresh.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency (same chain as other admin endpoints).

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

huggingface_cache: object containing:
- age_seconds: float|None - seconds since cache was populated
- is_valid: bool - age_seconds is not None AND age_seconds < cache TTL (default 1800s)
- total_cached_models: int - len of hf_data list
- cached_model_ids: list[str] - list of model["id"] values for all cached model dicts that have an "id" key
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Exception in get_gateway_cache_metadata() or processing

2.2 Flow Diagram

Request -> require_admin dep -> get_gateway_cache_metadata("huggingface") (reads in-process gateway cache for "huggingface" key) -> if timestamp: compute cache_age = (now - timestamp).total_seconds() else: cache_age = None -> hf_data = hf_cache.get("data") or [] -> cached_ids = [m["id"] for m in hf_data if isinstance(m, dict) and m.get("id")] -> build and return response -> Exception -> log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
admin_huggingface_cache_status() handler	src/routes/admin.py:317	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_gateway_cache_metadata("huggingface")	src/services/model_catalog_cache.py	Returns metadata dict for the "huggingface" gateway cache entry
Gateway cache structure	src/services/model_catalog_cache.py	In-process dict keyed by gateway name: {"data": list or None, "timestamp": datetime or None, "ttl": int}
datetime.now(UTC)	stdlib	Used to compute cache_age
hf_data filtering	src/routes/admin.py:327	List comprehension: [model.get("id") for model in hf_data if isinstance(model, dict) and model.get("id")]

get_gateway_cache_metadata("huggingface"): Returns dict with keys:

data: list of HuggingFace model dicts (each with "id" key at minimum) or None
timestamp: datetime when data was last fetched, or None
ttl: int seconds (default 1800)

In-process cache: The gateway cache is module-level in src/services/model_catalog_cache.py. Each process instance has an independent cache — no Redis involved for this status check. The cache is populated when get_cached_models() fetches HuggingFace data.

cached_model_ids extraction: Only models that are dicts AND have a truthy "id" field are included. Malformed entries (None, non-dict, or missing "id") are silently skipped.

2.4 Side Effects

In-process memory READ: Reads gateway cache dict — no I/O, extremely fast
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB reads or writes, no Redis operations, no cache invalidations, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/huggingface-cache-status"} post-response
Sentry: 50% sampling rate for admin endpoints
Note: cached_model_ids can be very large (1000+ items) for a warm HuggingFace cache; response payload may be substantial

Issue: #1605

Deep-Dive API Documentation: GET /admin/debug-models

Section 1: High-Level Overview

The GET /admin/debug-models endpoint is a diagnostic tool for administrators to inspect the state of the model and provider caches. It retrieves the first 3 models and 3 providers from the in-process caches, tests provider-slug matching for the first 2 models, and returns cache metadata (timestamps, ages) for both the OpenRouter gateway cache (used as the main models cache proxy) and the provider cache. This is used to debug model catalog and provider resolution issues.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

models_cache: object containing:
- total_models: int - total count of all cached models
- sample_models: list - first 3 model dicts from cache
- cache_timestamp: datetime|None
- cache_age_seconds: float|None
providers_cache: object containing:
- total_providers: int
- sample_providers: list - first 3 provider dicts from cache
- cache_timestamp: datetime|None
- cache_age_seconds: float|None
provider_matching_test: list of objects (up to 2) containing:
- model_id: str
- provider_slug: str|None (part before "/" in model_id)
- found_provider: bool
- provider_site_url: str|None
- provider_data: dict|None - full matching provider dict
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Exception (raises HTTPException with detail message)

2.2 Flow Diagram

Request -> require_admin dep -> asyncio.to_thread(get_cached_models) -> asyncio.to_thread(get_cached_providers) -> sample_models = models[:3], sample_providers = providers[:3] -> provider matching test: for each of first 2 models, extract provider_slug = model_id.split("/")[0], linear scan providers list for slug match -> get_gateway_cache_metadata("openrouter") -> get_provider_cache_metadata() -> compute cache ages -> build and return response -> Exception -> log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
admin_debug_models() handler	src/routes/admin.py:418	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
asyncio.to_thread()	stdlib	Runs blocking get_cached_models() and get_cached_providers() in thread pool
get_cached_models()	src/services/models.py	Returns list of all models from in-process cache or fetches from DB/API if stale
get_cached_providers()	src/services/providers.py	Returns list of all providers from in-process cache or fetches from DB if stale
get_gateway_cache_metadata("openrouter")	src/services/model_catalog_cache.py	Returns metadata for the "openrouter" gateway cache entry (used as proxy for models cache)
get_provider_cache_metadata()	src/services/model_catalog_cache.py	Returns provider cache metadata dict
Provider slug matching	src/routes/admin.py:432-453	Linear O(n) scan: for provider in providers: if provider.get("slug") == provider_slug: break

Provider Matching Logic:

provider_slug = model_id.split("/")[0] if "/" in model_id else None
Linear scan of all providers looking for provider.get("slug") == provider_slug
First match wins (break)
O(n) complexity where n = total provider count

Cache metadata sources:

Models: get_gateway_cache_metadata("openrouter") — treats OpenRouter cache as proxy for main model list
Providers: get_provider_cache_metadata() — dedicated provider cache

2.4 Side Effects

In-process cache READ: get_cached_models() and get_cached_providers() may trigger DB/API fetches if caches are stale
Potential DB READ: If model or provider cache is stale, underlying fetch functions may query the database
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No direct DB writes, no Redis operations (unless cache fetch triggers them), no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/debug-models"} post-response
Sentry: 50% sampling rate for admin endpoints
Note: sample_models and provider_data in provider_matching_test include full raw model/provider dicts which may contain sensitive configuration data

Issue: #1606

Deep-Dive API Documentation: GET /admin/test-huggingface/{hugging_face_id}

Section 1: High-Level Overview

The GET /admin/test-huggingface/{hugging_face_id} endpoint is a diagnostic tool that fetches raw model data from the HuggingFace API for a specific model ID, caches the result in Redis with a 1-hour TTL, and returns the raw API response alongside extracted author data. It is used to debug HuggingFace API connectivity and validate the data structure returned for specific models.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Path Parameters:

hugging_face_id: str (default: "openai/gpt-oss-120b") - The HuggingFace model ID in "author/model-name" format

Response Schema (on success):

hugging_face_id: str
raw_response: dict - Complete raw JSON from HuggingFace API
author_data_extracted: object containing:
- has_author_data: bool - whether hf_data["author_data"] exists and is truthy
- author_data: dict|None - raw author_data from HuggingFace response
- author: str|None - hf_data.get("author")
- extracted_author_data: object containing:
  - name: str|None - from author_data["name"]
  - fullname: str|None - from author_data["fullname"]
  - avatar_url: str|None - from author_data["avatarUrl"]
  - follower_count: int - from author_data["followerCount"] (default 0)
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
404: HuggingFace model not found (fetch_huggingface_model returns None) OR HuggingFace API returns 404
500: Network error, timeout, or unexpected exception

2.2 Flow Diagram

Request -> require_admin dep -> fetch_huggingface_model(hugging_face_id) -> httpx.get("https://huggingface.co/api/models/{id}", timeout=10.0) -> response.raise_for_status() -> parse JSON -> try: get_redis_manager().set_json("huggingface:model:{id}", model_data, ttl=3600) (warning logged if fails) -> return model_data -> if 404: log warning, return None -> if other HTTP error: log error, return None -> if fetch returns None: raise HTTPException 404 -> build response with raw_response and extracted author_data -> return -> HTTPException 404 re-raised -> Exception -> log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
admin_test_huggingface() handler	src/routes/admin.py:364	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
fetch_huggingface_model()	src/services/models.py:2557	Synchronous blocking HTTP fetch from HuggingFace API
httpx.get()	httpx library	Synchronous GET to https://huggingface.co/api/models/{hugging_face_id} with 10.0s timeout
response.raise_for_status()	httpx	Raises HTTPStatusError for 4xx/5xx responses
get_redis_manager()	src/config/redis_config.py	Returns Redis manager instance
redis_manager.set_json()	src/config/redis_config.py	Redis SET with JSON serialization; key="huggingface:model:{hugging_face_id}", TTL=3600s (1 hour)
Author data extraction	src/routes/admin.py:379-407	Direct dict access on raw HuggingFace response for author, author_data, author_data.name/fullname/avatarUrl/followerCount

External API Call:

URL: https://huggingface.co/api/models/{hugging_face_id}
Method: GET (synchronous via httpx.get)
Timeout: 10.0 seconds
No authentication (public HuggingFace API)
Response: JSON dict with HuggingFace model metadata

Redis Operation:

Key pattern: huggingface:model:{hugging_face_id} (e.g., "huggingface:model:openai/gpt-oss-120b")
Operation: SET with JSON serialization
TTL: 3600 seconds (1 hour)
Failure handling: try/except with warning log — cache miss does not fail the request

Note: fetch_huggingface_model runs synchronously (blocking). When called from the async handler it blocks the event loop. The handler does NOT wrap it in asyncio.to_thread(), which is a potential performance issue for slow HuggingFace responses.

2.4 Side Effects

External HTTP GET: Calls https://huggingface.co/api/models/{hugging_face_id} — synchronous, blocks event loop
Redis WRITE: SET huggingface:model:{hugging_face_id} with 3600s TTL (best-effort, failure is non-fatal)
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB reads or writes, no in-process cache changes, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/test-huggingface/{hugging_face_id}"} post-response
Sentry: 50% sampling rate for admin endpoints
raw_response: The full HuggingFace API response is returned verbatim — may be large for models with many metadata fields

Issue: #1607

Deep-Dive API Documentation: GET /admin/trial/analytics

Section 1: High-Level Overview

The GET /admin/trial/analytics endpoint returns aggregated trial analytics including conversion rates, usage statistics, and trial status breakdowns. It first checks Redis for a cached result (TTL 300 seconds / 5 minutes); on a cache miss it paginated-fetches all api_keys_new records to collect trial data, computes analytics in Python, caches the result in Redis, and returns the analytics. This endpoint is designed for monitoring trial user behavior and conversion metrics.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

success: bool - always True on success
analytics: object containing:
- total_trials: int - count of api_keys_new rows where is_trial=True
- active_trials: int - trial keys where trial_end_date > now
- expired_trials: int - trial keys where trial_end_date <= now or missing
- converted_trials: int - trial keys where trial_converted=True
- conversion_rate: float (rounded 2 decimal places) - converted_trials/total_trials*100
- usage_statistics: object with total_tokens_used, total_requests_used, total_credits_used, total_credits_allocated, credits_utilization_rate
- average_usage_per_trial: object with tokens, requests, credits (all rounded 2 decimal places)
- trial_status_breakdown: object with active, expired, converted, pending_conversion counts
- OR on error: {"error": str}

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Exception from get_trial_analytics() (raises HTTPException)

2.2 Flow Diagram

Request -> require_admin dep -> get_trial_analytics() called -> try Redis GET "trial:analytics:summary" -> if cached: json.loads() and return immediately -> else: get_supabase_client() -> paginated loop: SELECT is_trial, trial_converted, trial_start_date, trial_end_date, trial_used_tokens, trial_used_requests, trial_used_credits, trial_credits, subscription_status FROM api_keys_new RANGE 0-999, then 1000-1999, etc. until < page_size rows -> filter trial_keys = [k for k in all if k.get("is_trial")] -> compute analytics in Python -> try Redis SET "trial:analytics:summary" json_data TTL=300 (warning if fails) -> return analytics_data -> handler returns {"success": True, "analytics": analytics}

2.3 Complete Dependency Map

Component	Location	Details
get_trial_analytics_admin() handler	src/routes/admin.py:520	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_trial_analytics()	src/db/trials.py:196	Core analytics function
CACHE_KEY	src/db/trials.py:198	"trial:analytics:summary"
CACHE_TTL	src/db/trials.py:199	300 seconds (5 minutes)
get_redis_config()	src/config/redis_config.py	Returns Redis configuration/client wrapper
redis_config.get_cache(CACHE_KEY)	Redis	GET "trial:analytics:summary" -> returns bytes or None
json.loads(cached_data)	stdlib	Deserializes cached analytics
get_supabase_client()	src/config/supabase_config.py	Supabase client
Paginated SELECT loop	api_keys_new table	SELECT is_trial, trial_converted, trial_start_date, trial_end_date, trial_used_tokens, trial_used_requests, trial_used_credits, trial_credits, subscription_status RANGE(offset, offset+999) until < 1000 rows
trial_keys filter	src/db/trials.py:247	[k for k in all_trial_stats if k.get("is_trial", False)]
date parsing	src/db/trials.py:256-277	datetime.fromisoformat() with Z->+00:00 replacement; naive datetimes assumed UTC
tag_wrapper	src/services/pyroscope_config.py	Pyroscope profiling tags for cache operations
redis_config.set_cache()	Redis	SET "trial:analytics:summary" json_str EX 300

Redis Operations:

GET "trial:analytics:summary" — check for cached result
SET "trial:analytics:summary" EX 300 — cache computed result for 5 minutes (best-effort)

Supabase Query:

Table: api_keys_new
Operation: SELECT (paginated)
Columns: is_trial, trial_converted, trial_start_date, trial_end_date, trial_used_tokens, trial_used_requests, trial_used_credits, trial_credits, subscription_status
No filters (fetches ALL rows across all pages)
Page size: 1000 rows per page
Continues until page returns < 1000 rows

Python Aggregation (in-memory after fetch):

Filter: is_trial == True
Count active vs expired by comparing trial_end_date to now (UTC)
Count conversions: trial_converted == True
Sum tokens, requests, credits from trial_keys

2.4 Side Effects

Redis READ: GET "trial:analytics:summary" on every request
Redis WRITE: SET "trial:analytics:summary" EX 300 on cache miss (best-effort)
DB READ: Paginated SELECT from api_keys_new (all rows, no filter) on cache miss
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
Pyroscope tagging: tag_wrapper adds profiling context for cache read and write operations
No DB writes, no notifications, no in-process cache changes
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/trial/analytics"} post-response
Sentry: 50% sampling rate for admin endpoints
Performance: On cache miss, fetches ALL api_keys_new rows in pages — can be expensive at scale

Issue: #1608

Deep-Dive API Documentation: GET /admin/users/growth

Section 1: High-Level Overview

The GET /admin/users/growth endpoint returns daily cumulative user registration counts over a specified time period, designed to power user growth charts in admin dashboards. It queries the users table for created_at timestamps in the date range, groups registrations by day, adds a pre-period baseline count, and computes a growth rate percentage. It falls back to the registration_date column if created_at fails, and returns empty data arrays rather than errors on query failure.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Query Parameters:

days: int, ge=1, le=365, default=30 - Number of days to analyze

Response Schema:

status: "success"
days: int - days parameter used
start_date: str (ISO date YYYY-MM-DD)
end_date: str (ISO date YYYY-MM-DD)
data: list of objects {date: str YYYY-MM-DD, value: int (cumulative), new_users: int (daily)}
total: int - cumulative total at end of period (includes pre-period users)
growth_rate: float (rounded 2 decimal places) - percentage growth from first to last day
timestamp: ISO datetime string

On both query failures returns same schema with data=[], total=0, growth_rate=0.

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Unexpected exception (raises HTTPException with "Failed to get user growth data")

2.2 Flow Diagram

Request -> require_admin dep -> compute end_date = today, start_date = today - (days-1) days -> try: query users.created_at in range -> if fails: fallback to registration_date query -> if fallback fails: return empty data -> initialize daily_data dict {date_str: 0} for each day in range -> count registrations per day from query results -> try: query COUNT() for users created before start_date (baseline) -> cumulative_total = baseline_count -> iterate sorted days: cumulative_total += new_users_today, append {date, value, new_users} -> compute growth_rate = (end-start)/start100 if len>=2 -> return response -> Exception -> log with traceback -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
get_user_growth() handler	src/routes/admin.py:531	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_supabase_client()	src/config/supabase_config.py	Supabase client (imported inline)
Primary query	users table	SELECT created_at FROM users WHERE created_at >= start_date AND created_at <= end_date ORDER BY created_at ASC
Fallback query	users table	SELECT registration_date FROM users WHERE registration_date >= start_date AND registration_date <= end_date ORDER BY registration_date ASC
Baseline count query	users table	SELECT id count="exact" FROM users WHERE created_at < start_date
Date parsing	src/routes/admin.py:626-643	datetime.fromisoformat() with Z->+00:00 replacement; invalid dates logged and skipped
Growth rate calculation	src/routes/admin.py:676-681	(end_value - start_value) / start_value * 100 if start_value > 0 else 0

Supabase Queries:

Primary: users.select("created_at").gte("created_at", start_date.isoformat()).lte("created_at", end_date.isoformat()).order("created_at", desc=False).execute()
Fallback: users.select("registration_date").gte("registration_date", start_date.isoformat()).lte("registration_date", end_date.isoformat()).order("registration_date", desc=False).execute() (maps registration_date to created_at)
Baseline: users.select("id", count="exact").lt("created_at", start_date.isoformat()).execute() — uses count_result.count for server-side COUNT(*)

Daily aggregation algorithm:

Initialize dict: {each_date_in_range: 0}
For each user registration: increment daily_data[date_key]
cumulative_total starts at baseline count (users before start_date)
Iterate sorted dates: cumulative_total += daily_count, append to result

Growth rate: Compares cumulative_data[0]["value"] (day 1 total) to cumulative_data[-1]["value"] (final day total). Returns 0 if start_value is 0 or fewer than 2 data points.

2.4 Side Effects

DB READ x2-3: Primary query, optionally fallback query, baseline COUNT query (all on users table)
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no cache invalidations, no notifications
Error logging: traceback.format_exc() on unexpected exceptions
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/growth"} post-response
Sentry: 50% sampling rate for admin endpoints
Note: baseline COUNT query may be slow on large user tables without an index on created_at; primary and fallback queries return all matching rows (no limit) — can be large for high-growth periods

Issue: #1609

Deep-Dive API Documentation: GET /admin/users/count

Section 1: High-Level Overview

The GET /admin/users/count endpoint is an ultra-fast endpoint that returns only the total count of all users in the database using a server-side COUNT(*) query (via Supabase count="exact"). It is designed for dashboard counters that need only a number, not user data, with typical response times of 5-20ms.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

count: int - Total number of users (0 if query fails)
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Exception from Supabase query (raises HTTPException with "Failed to get users count")

2.2 Flow Diagram

Request -> require_admin dep -> get_supabase_client() -> users.select("id", count="exact").execute() -> total_count = count_result.count if count_result.count is not None else 0 -> return {"count": total_count, "timestamp": now} -> Exception -> log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
get_users_count() handler	src/routes/admin.py:702	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_supabase_client()	src/config/supabase_config.py	Imported inline in handler
Supabase query	users table	SELECT id FROM users with count="exact" -> server-side COUNT(*), returns count attribute
datetime.now(UTC).isoformat()	stdlib	Timestamp for response

Supabase Query Details:

Table: users
Operation: SELECT with count="exact"
Columns: id (minimal column to minimize data transfer, only count is used)
No filters — counts ALL users
count="exact" triggers a server-side COUNT(*) in PostgreSQL, not row fetching
Returns: count_result.count attribute (integer or None)
Fallback: 0 if count_result.count is None

Why count="exact": This is the PostgREST way to get accurate PostgreSQL COUNT(*). Without it, Supabase returns at most 1000 rows by default. With count="exact", only the count is returned (no row data), making this extremely lightweight.

2.4 Side Effects

DB READ: Single COUNT(*) query on users table — extremely lightweight, no row data fetched
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no cache invalidations, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/count"} post-response
Sentry: 50% sampling rate for admin endpoints

Issue: #1610

Deep-Dive API Documentation: GET /admin/users/stats

Section 1: High-Level Overview

The GET /admin/users/stats endpoint returns aggregated user statistics without returning user data. It runs up to 5 separate Supabase queries (count, roles, active/inactive status, credits, subscription breakdown) with optional filters for email, API key, and is_active status. It is designed for dashboard stats cards that need counts and aggregates, not user records — approximately 10-50ms vs 500ms+ for the full /admin/users list.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Query Parameters:

email: str|None - Case-insensitive partial match (ilike %email%)
api_key: str|None - Case-insensitive partial match on api_keys_new.api_key (ilike %api_key%) — triggers JOIN to api_keys_new
is_active: bool|None - Filter by users.is_active column

Response Schema:

status: "success"
total_users: int - COUNT of matching users
filters_applied: {email, api_key, is_active} showing applied filter values
statistics: object containing:
- active_users: int - users where is_active is True (boolean True, not truthy)
- inactive_users: int - total_users - active_users
- admin_users: int - users with role="admin"
- developer_users: int - users with role="developer"
- regular_users: int - users with role="user" or role=None
- total_credits: float (rounded 2 decimal places) - sum of all users.credits
- average_credits: float (rounded 2 decimal places)
- subscription_breakdown: dict keyed by subscription_status value -> count
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: Exception from any query (raises HTTPException with "Failed to get users statistics")

2.2 Flow Diagram

Request -> require_admin dep -> compute email_pattern = "%{email}%" if email else None -> Query 1: count_query SELECT id [JOIN api_keys_new if api_key] count="exact" with filters -> total_users = count_result.count -> Query 2: role_query SELECT role [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> count admin/developer/regular roles -> Query 3: status_query SELECT is_active [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> count active (is_active is True) -> Query 4: credits_query SELECT credits [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> sum credits -> Query 5: subscription_query SELECT subscription_status [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> group by subscription_status -> build and return response

2.3 Complete Dependency Map

Component	Location	Details
get_users_stats() handler	src/routes/admin.py:736	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_supabase_client()	src/config/supabase_config.py	Imported inline

Supabase Query 1 - Count:

Without api_key: users.select("id", count="exact") with .ilike("email", pattern) and/or .eq("is_active", is_active)
With api_key: users.select("id, api_keys_new!inner(api_key)", count="exact") with .ilike("api_keys_new.api_key", "%api_key%")
Returns: count_result.count

Supabase Query 2 - Roles:

users.select("role" [+ join]).limit(100000) with same filters
Python aggregation: sum(1 for u in role_data if u.get("role") == "admin"), "developer", or "user"/None

Supabase Query 3 - Status:

users.select("is_active" [+ join]).limit(100000) with same filters
active_users = sum(1 for u in status_data if u.get("is_active") is True) — strict True check, not truthy

Supabase Query 4 - Credits:

users.select("credits" [+ join]).limit(100000) with same filters
total_credits = sum(float(u.get("credits", 0)) for u in credits_data)
avg_credits = round(total_credits / total_users, 2)

Supabase Query 5 - Subscriptions:

users.select("subscription_status" [+ join]).limit(100000) with same filters
subscription_stats = {status: count} dict

JOIN pattern (when api_key provided): api_keys_new!inner(api_key) — INNER JOIN on api_keys_new table, filter: .ilike("api_keys_new.api_key", f"%{api_key}%")

Email filter: Uses PostgreSQL ILIKE with %{email}% pattern — case-insensitive partial match anywhere in email string

Important note on active_users: Uses u.get("is_active") is True (identity check), not u.get("is_active") (truthiness). This means users with is_active=1 (integer) would NOT be counted as active. Only Python bool True matches.

2.4 Side Effects

DB READ x5: Five separate sequential Supabase queries on users table (with optional api_keys_new JOIN)
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no cache invalidations, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/stats"} post-response
Sentry: 50% sampling rate for admin endpoints
Performance: LIMIT 100000 on roles/status/credits/subscription queries — fetches up to 100k rows per query which may be significant memory usage on large databases

Issue: #1611

Deep-Dive API Documentation: GET /admin/users

Section 1: High-Level Overview

The GET /admin/users endpoint returns a paginated, filterable list of user records for admin consumption. For email-only searches it uses a PostgreSQL RPC function (search_users_by_email) for performance on Cloudflare-hosted instances; for complex filters it falls back to a standard query with JOIN on api_keys_new. It returns user identity and status fields without statistics (see /admin/users/stats for aggregates) and supports pagination up to 10,000 records per page.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Query Parameters:

email: str|None - Case-insensitive partial match (ilike %email%)
api_key: str|None - Case-insensitive partial match on api_keys_new.api_key
is_active: bool|None - Filter by active status
limit: int, ge=1, le=10000, default=100 - Records per page
offset: int, ge=0, default=0 - Records to skip

Fast Path (email-only, no api_key, no is_active filter):

Calls RPC function: search_users_by_email(search_term, result_limit, result_offset)
Returns: total_count from first row, user records without total_count field
On RPC failure: raises HTTPException 500 with message about missing RPC function (does NOT fall back to standard query for email-only to avoid Cloudflare crashes)

Standard Path (any other combination):

Count query with optional JOIN
Data query with optional JOIN, specific column selection
Pagination via .range(offset, offset+limit-1) and .order("created_at", desc=True)

Response Schema:

status: "success"
total_users: int - total matching the filters
has_more: bool - (offset + limit) < total_users
pagination: {limit, offset, current_page (offset//limit)+1, total_pages}
filters_applied: {email, api_key, is_active}
users: list of user dicts (cleaned of api_keys_new join data)
timestamp: ISO datetime string

User dict columns (standard path): id, username, email, credits, is_active, role, registration_date, auth_method, subscription_status, trial_expires_at, created_at, updated_at (+ api_key field stripped from JOIN)

Error Codes:

401: Invalid/missing auth
403: Not admin role
500: RPC failure for email-only; or data query failure; or unexpected exception

2.2 Flow Diagram

Request -> require_admin dep -> check filter combination -> if email AND NOT api_key AND is_active is None: RPC path -> client.rpc("search_users_by_email", {search_term, result_limit, result_offset}) -> if RPC fails: raise HTTPException 500 -> extract total_count from first row -> clean total_count from user dicts -> return -> else: Standard path -> count query (with optional api_keys_new JOIN and filters) -> data query (with optional JOIN, column selection, filters) -> sort by created_at desc, range pagination -> clean api_keys_new from user dicts -> build response

2.3 Complete Dependency Map

Component	Location	Details
get_all_users_info() handler	src/routes/admin.py:942	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_supabase_client()	src/config/supabase_config.py	Imported inline

Fast Path - RPC Query:

Function: search_users_by_email
Parameters: {search_term: email, result_limit: limit, result_offset: offset}
Returns: rows with all user columns + total_count field on each row
total_users = users_data[0]["total_count"] if users_data else 0
Users cleaned by: {k: v for k, v in user.items() if k != "total_count"}

Standard Path - Count Query:

Without api_key: users.select("id", count="exact")
With api_key: users.select("id, api_keys_new!inner(api_key)", count="exact")
Filters: .ilike("email", f"%{email}%"), .ilike("api_keys_new.api_key", f"%{api_key}%"), .eq("is_active", is_active)
On count failure: logs error, falls back to total_users = 0

Standard Path - Data Query:

Without api_key: users.select("id, username, email, credits, is_active, role, registration_date, auth_method, subscription_status, trial_expires_at, created_at, updated_at")
With api_key: above + ", api_keys_new!inner(api_key)"
Same filters applied
Order: .order("created_at", desc=True)
Pagination: .range(offset, offset + limit - 1)
Users cleaned: {k: v for k, v in user.items() if k != "api_keys_new"}

has_more calculation: (offset + limit) < total_users

2.4 Side Effects

DB READ (fast path): 1 RPC call (search_users_by_email function) with pagination
DB READ (standard path): 2 sequential queries (count + data) with optional JOIN
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no cache invalidations, no notifications
Error logging: traceback.format_exc() on unexpected exceptions
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users"} post-response
Sentry: 50% sampling rate for admin endpoints
Note: api_keys_new JOIN is an INNER JOIN meaning users without any api_keys_new entries are excluded from api_key filter results; limit up to 10000 can return large payloads

Issue: #1612

Deep-Dive API Documentation: GET /admin/users/{user_id}

Section 1: High-Level Overview

The GET /admin/users/{user_id} endpoint retrieves comprehensive information about a specific user by their numeric ID, including all user record fields, all associated API keys from api_keys_new, the 10 most recent usage_records entries, and the 10 most recent activity_log entries. It runs 4 sequential Supabase queries and returns a unified response; usage_records and activity_log failures are silently swallowed returning empty arrays.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Path Parameters:

user_id: int - The numeric user ID (primary key in users table)

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Request Schema: No body, no query parameters.

Response Schema:

status: "success"
user: dict - Full users table row (all columns including sensitive data like api_key, email, credits)
api_keys: list - All rows from api_keys_new WHERE user_id=? (all columns including raw api_key strings)
recent_usage: list - Up to 10 rows from usage_records WHERE user_id=? ORDER BY created_at DESC LIMIT 10; [] on query failure
recent_activity: list - Up to 10 rows from activity_log WHERE user_id=? ORDER BY created_at DESC LIMIT 10; [] on query failure
timestamp: ISO datetime string

Error Codes:

401: Invalid/missing auth
403: Not admin role
404: users table query returns no rows for given user_id
500: Exception in users or api_keys query (raises HTTPException "Failed to get user information")

2.2 Flow Diagram

Request -> require_admin dep -> get_supabase_client() -> Query 1: SELECT * FROM users WHERE id=user_id -> if no data: raise 404 -> user = data[0] -> Query 2: SELECT * FROM api_keys_new WHERE user_id=user_id -> api_keys = data or [] -> try: Query 3: SELECT * FROM usage_records WHERE user_id=user_id ORDER BY created_at DESC LIMIT 10 -> except: recent_usage = [] -> try: Query 4: SELECT * FROM activity_log WHERE user_id=user_id ORDER BY created_at DESC LIMIT 10 -> except: recent_activity = [] -> return {status, user, api_keys, recent_usage, recent_activity, timestamp} -> HTTPException re-raised -> except Exception: log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
get_user_info_by_id() handler	src/routes/admin.py:1509	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_supabase_client()	src/config/supabase_config.py	Imported inline

Supabase Query 1 - User:

users.select("*").eq("id", user_id).execute()
Returns all users columns
404 if no rows returned

Supabase Query 2 - API Keys:

api_keys_new.select("*").eq("user_id", user_id).execute()
Returns all api_keys_new columns for all keys owned by this user
Returns [] if no keys found (api_keys_result.data is None or empty)
Note: Returns raw api_key strings in plaintext

Supabase Query 3 - Usage Records (legacy):

usage_records.select("*").eq("user_id", user_id).order("created_at", desc=True).limit(10).execute()
Returns up to 10 most recent usage records
Silent failure: bare except Exception: recent_usage = [] — no logging

Supabase Query 4 - Activity Log:

activity_log.select("*").eq("user_id", user_id).order("created_at", desc=True).limit(10).execute()
Returns up to 10 most recent activity entries
Silent failure: bare except Exception: recent_activity = [] — no logging

Note on silent failures: Both usage_records and activity_log queries use bare except Exception: result = [] without logging — query failures go completely undetected in the response.

2.4 Side Effects

DB READ x4: users (1 row), api_keys_new (all user keys), usage_records (LIMIT 10), activity_log (LIMIT 10) — sequential queries
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no cache invalidations, no notifications
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/{user_id}"} post-response
Sentry: 50% sampling rate for admin endpoints
Security note: Response includes all API key strings in plaintext and all user data including sensitive fields — treat as highly sensitive

Issue: #1613

Deep-Dive API Documentation: GET /admin/users/by-api-key

Section 1: High-Level Overview

The GET /admin/users/by-api-key endpoint performs an exact-match lookup to find which user owns a specific API key. It uses a PostgreSQL RPC function (search_user_by_api_key) for fast indexed lookup and returns a slim user object with key identity and status fields. This is designed for support workflows where an admin needs to find a user from a known API key. Requires the full exact API key — partial matching is not supported.

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

Authentication: Admin role required. Uses require_admin dependency.

Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware

Query Parameters:

api_key: str (required, ...) - The complete API key to look up (exact match, no partial matching)

Response Schema:

status: "success"
user: object containing:
- id: int|None - user_data.get("user_id") — note: RPC returns "user_id" which is mapped to "id"
- username: str|None
- email: str|None
- credits: float - default 0
- is_active: bool - default True
- role: str - default "user"
- subscription_status: str - default "trial"
- created_at: str|None
timestamp: ISO datetime string

Error Codes:

400: Pydantic validation (api_key is required, cannot be empty)
401: Invalid/missing auth
403: Not admin role
404: RPC returns empty result (no user with this exact API key)
500: Exception from RPC call or data processing

2.2 Flow Diagram

Request -> require_admin dep -> get_supabase_client() -> log lookup (first 20 chars of api_key) -> client.rpc("search_user_by_api_key", {"search_api_key": api_key}).execute() -> if no data or empty: raise 404 "No user found with API key: {api_key[:20]}..." -> user_data = result.data[0] -> build user dict mapping RPC field names to response field names -> return {status, user, timestamp} -> HTTPException re-raised -> Exception -> log error -> raise HTTPException 500

2.3 Complete Dependency Map

Component	Location	Details
get_user_by_api_key() handler	src/routes/admin.py:1294	Route handler
require_admin dependency	src/security/deps.py:220	Admin role check
get_supabase_client()	src/config/supabase_config.py	Imported inline
search_user_by_api_key RPC	Supabase PostgreSQL function	Exact match lookup: presumably does SELECT u.id as user_id, u.username, u.email, u.credits, u.is_active, u.role, u.subscription_status, u.created_at FROM api_keys_new ak JOIN users u ON u.id = ak.user_id WHERE ak.api_key = search_api_key LIMIT 1

RPC Call Details:

Function name: search_user_by_api_key
Parameters: {"search_api_key": api_key} — full exact string match
Expected response: list with 0 or 1 rows; each row contains: user_id, username, email, credits, is_active, role, subscription_status, created_at
Empty list or None data -> 404

Field Mapping (RPC result -> response):

user_data.get("user_id") -> user["id"]
user_data.get("username") -> user["username"]
user_data.get("email") -> user["email"]
user_data.get("credits", 0) -> user["credits"]
user_data.get("is_active", True) -> user["is_active"]
user_data.get("role", "user") -> user["role"]
user_data.get("subscription_status", "trial") -> user["subscription_status"]
user_data.get("created_at") -> user["created_at"]

404 message: f"No user found with API key: {api_key[:20]}..." — truncates to first 20 chars to avoid logging full key

Logging: logger.info(f"Looking up user by API key: {api_key[:20]}...") — first 20 chars logged before lookup

Performance: Documented as ~10-20ms (indexed lookup via RPC). The RPC function uses an index on api_keys_new.api_key for O(log n) lookup.

2.4 Side Effects

DB READ: 1 RPC call (search_user_by_api_key PostgreSQL function) — uses indexed exact match on api_keys_new.api_key
Cache READ: require_admin chain reads _user_cache for admin user
Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
No DB writes, no Redis operations, no in-process cache changes, no notifications
Logging: First 20 characters of provided api_key are logged at INFO level
ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/by-api-key"} post-response
Sentry: 50% sampling rate for admin endpoints
Route ordering note: This route (/admin/users/by-api-key) must be registered BEFORE /admin/users/{user_id} in FastAPI's router to avoid "by-api-key" being interpreted as a user_id integer — however since "by-api-key" is not a valid integer, the path parameter route would return a validation error rather than matching anyway

Issue: #1614

Deep-Dive API Documentation: GET /admin/api-keys/{api_key_id}

Section 1: High-Level Overview

This admin-only endpoint retrieves complete details for a specific API key identified by its numeric database ID. It performs a single joined Supabase query against api_keys_new and users, returning full key metadata (including the plaintext key string) and the owning user profile. The endpoint is intended for admin tooling, support workflows, and audit inspection of specific API keys.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain (Depends(require_admin) in src/security/deps.py):

require_admin calls get_current_user(user) (deps.py:220)
get_current_user calls get_api_key(credentials) (deps.py:74)
get_api_key: extracts Bearer token; in development (Config.IS_DEVELOPMENT) returns dev key bypassing validation; otherwise calls validate_api_key_security(api_key, client_ip, referer) from src/security/security.py, then audit_logger.log_api_key_usage(user_id, key_id, endpoint, ip, user_agent)
validate_api_key_security: checks key active status, expiration date, request limits, IP allowlist membership, domain referrer restrictions
get_current_user: calls validate_trial_expiration(user) from src/utils/trial_utils.py — raises HTTP 402 if trial expired
require_admin: checks user.get("is_admin", False) or user.get("role") == "admin" — raises HTTP 403, calls audit_logger.log_security_violation(UNAUTHORIZED_ADMIN_ACCESS, user_id) if not admin

Path parameter: api_key_id: int — numeric primary key of the api_keys_new row

Request body: None

Response (200 OK):

{
  "status": "success",
  "api_key": {
    "id": int,
    "api_key": str,           // Full plaintext API key string
    "key_name": str | null,
    "environment_tag": str,   // e.g. "live", "test", "staging"
    "is_active": bool,
    "is_primary": bool,
    "scope_permissions": dict,
    "max_requests": int | null,
    "requests_used": int,
    "ip_allowlist": list,
    "domain_referrers": list,
    "created_at": str,
    "updated_at": str,
    "last_used_at": str | null,
    "expiration_date": str | null,
    "user": {
      "id": int,
      "email": str,
      "username": str,
      "credits": float,
      "is_active": bool,
      "role": str,
      "subscription_status": str,
      "created_at": str
    }
  },
  "timestamp": str   // UTC ISO 8601
}

Error codes:

Code	Condition
401	Missing/invalid Authorization header; key inactive or expired
402	Caller's own trial period has expired
403	Caller is not an admin (role != "admin" and is_admin != True)
404	No row in `api_keys_new` with the given `api_key_id`
500	Supabase query exception or unexpected error

Middleware effects:

SecurityMiddleware (src/middleware/security_middleware.py): IP-based rate limiting, behavioral analysis, velocity mode checks. Authenticated admin users are exempt from IP-level limits.
OpenTelemetry tracing middleware: span created for request lifecycle
Sentry middleware: uncaught exceptions automatically captured
GZip middleware: response compressed when client supports it

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/api-keys/id]) --> B[SecurityMiddleware\nIP rate limit check]
    B --> C{Credentials present?}
    C -->|No| D[HTTP 401]
    C -->|Yes| E[validate_api_key_security\nactive/expiry/IP/domain]
    E -->|Invalid| F[HTTP 401 or 403]
    E -->|Valid| G[validate_trial_expiration]
    G -->|Expired| H[HTTP 402]
    G -->|OK| I{user.role == admin?}
    I -->|No| J[HTTP 403\n+ audit log UNAUTHORIZED]
    I -->|Yes| K[get_supabase_client]
    K --> L[SELECT api_keys_new.*\n+ users!inner JOIN\nWHERE id=api_key_id]
    L -->|No rows| M[HTTP 404]
    L -->|Exception| N[HTTP 500\nlogger.error]
    L -->|Row found| O[Pop nested users dict\nBuild response_data]
    O --> P[Return 200 JSON\n{status, api_key, timestamp}]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Calls get_current_user chain; checks admin role
`get_api_key`	`src/security/deps.py:74`	Auth	Extracts Bearer token; dev bypass if IS_DEVELOPMENT
`validate_api_key_security`	`src/security/security.py`	Auth	Checks: key exists, is_active=True, not expired, requests_used < max_requests, client IP in ip_allowlist (if set), referer domain in domain_referrers (if set)
`validate_trial_expiration`	`src/utils/trial_utils.py`	Auth	Raises HTTP 402 if user.trial_expires_at < now and user is trial user
`audit_logger.log_api_key_usage`	`src/security/security.py`	Side effect	Writes to audit log: user_id, key_id, endpoint path, IP, user-agent
`audit_logger.log_security_violation`	`src/security/security.py`	Side effect	Writes UNAUTHORIZED_ADMIN_ACCESS violation entry
`get_supabase_client`	`src/config/supabase_config.py`	DB connection	Returns singleton PostgREST client
`api_keys_new` table	Supabase	SELECT	`client.table("api_keys_new").select("*, users!inner(id, email, username, credits, is_active, role, subscription_status, created_at)").eq("id", api_key_id).execute()`
`users` table	Supabase	JOIN	Inner join via foreign key in above query; columns: id, email, username, credits, is_active, role, subscription_status, created_at

2.4 Side Effects

No database writes — this is a pure read endpoint.
Audit log: audit_logger.log_api_key_usage fires on every authenticated request, recording user_id, key_id, endpoint, client IP, and user-agent.
Security violation audit log (conditional): audit_logger.log_security_violation(UNAUTHORIZED_ADMIN_ACCESS, ...) fires when a non-admin user attempts this endpoint.
No Redis operations on this endpoint path.
No Prometheus metrics emitted directly by this handler (middleware-level request count and latency histograms still apply).
Sensitive data disclosure: The full plaintext API key string is included in the response. This endpoint requires admin authentication and must only be served over HTTPS in production.

Issue: #1615

Deep-Dive API Documentation: GET /admin/credit-transactions

Section 1: High-Level Overview

This admin-only endpoint retrieves all credit transactions across all users with comprehensive filtering, sorting, and pagination capabilities. It delegates to get_all_transactions() in src/db/credit_transactions.py, which queries the credit_transactions table directly with optional filters. Unlike the per-user endpoint, this admin view can span all accounts and optionally include a per-user transaction summary.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Same Depends(require_admin) chain as documented for Issue #1614.

Query Parameters:

Parameter	Type	Default	Validation	Description
`limit`	int	50	1–1000	Max transactions to return
`offset`	int	0	>= 0	Skip N transactions (pagination)
`user_id`	int	null	optional	Filter to specific user
`transaction_type`	str	null	optional	One of: trial, purchase, api_usage, admin_credit, admin_debit, refund, bonus, transfer
`from_date`	str	null	optional	YYYY-MM-DD or ISO format start date
`to_date`	str	null	optional	YYYY-MM-DD or ISO format end date
`min_amount`	float	null	optional	Minimum absolute amount filter
`max_amount`	float	null	optional	Maximum absolute amount filter
`direction`	str	null	"credit" or "charge"	credit = positive amounts, charge = negative amounts
`payment_id`	int	null	optional	Filter by payment record ID
`sort_by`	str	"created_at"	"created_at", "amount", "transaction_type"	Sort field
`sort_order`	str	"desc"	"asc" or "desc"	Sort direction
`include_summary`	bool	false	optional	Include per-user summary (only when user_id provided)

Handler-level validation (raises HTTP 400):

direction not in ("credit", "charge")
sort_by not in ("created_at", "amount", "transaction_type")
sort_order not in ("asc", "desc")

Response (200 OK):

{
  "transactions": [
    {
      "id": int,
      "user_id": int,
      "amount": float,
      "transaction_type": str,
      "description": str,
      "balance_before": float,
      "balance_after": float,
      "created_at": str,
      "payment_id": int | null,
      "metadata": dict,
      "created_by": str | null
    }
  ],
  "pagination": {
    "total": int,      // count in current page (not total in DB)
    "limit": int,
    "offset": int,
    "has_more": bool   // true if len(results) == limit
  },
  "filters_applied": { ... all filter params ... },
  "summary": { ... }  // only if include_summary=true AND user_id provided
}

Error codes:

Code	Condition
400	Invalid direction, sort_by, or sort_order value
401	Invalid/missing credentials
402	Caller trial expired
403	Not admin
500	DB query failure

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/credit-transactions]) --> B[require_admin auth chain]
    B -->|fail| C[401/402/403]
    B -->|OK| D{Validate direction\nsort_by\nsort_order}
    D -->|invalid| E[HTTP 400]
    D -->|valid| F[get_all_transactions\nsrc/db/credit_transactions.py]
    F --> G[get_supabase_client]
    G --> H[SELECT * FROM credit_transactions]
    H --> I{user_id filter?}
    I -->|yes| J[.eq user_id]
    I -->|no| K[all users]
    J --> L{transaction_type?}
    K --> L
    L --> M{date range?}
    M --> N{direction filter?}
    N --> O{payment_id?}
    O --> P[Apply sort order]
    P --> Q{min/max amount?}
    Q -->|yes| R[Fetch all, filter client-side\nthen paginate in Python]
    Q -->|no| S[DB-side range pagination]
    R --> T[Format transactions]
    S --> T
    T --> U{include_summary\nAND user_id?}
    U -->|yes| V[get_transaction_summary\nuser_id, dates]
    U -->|no| W[Build response]
    V --> W
    W --> X[Return 200]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain as documented
`get_all_transactions`	`src/db/credit_transactions.py:290`	DB read	Queries `credit_transactions` table with all filters applied
`get_supabase_client`	`src/config/supabase_config.py`	DB conn	PostgREST client
`credit_transactions` table	Supabase	SELECT *	Filters: eq(user_id), eq(transaction_type), gte/lte(created_at), gt/lt(amount for direction), eq(payment_id). Sort: order(sort_by, desc=bool). Pagination: range(offset, offset+limit-1) if no min/max amount; else client-side slicing
`get_transaction_summary`	`src/db/credit_transactions.py:492`	DB read	Called only when include_summary=True and user_id provided. SELECT * FROM credit_transactions WHERE user_id=X and date filters. Computes: total_transactions, total_credits_added, total_credits_used, net_change, by_type breakdown, daily_breakdown, largest_credit, largest_charge, average_transaction, transaction_count_by_direction
`TransactionType` class	`src/db/credit_transactions.py:21`	Constants	Defines: TRIAL, PURCHASE, ADMIN_CREDIT, ADMIN_DEBIT, API_USAGE, REFUND, BONUS, TRANSFER, SUBSCRIPTION_RENEWAL/CANCELLATION/UPGRADE/DOWNGRADE

Key implementation detail — min/max amount filtering: When min_amount or max_amount is provided, get_all_transactions cannot use DB-side pagination efficiently. It fetches ALL matching rows first (no LIMIT), then filters by abs(float(amount)) in Python, then applies slice [offset:offset+limit]. For large datasets this can be expensive.

2.4 Side Effects

No database writes.
Audit log: audit_logger.log_api_key_usage fires on every authenticated call.
Performance warning: When min_amount/max_amount filters are used, all matching rows are loaded into Python memory before pagination — can be expensive on large datasets.
Summary performance: When include_summary=True with no user_id, summary is silently skipped with a warning log (logger.warning) to prevent expensive full-table aggregation.
No Redis operations.
No direct Prometheus metrics.

Issue: #1616

Deep-Dive API Documentation: GET /admin/monitoring/chat-requests

Section 1: High-Level Overview

This admin endpoint retrieves paginated chat completion request records with flexible multi-field filtering. It queries the chat_completion_requests table with inner joins to models and providers, executing two queries per request (data fetch + count). It provides a full view of every recorded inference call for analytics and monitoring purposes.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — same full chain as documented in Issue #1614.

Query Parameters:

Parameter	Type	Default	Description
`model_id`	int	null	Filter by model ID (exact match on chat_completion_requests.model_id)
`provider_id`	int	null	Filter by provider ID (via models.provider_id join)
`model_name`	str	null	Filter by model name (case-insensitive contains match)
`start_date`	str	null	ISO format start date filter (gte on created_at)
`end_date`	str	null	ISO format end date filter (lte on created_at)
`limit`	int	100	1–100000 max records
`offset`	int	0	Pagination offset

Response (200 OK):

{
  "success": true,
  "data": [
    {
      // All columns from chat_completion_requests
      // Plus nested: models.{id, model_id, model_name, provider_model_id, provider_id,
      //               providers.{id, name, slug}}
    }
  ],
  "metadata": {
    "total_count": int,
    "limit": int,
    "offset": int,
    "returned_count": int,
    "filters": {
      "model_id": int|null,
      "provider_id": int|null,
      "model_name": str|null,
      "start_date": str|null,
      "end_date": str|null
    },
    "timestamp": str
  }
}

Error codes:

Code	Condition
401	Invalid/missing credentials
402	Trial expired
403	Not admin
500	Supabase query failure

Middleware: Security (IP; admins exempt), OpenTelemetry, Sentry, GZip.

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/monitoring/chat-requests]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[get_supabase_client]
    D --> E[Build data query\nchat_completion_requests SELECT *\n+ models!inner + providers!inner]
    E --> F{model_id?}
    F -->|yes| G[.eq model_id]
    F -->|no| H
    G --> H{provider_id?}
    H -->|yes| I[.eq models.provider_id]
    H -->|no| J
    I --> J{model_name?}
    J -->|yes| K[.ilike models.model_name]
    J -->|no| L
    K --> L{start_date?}
    L -->|yes| M[.gte created_at]
    L -->|no| N
    M --> N{end_date?}
    N -->|yes| O[.lte created_at]
    N -->|no| P
    O --> P[.order created_at desc\n.range offset to offset+limit-1]
    P --> Q[Execute data query]
    Q --> R[Build count query\nsame filters + count=exact head=True]
    R --> S[Execute count query]
    S --> T[total_count = count_result.count\nor len data]
    T --> U[Return 200]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`get_supabase_client`	`src/config/supabase_config.py`	DB	PostgREST client
`chat_completion_requests` table	Supabase	SELECT	`SELECT *, models!inner(id, model_id, model_name, provider_model_id, provider_id, providers!inner(id, name, slug))` with optional eq(model_id), eq(models.provider_id), ilike(models.model_name, %X%), gte(created_at), lte(created_at); ordered by created_at DESC; paginated via range(offset, offset+limit-1)
`chat_completion_requests` count	Supabase	COUNT	Same query structure with `count="exact", head=True`
`models` table	Supabase	JOIN	Inner join: id, model_id, model_name, provider_model_id, provider_id
`providers` table	Supabase	JOIN	Inner join via models: id, name, slug

2.4 Side Effects

No database writes.
Two Supabase queries per request: one data fetch, one count query.
Performance risk: limit can be set up to 100,000 — very large result sets can cause memory and timeout issues. No Redis caching on this endpoint.
No Redis operations.
No direct Prometheus metrics.
Audit log: auth chain fires audit_logger.log_api_key_usage on every call.

Issue: #1617

Deep-Dive API Documentation: GET /admin/monitoring/chat-requests/summary

Section 1: High-Level Overview

This admin endpoint returns aggregate summary statistics for chat completion requests, optionally filtered by model, provider, model name, and date range. Results are cached in Redis with a 60-second TTL using an MD5 hash of the filter parameters as the cache key. A cache miss triggers get_chat_completion_summary_by_filters() from src/db/chat_completion_requests.py. This endpoint is designed specifically for analytics dashboards and avoids fetching raw request records.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — same chain as documented in Issue #1614.

Query Parameters:

Parameter	Type	Default	Description
`model_id`	int	null	Filter by model ID
`provider_id`	int	null	Filter by provider ID
`model_name`	str	null	Partial match on model name
`start_date`	str	null	ISO format (YYYY-MM-DDTHH:MM:SS)
`end_date`	str	null	ISO format

Response (200 OK):

{
  "summary": {
    "total_requests": int,
    "total_input_tokens": int,
    "total_output_tokens": int,
    "total_tokens": int,
    "avg_input_tokens": float,
    "avg_output_tokens": float,
    "avg_processing_time_ms": float,
    "completed_requests": int,
    "failed_requests": int,
    "success_rate": float,
    "first_request_at": str,
    "last_request_at": str,
    "total_cost_usd": float
  },
  "filters": { model_id, provider_id, model_name, start_date, end_date },
  "timestamp": str,
  "cached": bool
}

Error codes: 401, 402, 403, 500.

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/monitoring/chat-requests/summary]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[Build filter_str\nmodel_id:provider_id:model_name:start_date:end_date]
    D --> E[MD5 hash filter_str\ncache_key = chat_summary:filters:HASH]
    E --> F[get_redis_client]
    F --> G{Redis available?}
    G -->|yes| H[redis.get cache_key]
    H -->|hit| I[Parse JSON\nset cached=True\nreturn 200]
    H -->|miss or error| J[Cache MISS log]
    G -->|no| J
    J --> K[get_chat_completion_summary_by_filters\nsrc/db/chat_completion_requests.py]
    K --> L[DB aggregation query]
    L --> M[Build response dict\ncached=False]
    M --> N{Redis available?}
    N -->|yes| O[redis.setex cache_key\nTTL=60s\nJSON serialized]
    N -->|no or error| P
    O --> P[Return 200]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`hashlib.md5`	stdlib	Computation	`filter_str = f"{model_id}:{provider_id}:{model_name}:{start_date}:{end_date}"` then `.hexdigest()`
`get_redis_client`	`src/config/redis_config.py`	Redis conn	Returns Redis client or None if unavailable
Redis GET	Redis	Read	Key: `chat_summary:filters:{md5_hash}`; returns JSON bytes or None
`get_chat_completion_summary_by_filters`	`src/db/chat_completion_requests.py`	DB aggregate	Executes aggregation query on `chat_completion_requests` with optional filters for model_id, provider_id (via models join), model_name (ilike), start_date (gte), end_date (lte)
Redis SETEX	Redis	Write	Key: `chat_summary:filters:{md5_hash}`; TTL: 60 seconds; value: JSON-serialized response (using `json.dumps(response, default=str)`)

Redis key pattern: chat_summary:filters:{md5_hex_of_filter_string} Redis TTL: 60 seconds Redis data structure: String (JSON serialized response object)

2.4 Side Effects

No database writes.
Redis write (on cache miss): Stores full response JSON with 60-second TTL.
Redis read (on every request): Attempts to retrieve cached response.
Redis failures are non-fatal: Both read and write errors are caught with logger.warning, execution continues without caching.
Cache invalidation: No explicit invalidation — entries expire naturally after 60 seconds. Admin-triggered refreshes of providers/models do NOT invalidate these summary caches.
Audit log: audit_logger.log_api_key_usage fires on every authenticated call.
No direct Prometheus metrics.
Performance: Cache hit ~5–10ms. Cache miss with DB RPC ~30–50ms. DB fallback without RPC slower.

Issue: #1618

Deep-Dive API Documentation: GET /admin/monitoring/chat-requests/plot-data

Section 1: High-Level Overview

This admin endpoint returns data optimized for frontend chart rendering. It executes two Supabase queries: one fetching the last 10 full request records for display (with model and provider metadata), and one fetching ALL matching requests but only 4 lightweight fields (input_tokens, output_tokens, processing_time_ms, created_at). The lightweight fields are compressed into parallel arrays for efficient network transfer and direct use in charting libraries.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain as documented in Issue #1614.

Query Parameters:

Parameter	Type	Default	Description
`model_id`	int	null	Filter by model ID (exact match on model_id)
`provider_id`	int	null	Filter by provider ID (post-processed client-side on recent_requests; NOT applied to plot query)
`start_date`	str	null	ISO format (gte on created_at)
`end_date`	str	null	ISO format (lte on created_at)

Important limitation: provider_id is applied via Python-side filtering ONLY to recent_requests. It is NOT applied to the plot_data query (all_requests fetch). This means the plot arrays include records from all providers even when provider_id is specified.

Response (200 OK):

{
  "success": true,
  "recent_requests": [
    // Last 10 records with full detail:
    // id, request_id, model_id, input_tokens, output_tokens,
    // processing_time_ms, status, error_message, created_at,
    // total_tokens (computed),
    // models.{id, model_id, model_name, provider_model_id,
    //   providers.{id, name, slug}}
  ],
  "plot_data": {
    "tokens": [int, ...],      // total_tokens per request (input+output)
    "latency": [float, ...],   // processing_time_ms per request
    "timestamps": [str, ...]   // created_at per request
  },
  "metadata": {
    "recent_count": int,
    "total_count": int,
    "timestamp": str,
    "compression": "arrays",
    "format_version": "1.0"
  }
}

Error codes: 401, 402, 403, 500.

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/monitoring/chat-requests/plot-data]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[get_supabase_client]

    D --> E[Query 1: recent_requests\nchat_completion_requests SELECT\nfull fields + models + providers JOIN\nwith model_id/start_date/end_date filters\n.order created_at desc .limit 10]
    E --> F[Execute recent query]
    F --> G{provider_id filter?}
    G -->|yes| H[Filter recent_requests in Python\nby providers.id == provider_id]
    G -->|no| I[Use all 10 records]
    H --> I
    I --> J[Add total_tokens to each record\ninput_tokens + output_tokens]

    D --> K[Query 2: plot query\nchat_completion_requests SELECT\ninput_tokens output_tokens\nprocessing_time_ms created_at ONLY\nwith model_id/start_date/end_date filters\nNO provider filter\n.order created_at asc\nNO limit]
    K --> L[Execute plot query\nFetches ALL matching records]
    L --> M[Build parallel arrays\nfor each record:\ntokens_array.append input+output\nlatency_array.append processing_time_ms\ntimestamps_array.append created_at]

    J --> N[Build response]
    M --> N
    N --> O[Return 200]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`get_supabase_client`	`src/config/supabase_config.py`	DB	PostgREST client
Query 1 — recent_requests	`chat_completion_requests`	SELECT	`SELECT id, request_id, model_id, input_tokens, output_tokens, processing_time_ms, status, error_message, created_at, models!inner(id, model_id, model_name, provider_model_id, providers!inner(id, name, slug))`. Filters: eq(model_id) if set, gte(created_at) if start_date, lte(created_at) if end_date. Order: created_at DESC. Limit: 10
Query 2 — plot data	`chat_completion_requests`	SELECT	`SELECT input_tokens, output_tokens, processing_time_ms, created_at`. Same filters EXCEPT provider_id NOT applied. No LIMIT — fetches entire matching dataset. Order: created_at ASC
Python post-processing	Handler	Filtering	provider_id applied to recent_requests only: `[r for r in recent_requests if r.get("models", {}).get("providers", {}).get("id") == provider_id]`
Python post-processing	Handler	Computation	`total_tokens = input_tokens + output_tokens` added to each recent_request
Python array building	Handler	Computation	Three parallel arrays built from all_requests in a single pass

2.4 Side Effects

No database writes.
Memory risk: Plot query fetches ALL matching records with NO LIMIT. On large deployments this can return millions of rows into Python memory. No Redis caching.
provider_id filter discrepancy: plot_data arrays include records from all providers; only recent_requests is provider-filtered. Frontend must be aware of this inconsistency.
No Redis operations.
No direct Prometheus metrics.
Audit log: audit_logger.log_api_key_usage fires on every authenticated call.

Issue: #1619

Deep-Dive API Documentation: GET /admin/monitoring/chat-requests/by-api-key

Section 1: High-Level Overview

This admin endpoint retrieves paginated chat completion requests for a specific API key identified by its full key string (exact match required). It first resolves the API key to its numeric ID via get_api_key_by_key(), then calls get_chat_completion_requests_by_api_key() from src/db/chat_completion_requests.py to fetch the paginated results. The include_summary parameter is deprecated; a separate /summary endpoint is preferred for statistics.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.

Query Parameters:

Parameter	Type	Default	Description
`api_key`	str	required	Full API key string (exact match, e.g. "gw_live_abc123...")
`limit`	int	100	1–1000 max records
`offset`	int	0	Pagination offset
`include_summary`	bool	false	DEPRECATED — include summary stats in response

Response (200 OK):

{
  "requests": [
    // Chat completion request records from get_chat_completion_requests_by_api_key()
  ],
  "total_count": int,
  "api_key_info": {
    "id": int,
    "key_name": str | null,
    "user_id": int,
    "environment_tag": str,
    "is_active": bool,
    "created_at": str
  },
  "limit": int,
  "offset": int,
  "pagination": {
    "limit": int,
    "offset": int,
    "has_more": bool,
    "current_page": int,
    "total_pages": int,
    "next_offset": int | null,
    "prev_offset": int | null
  },
  "timestamp": str,
  "summary": { ... }  // only if include_summary=true (deprecated)
}

Error codes:

Code	Condition
401	Invalid/missing credentials
402	Trial expired
403	Not admin
404	API key string not found in api_keys_new
500	DB error or missing ID field

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/monitoring/chat-requests/by-api-key]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[get_api_key_by_key api_key\nsrc/db/api_keys.py]
    D -->|not found| E[HTTP 404]
    D -->|found| F[Extract api_key_id]
    F -->|missing id| G[HTTP 500]
    F -->|has id| H[get_chat_completion_requests_by_api_key\napi_key_id limit offset\nsrc/db/chat_completion_requests.py]
    H --> I[Extract requests, total_count, summary]
    I --> J[Compute pagination metadata\nhas_more current_page total_pages]
    J --> K{include_summary?}
    K -->|yes + deprecated warning| L[Add summary to response\nlogger.warning]
    K -->|no| M[Build response without summary]
    L --> N[Return 200]
    M --> N

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`get_api_key_by_key`	`src/db/api_keys.py`	DB read	Looks up `api_keys_new` by exact key string match; returns full key record or None
`get_chat_completion_requests_by_api_key`	`src/db/chat_completion_requests.py`	DB read	Queries `chat_completion_requests` WHERE api_key_id=X with pagination (limit, offset); returns dict with keys: requests, total_count, summary
`api_keys_new` table	Supabase	SELECT	Via `get_api_key_by_key`: exact match on api_key column
`chat_completion_requests` table	Supabase	SELECT	Filtered by api_key_id; paginated

Deprecation note: When include_summary=True, a logger.warning is emitted: "include_summary parameter is deprecated for api_key_id=X. Use /admin/monitoring/chat-requests/by-api-key/summary endpoint instead..."

Pagination computation:

has_more = (offset + limit) < total_count
current_page = (offset // limit) + 1
total_pages = (total_count + limit - 1) // limit if total_count > 0 else 0
next_offset = offset + limit if has_more else None
prev_offset = max(0, offset - limit) if offset > 0 else None

2.4 Side Effects

No database writes.
Deprecation warning logged: When include_summary=True, a warning is emitted to server logs.
No Redis operations.
No direct Prometheus metrics.
Audit log: audit_logger.log_api_key_usage fires on every authenticated call.

Issue: #1620

API Documentation: GET /admin/monitoring/chat-requests/providers

High-Level Overview

This endpoint returns a list of all AI model providers that have at least one associated chat completion request recorded in the system. For each provider, it reports the count of distinct models used and total request volume. It is used by admin dashboards to populate provider selection dropdowns and build provider-level analytics views. The endpoint attempts to use an optimized PostgreSQL RPC function (get_provider_request_stats) and falls back to a manual join-and-aggregate approach if the RPC is unavailable.

2.1 Requirements & Pipeline

Authentication & Authorization:

Requires a valid Gatewayz API key with role = 'admin'.
Auth chain: get_api_key → get_current_user → require_admin.

Request Schema: No query parameters.

Response Schema:

{
  "success": true,
  "data": [
    {
      "provider_id": 1,
      "name": "OpenAI",
      "slug": "openai",
      "models_with_requests": 5,
      "total_requests": 12500
    }
  ],
  "metadata": {
    "total_providers": 8,
    "timestamp": "2026-01-01T00:00:00Z"
  }
}

Results are sorted by total_requests descending.

Error Codes:

Code	Condition
401	Invalid or missing API key
403	Not an admin
500	Database query failure

2.2 Mermaid Diagram

sequenceDiagram
    participant C as Client
    participant R as Route Handler<br/>get_providers_with_requests_admin()
    participant Auth as require_admin
    participant SB as Supabase

    C->>R: GET /admin/monitoring/chat-requests/providers
    R->>Auth: Depends(require_admin)
    Auth-->>R: admin_user

    R->>SB: RPC: get_provider_request_stats()
    alt RPC available and returns data
        SB-->>R: Aggregated provider stats
        R-->>C: 200 { success, data (from RPC), metadata }
    end

    note over R,SB: Fallback path (RPC not available)
    R->>SB: SELECT model_id, models!inner(<br/>providers!inner(id, name, slug))<br/>FROM chat_completion_requests
    SB-->>R: Raw join results

    R->>R: Group by provider_id,<br/>accumulate unique model_ids

    loop For each provider
        R->>SB: COUNT from chat_completion_requests<br/>WHERE model_id IN [provider_model_ids]
        SB-->>R: total_requests count
    end

    R->>R: Sort by total_requests DESC
    R-->>C: 200 { success, data, metadata }

2.3 Complete Dependency Map

Category	Name	Location	Purpose
Route file	`admin.py`	`src/routes/admin.py`	Handler
Auth	`require_admin`	`src/security/deps.py`	Admin enforcement
DB client	`get_supabase_client`	`src/config/supabase_config.py`	Supabase client
DB RPC	`get_provider_request_stats`	Supabase (PostgreSQL function)	Optimized aggregate (primary path)
DB table	`chat_completion_requests`	Supabase	Request records (fallback path)
DB table	`models`	Supabase	Model→Provider mapping (fallback)
DB table	`providers`	Supabase	Provider names/slugs (fallback)
Framework	`FastAPI`, `Depends`	`fastapi`	HTTP layer
Logging	`logging`	stdlib	Debug logging for RPC fallback

2.4 Side Effects

Read-only. No writes.
No caching. Results are always fetched live.
Audit log: Written on successful auth.
Fallback path: If get_provider_request_stats RPC is unavailable, the endpoint executes multiple COUNT queries (one per provider), which can be slow with many providers. RPC failure is logged at DEBUG level only.
No notifications or external calls.

Issue: #1621

API Documentation: GET /admin/monitoring/chat-requests/counts

High-Level Overview

This is a lightweight endpoint that returns request counts grouped by model, sorted by count descending. It is designed as a simpler, faster alternative to the /models endpoint when the caller only needs usage volume per model (not full token statistics). It is used by admin dashboards to build "most used models" leaderboards and quick-glance usage metrics.

2.1 Requirements & Pipeline

Authentication & Authorization:

Requires a valid Gatewayz API key with role = 'admin'.
Auth chain: get_api_key → get_current_user → require_admin.

Request Schema: No query parameters.

Response Schema:

{
  "success": true,
  "data": [
    {
      "model_id": 4,
      "model_name": "GPT-4o",
      "model_identifier": "openai/gpt-4o",
      "provider_name": "OpenAI",
      "provider_slug": "openai",
      "request_count": 5250
    }
  ],
  "metadata": {
    "total_models": 12,
    "total_requests": 48000,
    "timestamp": "2026-01-01T00:00:00Z"
  }
}

Error Codes:

Code	Condition
401	Invalid or missing API key
403	Not an admin
500	Database query failure

2.2 Mermaid Diagram

sequenceDiagram
    participant C as Client
    participant R as Route Handler<br/>get_request_counts_by_model_admin()
    participant Auth as require_admin
    participant SB as Supabase

    C->>R: GET /admin/monitoring/chat-requests/counts
    R->>Auth: Depends(require_admin)
    Auth-->>R: admin_user

    R->>SB: SELECT model_id,<br/>models!inner(id, model_name, provider_model_id,<br/>providers!inner(name, slug))<br/>FROM chat_completion_requests
    SB-->>R: All rows (model_id + join data)

    R->>R: Group by model_id in memory,<br/>count occurrences,<br/>accumulate model metadata

    R->>R: Sort by request_count DESC
    R-->>C: 200 { success, data, metadata }

2.3 Complete Dependency Map

Category	Name	Location	Purpose
Route file	`admin.py`	`src/routes/admin.py`	Handler
Auth	`require_admin`	`src/security/deps.py`	Admin enforcement
DB client	`get_supabase_client`	`src/config/supabase_config.py`	Supabase client
DB table	`chat_completion_requests`	Supabase	All request records
DB table	`models`	Supabase	Model metadata (joined)
DB table	`providers`	Supabase	Provider name/slug (joined)
Framework	`FastAPI`, `Depends`	`fastapi`	HTTP layer
Logging	`logging`	stdlib	Error logging

2.4 Side Effects

Read-only. No writes.
No caching. Results fetched live on every call.
In-memory aggregation: The endpoint fetches ALL rows from chat_completion_requests joined with models and providers, then groups them in Python memory. For high-volume systems this could fetch very large result sets. For systems with millions of requests, prefer the RPC-based /models endpoint which does aggregation in the database.
Audit log: Written on auth.
No notifications or external calls.

Issue: #1622

API Documentation: GET /admin/monitoring/chat-requests/models

High-Level Overview

This endpoint returns all unique AI models that have at least one recorded chat completion request, along with their request statistics (token totals, averages, processing latency). Results can be filtered by provider ID. It is used by admin dashboards to build model-level analytics views and to enumerate which models have been actively used. The endpoint attempts to use optimized PostgreSQL RPC functions for both the model list and per-model stats, falling back to standard queries if the RPCs are unavailable.

2.1 Requirements & Pipeline

Authentication & Authorization:

Requires a valid Gatewayz API key with role = 'admin'.
Auth chain: get_api_key → get_current_user → require_admin.

Query Parameters:

Parameter	Type	Default	Description
`provider_id`	int	None	Filter results to models from a specific provider

Response Schema:

{
  "success": true,
  "data": [
    {
      "model_id": 4,
      "model_identifier": "openai/gpt-4o",
      "model_name": "GPT-4o",
      "provider_model_id": "gpt-4o",
      "provider": { "id": 1, "name": "OpenAI", "slug": "openai" },
      "stats": {
        "total_requests": 5250,
        "total_input_tokens": 2625000,
        "total_output_tokens": 1575000,
        "total_tokens": 4200000,
        "avg_processing_time_ms": 1150.5
      }
    }
  ],
  "metadata": {
    "total_models": 12,
    "timestamp": "2026-01-01T00:00:00Z",
    "method": "rpc"
  }
}

Results are sorted by total_requests descending.

Error Codes:

Code	Condition
401	Invalid or missing API key
403	Not an admin
500	Database query failure

2.2 Mermaid Diagram

sequenceDiagram
    participant C as Client
    participant R as Route Handler<br/>get_models_with_requests_admin()
    participant Auth as require_admin
    participant SB as Supabase

    C->>R: GET /admin/monitoring/chat-requests/models?[provider_id]
    R->>Auth: Depends(require_admin)
    Auth-->>R: admin_user

    R->>SB: RPC: get_models_with_requests[_by_provider](provider_id?)
    alt RPC returns data
        SB-->>R: Aggregated model stats
        R-->>C: 200 { success, data, metadata(method=rpc) }
    end

    note over R,SB: Fallback path
    R->>SB: SELECT models + providers<br/>[WHERE provider_id = ?]
    SB-->>R: Model rows with provider info

    loop For each model
        R->>SB: RPC: get_model_request_stats(model_id)
        alt RPC works
            SB-->>R: { total_requests, tokens, avg_latency }
        else RPC fails
            R->>SB: COUNT WHERE model_id = ?
            SB-->>R: count only (no token stats)
        end
        R->>R: Skip model if total_requests == 0
    end

    R->>R: Sort by total_requests DESC
    R-->>C: 200 { success, data, metadata }

2.3 Complete Dependency Map

Category	Name	Location	Purpose
Route file	`admin.py`	`src/routes/admin.py`	Handler
Auth	`require_admin`	`src/security/deps.py`	Admin enforcement
DB client	`get_supabase_client`	`src/config/supabase_config.py`	Supabase client
DB RPC	`get_models_with_requests`	Supabase	Optimized aggregate (no filter)
DB RPC	`get_models_with_requests_by_provider`	Supabase	Optimized aggregate (provider filter)
DB RPC	`get_model_request_stats`	Supabase	Per-model stats (fallback inner loop)
DB table	`models`	Supabase	Model catalog
DB table	`providers`	Supabase	Provider names/slugs
DB table	`chat_completion_requests`	Supabase	COUNT fallback
Framework	`FastAPI`, `Query`, `Depends`	`fastapi`	HTTP layer
Logging	`logging`	stdlib	Debug/error logging

2.4 Side Effects

Read-only. No writes.
No caching. Always fetches live data.
Audit log: Written on auth.
Fallback behavior: If the primary RPCs fail, the endpoint enters a per-model loop issuing individual COUNT and stats queries. With many models this can result in dozens of database roundtrips. RPC failures are logged at DEBUG level.
No notifications or external calls.

Issue: #1623

Deep-Dive API Documentation: GET /admin/model-usage-analytics

Section 1: High-Level Overview

This admin endpoint reads from the model_usage_analytics database view (a pre-aggregated view that combines chat completion requests, models, providers, and pricing data) and returns paginated, searchable, sortable model usage statistics. All aggregation is done at the database view level, making individual queries fast. It supports page-based pagination (not offset-based like other endpoints), case-insensitive partial model name search, and sorting by multiple fields.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614. Note: admin_user is passed as a dependency but not used in the handler body beyond gate enforcement.

Query Parameters:

Parameter	Type	Default	Validation	Description
`page`	int	1	>= 1	Page number (1-based)
`limit`	int	50	1–500	Items per page
`model_name`	str	null	optional	Case-insensitive partial match on model_name
`sort_by`	str	"total_cost_usd"	whitelist	Sort field (invalid values silently default to "total_cost_usd")
`sort_order`	str	"desc"	"asc"/"desc"	Sort direction (invalid values silently default to "desc")

Valid sort_by values: model_name, provider_name, successful_requests, total_cost_usd, avg_cost_per_request_usd, total_input_tokens, total_output_tokens, total_tokens, avg_processing_time_ms, first_request_at, last_request_at

Response (200 OK):

{
  "success": true,
  "data": [
    // Rows from model_usage_analytics view
    // Columns depend on view definition, typically:
    // model_name, provider_name, successful_requests, total_cost_usd,
    // avg_cost_per_request_usd, total_input_tokens, total_output_tokens,
    // total_tokens, avg_processing_time_ms, first_request_at, last_request_at,
    // pricing fields, model metadata
  ],
  "pagination": {
    "page": int,
    "limit": int,
    "total_items": int,
    "total_pages": int,
    "has_next": bool,
    "has_prev": bool,
    "offset": int
  },
  "filters": {
    "model_name": str | null,
    "sort_by": str,
    "sort_order": str
  },
  "metadata": {
    "timestamp": str,
    "items_in_page": int
  }
}

Error codes: 401, 402, 403, 500.

2.2 Mermaid Diagram

flowchart TD
    A([GET /admin/model-usage-analytics]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[Compute offset = page-1 * limit]
    D --> E[get_supabase_client]
    E --> F[SELECT * FROM model_usage_analytics\ncount=exact]
    F --> G{model_name filter?}
    G -->|yes| H[.ilike model_name %value%]
    G -->|no| I
    H --> I{sort_by valid?}
    I -->|invalid| J[Default to total_cost_usd]
    I -->|valid| K
    J --> K{sort_order valid?}
    K -->|invalid| L[Default to desc]
    K -->|valid| M
    L --> M[.order sort_by desc=bool]
    M --> N[.range offset to offset+limit-1]
    N --> O[Execute query]
    O --> P[total_count = result.count]
    P --> Q[Compute total_pages, has_next, has_prev]
    Q --> R[Return 200]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`get_supabase_client`	`src/config/supabase_config.py`	DB	PostgREST client
`model_usage_analytics` view	Supabase	SELECT	`client.table("model_usage_analytics").select("*", count="exact")` with optional `.ilike("model_name", f"%{model_name}%")`, `.order(sort_by, desc=bool)`, `.range(offset, offset+limit-1)`. Count is obtained from `result.count` (part of PostgREST count=exact response).

Security note on sort_by: The handler validates sort_by against a whitelist of allowed field names before passing to .order(). Invalid values silently fall back to "total_cost_usd" rather than raising an error. This prevents SQL injection via the sort field.

Pagination formula:

offset = (page - 1) * limit
total_pages = (total_count + limit - 1) // limit if total_count > 0 else 0
has_next = page < total_pages
has_prev = page > 1

2.4 Side Effects

No database writes.
View-backed: Queries against a pre-aggregated model_usage_analytics view. View refresh/staleness depends on database view type (likely not materialized — queries are live).
No Redis operations.
No direct Prometheus metrics.
Audit log: audit_logger.log_api_key_usage on every authenticated call.
Silent field validation: Invalid sort_by and sort_order values are silently corrected to defaults without returning an error — callers should not rely on validation errors for these params.

Issue: #1624

Deep-Dive API Documentation: POST /admin/limit

Section 1: High-Level Overview

This admin endpoint sets or updates rate limit configuration for a specific user's API key. It writes to the rate_limit_configs table via set_user_rate_limits(), then immediately reads back the saved configuration via get_user_rate_limits() to verify the write and return the current state. If the API key is not found in the database, a 404 is returned.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.

Request Schema (SetRateLimitRequest from src/schemas/admin.py):

{
  "api_key": str,            // The API key string to configure limits for
  "rate_limits": {
    "requests_per_minute": int,  // default 60
    "requests_per_hour": int,    // default 1000
    "requests_per_day": int,     // default 10000
    "tokens_per_minute": int,    // default 10000
    "tokens_per_hour": int,      // default 100000
    "tokens_per_day": int        // default 1000000
  }
}

Pydantic model chain:

SetRateLimitRequest.api_key: str
SetRateLimitRequest.rate_limits: RateLimitConfig
RateLimitConfig fields (all int with defaults as above)

Response (200 OK):

{
  "status": "success",
  "message": "Rate limits updated for user {api_key[:10]}...",
  "rate_limits": {
    "requests_per_minute": int,   // derived: max_requests // 60
    "requests_per_hour": int,     // stored as max_requests
    "requests_per_day": int,      // derived: max_requests * 24
    "tokens_per_minute": int,     // derived: max_tokens // 60
    "tokens_per_hour": int,       // stored as max_tokens
    "tokens_per_day": int         // derived: max_tokens * 24
  }
}

Error codes:

Code	Condition
400	`set_user_rate_limits` raises ValueError (key not found)
401	Invalid/missing credentials
402	Trial expired
403	Not admin
404	`get_user_rate_limits` returns None after write
500	Unexpected exception

2.2 Mermaid Diagram

flowchart TD
    A([POST /admin/limit]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[set_user_rate_limits\nreq.api_key, req.rate_limits.model_dump]
    D --> E[get_supabase_client]
    E --> F[SELECT id FROM api_keys_new\nWHERE api_key=req.api_key]
    F -->|not found| G[raise ValueError\n-> HTTP 400]
    F -->|found| H[api_key_id extracted]
    H --> I[Prepare rate_limit_config:\nmax_requests=requests_per_hour\nmax_tokens=tokens_per_hour\nburst_limit concurrency_limit\nwindow_size=3600]
    I --> J[SELECT id FROM rate_limit_configs\nWHERE api_key_id=X]
    J -->|exists| K[UPDATE rate_limit_configs\nWHERE api_key_id=X]
    J -->|not exists| L[INSERT rate_limit_configs]
    K --> M[get_user_rate_limits req.api_key]
    L --> M
    M --> N[SELECT api_keys_new WHERE api_key\nSELECT rate_limit_configs WHERE api_key_id]
    N -->|no config| O[Return None -> HTTP 404]
    N -->|config| P[Derive minute/hour/day values\nfrom max_requests, max_tokens]
    P --> Q[Return 200 with rate_limits]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`set_user_rate_limits`	`src/db/rate_limits.py:55`	DB write	Async function wrapping sync ops in `asyncio.to_thread`. Looks up api_keys_new, then upserts rate_limit_configs. Stores: max_requests=requests_per_hour, max_tokens=tokens_per_hour, burst_limit (from req or default 100), concurrency_limit (default 50), window_size=3600
`api_keys_new` table	Supabase	SELECT	`SELECT id WHERE api_key=api_key` — to resolve API key string to numeric ID
`rate_limit_configs` table	Supabase	SELECT	Check for existing config: `SELECT id WHERE api_key_id=X`
`rate_limit_configs` table	Supabase	INSERT or UPDATE	Upsert pattern: UPDATE if existing, INSERT if new
`get_user_rate_limits`	`src/db/rate_limits.py:12`	DB read	Synchronous. Reads api_keys_new -> rate_limit_configs to return current limits. Returns None if no config found. Derives: requests_per_minute = max_requests // 60, requests_per_day = max_requests * 24
`SetRateLimitRequest`	`src/schemas/admin.py:49`	Schema	Pydantic model: api_key: str, rate_limits: RateLimitConfig
`RateLimitConfig`	`src/schemas/admin.py:40`	Schema	Pydantic: requests_per_minute=60, requests_per_hour=1000, requests_per_day=10000, tokens_per_minute=10000, tokens_per_hour=100000, tokens_per_day=1000000

Storage note: Only requests_per_hour (stored as max_requests) and tokens_per_hour (stored as max_tokens) are persisted. The per-minute and per-day values visible in the response are derived by dividing or multiplying — they are not stored independently.

2.4 Side Effects

Database write: Upserts a row in rate_limit_configs (INSERT or UPDATE based on existence check).
Rate limiting cache NOT explicitly cleared: The get_rate_limit_manager() LRU cache (in src/services/rate_limiting.py) is NOT cleared by this endpoint. New limits take effect only when the cached manager expires or is cleared via /admin/clear-rate-limit-cache.
No Redis operations.
No direct Prometheus metrics.
Audit log: audit_logger.log_api_key_usage on every authenticated call.
rate_limits.model_dump(): Pydantic v2 call — converts RateLimitConfig to dict for the DB layer.

Issue: #1625

Deep-Dive API Documentation: POST /admin/refresh-providers

Section 1: High-Level Overview

This admin endpoint forces a provider catalog cache refresh by invalidating the in-memory provider cache and then immediately fetching fresh data via get_cached_providers(). The invalidation is handled by invalidate_provider_catalog("providers") from the model_catalog_cache module, which uses a debouncing mechanism to prevent cache thrashing. The fresh data fetch is run in a thread pool via asyncio.to_thread.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.

Request: No body or query parameters.

Response (200 OK):

{
  "status": "success",
  "message": "Provider cache refreshed successfully",
  "total_providers": int,
  "timestamp": str
}

Error codes:

Code	Condition
401	Invalid/missing credentials
402	Trial expired
403	Not admin
500	Cache invalidation or provider fetch failure

2.2 Mermaid Diagram

flowchart TD
    A([POST /admin/refresh-providers]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[invalidate_provider_catalog providers]
    D --> E[InvalidationDebouncer.invalidate\nkey=providers]
    E --> F{Pending timer exists?}
    F -->|yes| G[Cancel existing timer]
    G --> H[Schedule new timer\ndelay=1.0s]
    F -->|no| H
    H --> I[asyncio.to_thread\nget_cached_providers]
    I --> J[get_supabase_client]
    J --> K[SELECT * FROM providers\norder by name]
    K --> L[Store in provider cache\nwith TTL metadata]
    L --> M[Return providers list]
    M --> N[total_providers = len providers]
    N --> O[Return 200]
    D -->|exception| P[HTTP 500\nlogger.error]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`invalidate_provider_catalog`	`src/services/model_catalog_cache.py`	Cache invalidation	Calls `InvalidationDebouncer.invalidate("providers")`. Debounce delay: 1.0 second. Cancels any pending timer for "providers" key, schedules new 1s timer that clears the in-memory provider cache dict.
`get_cached_providers`	`src/services/providers.py`	DB + cache	Fetches providers from Supabase `providers` table. Returns list of provider records. Cache TTL: `PROVIDER_MODELS_CACHE_TTL` = 1800 seconds (30 min).
`asyncio.to_thread`	stdlib	Threading	Wraps synchronous `get_cached_providers` call to avoid blocking the event loop
`InvalidationDebouncer`	`src/services/model_catalog_cache.py`	Debouncing	Thread-safe timer-based debouncer. Uses `threading.Timer`. Prevents cache thrashing from rapid invalidation calls.
`providers` table	Supabase	SELECT	Queried by `get_cached_providers` to fetch all provider records

Cache details:

Cache type: In-memory Python dict (not Redis)
Cache key: "providers"
TTL: 1800 seconds (30 minutes)
Invalidation: Debounced 1-second delay via InvalidationDebouncer
Prometheus metric: catalog_cache_operations_total counter (if initialized) with labels: operation, cache_layer, result

2.4 Side Effects

No direct database writes.
In-memory cache invalidation: Provider cache cleared and repopulated synchronously within this request.
Debounce timer: A 1-second background timer is set for the "providers" cache key. Rapid successive calls will reset the timer.
Prometheus metric (conditional): catalog_cache_operations_total counter incremented if available.
No Redis operations (provider cache is in-memory, not Redis-backed).
Audit log: audit_logger.log_api_key_usage on every authenticated call.

Issue: #1626

Deep-Dive API Documentation: POST /admin/refresh-huggingface-cache

Section 1: High-Level Overview

This admin endpoint clears the in-memory HuggingFace model cache to force a refresh on the next catalog request. It calls invalidate_gateway_catalog("huggingface") from the model_catalog_cache module, which uses the same InvalidationDebouncer mechanism as the provider refresh endpoint. Unlike /refresh-providers, this endpoint does NOT immediately fetch fresh data — it only invalidates the cache. The next incoming request for HuggingFace models will trigger the actual fetch.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.

Request: No body or query parameters.

Response (200 OK):

{
  "message": "Hugging Face cache cleared successfully",
  "timestamp": str
}

Error codes:

Code	Condition
401	Invalid/missing credentials
402	Trial expired
403	Not admin
500	Cache invalidation failure

2.2 Mermaid Diagram

flowchart TD
    A([POST /admin/refresh-huggingface-cache]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[invalidate_gateway_catalog huggingface]
    D --> E[InvalidationDebouncer.invalidate\nkey=huggingface]
    E --> F{Pending timer for huggingface?}
    F -->|yes| G[Cancel existing timer]
    G --> H[Schedule new 1s timer\nclears huggingface cache entry]
    F -->|no| H
    H --> I{Exception?}
    I -->|yes| J[HTTP 500\nlogger.error]
    I -->|no| K[Return 200\n{message, timestamp}]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`invalidate_gateway_catalog`	`src/services/model_catalog_cache.py`	Cache invalidation	Calls `InvalidationDebouncer.invalidate("huggingface")`. Uses 1-second debounce delay. Clears the "huggingface" entry from the in-memory gateway catalog cache dict.
`InvalidationDebouncer`	`src/services/model_catalog_cache.py`	Debouncing	Thread-safe `threading.Timer`-based debouncer. Cancels and reschedules on rapid calls.

Key differences from /admin/refresh-providers:

This endpoint only INVALIDATES — it does NOT fetch fresh data immediately
The next organic request for HuggingFace catalog will trigger the lazy-load fetch
Response body uses "message" key instead of "status" key (inconsistency in admin API)

Cache details:

Cache type: In-memory Python dict (not Redis)
Cache key: "huggingface" in gateway catalog cache
TTL: CATALOG_RESPONSE_CACHE_TTL = 300 seconds (5 min) or PROVIDER_MODELS_CACHE_TTL = 1800s depending on cache tier
Invalidation: Debounced 1-second delay

2.4 Side Effects

No database writes.
In-memory cache invalidation: "huggingface" entry removed from gateway catalog cache after 1-second debounce.
Lazy refresh: Next HuggingFace catalog request after invalidation will incur full fetch latency (500ms–2s) as cache is cold.
Debounce timer: 1-second background timer for "huggingface" key.
No Redis operations.
No direct Prometheus metrics (middleware metrics apply).
Audit log: audit_logger.log_api_key_usage on every authenticated call.

Issue: #1627

Deep-Dive API Documentation: POST /admin/clear-rate-limit-cache

Section 1: High-Level Overview

This admin endpoint clears the in-memory rate limit configuration cache held by the RateLimitManager service, forcing the next rate-limited request to reload limits from the database. It directly accesses the RateLimitManager singleton via get_rate_limit_manager() (which is LRU-cached), clears its key_configs dict, and then calls cache_clear() on the LRU cache itself to fully reset the manager reference. This ensures both the per-key config cache and the manager singleton are reset.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.

Request: No body or query parameters.

Response (200 OK):

{
  "status": "success",
  "message": "Rate limit cache cleared successfully. New requests will reload configuration.",
  "timestamp": str
}

Error codes:

Code	Condition
401	Invalid/missing credentials
402	Trial expired
403	Not admin
500	Exception during cache clearing

2.2 Mermaid Diagram

flowchart TD
    A([POST /admin/clear-rate-limit-cache]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[Import get_rate_limit_manager\nfrom src.services.rate_limiting]
    D --> E[manager = get_rate_limit_manager]
    E --> F{manager is not None?}
    F -->|yes| G[manager.key_configs.clear\nclear all cached per-key configs]
    G --> H[logger.info Cleared rate limit manager key_configs cache]
    F -->|no| H
    H --> I[get_rate_limit_manager.cache_clear\nclear LRU cache reference]
    I --> J{Exception?}
    J -->|yes| K[HTTP 500\nlogger.error\nf-string with str e]
    J -->|no| L[Return 200]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`get_rate_limit_manager`	`src/services/rate_limiting.py`	LRU-cached function	Decorated with `@lru_cache`. Returns the singleton `RateLimitManager` instance.
`manager.key_configs`	`src/services/rate_limiting.py`	In-memory dict	Dict mapping API key strings to their `RateLimitConfig` dataclass instances. Cleared by `.clear()`.
`get_rate_limit_manager.cache_clear()`	`src/services/rate_limiting.py`	LRU cache clear	Python's built-in `functools.lru_cache` cache_clear method. Removes the cached return value so next call to `get_rate_limit_manager()` creates a fresh instance.

Rate limiting manager structure (from src/services/rate_limiting.py):

RateLimitManager: Manages per-key sliding window rate limits using Redis
key_configs: dict[str, RateLimitConfig]: In-memory per-key config cache loaded from DB
RateLimitConfig dataclass: requests_per_minute=250, requests_per_hour=1000, requests_per_day=10000, tokens_per_minute=10000, tokens_per_hour=100000, tokens_per_day=1000000, burst_limit=100, concurrency_limit=50, window_size_seconds=60

Two-level cache clearing:

manager.key_configs.clear() — removes all cached per-key rate limit configs (loaded from DB by get_rate_limit_config() in src/db/rate_limits.py)
get_rate_limit_manager.cache_clear() — removes the LRU-cached manager reference, causing next get_rate_limit_manager() call to instantiate a fresh RateLimitManager

2.4 Side Effects

No database writes.
In-memory cache cleared: RateLimitManager.key_configs dict emptied.
LRU cache reset: get_rate_limit_manager LRU cache cleared — next rate-limited request will instantiate a new RateLimitManager and reload configs from DB.
Performance impact: First few requests after clearing will incur DB lookup latency for rate limit config (~5–20ms per key).
No Redis operations (Redis stores rate limit counters, not configs — those remain intact).
No direct Prometheus metrics.
Audit log: audit_logger.log_api_key_usage on every authenticated call.
Rate limiting not disrupted: Existing Redis sliding window counters are unaffected. Only the in-memory config cache is cleared. Active connections continue to be tracked.

Issue: #1628

Deep-Dive API Documentation: DELETE /admin/users/by-domain/{domain}

Section 1: High-Level Overview

This admin endpoint deletes all user accounts whose email address matches a given domain suffix. It has a critical safety mechanism: dry_run=true (the default) performs a preview-only operation returning which users would be deleted without actually deleting them. Six major email providers (gmail.com, yahoo.com, outlook.com, hotmail.com, icloud.com, protonmail.com) are permanently protected from deletion. When dry_run=false, it deletes users one-by-one in a loop, continuing past individual failures.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.

Path parameter: domain: str — email domain (e.g., "spam-domain.org")

Query parameter:

Parameter	Type	Default	Description
`dry_run`	bool	true	If true: preview only, no deletions

Handler-level validation (raises HTTP 400):

Domain is normalized: .lower().strip()
Domain in protected set: {gmail.com, yahoo.com, outlook.com, hotmail.com, icloud.com, protonmail.com} → HTTP 400

Supabase query for finding users: SELECT id, email, username, created_at, credits FROM users WHERE email ILIKE '%@{domain}'

Response (200 OK — dry_run=true):

{
  "status": "success",
  "message": "DRY RUN: Would delete N users from domain: {domain}",
  "dry_run": true,
  "count": int,
  "users": [{ "id": int, "email": str, "username": str, "created_at": str, "credits": float }],
  "timestamp": str
}

Response (200 OK — dry_run=false):

{
  "status": "success",
  "message": "Deleted N users from domain: {domain}",
  "dry_run": false,
  "count": int,            // successful deletions
  "failed": [{ "id": int, "error": str }],
  "users": [{ ... }],     // all matching users (including failed deletions)
  "timestamp": str
}

Response (200 OK — no users found):

{
  "status": "success",
  "message": "No users found with email domain: {domain}",
  "dry_run": bool,
  "count": 0,
  "users": [],
  "timestamp": str
}

Error codes:

Code	Condition
400	Domain is in the protected domains set
401	Invalid/missing credentials
402	Trial expired
403	Not admin
500	Supabase query failure

2.2 Mermaid Diagram

flowchart TD
    A([DELETE /admin/users/by-domain/domain]) --> B[require_admin]
    B -->|fail| C[401/402/403]
    B -->|OK| D[Normalize: domain.lower.strip]
    D --> E{domain in protected_domains?}
    E -->|yes| F[HTTP 400 Cannot delete protected domain]
    E -->|no| G[SELECT id email username created_at credits\nFROM users WHERE email ILIKE %@domain]
    G --> H{users_to_delete empty?}
    H -->|empty| I[Return 200 count=0 empty list]
    H -->|found| J[Build user_summary list]
    J --> K{dry_run == true?}
    K -->|yes| L[logger.info DRY RUN log\nReturn 200 with user list\nno deletions]
    K -->|no| M[For each user in users_to_delete]
    M --> N[DELETE FROM users WHERE id=user.id]
    N -->|success| O[deleted_count += 1\nlogger.info user deleted]
    N -->|exception| P[logger.error\nfailed_deletions.append id+error]
    O --> Q{more users?}
    P --> Q
    Q -->|yes| M
    Q -->|done| R[Return 200 with count+failed+users]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`require_admin`	`src/security/deps.py:220`	Auth	Full chain
`get_supabase_client`	`src/config/supabase_config.py`	DB	PostgREST client
`users` table SELECT	Supabase	SELECT	`client.table("users").select("id, email, username, created_at, credits").ilike("email", f"%@{domain}")` — case-insensitive suffix match
`users` table DELETE	Supabase	DELETE	Per-user: `client.table("users").delete().eq("id", user["id"]).execute()` — individual DELETE per user in a loop

Protected domains (hardcoded set): gmail.com, yahoo.com, outlook.com, hotmail.com, icloud.com, protonmail.com

Cascade behavior: When a user row is deleted, any foreign-key-constrained child rows (api_keys_new, credit_transactions, chat_history, etc.) are deleted or nullified depending on Supabase/PostgreSQL cascade rules. The handler itself does not explicitly handle cascade.

2.4 Side Effects

When dry_run=false:

Database deletes: One DELETE per matching user in users table. Cascade rules apply to related tables.
Audit trail via logger.info: Each successful deletion is logged with user_id, email, domain, and admin_id.
No atomic transaction: Deletions are done in a loop. Partial failures leave some users deleted and others intact. The failed list captures which user IDs could not be deleted.

Always:

Audit log: audit_logger.log_api_key_usage on every authenticated call.
No Redis operations (no cache invalidation of deleted users from any cache).
No direct Prometheus metrics.
Potential data loss: This operation is irreversible when dry_run=false. The dry_run=true default is a critical safety feature.

Issue: #1634

Deep-Dive API Documentation: GET /admin/monitoring/api-key-tracking-quality

Handler: `get_api_key_tracking_quality()` — `src/routes/api_key_monitoring.py`

Overview

Admin-only endpoint that analyzes the quality of API key tracking in chat_completion_requests. It queries how many requests have a non-null api_key_id vs. null, and produces recommendations based on thresholds.

Authentication

Type: HTTP Bearer (Authorization: Bearer <token>)

Dependency: get_admin_key() in src/security/deps.py

Reads ADMIN_API_KEY from environment variable
Uses secrets.compare_digest() for constant-time comparison (timing-attack-safe)
Input validation: key must be non-empty and at least 10 characters (ensure_api_key_like)
On failure: logs INVALID_ADMIN_KEY_ATTEMPT to audit logger
Returns HTTP 401 if missing, invalid, or key not configured

Query Parameters

Parameter	Type	Default	Validation	Description
`hours`	`int`	`24`	`ge=1, le=168`	Time window in hours (1–168)

Supabase Queries (All on table: `chat_completion_requests`)

Query 1 — Total requests in window:

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>

Query 2 — Requests with api_key_id (non-null):

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NOT NULL

Query 3 — Requests without api_key_id (null):

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NULL

Query 4 — Null api_key_id but valid user_id (authenticated but untracked):

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NULL AND user_id IS NOT NULL

Query 5 — Both null (anonymous traffic):

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NULL AND user_id IS NULL

All queries use count="exact" for PostgreSQL COUNT aggregation. Count is accessed via result.count.

Business Logic

Tracking rate calculation:

tracking_rate = round((requests_with_key / total_requests) * 100, 2) if total_requests > 0 else 0

Alert status thresholds:

"ok" — tracking_rate >= 90%
"warning" — 70% <= tracking_rate < 90%
"critical" — tracking_rate < 70%

Recommendation triggers:

null_key_valid_user > 0 → API key lookup failure warning
both_null > total_requests * 0.2 → High anonymous traffic warning (>20% threshold)
tracking_rate < 90 → General tracking below threshold warning
All good → "No action needed"

Response Schema

{
  "total_requests": 1500,
  "requests_with_api_key": 1425,
  "requests_without_api_key": 75,
  "tracking_rate_percent": 95.0,
  "breakdown": {
    "null_key_with_valid_user": 20,
    "both_null_likely_anonymous": 55,
    "null_key_with_valid_user_percent": 1.33,
    "both_null_percent": 3.67
  },
  "time_window": {
    "hours": 24,
    "start_time": "2026-03-03T12:00:00+00:00",
    "end_time": "2026-03-04T12:00:00+00:00"
  },
  "alert_status": "ok",
  "recommendations": ["API key tracking quality is good. No action needed."]
}

Error Handling

Scenario	Behavior
Any unhandled exception	`logger.error()` with `exc_info=True`, returns dict with `"error"` key and `"alert_status": "error"`
Admin key missing/invalid	`HTTP 401` raised before handler executes
No data in time window	Returns zeros, `tracking_rate_percent: 0`, `alert_status: "ok"`

Error response shape (does NOT raise HTTPException — returns 200 with error info):

{
  "error": "...",
  "total_requests": 0,
  "requests_with_api_key": 0,
  "requests_without_api_key": 0,
  "tracking_rate_percent": 0,
  "alert_status": "error",
  "recommendations": ["Failed to retrieve tracking quality metrics. Check logs."]
}

No Redis / No Prometheus Metrics

This endpoint does not use Redis caching or emit Prometheus metrics. It performs direct Supabase queries on every call.

Router Registration

router = APIRouter(prefix="/admin/monitoring", tags=["Admin", "Monitoring"])
# Full path: GET /admin/monitoring/api-key-tracking-quality

Issue: #1635

Deep-Dive API Documentation: GET /admin/monitoring/api-key-tracking-trend

Handler: `get_api_key_tracking_trend()` — `src/routes/api_key_monitoring.py`

Overview

Admin-only endpoint that provides a daily time-series breakdown of API key tracking quality over a configurable number of days. Iterates day-by-day using a loop, performing 2 Supabase queries per day.

Authentication

Type: HTTP Bearer (Authorization: Bearer <token>)

Dependency: get_admin_key() in src/security/deps.py

Reads ADMIN_API_KEY from environment variable
Uses secrets.compare_digest() for constant-time comparison
Returns HTTP 401 if missing, invalid, or environment variable not set
Logs INVALID_ADMIN_KEY_ATTEMPT security violation to audit logger on failure

Query Parameters

Parameter	Type	Default	Validation	Description
`days`	`int`	`7`	`ge=1, le=30`	Number of days to analyze (1–30)

Supabase Queries

Per-day loop — for each of days iterations:

Query A — Total requests for day N:

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <day_start> AND created_at < <day_end>

Query B — Requests with api_key_id for day N:

SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <day_start> AND created_at < <day_end>
AND api_key_id IS NOT NULL

Date ranges use gte/lt (NOT lte) so days are non-overlapping
Both use count="exact" mode
Total Supabase queries = 2 × days (up to 60 queries for days=30)

Business Logic

Per-day tracking rate:

tracking_rate = round((with_key / total) * 100, 2) if total > 0 else 0

Summary calculation (post-loop aggregation):

total_all = sum(d["total_requests"] for d in trend_data)
with_key_all = sum(d["requests_with_api_key"] for d in trend_data)
avg_tracking_rate = round((with_key_all / total_all) * 100, 2) if total_all > 0 else 0

Response Schema

{
  "trend_data": [
    {
      "date": "2026-02-26",
      "total_requests": 500,
      "requests_with_api_key": 490,
      "tracking_rate_percent": 98.0
    },
    {
      "date": "2026-02-27",
      "total_requests": 620,
      "requests_with_api_key": 600,
      "tracking_rate_percent": 96.77
    }
  ],
  "summary": {
    "period_days": 7,
    "total_requests": 3500,
    "requests_with_api_key": 3400,
    "average_tracking_rate_percent": 97.14,
    "start_date": "2026-02-25",
    "end_date": "2026-03-04"
  }
}

trend_data array is chronologically ordered, oldest day first (day_offset 0 = start_time, day_offset N-1 = most recent).

Error Handling

Scenario	Behavior
Any unhandled exception	`logger.error()`, returns 200 with error dict
Empty database	Returns trend_data with zeros per day, summary zeros
Admin key invalid	`HTTP 401` before handler executes

Error response shape (200 status, not an exception):

{
  "error": "...",
  "trend_data": [],
  "summary": {
    "period_days": 7,
    "total_requests": 0,
    "requests_with_api_key": 0,
    "average_tracking_rate_percent": 0
  }
}

Performance Note

For days=30, this endpoint fires 60 synchronous Supabase HTTP calls sequentially. There is no batching, parallelism, or caching. Large time windows on tables with high row counts may be slow.

No Redis / No Prometheus Metrics

This endpoint does not use Redis caching or emit Prometheus metrics. All computation is in-process after direct DB queries.

Router Registration

router = APIRouter(prefix="/admin/monitoring", tags=["Admin", "Monitoring"])
# Full path: GET /admin/monitoring/api-key-tracking-trend

Issue: #1719

API Endpoint Documentation: GET /admin/coupons

Overview

Handler: list_coupons_endpoint() in src/routes/coupons.py (line 205) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Query Parameters

Param	Type	Default	Description
`scope`	`str \| None`	`None`	Filter by coupon_scope ("user_specific" or "global")
`coupon_type`	`str \| None`	`None`	Filter by coupon type
`is_active`	`bool \| None`	`None`	Filter by active status
`limit`	`int`	`100`	Max results
`offset`	`int`	`0`	Pagination offset

Response: `ListCouponsResponse`

Field	Type	Description
`coupons`	`list[CouponResponse]`	List of coupon records
`total`	`int`	Count of returned coupons (not total in DB)
`offset`	`int`	Current offset
`limit`	`int`	Applied limit

`CouponResponse`

Field	Type	Default
`id`	`int`	-
`code`	`str`	-
`value_usd`	`float`	-
`coupon_scope`	`str`	-
`coupon_type`	`str`	-
`max_uses`	`int`	-
`times_used`	`int`	-
`valid_from`	`datetime`	-
`valid_until`	`datetime`	-
`is_active`	`bool`	-
`created_at`	`datetime`	-
`assigned_to_user_id`	`int \| None`	`None`
`created_by`	`int \| None`	`None`
`created_by_type`	`str`	-
`description`	`str \| None`	`None`

Dependency Trace (3+ levels deep)

list_coupons_endpoint(scope, coupon_type, is_active, limit, offset, user)
├── Depends(require_admin)                       # src/security/deps.py:220
│   └── Depends(get_current_user)
│       └── (auth chain: get_api_key → validate_api_key_security → get_user → validate_trial)
│       └── Check is_admin or role=="admin"
│           └── If not admin → 403 + audit log
├── list_coupons(scope, coupon_type, is_active,  # src/db/coupons.py:135
│     limit, offset)
│   ├── get_supabase_client()
│   └── client.table("coupons").select("*")
│       + conditional .eq("coupon_scope", scope)
│       + conditional .eq("coupon_type", coupon_type)
│       + conditional .eq("is_active", is_active)
│       + .order("created_at", desc=True)
│       + .range(offset, offset + limit - 1)
│       + .execute()
└── Return ListCouponsResponse with [CouponResponse(**c)]

Supabase Queries

Operation	Table	Columns	Filters	Order	Pagination
SELECT	`coupons`	`*`	Optional: `coupon_scope`, `coupon_type`, `is_active`	`created_at DESC`	`.range(offset, offset+limit-1)`

Redis Operations

None directly.

Prometheus Metrics

None.

Middleware Effects

Standard middleware pipeline
Bearer token authentication + admin role verification
Audit log on unauthorized admin access attempt

Error Handling

Error Path	Status Code	Detail
Auth failures	401/402/403/404/429	Various auth errors
Non-admin user	403	"Administrator privileges required" + audit log
Supabase query error	500	"Internal server error"

On Supabase error in list_coupons, returns [] (empty list), so the endpoint would return an empty list rather than 500.

Mermaid Diagram

flowchart TD
    A[GET /admin/coupons] --> B[require_admin dependency]
    B -->|Not admin| B1[403 Admin required + audit log]
    B -->|Auth fail| B2[401/402/404/429]
    B -->|Admin| C{try block}
    C --> D[list_coupons with filters]
    D --> E[Build Supabase query]
    E --> F{scope filter?}
    F -->|Yes| F1[.eq coupon_scope]
    F -->|No| G{coupon_type filter?}
    F1 --> G
    G -->|Yes| G1[.eq coupon_type]
    G -->|No| H{is_active filter?}
    G1 --> H
    H -->|Yes| H1[.eq is_active]
    H -->|No| I[.order + .range pagination]
    H1 --> I
    I --> J[Execute query]
    J --> K[Map to CouponResponse list]
    K --> L[Return ListCouponsResponse]
    C -->|HTTPException| M[Re-raise]
    C -->|Other Exception| N[500 Internal server error]

Issue: #1720

API Endpoint Documentation: GET /admin/coupons/stats/overview

Overview

Handler: get_coupon_stats_endpoint() in src/routes/coupons.py (line 342) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Response: `CouponStatsResponse`

Field	Type	Description
`total_coupons`	`int`	Total number of coupons in system
`active_coupons`	`int`	Currently active coupons
`user_specific_coupons`	`int`	User-specific scoped coupons
`global_coupons`	`int`	Globally scoped coupons
`total_redemptions`	`int`	Total number of redemptions
`unique_redeemers`	`int`	Unique users who redeemed
`total_value_distributed`	`float`	Total USD distributed
`average_redemption_value`	`float`	Average redemption value

Dependency Trace (3+ levels deep)

get_coupon_stats_endpoint(user)
├── Depends(require_admin)                       # (admin auth chain)
├── get_all_coupons_stats()                      # src/db/coupons.py:557
│   ├── get_supabase_client()
│   ├── client.table("coupons").select("*").execute()
│   │   └── Fetches ALL coupons (no pagination!)
│   ├── client.table("coupon_redemptions").select("*").execute()
│   │   └── Fetches ALL redemptions (no pagination!)
│   └── Aggregations:
│       ├── Filter active coupons (is_active=True)
│       ├── Filter by coupon_scope ("user_specific" vs "global")
│       ├── Sum value_applied across all redemptions
│       ├── Count unique user_ids in redemptions
│       └── Calculate average_redemption_value
└── Return CouponStatsResponse(**stats)

Supabase Queries

Operation	Table	Columns	Filters	Notes
SELECT	`coupons`	`*`	None	Fetches all rows
SELECT	`coupon_redemptions`	`*`	None	Fetches all rows

Performance Warning: Both queries fetch all rows without pagination. This could be slow with large datasets.

Redis Operations

None.

Prometheus Metrics

None.

Middleware Effects

Standard middleware pipeline
Admin authentication required

Error Handling

Error Path	Status Code	Detail
Auth failures	401/402/403/404/429	Various auth errors
Non-admin	403	"Administrator privileges required"
Supabase error in `get_all_coupons_stats`	Returns `{}`	Empty dict, then Pydantic validation fails → 500
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[GET /admin/coupons/stats/overview] --> B[require_admin dependency]
    B -->|Not admin| B1[403 Admin required]
    B -->|Auth fail| B2[401/402/404/429]
    B -->|Admin| C{try block}
    C --> D[get_all_coupons_stats]
    D --> E[SELECT * FROM coupons]
    E --> F[SELECT * FROM coupon_redemptions]
    F --> G[Filter active coupons]
    G --> H[Filter by scope: user_specific vs global]
    H --> I[Sum total_value_distributed]
    I --> J[Count unique redeemers]
    J --> K[Calculate average_redemption_value]
    K --> L[Return CouponStatsResponse]
    C -->|HTTPException| M[Re-raise]
    C -->|Other Exception| N[500 Internal server error]

Issue: #1721

API Endpoint Documentation: GET /admin/coupons/{coupon_id}

Overview

Handler: get_coupon_endpoint() in src/routes/coupons.py (line 244) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Path Parameters

Param	Type	Description
`coupon_id`	`int`	Coupon ID to retrieve

Response: `CouponResponse`

Field	Type	Default
`id`	`int`	-
`code`	`str`	-
`value_usd`	`float`	-
`coupon_scope`	`str`	-
`coupon_type`	`str`	-
`max_uses`	`int`	-
`times_used`	`int`	-
`valid_from`	`datetime`	-
`valid_until`	`datetime`	-
`is_active`	`bool`	-
`created_at`	`datetime`	-
`assigned_to_user_id`	`int \| None`	`None`
`created_by`	`int \| None`	`None`
`created_by_type`	`str`	-
`description`	`str \| None`	`None`

Dependency Trace (3+ levels deep)

get_coupon_endpoint(coupon_id, user)
├── Depends(require_admin)                    # (admin auth chain)
├── get_coupon_by_id(coupon_id)               # src/db/coupons.py:118
│   ├── get_supabase_client()
│   └── client.table("coupons")
│       .select("*")
│       .eq("id", coupon_id)
│       .execute()
└── If None → 404; else Return CouponResponse(**coupon)

Supabase Queries

Operation	Table	Columns	Filters
SELECT	`coupons`	`*`	`.eq("id", coupon_id)`

Redis Operations

None.

Prometheus Metrics

None.

Middleware Effects

Standard middleware pipeline
Admin authentication required

Error Handling

Error Path	Status Code	Detail
Auth failures	401/402/403/404/429	Various auth errors
Non-admin	403	"Administrator privileges required"
Coupon not found	404	"Coupon not found"
Supabase error in `get_coupon_by_id`	Returns `None` → 404	Logs error, returns None
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[GET /admin/coupons/coupon_id] --> B[require_admin dependency]
    B -->|Not admin| B1[403 Admin required]
    B -->|Admin| C{try block}
    C --> D[get_coupon_by_id]
    D --> E[SELECT * FROM coupons WHERE id = coupon_id]
    E --> F{Coupon found?}
    F -->|No| G[404 Coupon not found]
    F -->|Yes| H[Return CouponResponse]
    C -->|HTTPException| I[Re-raise]
    C -->|Other Exception| J[500 Internal server error]

Issue: #1722

API Endpoint Documentation: GET /admin/coupons/{coupon_id}/analytics

Overview

Handler: get_coupon_analytics_endpoint() in src/routes/coupons.py (line 314) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Path Parameters

Param	Type	Description
`coupon_id`	`int`	Coupon ID to get analytics for

Response: `CouponAnalyticsResponse`

Field	Type	Description
`coupon`	`CouponResponse`	Full coupon details
`total_redemptions`	`int`	Total redemptions for this coupon
`unique_users`	`int`	Unique users who redeemed
`total_value_distributed`	`float`	Total USD distributed
`redemption_rate`	`float`	% of max_uses consumed
`remaining_uses`	`int`	max_uses - times_used
`is_expired`	`bool`	Whether valid_until has passed

Dependency Trace (3+ levels deep)

get_coupon_analytics_endpoint(coupon_id, user)
├── Depends(require_admin)                       # (admin auth chain)
├── get_coupon_analytics(coupon_id)               # src/db/coupons.py:509
│   ├── get_coupon_by_id(coupon_id)               # src/db/coupons.py:118
│   │   ├── get_supabase_client()
│   │   └── SELECT * FROM coupons WHERE id = coupon_id
│   ├── If coupon not found → return {}
│   ├── get_supabase_client()
│   ├── client.table("coupon_redemptions")
│   │   .select("*")
│   │   .eq("coupon_id", coupon_id)
│   │   .execute()
│   └── Compute:
│       ├── total_value_distributed = sum(value_applied)
│       ├── unique_users = len(set(user_id))
│       ├── redemption_rate = (count / max_uses * 100)
│       ├── remaining_uses = max_uses - times_used
│       ├── is_expired = valid_until < now(UTC)
│       └── recent_redemptions = last 10 (not exposed in response)
└── Return CouponAnalyticsResponse

Supabase Queries

Operation	Table	Columns	Filters
SELECT	`coupons`	`*`	`.eq("id", coupon_id)`
SELECT	`coupon_redemptions`	`*`	`.eq("coupon_id", coupon_id)`

Redis Operations

None.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth failures	401/402/403/404/429	Various auth errors
Non-admin	403	"Administrator privileges required"
Coupon not found (`get_coupon_analytics` returns `{}`)	404	"Coupon not found"
Supabase error	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[GET /admin/coupons/coupon_id/analytics] --> B[require_admin dependency]
    B -->|Not admin| B1[403]
    B -->|Admin| C{try block}
    C --> D[get_coupon_analytics]
    D --> E[get_coupon_by_id]
    E --> F{Coupon found?}
    F -->|No| G[Return empty dict]
    F -->|Yes| H[SELECT * FROM coupon_redemptions WHERE coupon_id]
    H --> I[Sum value_applied]
    I --> J[Count unique user_ids]
    J --> K[Calculate redemption_rate]
    K --> L[Check is_expired]
    L --> M[Return analytics dict]
    G --> N{analytics empty?}
    M --> N
    N -->|Empty| O[404 Coupon not found]
    N -->|Has data| P[Return CouponAnalyticsResponse]
    C -->|HTTPException| Q[Re-raise]
    C -->|Other Exception| R[500 Internal server error]

Issue: #1724

API Endpoint Documentation: POST /admin/coupons

Overview

Handler: create_coupon_endpoint() in src/routes/coupons.py (line 163) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Request: `CreateCouponRequest`

Field	Type	Default	Validation
`code`	`str`	required	`min_length=3`, `max_length=50`, must be alphanumeric (hyphens/underscores OK), uppercased
`value_usd`	`float`	required	`gt=0`, `le=1000`
`coupon_scope`	`CouponScope`	required	`"user_specific"` or `"global"`
`max_uses`	`int`	required	`gt=0`; if user_specific, must be 1
`valid_until`	`datetime`	required	Expiration date
`coupon_type`	`CouponType`	`"promotional"`	`"promotional"`, `"referral"`, `"compensation"`, `"partnership"`
`assigned_to_user_id`	`int \| None`	`None`	Required for user_specific, forbidden for global
`description`	`str \| None`	`None`	`max_length=500`
`valid_from`	`datetime \| None`	`None`	Defaults to now

Validators:

code_must_be_alphanumeric: Strips non-alphanumeric (except - and _), uppercases
validate_user_assignment: Cross-validates scope vs assigned_to_user_id
validate_max_uses: user_specific requires max_uses=1

Response: `CouponResponse` (same as #1721)

Dependency Trace (3+ levels deep)

create_coupon_endpoint(coupon_request, user)
├── Depends(require_admin)                         # (admin auth chain)
├── create_coupon(...)                              # src/db/coupons.py:20
│   ├── get_supabase_client()
│   ├── Validate scope + assignment:
│   │   ├── user_specific without assigned_to → ValueError
│   │   ├── global with assigned_to → ValueError
│   │   └── user_specific with max_uses != 1 → ValueError
│   ├── Prepare coupon_data dict:
│   │   ├── code → uppercased
│   │   ├── value_usd, coupon_scope, max_uses, coupon_type
│   │   ├── created_by_type, valid_until, valid_from (default: now)
│   │   ├── conditional: created_by, assigned_to_user_id, description
│   │   └── Note: code stored UPPERCASED
│   └── client.table("coupons").insert(coupon_data).execute()
├── If result is None → 500 "Failed to create coupon"
└── Return CouponResponse(**coupon)

Supabase Queries

Operation	Table	Columns Inserted	Notes
INSERT	`coupons`	`code, value_usd, coupon_scope, max_uses, coupon_type, created_by_type, valid_until, valid_from, [created_by, assigned_to_user_id, description]`	Code uppercased; unique constraint on code likely enforced at DB level

Redis Operations

None.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Pydantic validation (code format, scope rules)	422	Automatic
Scope/assignment ValueError in `create_coupon`	400	Error message
Insert returns None	500	"Failed to create coupon"
Duplicate code (DB constraint)	500 via exception	"Internal server error"
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[POST /admin/coupons] --> B[Pydantic validation]
    B -->|Invalid code/scope/max_uses| B1[422]
    B -->|Valid| C[require_admin]
    C -->|Not admin| C1[403]
    C -->|Admin| D{try block}
    D --> E[create_coupon]
    E --> F[Validate scope + assignment rules]
    F -->|Invalid| F1[ValueError → 400]
    F -->|Valid| G[Prepare coupon_data]
    G --> H[INSERT INTO coupons]
    H --> I{Insert result?}
    I -->|None| J[500 Failed to create]
    I -->|Data| K[Return CouponResponse]
    D -->|HTTPException| L[Re-raise]
    D -->|ValueError| M[400 error detail]
    D -->|Other| N[500 Internal server error]

Issue: #1725

API Endpoint Documentation: PATCH /admin/coupons/{coupon_id}

Overview

Handler: update_coupon_endpoint() in src/routes/coupons.py (line 262) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Path Parameters

Param	Type	Description
`coupon_id`	`int`	Coupon ID to update

Request: `UpdateCouponRequest`

Field	Type	Default	Validation
`valid_until`	`datetime \| None`	`None`	Optional new expiration
`max_uses`	`int \| None`	`None`	`gt=0`
`is_active`	`bool \| None`	`None`	Toggle active status
`description`	`str \| None`	`None`	`max_length=500`

All fields are optional; only set fields are included via exclude_unset=True.

Response: `CouponResponse` (same as #1721)

Dependency Trace (3+ levels deep)

update_coupon_endpoint(coupon_id, update_request, user)
├── Depends(require_admin)                     # (admin auth chain)
├── update_request.dict(exclude_unset=True)    # Only fields explicitly set
│   └── If empty → 400 "No fields to update"
├── update_coupon(coupon_id, updates)           # src/db/coupons.py:192
│   ├── get_supabase_client()
│   ├── Filter updates to allowed_fields only:
│   │   └── ["valid_until", "max_uses", "is_active", "description"]
│   │   └── Any other fields silently dropped
│   ├── If no valid fields after filtering → ValueError
│   └── client.table("coupons")
│       .update(filtered_updates)
│       .eq("id", coupon_id)
│       .execute()
└── If None → 404; else Return CouponResponse

Supabase Queries

Operation	Table	Columns Updated	Filters
UPDATE	`coupons`	Only allowed: `valid_until`, `max_uses`, `is_active`, `description`	`.eq("id", coupon_id)`

Security Note: The DB layer enforces an allowlist of updatable fields. Even if extra fields are sent in the request, they are silently dropped by update_coupon().

Redis Operations

None.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
No fields set in request	400	"No fields to update"
No valid fields after allowlist filter	raises ValueError	Caught by exception handler
Coupon not found (update returns None)	404	"Coupon not found or update failed"
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[PATCH /admin/coupons/coupon_id] --> B[require_admin]
    B -->|Not admin| B1[403]
    B -->|Admin| C{try block}
    C --> D[update_request.dict exclude_unset]
    D --> E{Any updates?}
    E -->|No| F[400 No fields to update]
    E -->|Yes| G[update_coupon]
    G --> H[Filter to allowed fields]
    H --> I{Valid fields remain?}
    I -->|No| J[ValueError raised]
    I -->|Yes| K[UPDATE coupons SET ... WHERE id = coupon_id]
    K --> L{Update result?}
    L -->|None| M[404 Not found or update failed]
    L -->|Data| N[Return CouponResponse]
    C -->|HTTPException| O[Re-raise]
    C -->|Other| P[500 Internal server error]

Issue: #1726

API Endpoint Documentation: DELETE /admin/coupons/{coupon_id}

Overview

Handler: deactivate_coupon_endpoint() in src/routes/coupons.py (line 291) Tags: ["admin", "coupons"] Authentication: Required - require_admin (admin role) Note: This is a soft delete -- it deactivates the coupon rather than removing it.

Pydantic Schemas

Path Parameters

Param	Type	Description
`coupon_id`	`int`	Coupon ID to deactivate

Response

{"success": True, "message": "Coupon deactivated successfully"}

Dependency Trace (3+ levels deep)

deactivate_coupon_endpoint(coupon_id, user)
├── Depends(require_admin)                    # (admin auth chain)
├── deactivate_coupon(coupon_id)              # src/db/coupons.py:228
│   ├── get_supabase_client()
│   └── client.table("coupons")
│       .update({"is_active": False})
│       .eq("id", coupon_id)
│       .execute()
│       └── Returns True if data returned, False otherwise
└── If False → 404; else Return success dict

Supabase Queries

Operation	Table	Columns Updated	Filters
UPDATE	`coupons`	`is_active = False`	`.eq("id", coupon_id)`

Redis Operations

None.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Coupon not found or already inactive	404	"Coupon not found or already inactive"
Supabase error in `deactivate_coupon`	Returns `False` → 404	Logs error
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[DELETE /admin/coupons/coupon_id] --> B[require_admin]
    B -->|Not admin| B1[403]
    B -->|Admin| C{try block}
    C --> D[deactivate_coupon]
    D --> E[UPDATE coupons SET is_active=False WHERE id]
    E --> F{Update returned data?}
    F -->|No| G[404 Not found or already inactive]
    F -->|Yes| H[Return success: true]
    C -->|HTTPException| I[Re-raise]
    C -->|Other| J[500 Internal server error]

Issue: #1737

API Endpoint Documentation: GET /admin/downtime/incidents

Handler: `list_downtime_incidents()` in `src/routes/downtime_logs.py`

1. Overview

Lists downtime incidents with optional filtering by status, severity, and environment. Admin-only endpoint requiring authentication through the full auth chain (API key -> user lookup -> admin role check).

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin (Bearer token -> API key validation -> user lookup -> admin role check)
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

Query Parameters

Parameter	Type	Default	Validation	Description
`limit`	`int`	`50`	`ge=1, le=500`	Max incidents to return
`status`	`str \| None`	`None`	regex: `^(ongoing\|resolved\|investigating)$`	Filter by status
`severity`	`str \| None`	`None`	regex: `^(low\|medium\|high\|critical)$`	Filter by severity
`environment`	`str \| None`	`None`	none	Filter by environment

Headers

Authorization: Bearer <api_key> (required)

3. Response

Success (200)

{
  "status": "success",
  "total_incidents": 5,
  "ongoing": 1,
  "resolved": 4,
  "incidents": [
    {
      "id": "uuid",
      "started_at": "2026-03-01T00:00:00+00:00",
      "detected_at": "2026-03-01T00:01:00+00:00",
      "health_endpoint": "/health",
      "error_message": "Connection refused",
      "http_status_code": 503,
      "status": "resolved",
      "severity": "high",
      "environment": "production",
      "ended_at": "2026-03-01T00:15:00+00:00",
      "logs_captured": [...],
      "log_count": 150,
      "resolved_by": "admin:user@example.com",
      "notes": "Resolution notes"
    }
  ]
}

Error Responses

Status	Condition
401	Missing/invalid API key
402	Trial expired
403	User is not admin
404	User not found
500	Internal server error

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

list_downtime_incidents() in src/routes/downtime_logs.py (line 33-79)

Level 2: Dependencies

require_admin from src/security/deps.py (FastAPI Depends)
get_recent_incidents() from src/db/downtime_incidents.py

Level 3: require_admin chain (deps.py)

require_admin() -> get_current_user() -> get_api_key() -> HTTPBearer()
get_current_user() calls get_user() from src/services/user_lookup_cache.py
get_current_user() calls validate_trial_expiration() from src/utils/trial_utils.py
require_admin() checks user.get("is_admin", False) or user.get("role") == "admin"
On failure: logs via audit_logger.log_security_violation() and raises 403

Level 4: get_recent_incidents() (db/downtime_incidents.py)

Calls execute_with_retry(_get_recent, max_retries=2, retry_delay=0.2)
_get_recent(client) builds Supabase query:
- Table: downtime_incidents
- Operation: SELECT *
- Optional filters: .eq("status", status), .eq("severity", severity), .eq("environment", environment)
- Order: .order("started_at", desc=True)
- Limit: .limit(limit)

Level 5: execute_with_retry() (config/supabase_config.py)

Retries up to max_retries (2) with retry_delay (0.2s) between attempts
Passes Supabase client to the operation callable
Handles connection errors with retry logic

5. Supabase Queries

Table	Operation	Columns	Filters	Order	Limit
`downtime_incidents`	SELECT	`*`	Optional: `status`, `severity`, `environment` (all `.eq()`)	`started_at DESC`	`limit` param (default 50, max 500)

Retry config: max_retries=2, retry_delay=0.2s

6. Redis Operations

None directly. The get_user() call in the auth chain uses user_lookup_cache which may involve Redis caching.

7. Prometheus Metrics

None directly emitted by this endpoint. The auth middleware pipeline may increment standard request metrics.

8. Pydantic Schemas

None. Uses FastAPI Query() parameter validation with regex patterns. Return type is dict[str, Any].

9. Middleware Effects

Standard middleware pipeline (sentry, observability, timeout, security, gzip, trace)
Subject to ConcurrencyMiddleware
Authentication via require_admin dependency injection chain:
1. HTTPBearer() extracts Bearer token
2. get_api_key() validates API key (format, active status, expiration, IP allowlist, domain restrictions)
3. get_current_user() looks up user and validates trial expiration
4. require_admin() checks admin role

10. Error Handling

Exception	Status	Handler
`HTTPException` (from auth chain)	401/402/403/404	Re-raised at line 75-76
Generic `Exception`	500	Caught at line 77-79, logged with `exc_info=True`, raises `HTTPException(500, "Internal server error")`

Auth chain error paths:

Missing credentials -> 401
Invalid/inactive/expired API key -> 401
Rate limited key -> 429
IP/domain restriction -> 403
User not found -> 404
Trial expired -> 402
Not admin -> 403

11. Mermaid Diagram

flowchart TD
    A[GET /admin/downtime/incidents] --> B[require_admin dependency]
    B --> C[get_current_user]
    C --> D[get_api_key - validate Bearer token]
    D --> E{API key valid?}
    E -->|No| F[401/403/429 HTTPException]
    E -->|Yes| G[get_user from cache]
    G --> H{User found?}
    H -->|No| I[404 HTTPException]
    H -->|Yes| J[validate_trial_expiration]
    J --> K{Trial expired?}
    K -->|Yes| L[402 HTTPException]
    K -->|No| M{is_admin or role==admin?}
    M -->|No| N[403 HTTPException]
    M -->|Yes| O[Execute handler]
    O --> P[get_recent_incidents from Supabase]
    P --> Q[SELECT * FROM downtime_incidents with filters]
    Q --> R[execute_with_retry max_retries=2]
    R --> S[Calculate summary: total, ongoing, resolved counts]
    S --> T[Return success response]
    O -->|Exception| U[Log error, raise 500]

12. Complete Dependency Map

list_downtime_incidents()
├── src/security/deps.py::require_admin (Depends)
│   ├── get_current_user()
│   │   ├── get_api_key() -> validate_api_key_security()
│   │   │   └── src/security/security.py
│   │   ├── get_user() -> src/services/user_lookup_cache.py
│   │   └── validate_trial_expiration() -> src/utils/trial_utils.py
│   └── audit_logger.log_security_violation()
├── src/db/downtime_incidents.py::get_recent_incidents()
│   └── src/config/supabase_config.py::execute_with_retry()
│       └── Supabase client -> downtime_incidents table
└── logging (stdlib)

Issue: #1738

API Endpoint Documentation: GET /admin/downtime/incidents/ongoing

Handler: `list_ongoing_incidents()` in `src/routes/downtime_logs.py`

1. Overview

Lists all currently ongoing downtime incidents. Admin-only endpoint. A specialized, no-parameter version of the incidents list filtered to status=ongoing.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin (Bearer token -> API key -> user -> admin check)
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

Headers

Authorization: Bearer <api_key> (required)

No query parameters.

3. Response

Success (200)

{
  "status": "success",
  "count": 2,
  "incidents": [
    {
      "id": "uuid",
      "started_at": "2026-03-01T00:00:00+00:00",
      "detected_at": "...",
      "status": "ongoing",
      "severity": "high",
      "environment": "production",
      ...
    }
  ]
}

Error Responses

Status	Condition
401	Missing/invalid API key
402	Trial expired
403	User is not admin
404	User not found
500	Internal server error

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

list_ongoing_incidents() in src/routes/downtime_logs.py (line 82-106)

Level 2: Dependencies

require_admin from src/security/deps.py (same auth chain as #1737)
get_ongoing_incidents() from src/db/downtime_incidents.py

Level 3: get_ongoing_incidents() (db/downtime_incidents.py line 209-234)

Calls execute_with_retry(_get_ongoing, max_retries=2, retry_delay=0.2)
_get_ongoing(client) builds Supabase query:
- Table: downtime_incidents
- Operation: SELECT *
- Filter: .eq("status", "ongoing")
- Order: .order("started_at", desc=True)
Returns result.data or empty list []
On exception: logs error, calls _maybe_log_missing_table_hint(), returns []

5. Supabase Queries

Table	Operation	Columns	Filters	Order
`downtime_incidents`	SELECT	`*`	`status = 'ongoing'`	`started_at DESC`

Retry config: max_retries=2, retry_delay=0.2s

6. Redis Operations

None directly. Auth chain may use user lookup cache.

7. Prometheus Metrics

None directly emitted.

8. Pydantic Schemas

None. Return type is dict[str, Any].

9. Middleware Effects

Same as #1737: standard pipeline + ConcurrencyMiddleware + require_admin auth chain.

10. Error Handling

Exception	Status	Handler
`HTTPException` (auth)	401/402/403/404	Re-raised at line 101-102
Generic `Exception`	500	Logged with `exc_info=True`, raises `HTTPException(500)` at line 103-106

Note: If the downtime_incidents table is missing, get_ongoing_incidents() catches the error internally and returns [] (empty list) rather than raising. The handler would then return {"status": "success", "count": 0, "incidents": []}.

11. Mermaid Diagram

flowchart TD
    A[GET /admin/downtime/incidents/ongoing] --> B[require_admin auth chain]
    B --> C{Auth successful?}
    C -->|No| D[401/402/403/404 HTTPException]
    C -->|Yes| E[get_ongoing_incidents from Supabase]
    E --> F[SELECT * FROM downtime_incidents WHERE status=ongoing ORDER BY started_at DESC]
    F --> G[execute_with_retry max_retries=2]
    G --> H[Return count + incidents list]
    E -->|Exception| I[Log error, raise 500]

12. Complete Dependency Map

list_ongoing_incidents()
├── src/security/deps.py::require_admin (Depends)
│   └── (full auth chain: get_api_key -> get_current_user -> admin check)
├── src/db/downtime_incidents.py::get_ongoing_incidents()
│   └── src/config/supabase_config.py::execute_with_retry()
│       └── Supabase client -> downtime_incidents table
└── logging (stdlib)

Issue: #1739

API Endpoint Documentation: GET /admin/downtime/statistics

Handler: `get_downtime_statistics()` in `src/routes/downtime_logs.py`

1. Overview

Returns aggregated downtime statistics for a configurable time period, including total incidents, downtime duration, and breakdowns by severity and status. Admin-only endpoint.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

Query Parameters

Parameter	Type	Default	Validation	Description
`days`	`int`	`30`	`ge=1, le=365`	Number of days to analyze

Headers

Authorization: Bearer <api_key> (required)

3. Response

Success (200)

{
  "status": "success",
  "period_days": 30,
  "statistics": {
    "total_incidents": 12,
    "total_downtime_seconds": 3600,
    "average_duration_seconds": 300,
    "by_severity": {
      "high": 5,
      "critical": 2,
      "medium": 5
    },
    "by_status": {
      "resolved": 10,
      "ongoing": 2
    }
  }
}

Empty Period

{
  "status": "success",
  "period_days": 30,
  "statistics": {
    "total_incidents": 0,
    "total_downtime_seconds": 0,
    "average_duration_seconds": 0,
    "by_severity": {},
    "by_status": {}
  }
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_downtime_statistics() in src/routes/downtime_logs.py (line 360-388)

Level 2: Dependencies

require_admin from src/security/deps.py
get_incident_statistics(days) from src/db/downtime_incidents.py

Level 3: get_incident_statistics() (db/downtime_incidents.py line 335-390)

Calculates cutoff_dt as now() - (days * 86400) seconds
Calls get_incidents_by_date_range(cutoff_dt, now())
Aggregates: total count, total downtime from duration_seconds field, severity counts, status counts
Returns stats dict

Level 4: get_incidents_by_date_range() (db/downtime_incidents.py line 279-309)

Calls execute_with_retry(_get_by_range, max_retries=2, retry_delay=0.2)
Supabase query:
- Table: downtime_incidents
- Operation: SELECT *
- Filters: .gte("started_at", start_date.isoformat()), .lte("started_at", end_date.isoformat())
- Order: .order("started_at", desc=True)

5. Supabase Queries

Table	Operation	Columns	Filters	Order
`downtime_incidents`	SELECT	`*`	`started_at >= cutoff_date AND started_at <= now()`	`started_at DESC`

Retry config: max_retries=2, retry_delay=0.2s

6. Redis Operations

None directly.

7. Prometheus Metrics

None directly emitted.

8. Pydantic Schemas

None. Return type is dict[str, Any].

9. Middleware Effects

Same as other admin endpoints: standard pipeline + ConcurrencyMiddleware + require_admin auth chain.

10. Error Handling

Exception	Status	Handler
`HTTPException` (auth)	401/402/403/404	Re-raised
Generic `Exception`	500	Logged, raises `HTTPException(500)`

Note: get_incident_statistics() has its own internal error handling and returns a zeroed-out stats dict on failure rather than raising. So a Supabase failure would result in a 200 response with all-zero statistics.

11. Statistics Calculation Logic

total_downtime = sum(inc.get("duration_seconds", 0) for inc in incidents if inc.get("duration_seconds"))
average_duration = total_downtime // len(incidents) if incidents else 0
# severity_counts: count per severity value
# status_counts: count per status value

12. Mermaid Diagram

flowchart TD
    A[GET /admin/downtime/statistics?days=30] --> B[require_admin auth chain]
    B --> C{Auth OK?}
    C -->|No| D[401/402/403/404]
    C -->|Yes| E[get_incident_statistics days=30]
    E --> F[Calculate cutoff_dt = now - 30 days]
    F --> G[get_incidents_by_date_range cutoff_dt to now]
    G --> H[SELECT * FROM downtime_incidents WHERE started_at BETWEEN dates]
    H --> I{Incidents found?}
    I -->|No| J[Return zeroed stats]
    I -->|Yes| K[Sum total_downtime from duration_seconds]
    K --> L[Count by severity and status]
    L --> M[Calculate average_duration]
    M --> N[Return statistics]

13. Complete Dependency Map

get_downtime_statistics()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident_statistics()
│   └── get_incidents_by_date_range()
│       └── execute_with_retry() -> Supabase downtime_incidents table
└── logging (stdlib)

Issue: #1740

API Endpoint Documentation: GET /admin/downtime/incidents/{incident_id}

Handler: `get_downtime_incident()` in `src/routes/downtime_logs.py`

1. Overview

Retrieves full details of a specific downtime incident by UUID, including captured logs and metadata. Admin-only endpoint.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

Path Parameters

Parameter	Type	Description
`incident_id`	`str`	UUID of the incident

Headers

Authorization: Bearer <api_key> (required)

3. Response

Success (200)

{
  "status": "success",
  "incident": {
    "id": "uuid",
    "started_at": "2026-03-01T00:00:00+00:00",
    "detected_at": "2026-03-01T00:01:00+00:00",
    "ended_at": "2026-03-01T00:15:00+00:00",
    "health_endpoint": "/health",
    "error_message": "Connection refused",
    "http_status_code": 503,
    "response_body": "...",
    "status": "resolved",
    "severity": "high",
    "environment": "production",
    "logs_captured": [...],
    "log_count": 150,
    "logs_file_path": null,
    "resolved_by": "admin:user@example.com",
    "notes": "Resolution notes",
    "server_info": {},
    "metrics_snapshot": {}
  }
}

Error Responses

Status	Condition
401	Missing/invalid API key
403	Not admin
404	Incident not found
500	Internal server error

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_downtime_incident() in src/routes/downtime_logs.py (line 109-139)

Level 2: Dependencies

require_admin from src/security/deps.py
get_incident(incident_id) from src/db/downtime_incidents.py

Level 3: get_incident() (db/downtime_incidents.py line 180-206)

Calls execute_with_retry(_get_incident, max_retries=2, retry_delay=0.2)
_get_incident(client):
- Table: downtime_incidents
- Operation: SELECT *
- Filter: .eq("id", str(incident_id))
Returns first row or None
On exception: logs error, calls _maybe_log_missing_table_hint(), returns None

5. Supabase Queries

Table	Operation	Columns	Filters
`downtime_incidents`	SELECT	`*`	`id = incident_id`

Retry config: max_retries=2, retry_delay=0.2s

6. Redis Operations

None directly.

7. Prometheus Metrics

None directly emitted.

8. Pydantic Schemas

None.

9. Middleware Effects

Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.

10. Error Handling

Exception	Status	Handler
Auth chain failures	401/402/403/404	Re-raised
`get_incident()` returns `None`	404	`HTTPException(404, "Incident not found")` at line 128
`HTTPException` (any)	varies	Re-raised at line 135-136
Generic `Exception`	500	Logged, raises `HTTPException(500)` at line 137-139

Note: If get_incident() fails due to missing table/Supabase error, it returns None internally (does not raise), which the handler interprets as 404.

11. Mermaid Diagram

flowchart TD
    A["GET /admin/downtime/incidents/{incident_id}"] --> B[require_admin auth chain]
    B --> C{Auth OK?}
    C -->|No| D[401/402/403/404]
    C -->|Yes| E[get_incident from Supabase]
    E --> F["SELECT * FROM downtime_incidents WHERE id = incident_id"]
    F --> G{Incident found?}
    G -->|No| H[404 Incident not found]
    G -->|Yes| I[Return success with incident data]
    E -->|Exception| J[Log error, raise 500]

12. Complete Dependency Map

get_downtime_incident()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│   └── execute_with_retry() -> Supabase downtime_incidents table
└── logging (stdlib)

Issue: #1741

API Endpoint Documentation: GET /admin/downtime/incidents/{incident_id}/logs

Handler: `get_incident_logs()` in `src/routes/downtime_logs.py`

1. Overview

Retrieves and filters captured logs for a specific downtime incident. Supports filtering by log level, logger name, and full-text search. Admin-only endpoint.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

Path Parameters

Parameter	Type	Description
`incident_id`	`str`	UUID of the incident

Query Parameters

Parameter	Type	Default	Validation	Description
`level`	`str \| None`	`None`	regex: `^(ERROR\|WARNING\|INFO\|DEBUG)$`	Filter by log level
`logger_name`	`str \| None`	`None`	none	Filter by logger name (e.g. `src.routes.chat`)
`search`	`str \| None`	`None`	none	Case-insensitive search in log messages

Headers

Authorization: Bearer <api_key> (required)

3. Response

Success - Logs Found (200)

{
  "status": "success",
  "total_logs": 25,
  "total_captured": 150,
  "filters": {
    "level": "ERROR",
    "logger": null,
    "search": null
  },
  "logs": [
    {
      "timestamp": "2026-03-01T00:05:00+00:00",
      "level": "ERROR",
      "logger": "src.routes.chat",
      "message": "Provider timeout after 30s",
      "labels": {...}
    }
  ]
}

Success - No Logs (200)

{
  "status": "success",
  "message": "No logs captured for this incident",
  "total_logs": 0,
  "logs": []
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_incident_logs() in src/routes/downtime_logs.py (line 142-206)

Level 2: Dependencies

require_admin from src/security/deps.py
get_incident() from src/db/downtime_incidents.py
get_filtered_logs() from src/services/downtime_log_capture.py

Level 3: get_incident() -> Supabase query (same as #1740)

Level 3: get_filtered_logs() (downtime_log_capture.py line 425-456)

Pure in-memory filtering function (no I/O):

If level provided: filter where log.get("level") == level
If logger_name provided: filter where log.get("logger") == logger_name
If search_term provided: filter where search_term.lower() in log.get("message", "").lower()
Returns filtered list

5. Supabase Queries

Table	Operation	Columns	Filters
`downtime_incidents`	SELECT	`*`	`id = incident_id`

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Pydantic Schemas

None.

9. Middleware Effects

Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.

10. Error Handling

Exception	Status	Handler
Auth chain failures	401/402/403/404	Re-raised
Incident not found (`get_incident()` returns None)	404	`HTTPException(404, "Incident not found")`
No `logs_captured` in incident	200	Returns `{"total_logs": 0, "logs": []}` (not an error)
`HTTPException` (any)	varies	Re-raised
Generic `Exception`	500	Logged, raises `HTTPException(500)`

11. Filtering Logic

# Applied sequentially - all filters are AND conditions
filtered = logs
if level:     filtered = [l for l in filtered if l.get("level") == level]
if logger:    filtered = [l for l in filtered if l.get("logger") == logger_name]
if search:    filtered = [l for l in filtered if search.lower() in l.get("message","").lower()]

12. Mermaid Diagram

flowchart TD
    A["GET /admin/downtime/incidents/{id}/logs"] --> B[require_admin auth]
    B --> C{Auth OK?}
    C -->|No| D[401/402/403/404]
    C -->|Yes| E[get_incident from Supabase]
    E --> F{Incident found?}
    F -->|No| G[404 Incident not found]
    F -->|Yes| H[Get logs_captured from incident]
    H --> I{Logs exist?}
    I -->|No| J[Return total_logs=0, empty logs array]
    I -->|Yes| K[get_filtered_logs with level, logger_name, search]
    K --> L[Apply level filter if provided]
    L --> M[Apply logger_name filter if provided]
    M --> N[Apply search filter if provided - case insensitive]
    N --> O[Return filtered logs with counts and filter metadata]

13. Complete Dependency Map

get_incident_logs()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│   └── execute_with_retry() -> Supabase downtime_incidents table
├── src/services/downtime_log_capture.py::get_filtered_logs()
│   └── (pure in-memory filtering, no external deps)
└── logging (stdlib)

Issue: #1742

API Endpoint Documentation: GET /admin/downtime/incidents/{incident_id}/analysis

Handler: `analyze_incident_logs()` in `src/routes/downtime_logs.py`

1. Overview

Analyzes captured logs for a downtime incident, providing error statistics and patterns including error counts, warning counts, error type distribution, and top error messages. Admin-only endpoint.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

Path Parameters

Parameter	Type	Description
`incident_id`	`str`	UUID of the incident

Headers

Authorization: Bearer <api_key> (required)

3. Response

Success - Analysis (200)

{
  "status": "success",
  "incident_id": "uuid",
  "analysis": {
    "total_logs": 150,
    "error_count": 25,
    "warning_count": 40,
    "error_types": {
      "ConnectionError": 10,
      "TimeoutError": 8,
      "Unknown": 7
    },
    "top_errors": [
      ["Provider timeout after 30s", 8],
      ["Connection refused to database", 6],
      ["Redis connection lost", 3]
    ]
  }
}

Success - No Logs (200)

{
  "status": "success",
  "message": "No logs to analyze",
  "analysis": null
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

analyze_incident_logs() in src/routes/downtime_logs.py (line 209-255)

Level 2: Dependencies

require_admin from src/security/deps.py
get_incident() from src/db/downtime_incidents.py
analyze_logs_for_errors() from src/services/downtime_log_capture.py

Level 3: analyze_logs_for_errors() (downtime_log_capture.py line 459-492)

Pure in-memory analysis function:

Filters errors = logs where level == "ERROR"
Filters warnings = logs where level == "WARNING"
Counts error types from error_type field (default "Unknown")
Counts error messages (truncated to 200 chars)
Sorts top 10 errors by count descending
Returns analysis dict

5. Supabase Queries

Table	Operation	Columns	Filters
`downtime_incidents`	SELECT	`*`	`id = incident_id`

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Pydantic Schemas

None.

9. Middleware Effects

Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.

10. Error Handling

Exception	Status	Handler
Auth chain failures	401/402/403/404	Re-raised
Incident not found	404	`HTTPException(404, "Incident not found")`
No logs in incident	200	Returns `{"analysis": null, "message": "No logs to analyze"}`
`HTTPException` (any)	varies	Re-raised
Generic `Exception`	500	Logged, raises `HTTPException(500)`

11. Analysis Algorithm

errors = [log for log in logs if log.get("level") == "ERROR"]
warnings = [log for log in logs if log.get("level") == "WARNING"]

# Count by error_type field
error_types = {}  # {"ConnectionError": 10, "TimeoutError": 8}

# Count by message (truncated to 200 chars)
error_messages = {}  # {"msg": count}
top_errors = sorted(error_messages.items(), key=count, reverse=True)[:10]

12. Mermaid Diagram

flowchart TD
    A["GET /admin/downtime/incidents/{id}/analysis"] --> B[require_admin auth]
    B --> C{Auth OK?}
    C -->|No| D[401/402/403/404]
    C -->|Yes| E[get_incident from Supabase]
    E --> F{Incident found?}
    F -->|No| G[404 Incident not found]
    F -->|Yes| H[Get logs_captured from incident]
    H --> I{Logs exist?}
    I -->|No| J["Return analysis=null, message='No logs to analyze'"]
    I -->|Yes| K[analyze_logs_for_errors]
    K --> L[Filter ERROR level logs]
    K --> M[Filter WARNING level logs]
    K --> N[Count error_types]
    K --> O[Count and rank top 10 error messages]
    L --> P[Return analysis dict]
    M --> P
    N --> P
    O --> P

13. Complete Dependency Map

analyze_incident_logs()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│   └── execute_with_retry() -> Supabase downtime_incidents table
├── src/services/downtime_log_capture.py::analyze_logs_for_errors()
│   └── (pure in-memory analysis, no external deps)
└── logging (stdlib)

Issue: #1743

API Endpoint Documentation: POST /admin/downtime/incidents/{incident_id}/capture-logs

Handler: `trigger_log_capture()` in `src/routes/downtime_logs.py`

1. Overview

Manually triggers log capture from Grafana Loki for an ongoing downtime incident. Queries Loki for logs from 5 minutes before the incident started to current time, and stores them in the database. Admin-only endpoint.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: POST
Return Type: dict[str, Any]

2. Request

Path Parameters

Parameter	Type	Description
`incident_id`	`str`	UUID of the incident

Headers

Authorization: Bearer <api_key> (required)

No request body.

3. Response

Success (200)

{
  "status": "success",
  "message": "Log capture triggered",
  "result": {
    "success": true,
    "log_count": 250,
    "truncated": false,
    "storage": "database"
  }
}

Failure Results (still 200)

{
  "status": "success",
  "message": "Log capture triggered",
  "result": {
    "success": false,
    "log_count": 0,
    "error": "No logs found in Loki"
  }
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

trigger_log_capture() in src/routes/downtime_logs.py (line 258-304)

Level 2: Dependencies

require_admin from src/security/deps.py
get_incident() from src/db/downtime_incidents.py
capture_logs_for_ongoing_incident() from src/services/downtime_log_capture.py

Level 3: capture_logs_for_ongoing_incident() (downtime_log_capture.py line 377-396)

Delegates to capture_downtime_logs(incident_id, downtime_start, downtime_end=None, save_to_file=False)

Level 4: capture_downtime_logs() (downtime_log_capture.py line 235-332)

Calculates time range: start = downtime_start - 5 minutes, end = now() (ongoing)
Calls query_loki_logs(start, end) to fetch logs from Grafana Loki
If save_to_file=False (default for manual capture):
- Truncates to MAX_LOGS_TO_CAPTURE (10,000)
- Calls update_incident(incident_id, logs_captured=logs) to save to database

Level 5: query_loki_logs() (downtime_log_capture.py line 135-232)

Checks Config.LOKI_ENABLED - returns [] if disabled
Checks Config.LOKI_QUERY_URL - returns [] if not set
Makes HTTP GET to {LOKI_QUERY_URL}/loki/api/v1/query_range with:
- query: {app="gatewayz-api"}
- start: nanosecond timestamp
- end: nanosecond timestamp
- limit: 10,000
- direction: forward (chronological)
Auth: Basic auth with GRAFANA_LOKI_USERNAME / GRAFANA_LOKI_API_KEY if configured
Uses httpx.Client (sync) with timeout=30.0
Parses Loki stream response, extracts timestamps and log lines (JSON or plain text)

Level 5: update_incident() (db/downtime_incidents.py line 107-177)

Builds update dict with logs_captured and log_count
Supabase: UPDATE downtime_incidents SET logs_captured=..., log_count=... WHERE id=incident_id
Retry config: max_retries=2, retry_delay=0.2s

5. Supabase Queries

Table	Operation	Columns	Filters	Notes
`downtime_incidents`	SELECT	`*`	`id = incident_id`	Get incident details
`downtime_incidents`	UPDATE	`logs_captured`, `log_count`	`id = incident_id`	Store captured logs

6. Redis Operations

None.

7. Prometheus Metrics

None directly emitted.

8. External API Calls

Service	Method	URL	Auth	Timeout
Grafana Loki	GET	`{LOKI_QUERY_URL}/loki/api/v1/query_range`	Basic (GRAFANA_LOKI_USERNAME/GRAFANA_LOKI_API_KEY)	30s

Loki Query Parameters

Param	Value
`query`	`{app="gatewayz-api"}`
`start`	`(incident_started_at - 5min)` in nanoseconds
`end`	`now()` in nanoseconds
`limit`	10,000
`direction`	`forward`

9. Config Dependencies

Config	Env Var	Description
`Config.LOKI_ENABLED`	`LOKI_ENABLED`	Must be truthy for log capture to work
`Config.LOKI_QUERY_URL`	`LOKI_QUERY_URL`	Loki query endpoint base URL
`Config.GRAFANA_LOKI_USERNAME`	`GRAFANA_LOKI_USERNAME`	Basic auth username (optional)
`Config.GRAFANA_LOKI_API_KEY`	`GRAFANA_LOKI_API_KEY`	Basic auth password (optional)

10. Error Handling

Exception	Status	Handler
Auth chain failures	401/402/403/404	Re-raised
Incident not found	404	`HTTPException(404, "Incident not found")`
Incident not ongoing	400	`HTTPException(400, "Can only capture logs for ongoing incidents")`
`HTTPException` (any)	varies	Re-raised
Generic `Exception`	500	Logged, raises `HTTPException(500)`
Loki query fails	200	Returns `{"result": {"success": false, "error": "..."}}` (handled internally)

11. Constants

Constant	Value	Description
`PRE_DOWNTIME_MINUTES`	5	Minutes before incident to capture
`POST_DOWNTIME_MINUTES`	5	Minutes after incident to capture
`MAX_LOGS_TO_CAPTURE`	10,000	Max log entries to store

12. Mermaid Diagram

flowchart TD
    A["POST /admin/downtime/incidents/{id}/capture-logs"] --> B[require_admin auth]
    B --> C{Auth OK?}
    C -->|No| D[401/402/403/404]
    C -->|Yes| E[get_incident from Supabase]
    E --> F{Incident found?}
    F -->|No| G[404 Incident not found]
    F -->|Yes| H{Status == ongoing?}
    H -->|No| I[400 Can only capture logs for ongoing incidents]
    H -->|Yes| J[Parse started_at from incident]
    J --> K[capture_logs_for_ongoing_incident]
    K --> L[Calculate time range: started_at - 5min to now]
    L --> M{LOKI_ENABLED?}
    M -->|No| N["Return success=false, error='Loki not enabled'"]
    M -->|Yes| O[HTTP GET Loki /loki/api/v1/query_range]
    O --> P{Logs found?}
    P -->|No| Q["Return success=false, 'No logs found'"]
    P -->|Yes| R[Truncate to 10,000 max]
    R --> S[UPDATE downtime_incidents SET logs_captured, log_count]
    S --> T[Return success with log_count]

13. Complete Dependency Map

trigger_log_capture()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│   └── execute_with_retry() -> Supabase downtime_incidents table (SELECT)
├── src/services/downtime_log_capture.py::capture_logs_for_ongoing_incident()
│   └── capture_downtime_logs()
│       ├── query_loki_logs() -> HTTP GET Grafana Loki API
│       │   ├── Config.LOKI_ENABLED
│       │   ├── Config.LOKI_QUERY_URL
│       │   ├── Config.GRAFANA_LOKI_USERNAME
│       │   ├── Config.GRAFANA_LOKI_API_KEY
│       │   └── httpx.Client (sync, timeout=30s)
│       └── update_incident() -> Supabase downtime_incidents table (UPDATE)
│           └── execute_with_retry()
├── datetime (stdlib)
└── logging (stdlib)

Issue: #1744

API Endpoint Documentation: POST /admin/downtime/incidents/{incident_id}/resolve

Handler: `resolve_downtime_incident()` in `src/routes/downtime_logs.py`

1. Overview

Manually resolves a downtime incident, setting its status to "resolved", recording the resolution timestamp, and storing the resolving admin's identity and optional notes. Admin-only endpoint.

Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: POST
Return Type: dict[str, Any]

2. Request

Path Parameters

Parameter	Type	Description
`incident_id`	`str`	UUID of the incident

Query Parameters

Parameter	Type	Default	Description
`notes`	`str \| None`	`None`	Optional resolution notes

Headers

Authorization: Bearer <api_key> (required)

3. Response

Success (200)

{
  "status": "success",
  "message": "Incident resolved",
  "incident": {
    "id": "uuid",
    "status": "resolved",
    "ended_at": "2026-03-04T12:00:00+00:00",
    "resolved_by": "admin:user@example.com",
    "notes": "Fixed database connection pool"
  }
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

resolve_downtime_incident() in src/routes/downtime_logs.py (line 307-357)

Level 2: Dependencies

require_admin from src/security/deps.py
get_incident() from src/db/downtime_incidents.py
resolve_incident() from src/db/downtime_incidents.py

Level 3: resolve_incident() (db/downtime_incidents.py line 312-332)

Delegates to update_incident() with:
- ended_at=datetime.now(UTC)
- status="resolved"
- resolved_by=resolved_by
- notes=notes

Level 4: update_incident() (db/downtime_incidents.py line 107-177)

Builds update dict from provided fields
Calls execute_with_retry(_update_incident, max_retries=2, retry_delay=0.2)
Supabase: UPDATE downtime_incidents SET ended_at, status, resolved_by, notes WHERE id=incident_id

resolved_by Construction (handler line 339)

resolved_by = f"admin:{admin_user.get('email', admin_user.get('id'))}"

Uses admin's email, falling back to user ID.

5. Supabase Queries

Table	Operation	Columns Updated	Filters
`downtime_incidents`	SELECT	`*`	`id = incident_id` (get incident)
`downtime_incidents`	UPDATE	`ended_at`, `status`, `resolved_by`, `notes`	`id = incident_id`

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Pydantic Schemas

None.

9. Middleware Effects

Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.

10. Error Handling

Exception	Status	Handler
Auth chain failures	401/402/403/404	Re-raised
Incident not found	404	`HTTPException(404, "Incident not found")`
Incident already resolved	400	`HTTPException(400, "Incident is already resolved")`
`HTTPException` (any)	varies	Re-raised
Generic `Exception`	500	Logged, raises `HTTPException(500)`

11. Mermaid Diagram

flowchart TD
    A["POST /admin/downtime/incidents/{id}/resolve"] --> B[require_admin auth]
    B --> C{Auth OK?}
    C -->|No| D[401/402/403/404]
    C -->|Yes| E[get_incident from Supabase]
    E --> F{Incident found?}
    F -->|No| G[404 Incident not found]
    F -->|Yes| H{Status == resolved?}
    H -->|Yes| I[400 Incident is already resolved]
    H -->|No| J["Build resolved_by = admin:{email or id}"]
    J --> K[resolve_incident -> update_incident]
    K --> L["UPDATE downtime_incidents SET ended_at=now, status=resolved, resolved_by, notes WHERE id=..."]
    L --> M[Return success with updated incident]

12. Complete Dependency Map

resolve_downtime_incident()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│   └── execute_with_retry() -> Supabase SELECT
├── src/db/downtime_incidents.py::resolve_incident()
│   └── update_incident()
│       └── execute_with_retry() -> Supabase UPDATE downtime_incidents
├── datetime (stdlib)
└── logging (stdlib)

Analytics

5 endpoints

Issue: #1631

Deep-Dive API Documentation: POST /v1/analytics/events

Section 1: High-Level Overview

This endpoint accepts a single analytics event from the frontend and forwards it to both Statsig and PostHog analytics platforms. It is designed to avoid ad-blocker interference with client-side analytics by routing events through the backend. Authentication is optional — authenticated users have their user ID resolved from the token, while unauthenticated requests use a caller-provided user_id or fall back to "anonymous".

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication: Depends(get_current_user) with current_user: dict | None — optional auth. The dependency resolves to None if no valid credentials are provided (non-fatal). Note: despite using get_current_user, this is effectively optional because the parameter type allows None.

Request Schema (AnalyticsEvent):

{
  "event_name": str,                    // Required. Event name (e.g., "chat_message_sent")
  "user_id": str | null,               // Optional. Used if not authenticated
  "value": str | null,                 // Optional. Event value
  "metadata": dict[str, Any] | null    // Optional. Event metadata
}

User ID resolution logic:

If current_user is authenticated: user_id = str(current_user.get("user_id", "anonymous"))
Else if event.user_id provided: user_id = event.user_id
Else: user_id = "anonymous"

Response (200 OK):

{
  "success": true,
  "message": "Event '{event_name}' logged successfully"
}

Error codes:

Code	Condition
500	statsig_service.log_event or posthog_service.capture raises exception

2.2 Mermaid Diagram

flowchart TD
    A([POST /v1/analytics/events]) --> B[get_current_user optional auth]
    B -->|no/invalid creds| C[current_user = None]
    B -->|valid creds| D[current_user = user dict]
    C --> E{current_user set?}
    D --> E
    E -->|yes| F[user_id = str current_user.user_id or anonymous]
    E -->|no| G{event.user_id set?}
    G -->|yes| H[user_id = event.user_id]
    G -->|no| I[user_id = anonymous]
    F --> J[statsig_service.log_event\nuser_id, event_name, value, metadata]
    H --> J
    I --> J
    J --> K[posthog_service.capture\ndistinct_id=user_id\nevent=event_name\nproperties=metadata]
    K -->|exception| L[logger.error\nHTTP 500]
    K -->|OK| M[Return 200 success]

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`get_current_user`	`src/security/deps.py:192`	Optional auth	Returns user dict or raises (but type hint allows None in analytics route context). Actually `get_current_user` raises if user not found — the analytics route accepts `dict
`statsig_service`	`src/services/statsig_service.py`	Event logging	Singleton `StatsigService` instance. `log_event(user_id, event_name, value, metadata)` — creates `StatsigUser` with `user_id`, calls `Statsig.log_event()`. Requires `STATSIG_SERVER_SECRET_KEY` env var. Falls back to logging-only if SDK unavailable.
`posthog_service`	`src/services/posthog_service.py`	Event capture	Singleton `PostHogService` instance. `capture(distinct_id, event, properties)` — calls PostHog Python SDK `client.capture()`. Requires `POSTHOG_API_KEY` env var. Uses async mode (`sync_mode=False`). Falls back gracefully if not initialized.
`StatsigService.log_event`	`src/services/statsig_service.py`	External API	Batches events via statsig-python-core SDK. Flush interval: 10s. Max queue size: 50.
`PostHogService.capture`	`src/services/posthog_service.py`	External API	Async PostHog capture. SDK: `posthog` Python package. Host: `POSTHOG_HOST` (default: `https://us.i.posthog.com`).

External API calls:

Statsig: Event batched locally, flushed to https://api.statsig.com every 10 seconds or when queue reaches 50 events
PostHog: Event sent asynchronously to POSTHOG_HOST (default: https://us.i.posthog.com)

Environment variables required:

STATSIG_SERVER_SECRET_KEY: Required for Statsig. Missing = service logs warning, operates in logging-only fallback.
POSTHOG_API_KEY: Required for PostHog. Missing = service disabled with warning log.
POSTHOG_HOST: Optional, defaults to https://us.i.posthog.com

2.4 Side Effects

No database writes.
No Redis operations.
External API calls (async/batched):
- Statsig: event queued locally, flushed in background
- PostHog: event sent asynchronously
No direct Prometheus metrics.
No audit log (analytics endpoint does not call audit_logger.log_api_key_usage).
Graceful degradation: Both analytics services fail silently (logging warnings) if not configured. The endpoint will still return 200 in those cases since exceptions would only be raised from statsig_service.log_event or posthog_service.capture — which both have try/except internally that may or may not re-raise.

Issue: #1632

API Documentation: POST /v1/analytics/batch

High-Level Overview

This endpoint accepts a batch of analytics events in a single request and forwards each one to both Statsig and PostHog sequentially. It is the preferred method when the frontend needs to log multiple events at once (e.g., on page unload, after a session, or when catching up on buffered events). Authentication is optional; the authenticated user's ID is used as the default for all events in the batch, while individual events can override their user_id field.

2.1 Requirements & Pipeline

Authentication & Authorization:

Optional authentication. Uses get_current_user dependency.
Unauthenticated requests are accepted; user_id defaults to "anonymous" unless overridden per event.
If authenticated, the user's ID serves as the default for any event that does not specify its own user_id.

Request Schema:

[
  {
    "event_name": "chat_message_sent",
    "user_id": null,
    "value": null,
    "metadata": { "model": "openai/gpt-4o" }
  },
  {
    "event_name": "model_selected",
    "user_id": "override-user-456",
    "value": "openai/gpt-4o",
    "metadata": {}
  }
]

Schema: list[AnalyticsEvent] (each item is AnalyticsEvent from src/routes/analytics.py).

User ID Resolution (per event):

Uses event.user_id if set, otherwise falls back to the authenticated user's ID or "anonymous".

Response Schema:

{
  "success": true,
  "message": "3 events logged successfully"
}

Error Codes:

Code	Condition
500	Any Statsig or PostHog service call failure

2.2 Mermaid Diagram

sequenceDiagram
    participant C as Client (Frontend)
    participant R as Route Handler<br/>log_batch_events()
    participant Auth as get_current_user (optional)
    participant Statsig as Statsig Service
    participant PostHog as PostHog Service

    C->>R: POST /v1/analytics/batch [ {event_name, ...}, ... ]
    R->>Auth: Depends(get_current_user)
    alt Authenticated
        Auth-->>R: current_user
        R->>R: default user_id = str(current_user["user_id"])
    else Not authenticated
        Auth-->>R: None
        R->>R: default user_id = "anonymous"
    end

    loop For each event in events list
        R->>R: event_user_id = event.user_id or default user_id

        R->>Statsig: statsig_service.log_event(<br/>user_id=event_user_id,<br/>event_name, value, metadata)
        Statsig-->>R: OK

        R->>PostHog: posthog_service.capture(<br/>distinct_id=event_user_id,<br/>event=event_name, properties=metadata)
        PostHog-->>R: OK
    end

    R-->>C: 200 { success: true, message: "N events logged successfully" }

2.3 Complete Dependency Map

Category	Name	Location	Purpose
Route file	`analytics.py`	`src/routes/analytics.py`	Handler
Auth	`get_current_user`	`src/security/deps.py`	Optional user identification
Schema	`AnalyticsEvent`	`src/routes/analytics.py`	Per-event model
Service	`statsig_service`	`src/services/statsig_service.py`	Statsig event logging
Service	`posthog_service`	`src/services/posthog_service.py`	PostHog event capture
External	Statsig	SaaS	Analytics / feature flags
External	PostHog	SaaS	Product analytics
Framework	`FastAPI`, `APIRouter`, `Depends`, `HTTPException`	`fastapi`	HTTP layer
Logging	`logging`	stdlib	Error logging

2.4 Side Effects

External writes to Statsig: One log_event() call per event in the batch.
External writes to PostHog: One capture() call per event in the batch.
Processing is sequential (not concurrent): Events are iterated in order; a slow network call to Statsig or PostHog for one event will delay processing subsequent events. There is no parallelism or timeout per event.
Fail-fast error handling: If any event's Statsig or PostHog call raises an exception, the entire batch fails with HTTP 500. Events processed before the failure are already logged; events after are not.
No database writes to any Supabase table.
No caching reads or writes.
No notifications.

Issue: #1633

API Documentation: POST /v1/analytics/session/start

High-Level Overview

This endpoint logs a session start event to both Statsig and PostHog for DAU/WAU/MAU (Daily/Weekly/Monthly Active User) tracking and product growth metrics computation. It should be called when a user opens the application, logs in, or returns after an idle period. The session_start event is specifically named to align with Statsig's built-in Product Growth metric computation pipeline. Authentication is optional — anonymous sessions are tracked with user_id = "anonymous".

2.1 Requirements & Pipeline

Authentication & Authorization:

Optional authentication. Uses get_current_user dependency.
Unauthenticated requests are accepted and logged as anonymous sessions.
Authenticated user's ID is used if available.

Request Schema:

{
  "platform": "web",
  "metadata": {
    "version": "2.0.4",
    "referrer": "https://google.com",
    "utm_source": "email"
  }
}

Schema: SessionStartEvent (defined in src/routes/analytics.py).

Platform values: web, ios, android, desktop (validated by Pydantic Field(default="web")).

Response Schema:

{
  "success": true,
  "message": "Session start logged successfully"
}

Error Codes:

Code	Condition
500	Statsig or PostHog service call failure

2.2 Mermaid Diagram

sequenceDiagram
    participant C as Client (Frontend / Mobile)
    participant R as Route Handler<br/>log_session_start()
    participant Auth as get_current_user (optional)
    participant Statsig as Statsig Service
    participant PostHog as PostHog Service

    C->>R: POST /v1/analytics/session/start<br/>{ platform: "web", metadata: {...} }
    R->>Auth: Depends(get_current_user)
    alt Authenticated
        Auth-->>R: current_user dict
        R->>R: user_id = str(current_user["user_id"])
    else Not authenticated
        Auth-->>R: None
        R->>R: user_id = "anonymous"
    end

    R->>Statsig: statsig_service.log_session_start(<br/>user_id=user_id,<br/>platform=session.platform,<br/>metadata=session.metadata)
    Statsig-->>R: OK (logs "session_start" event<br/>for DAU/WAU/MAU computation)

    R->>PostHog: posthog_service.capture(<br/>distinct_id=user_id,<br/>event="session_start",<br/>properties={"platform": "web", ...metadata})
    PostHog-->>R: OK

    R->>R: logger.debug("Session start logged for user X on web")
    R-->>C: 200 { success: true, message: "Session start logged successfully" }

2.3 Complete Dependency Map

Category	Name	Location	Purpose
Route file	`analytics.py`	`src/routes/analytics.py`	Handler
Auth	`get_current_user`	`src/security/deps.py`	Optional user identification
Schema	`SessionStartEvent`	`src/routes/analytics.py`	Request body
Service	`statsig_service`	`src/services/statsig_service.py`	Statsig session start logging
Service method	`statsig_service.log_session_start()`	`src/services/statsig_service.py`	Specialized session event
Service	`posthog_service`	`src/services/posthog_service.py`	PostHog session capture
External	Statsig	SaaS	DAU/WAU/MAU + Product Growth metrics
External	PostHog	SaaS	Session tracking, retention analysis
Framework	`FastAPI`, `APIRouter`, `Depends`, `HTTPException`	`fastapi`	HTTP layer
Logging	`logging`	stdlib	Debug logging

2.4 Side Effects

External write to Statsig: Calls statsig_service.log_session_start() which logs a named session_start event. Statsig uses this specific event name to compute Product Growth metrics including DAU, WAU, MAU, stickiness, and retention rates. This is not a generic log_event() call — it uses a dedicated method to ensure the event is structured correctly for Statsig's metric pipeline.
External write to PostHog: Calls posthog_service.capture() with event="session_start" and a platform property plus any additional metadata. PostHog uses this for funnel analysis, session recording correlation, and retention cohorts.
No database writes to any Supabase table.
No caching reads or writes.
No notifications.
Debug log: A logger.debug line is emitted for each session start (not info/warning level), so it does not appear in production log aggregation unless debug logging is enabled.

Issue: #1661

Deep-Dive API Documentation: GET /v1/analytics/cache

Handler: get_cache_analytics() in src/routes/butter_analytics.py line 26

Overview

Returns Butter.dev LLM response cache performance analytics for the authenticated user over a configurable time window (1-90 days). Queries the chat_completion_requests Supabase table with a join to models and providers, then aggregates cache hit/miss statistics, cost savings, and per-model breakdown in Python.

Authentication

Dependency: get_api_key (src/security/deps.py). Bearer token validated.

Then calls get_user(api_key) from src/db/users.py to retrieve the full user record. Returns HTTP 401 if no user found for the key.

Request

GET /v1/analytics/cache?days=30 Authorization: Bearer api_key

Query parameter: days (int, optional, default=30, min=1, max=90) - analysis window in days

FastAPI Query validation: ge=1, le=90. Values outside range return HTTP 422 Unprocessable Entity.

Handler Execution Flow (3 Levels Deep)

Level 1 get_cache_analytics() src/routes/butter_analytics.py:26-163:

Call get_user(api_key) to get user record including id and preferences
Compute since_date = datetime.now(UTC) - timedelta(days=days)
Call get_supabase_client() to get Supabase client
Execute Supabase query on chat_completion_requests table
Aggregate statistics in Python (cache hits, misses, savings)
Sort and filter top_cached_models
Return response dict

Level 2 get_user() from src/db/users.py (imported at top of file):

Note: src/routes/butter_analytics.py imports from src.db.users. This is the database-backed user lookup, not the cached version. Returns full user dict or None.

Level 2 get_supabase_client() from src/config/supabase_config.py:

Returns the configured Supabase Python client singleton.

Level 2 Supabase Query (src/routes/butter_analytics.py:59-68):

result = (
    client.table("chat_completion_requests")
    .select("model_id, cost_usd, metadata, created_at, models(model_name, providers(name, slug))")
    .eq("user_id", user_id)
    .eq("status", "completed")
    .gte("created_at", since_date.isoformat())
    .execute()
)

Table: chat_completion_requests Operation: SELECT with JOIN Columns selected: model_id, cost_usd, metadata, created_at Joined tables: models (model_name), providers (name, slug) Filters:

user_id = user_id (integer equality)
status = 'completed'
created_at >= since_date (ISO-8601 timestamp) No LIMIT applied - fetches all matching rows.

Level 3 Aggregation Logic (src/routes/butter_analytics.py:73-134):

For each request record:

Check metadata.butter_cache_hit (boolean)
If cache hit: increment cache_hits counter, add metadata.actual_cost_usd to total_savings
Else: increment cache_misses counter
Track per-model stats in model_stats dict

Top cached models filtering:

Only includes models with total_requests >= 5
Sorted by cache_hit_rate_percent descending
Truncated to top 10

Derived metrics:

cache_hit_rate = (cache_hits / total_requests * 100) if total_requests > 0 else 0
estimated_monthly_savings = (total_savings * 30 / days) if days > 0 else 0

Supabase Tables Accessed

chat_completion_requests:

model_id (column)
cost_usd (column)
metadata (JSONB column) - contains: butter_cache_hit (bool), actual_cost_usd (float)
created_at (timestamp)
user_id (FK to users)
status (enum, filtered on 'completed')

models (joined via model_id FK):

model_name

providers (joined via models.provider_id FK):

name
slug

Response

{
  "period_days": 30,
  "start_date": "2026-02-02T12:00:00.000000+00:00",
  "end_date": "2026-03-04T12:00:00.000000+00:00",
  "total_requests": 1250,
  "cache_hits": 437,
  "cache_misses": 813,
  "cache_hit_rate_percent": 34.96,
  "total_savings_usd": 12.847293,
  "estimated_monthly_savings_usd": 12.85,
  "top_cached_models": [
    {
      "model_name": "gpt-4",
      "provider": "OpenAI",
      "total_requests": 320,
      "cache_hits": 198,
      "cache_hit_rate_percent": 61.88,
      "savings_usd": 8.943241
    }
  ],
  "cache_enabled": true,
  "system_enabled": true
}

cache_enabled: from user.preferences.enable_butter_cache (default true if not set) system_enabled: from Config.BUTTER_DEV_ENABLED environment variable

Error Handling

Inner try/except blocks: None (single outer handler) Outer try/except:

HTTPException: re-raised (preserves 401 from get_user check)
All other Exception: logs via sanitize_for_logging, raises HTTP 500 "Failed to retrieve cache analytics"

HTTP error codes:

401: Invalid API key
422: Invalid days parameter (FastAPI validation)
500: Failed to retrieve cache analytics

Storage

Redis: Not used Supabase: SELECT on chat_completion_requests with JOIN to models and providers In-memory: None

Config Reference

Config.BUTTER_DEV_ENABLED: boolean env var controlling whether Butter.dev caching is active system-wide user.preferences.enable_butter_cache: per-user opt-in/opt-out (defaults to True)

Issue: #1662

Deep-Dive API Documentation: GET /v1/analytics/cache/summary

Handler: get_cache_summary() in src/routes/butter_analytics.py line 166

Overview

Returns a quick summary of Butter.dev cache performance for the authenticated user. Tries a Supabase RPC function first (get_user_cache_savings), falls back to manual query if RPC is unavailable. Returns minimal response if cache is disabled for the user or system-wide.

Authentication

Same as get_cache_analytics: get_api_key (src/security/deps.py) + get_user(api_key) lookup. Returns HTTP 401 if invalid key.

Request

GET /v1/analytics/cache/summary Authorization: Bearer api_key

No query parameters.

Handler Execution Flow (3 Levels Deep)

Level 1 get_cache_summary() src/routes/butter_analytics.py:166-266:

Call get_user(api_key) -> get user record
Extract cache_enabled = user.preferences.enable_butter_cache (default True)
If not cache_enabled OR not Config.BUTTER_DEV_ENABLED: return minimal response immediately
Compute since_date = datetime.now(UTC) - timedelta(days=30) (hardcoded 30 days)
Try Supabase RPC call first
On RPC failure: fall back to manual query
Return aggregated response

Level 2a Supabase RPC (src/routes/butter_analytics.py:205-221):

result = client.rpc(
    "get_user_cache_savings",
    {"p_user_id": user_id, "p_days": 30}
).execute()

RPC function: get_user_cache_savings Parameters: p_user_id (integer), p_days (integer, hardcoded 30) Expected return columns: total_requests, cache_hits, cache_hit_rate_percent, total_savings_usd, estimated_monthly_savings_usd

If RPC returns data and len(data) > 0: return response using RPC results. If RPC raises Exception: log at DEBUG level and fall through to manual query (NOT an error).

Level 2b Manual Supabase Query (fallback, src/routes/butter_analytics.py:226-244):

result = (
    client.table("chat_completion_requests")
    .select("metadata")
    .eq("user_id", user_id)
    .eq("status", "completed")
    .gte("created_at", since_date.isoformat())
    .execute()
)

Table: chat_completion_requests Operation: SELECT Columns: metadata only (minimal data transfer vs full analytics endpoint) Filters: user_id equality, status='completed', created_at >= 30 days ago

Level 3 Manual Aggregation (src/routes/butter_analytics.py:235-255):

Simpler than get_cache_analytics:

Iterates metadata field only
Counts metadata.butter_cache_hit truthy values
Sums metadata.actual_cost_usd for hits
No per-model breakdown
Estimated monthly = total_savings (already 30 days, no scaling needed)

Early Return: Cache Disabled

If user.preferences.enable_butter_cache == False OR Config.BUTTER_DEV_ENABLED == False:

{
  "cache_enabled": false,
  "system_enabled": true,
  "message": "Cache is disabled. Enable it in settings to start saving on API costs.",
  "total_savings_usd": 0.0,
  "cache_hit_rate_percent": 0.0
}

Message differs based on which flag triggered: user preference vs system disable.

Response (cache enabled, RPC path)

{
  "cache_enabled": true,
  "system_enabled": true,
  "total_requests": 1250,
  "cache_hits": 437,
  "cache_hit_rate_percent": 34.96,
  "total_savings_usd": 12.847,
  "estimated_monthly_savings_usd": 12.85
}

Response (cache enabled, manual fallback path)

Same structure but total_savings_usd rounded to 6 decimal places, estimated_monthly_savings_usd = total_savings rounded to 2 decimal places (already 30-day window, no projection applied).

Error Handling

HTTPException: re-raised (401 from get_user check)
RPC Exception: caught silently at DEBUG level, triggers fallback query
All other Exception: logs via sanitize_for_logging, raises HTTP 500 "Failed to retrieve cache summary"

HTTP error codes: 401, 500

Supabase Operations

RPC: get_user_cache_savings(p_user_id int, p_days int) - PostgreSQL function Table read: chat_completion_requests (select metadata only, filtered)

Storage

Redis: Not used In-memory: None

Config Reference

Config.BUTTER_DEV_ENABLED: system-wide enable/disable (env var) user.preferences: JSONB column on users table, key enable_butter_cache (bool, defaults to True when not set)

Authentication

5 endpoints

Issue: #1645

Deep-Dive API Documentation: POST /auth

Handler: `privy_auth()` — `src/routes/auth.py`

Overview

Primary authentication endpoint using Privy as the identity provider. Handles both new user registration and existing user login in a single call. Extracts identity from Privy linked accounts (email, Google OAuth, GitHub, phone/SMS), performs email quality verification, creates users on first login, and returns an API key.

Authentication

No auth required. This endpoint is unauthenticated (it IS the auth endpoint).

Rate Limiting

Type: AuthRateLimitType.LOGIN
Limit: 10 attempts per 15 minutes per IP (sliding window)
Key: Client IP (extracted via get_client_ip())
Algorithm: In-memory deque, asyncio Lock
On exceed: HTTP 429 with {"error": "Rate limit exceeded", "retry_after": N} + Retry-After header

Request Body — `PrivyAuthRequest` (`src/schemas/auth.py`)

class PrivyAuthRequest(BaseModel):
    user: PrivyUserData                          # Required
    token: str | None = None                     # Privy access token (not currently validated)
    email: str | None = None                     # Optional top-level email override
    privy_access_token: str | None = None
    refresh_token: str | None = None
    session_update_action: str | None = None
    is_new_user: bool | None = None
    referral_code: str | None = None             # User referral OR partner code (e.g., "REDBEARD")
    environment_tag: str | None = "live"         # Validated: "live" | "test" | "development"
    auto_create_api_key: bool | None = True

class PrivyUserData(BaseModel):
    id: str                                      # Privy user ID (required, non-empty)
    created_at: int
    linked_accounts: list[PrivyLinkedAccount] = []
    mfa_methods: list[str] = []
    has_accepted_terms: bool = False
    is_guest: bool = False

class PrivyLinkedAccount(BaseModel):
    type: str          # Normalized: "email", "phone", "google_oauth", "github", etc.
    subject: str | None = None
    email: str | None = None
    address: str | None = None
    name: str | None = None
    phone_number: str | None    # AliasChoices: "phone_number" or "phoneNumber"
    verified_at: int | None = None

Auth Info Extraction Priority

1. request.email (top-level field from frontend)
2. Linked account type "email" → email field
3. Linked account type "google_oauth" → address/email field + display_name
4. Linked account type "phone" → phone_number
5. Linked account type "github" → name as display_name

Auth method priority (set last wins):

Default: AuthMethod.EMAIL
GitHub sets to AuthMethod.GITHUB if no email found
Phone sets to AuthMethod.PHONE if no email found

User Lookup Flow

Cache check (in-memory): get_cached_user_by_privy_id(request.user.id) — Redis-backed, invalidated on updates

DB fallback (with timeout): users_module.get_user_by_privy_id(request.user.id) — Supabase query:

SELECT * FROM users WHERE privy_user_id = <privy_id> LIMIT 1

Timeout: USER_LOOKUP_TIMEOUT seconds (configured constant).

Secondary fallback: If privy_id lookup fails, tries username:

SELECT * FROM users WHERE username = <base_username> LIMIT 1

If found by username, updates users.privy_user_id and invalidates cache.

Existing User Path (`_handle_existing_user`)

Fetches active API keys from api_keys_new:

SELECT api_key, is_primary, created_at FROM api_keys_new
WHERE user_id = <id> AND is_active = true
ORDER BY is_primary DESC, created_at ASC

Returns primary key if present, else oldest active key
Detects and rejects temporary API key patterns (pattern check via _is_temporary_api_key())
Auto-creates new primary key if none exists and auto_create_api_key=True
Computes tiered credits (subscription allowance + purchased, in cents for frontend)
Raises HTTP 503 if user exists but has no API key available

Background tasks:

_send_welcome_email_background — sends if email valid and not @privy.user/@privy.placeholder
_log_auth_activity_background — inserts to activity table

New User Path

Email verification via _get_subscription_status_for_email():
- Checks local blocklist → is_blocked_email_domain() → HTTP 400 if blocked
- Checks local temp email list → marks as "bot"
- Calls Emailable API → verify_email(email) → blocks should_block, marks is_bot as "bot"
- On API failure: falls back gracefully, allows registration
Generates unique username: _generate_unique_username() — up to 5 collision retries, then appends random 4-byte hex
Creates user: users_module.create_enhanced_user() — starts with $5 credits, 3-day trial
Fallback manual insert if create_enhanced_user fails
Partner/referral code processing (background):
- Partner codes (e.g., "REDBEARD"): _apply_partner_trial_background → PartnerTrialService.start_partner_trial()
- User codes: _process_referral_code_background → updates users.referred_by_code

Response Schema — `PrivyAuthResponse`

class PrivyAuthResponse(BaseModel):
    success: bool
    message: str                          # "Login successful" or "Account created successfully"
    user_id: int | None
    api_key: str | None                   # Raw API key (gw_live_... prefix)
    auth_method: AuthMethod | None
    privy_user_id: str | None
    is_new_user: bool | None
    display_name: str | None
    email: str | None
    phone_number: str | None
    credits: float | None                 # Total credits in dollars
    timestamp: datetime | None
    subscription_status: str | None       # "trial", "active", "bot", "inactive"
    tier: str | None                      # "basic", "pro", "max"
    tier_display_name: str | None         # "Basic", "Pro", "MAX"
    trial_expires_at: str | None          # ISO string
    subscription_end_date: int | None     # Unix timestamp
    subscription_allowance: int | None    # Monthly allowance in cents
    purchased_credits: int | None         # One-time credits in cents
    total_credits: int | None             # Sum in cents
    allowance_reset_date: str | None

Error Handling

Scenario	HTTP Status	Detail
Rate limit exceeded	429	`{error, message, retry_after}`
Blocked email domain	400	"This email address is not allowed..."
User exists but no API key	503	"Your account exists but no API key is available..."
New user created but no API key	500	"Account created but API key generation failed..."
Supabase URL misconfigured	503	"Service configuration error: Database URL is misconfigured..."
General failure	500	"Authentication failed: ..."

Issue: #1646

Deep-Dive API Documentation: POST /auth/register

Handler: `register_user()` — `src/routes/auth.py`

Overview

Direct user registration endpoint (non-Privy). Creates a new user account with username + email, generates an API key, sends a welcome email, and processes optional referral codes. Intended for direct registration flows not using Privy auth.

Authentication

No auth required. This is a registration endpoint.

Rate Limiting

Type: AuthRateLimitType.REGISTER
Limit: 3 attempts per hour per IP
Window: 3600 seconds (1 hour)
Key: Client IP
Algorithm: In-memory sliding window, asyncio Lock
On exceed: HTTP 429 with {"error": "Rate limit exceeded", "retry_after": N} + Retry-After header

Request Body — `UserRegistrationRequest` (`src/schemas/users.py`)

class UserRegistrationRequest(BaseModel):
    username: str                           # Required
    email: EmailStr                         # Required, Pydantic EmailStr validation
    auth_method: AuthMethod = AuthMethod.EMAIL
    environment_tag: str = "live"
    key_name: str = "Primary Key"
    referral_code: str | None = None       # Optional user referral code

AuthMethod enum (from src/schemas/common.py):

class AuthMethod(str, Enum):
    EMAIL = "email"
    GOOGLE = "google"
    GITHUB = "github"
    PHONE = "phone"
    # ... other OAuth methods

Execution Flow

Step 1 — Rate limit check:

rate_limit_result = await check_auth_rate_limit(client_ip, AuthRateLimitType.REGISTER)

Step 2 — Email quality verification:

subscription_status, should_block = await _get_subscription_status_for_email(request.email)
# should_block=True → HTTP 400
# subscription_status="bot" → marks as bot, still allows registration

Process:

Check local blocklist (is_blocked_email_domain)
Check local temp email list (is_temporary_email_domain)
Call Emailable API for comprehensive verification

Step 3 — Uniqueness checks (with query timeout):

-- Email check
SELECT id FROM users WHERE email = <email>

-- Username check
SELECT id FROM users WHERE username = <username>

Both use safe_query_with_timeout() with AUTH_QUERY_TIMEOUT. Returns HTTP 503 on timeout, HTTP 400 on conflict.

Step 4 — User creation:

user_data = users_module.create_enhanced_user(
    username=request.username,
    email=request.email,
    auth_method=auth_method_str,
    privy_user_id=None,           # No Privy for direct registration
    credits=5,                    # $5 trial credits
    subscription_status=subscription_status,
)

Fallback manual insert if create_enhanced_user fails:

fallback_payload = {
    "username": request.username,
    "email": request.email,
    "credits": 5,
    "privy_user_id": None,
    "auth_method": ...,
    "subscription_status": "bot" if is_temp_email else "trial",
    "trial_expires_at": (datetime.now(UTC) + timedelta(days=3)).isoformat(),
    "tier": "basic",
}
client.table("users").insert(fallback_payload).execute()

Then creates API key via create_api_key(user_id, key_name, environment_tag, is_primary=True).

Step 5 — Referral code processing (background task):

if request.referral_code:
    background_tasks.add_task(
        _process_referral_code_background,
        referral_code=request.referral_code,
        user_id=user_data["user_id"],
        username=request.username,
        is_new_user=True,
    )

Calls track_referral_signup() → updates users.referred_by_code → sends referral notification email.

Step 6 — Welcome email (synchronous, not background):

success = notif_module.enhanced_notification_service.send_welcome_email(...)
if success:
    mark_welcome_email_sent(user_data["user_id"])  # UPDATE users SET welcome_email_sent=true

Response Schema — `UserRegistrationResponse` (`src/schemas/users.py`)

class UserRegistrationResponse(BaseModel):
    user_id: int
    username: str
    email: str
    api_key: str                      # Raw primary API key
    credits: int                      # Starting credits ($5)
    environment_tag: str
    scope_permissions: dict[str, list[str]]
    auth_method: AuthMethod
    subscription_status: SubscriptionStatus   # Always "trial" on success
    message: str                      # "Account created successfully"
    timestamp: datetime

Initial User State

Field	Value
`credits`	5 (dollars)
`subscription_status`	`"trial"` (or `"bot"` for temp emails)
`tier`	`"basic"`
`trial_expires_at`	`now + 3 days`
`is_primary` key	True
`welcome_email_sent`	True (if email sent successfully)

Error Handling

Scenario	HTTP Status	Detail
Rate limit exceeded	429	`{error, message, retry_after}`
Blocked email	400	"This email address is not allowed..."
Email already exists	400	"User with this email already exists"
Username already taken	400	"Username already taken"
DB timeout on uniqueness check	503	"Service temporarily unavailable"
User creation failure	500	"Failed to create user account"
General failure	500	"Registration failed: ..."

Difference from POST /auth (Privy)

Aspect	POST /auth	POST /auth/register
Identity provider	Privy (OAuth/social)	Direct (email+username)
Rate limit	10/15min	3/hour
Privy user ID	Stored	None
Email from	Linked accounts	Request body
Welcome email	Background task	Synchronous
Partner codes	Supported	Not supported

Issue: #1647

Deep-Dive API Documentation: POST /auth/password-reset

Handler: `request_password_reset()` — `src/routes/auth.py`

Overview

Initiates a password reset flow. Looks up a user by email address and sends a reset email via the notification service. Uses a deliberately vague response message to prevent email enumeration attacks.

Authentication

No auth required. Public endpoint.

Rate Limiting

Type: AuthRateLimitType.PASSWORD_RESET
Limit: 3 attempts per hour per IP
Window: 3600 seconds (1 hour)
Key: Client IP (from get_client_ip(raw_request))
Algorithm: In-memory sliding window, asyncio Lock
On exceed: HTTP 429 with {"error": "Rate limit exceeded", "retry_after": N} + Retry-After header

Request Parameters

Note: The email is taken as a query parameter (not a JSON body), as the handler signature is:

async def request_password_reset(email: str, raw_request: Request):

This means the request is: POST /auth/password-reset?email=user@example.com

Execution Flow

Step 1 — Rate limit check:

rate_limit_result = await check_auth_rate_limit(client_ip, AuthRateLimitType.PASSWORD_RESET)

Step 2 — User lookup:

SELECT id, username, email FROM users
WHERE email = <email>

Uses direct Supabase client (no timeout wrapper on this query). If user not found, returns generic 200 response (does NOT reveal whether email exists).

Step 3 — Send reset email:

reset_token = notif_module.enhanced_notification_service.send_password_reset_email(
    user_id=user["id"],
    username=user["username"],
    email=user["email"]
)

The notification service generates a reset token and sends via Resend email API. The token is stored in the password_reset_tokens table.

Response Schema

On success (user found, email sent):

{"message": "Password reset email sent successfully"}

On user not found (email enumeration prevention):

{"message": "If an account with that email exists, a password reset link has been sent."}

Both return HTTP 200.

Error Handling

Scenario	HTTP Status	Detail
Rate limit exceeded	429	`{error, message, retry_after}`
Email service failure	500	"Failed to send password reset email"
Unhandled exception	500	"Internal server error"
User not found	200	Generic message (intentional — no 404 to prevent enumeration)

Security Design

Constant-time-like response: Both "user found" and "user not found" return similar messages, preventing attackers from enumerating registered emails.
Rate limiting: 3 attempts/hour/IP prevents email bombing.
Token storage: Reset token stored in password_reset_tokens table with expiry (used by reset_password endpoint).

Database Tables Used

Table	Operation	Purpose
`users`	SELECT	Look up user by email
`password_reset_tokens`	INSERT (via notification service)	Store reset token with expiry

Known Limitations

The email query has no timeout wrapper (unlike the registration uniqueness checks)
The endpoint accepts email as a query parameter, not JSON body (atypical for a POST)

Issue: #1648

Deep-Dive API Documentation: POST /auth/reset-password

Handler: `reset_password()` — `src/routes/auth.py`

Overview

Completes the password reset flow. Validates a one-time reset token from the password_reset_tokens table, checks expiry, and marks the token as used. Note: The current implementation does not actually update the password hash — it only marks the token consumed (placeholder implementation).

Authentication

No auth required. Public endpoint (token itself is the credential).

Rate Limiting

Type: AuthRateLimitType.PASSWORD_RESET
Limit: 3 attempts per hour per IP (shared with /auth/password-reset)
Window: 3600 seconds
Key: Client IP
On exceed: HTTP 429 with {error, message, retry_after} + Retry-After header

Security rationale: Prevents token enumeration attacks by rate-limiting guesses.

Request Parameters

The token is passed as a query parameter (not a JSON body):

async def reset_password(token: str, raw_request: Request):

Request: POST /auth/reset-password?token=<reset_token_value>

Execution Flow

Step 1 — Rate limit check:

rate_limit_result = await check_auth_rate_limit(client_ip, AuthRateLimitType.PASSWORD_RESET)

Step 2 — Token validation:

SELECT * FROM password_reset_tokens
WHERE token = <token>
  AND used = false

Returns HTTP 400 if no matching unused token found.

Step 3 — Expiry check:

expires_at = datetime.fromisoformat(token_data["expires_at"].replace("Z", "+00:00"))
if datetime.now(UTC).replace(tzinfo=expires_at.tzinfo) > expires_at:
    raise HTTPException(status_code=400, detail="Reset token has expired")

Step 4 — Mark token as used:

UPDATE password_reset_tokens
SET used = true
WHERE id = <token_id>

`password_reset_tokens` Table Schema

Column	Type	Description
`id`	int	Primary key
`token`	str	Opaque reset token value
`user_id`	int	Associated user ID
`used`	bool	Whether token has been consumed
`expires_at`	timestamp	Expiry datetime (ISO string with timezone)

Response Schema

{"message": "Password reset successfully"}

HTTP 200 on success.

Error Handling

Scenario	HTTP Status	Detail
Rate limit exceeded	429	`{error, message, retry_after}`
Invalid/used token	400	"Invalid or expired reset token"
Token expired	400	"Reset token has expired"
Unhandled exception	500	"Internal server error"

Implementation Notes

Important caveat: The current implementation comment reads:

# Update password (in a real app, you'd hash this)
# For now, we'll just mark the token as used

This means the endpoint:

Validates and consumes the token
Does NOT actually update a password field in the users table
Is effectively a placeholder that confirms the token is valid

A complete implementation would need:

New password in request body
Password hashing (bcrypt/argon2)
UPDATE users SET password_hash = <hash> WHERE id = <token_data.user_id>

Security Design

One-time use: token used flag prevents replay attacks
Expiry: tokens have a time-limited validity window
Rate limiting: prevents brute-force token guessing
Token from password_reset_tokens (not JWT — opaque DB-backed token)

Issue: #1688

Deep-Dive API Documentation: GET /v1/huggingface/author/{author}/models

Endpoint Overview

Method: GET
Path: /v1/huggingface/author/{author}/models
Handler: list_author_models_endpoint() in src/routes/catalog.py
Service: list_models_by_author() in src/services/huggingface_hub_service.py
Auth: None required (public endpoint)
SDK: huggingface_hub Python SDK (HfApi.list_models(author=...))
Purpose: Returns all public models published by a specific HuggingFace author or organization

Path Parameters

Parameter	Type	Description
`author`	str	HuggingFace username or organization name (e.g. `meta-llama`, `google`, `mistralai`)

Query Parameters

Parameter	Type	Default	Validation	Description
`limit`	int	20	ge=1, le=100	Max models to return

Execution Flow (3+ Levels Deep)

Level 1: Route Handler — `list_author_models_endpoint()`

@router.get("/huggingface/author/{author}/models")
async def list_author_models_endpoint(
    author: str,
    limit: int = Query(20, ge=1, le=100),
):
    models = await asyncio.to_thread(list_models_by_author, author=author, limit=limit)
    return {"author": author, "models": models, "count": len(models)}

Level 2: Author Model Lister — `list_models_by_author()`

Located in src/services/huggingface_hub_service.py:

def list_models_by_author(author: str, limit: int = 20) -> list[dict]:
    api = get_hf_api_client()
    models_iter = api.list_models(
        author=author,
        limit=limit,
        sort="downloads",
        direction=-1,     # Descending
        cardData=True,
    )
    results = []
    for model in models_iter:
        normalized = normalize_model_info(model)
        if normalized:   # Skip private/gated
            results.append(normalized)
    return results

HuggingFace API call: GET https://huggingface.co/api/models?author={author}&limit={limit}&sort=downloads&direction=-1

Filters by author field (exact match on namespace prefix)
Always sorted by downloads descending
Returns only models where model.id.startswith(f"{author}/")

Level 3: API Client — `get_hf_api_client()`

def get_hf_api_client() -> HfApi:
    return HfApi(token=Config.HUG_API_KEY)

Authenticated via HUG_API_KEY for higher rate limits.

Level 3: Model Normalizer — `normalize_model_info()`

def normalize_model_info(model: ModelInfo) -> dict | None:
    if getattr(model, "private", False):
        return None
    if getattr(model, "gated", False):
        return None

    return {
        "id": model.id,
        "name": model.id.split("/")[-1].replace("-", " ").replace("_", " ").title(),
        "description": getattr(model, "description", "") or "",
        "pipeline_tag": getattr(model, "pipeline_tag", None),
        "downloads": getattr(model, "downloads", 0),
        "likes": getattr(model, "likes", 0),
        "created_at": str(getattr(model, "created_at", "")),
        "author": model.id.split("/")[0] if "/" in model.id else None,
        "tags": getattr(model, "tags", []),
        "library_name": getattr(model, "library_name", None),
    }

Private and gated models are silently excluded. The author field in the response is extracted from the model ID namespace, which will always equal the path {author} parameter for results returned by the author= filter.

No Redis Operations

This endpoint does not use Redis.

No Supabase Operations

This endpoint does not query Supabase.

Response Schema

{
  "author": "meta-llama",
  "count": 18,
  "models": [
    {
      "id": "meta-llama/Llama-2-70b-chat-hf",
      "name": "Llama 2 70B Chat HF",
      "description": "",
      "pipeline_tag": "text-generation",
      "downloads": 4823912,
      "likes": 3201,
      "created_at": "2023-07-18 12:00:00+00:00",
      "author": "meta-llama",
      "tags": ["transformers", "llama", "text-generation"],
      "library_name": "transformers"
    },
    {
      "id": "meta-llama/Meta-Llama-3-8B-Instruct",
      "name": "Meta Llama 3 8B Instruct",
      "description": "",
      "pipeline_tag": "text-generation",
      "downloads": 12394821,
      "likes": 8934,
      "created_at": "2024-04-18 00:00:00+00:00",
      "author": "meta-llama",
      "tags": ["transformers", "safetensors", "llama"],
      "library_name": "transformers"
    }
  ]
}

Error Handling

Scenario	HTTP Status	Behavior
Author not found / no public models	200	Returns `{"models": [], "count": 0}`
All models are private/gated	200	Returns `{"models": [], "count": 0}` (filtered in Python)
HuggingFace API rate limit	500	`HfHubHTTPError` propagated from thread
HuggingFace API timeout	500	Exception propagated from `asyncio.to_thread()`
Invalid `author` (special chars)	200	HF API returns empty list

No explicit 404 is raised for unknown authors — HuggingFace API returns an empty list.

Comparison with Other HuggingFace Endpoints

Feature	`/discovery`	`/search`	`/author/{author}/models`
Filter	Task type	Text query	Author/org name
Sort	Downloads	Downloads	Downloads
Auth needed	No	No	No
Result scope	All public HF	Search matches	Author's public models
404 for empty	No	No	No (returns empty list)

Performance Characteristics

Latency: 200ms–2s (single HuggingFace author filter API call)
No caching: Every request makes a live API call
Server-side filtering: author= filter applied at HuggingFace servers
Client-side filtering: Private/gated exclusion in Python after response
Limit: Enforced at HuggingFace API level (server-side pagination)
Thread: Blocking SDK call in asyncio.to_thread() thread pool

Chat & Messaging

20 endpoints

Issue: #1629

Deep-Dive API Documentation: POST /api/chat/ai-sdk and POST /api/chat/ai-sdk-completions

Section 1: High-Level Overview

These two routes (/api/chat/ai-sdk and /api/chat/ai-sdk-completions) are registered to the same handler function ai_sdk_chat_completion() in src/routes/ai_sdk.py. They provide a Vercel AI SDK-compatible chat completion interface. The handler validates the user, checks trial access, adapts the request format via AISDKChatAdapter, and routes it through the unified ChatInferenceHandler. For streaming requests a StreamingResponse is returned with SSE headers; for non-streaming a standard JSON response is returned. Credit deduction, usage recording, and request metadata saving are handled as background tasks.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication: Depends(get_api_key) — NOT require_admin. Regular user API key authentication.

Auth chain:

get_api_key: extracts Bearer token, calls validate_api_key_security(api_key, client_ip, referer)
get_user(api_key): looks up user — HTTP 401 if not found
validate_trial_access(api_key): checks trial validity — HTTP 403 if not valid

Request Schema (AISDKChatRequest):

{
  "model": str,                   // Required. Format: "provider/model-name"
  "messages": [
    { "role": str, "content": str }  // Required list
  ],
  "max_tokens": int | null,       // Optional
  "temperature": float | null,    // Optional, 0.0-2.0
  "top_p": float | null,          // Optional
  "frequency_penalty": float | null,  // Optional
  "presence_penalty": float | null,   // Optional
  "stream": bool | null           // Optional, default false
}

Response Schema (AISDKChatResponse for non-streaming):

{
  "choices": [
    {
      "message": { "role": str, "content": str },
      "finish_reason": str | null
    }
  ],
  "usage": {
    "prompt_tokens": int,
    "completion_tokens": int,
    "total_tokens": int
  }
}

Streaming response: StreamingResponse with media_type="text/event-stream". Headers:

X-Accel-Buffering: no
Cache-Control: no-cache, no-transform
Connection: keep-alive

SSE format: data: {"choices": [{"delta": {"role": "assistant", "content": "..."}}]}\n\n Final chunks: data: {"choices": [{"finish_reason": "stop"}]}\n\n then data: [DONE]\n\n

Error codes:

Code	Condition
401	API key invalid or user not found
403	Trial access denied (`validate_trial_access` failed)
500	General processing error
503	AI SDK or OpenRouter not configured (ValueError)

2.2 Mermaid Diagram

flowchart TD
    A([POST /api/chat/ai-sdk]) --> B[get_api_key auth]
    B -->|invalid| C[HTTP 401]
    B -->|OK| D[get_user api_key]
    D -->|not found| E[HTTP 401]
    D -->|found| F[validate_trial_access api_key]
    F -->|denied| G[HTTP 403]
    F -->|OK| H[Generate request_id UUID\nrecord start_time]
    H --> I[AISDKChatAdapter.to_internal_request\nConvert AI SDK format to internal]
    I --> J[ChatInferenceHandler\napi_key, background_tasks]
    J --> K{request.stream?}
    K -->|true| L[handler.process_stream internal_request\nAdapter.from_internal_stream]
    L --> M[StreamingResponse SSE\nX-Accel-Buffering: no]
    K -->|false| N[await handler.process internal_request]
    N --> O[adapter.from_internal_response]
    O --> P[Return AISDKChatResponse]

    subgraph Error Handling
    Q[HTTPException] --> R[background_tasks: save_chat_completion_request\nstatus=failed]
    S[ValueError] --> T[logger.error\nSentry capture\nHTTP 503]
    U[Exception] --> V[logger.error\nSentry capture\nHTTP 500]
    end

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`get_api_key`	`src/security/deps.py:74`	Auth	Bearer token extraction + validate_api_key_security
`get_user`	`src/db/users.py`	DB read	Look up user by API key
`validate_trial_access`	`src/services/trial_validation.py`	Validation	Checks trial validity; returns dict with is_valid, is_trial, is_expired
`AISDKChatAdapter`	`src/adapters/chat.py`	Format conversion	Converts AI SDK format -> internal format; converts internal response -> AI SDK format
`ChatInferenceHandler`	`src/handlers/chat_handler.py`	Inference routing	Unified handler for all chat inference. Handles provider routing, streaming, non-streaming.
`handler.process(internal_request)`	`src/handlers/chat_handler.py`	Inference	Awaitable non-streaming inference call
`handler.process_stream(internal_request)`	`src/handlers/chat_handler.py`	Streaming	Returns async generator of internal stream chunks
`adapter.from_internal_stream`	`src/adapters/chat.py`	Format conversion	Converts internal stream -> AI SDK SSE format async generator
`save_chat_completion_request`	`src/db/chat_completion_requests.py`	DB write (background)	Saves request metadata to `chat_completion_requests` table. Called via `background_tasks.add_task()` on both success and failure paths.
`deduct_credits`	`src/db/users.py`	DB write	Credit deduction (executed in legacy code path after line 361 — currently unreachable in normal execution flow due to early return at line 360)
`record_usage`	`src/db/users.py`	DB write	Records usage stats (same legacy code path issue)
`calculate_cost`	`src/services/pricing.py`	Computation	Calculates USD cost from model name + token counts
`track_trial_usage`	`src/services/trial_validation.py`	DB write	Tracks trial token usage
`sentry_sdk.capture_exception`	sentry_sdk	Error capture	Fires on ValueError (503) and unexpected Exception (500)
`_check_trial_override`	`src/routes/ai_sdk.py:158`	Logic	Defense-in-depth: overrides is_trial if user has active Stripe subscription

Important code note (lines 360-441): There is dead/legacy code after return processed at line 360. The credit deduction, trial tracking, and save_chat_completion_request background task calls in the non-streaming path (lines 363–441) are unreachable in normal execution flow because the return processed statement at line 360 exits the function. The streaming path and error paths do save metadata correctly via background_tasks.add_task().

2.4 Side Effects

On success (non-streaming):

Background task: save_chat_completion_request writes to chat_completion_requests table (request_id, model_name, input_tokens, output_tokens, processing_time_ms, status="completed", user_id, provider_name, api_key_id)
Legacy dead code (unreachable): deduct_credits, record_usage, track_trial_usage

On failure (HTTPException, ValueError, Exception):

Background task: save_chat_completion_request writes to chat_completion_requests table with status="failed" and error_message
sentry_sdk.capture_exception on ValueError and Exception paths

Always:

audit_logger.log_api_key_usage on every authenticated call
Request correlation ID (uuid4()) generated and used for distributed tracing

Streaming side effects:

Credit deduction and usage recording occur AFTER streaming completes, inside the async generator function
Token estimation fallback (1 token ≈ 4 chars) applied when provider doesn't return usage data
capture_payment_error from src/utils/sentry_context.py called if post-stream credit deduction fails

Issue: #1630

Deep-Dive API Documentation: POST /api/chat/ai-sdk and POST /api/chat/ai-sdk-completions

Section 1: High-Level Overview

These two routes (/api/chat/ai-sdk and /api/chat/ai-sdk-completions) are registered to the same handler function ai_sdk_chat_completion() in src/routes/ai_sdk.py. They provide a Vercel AI SDK-compatible chat completion interface. The handler validates the user, checks trial access, adapts the request format via AISDKChatAdapter, and routes it through the unified ChatInferenceHandler. For streaming requests a StreamingResponse is returned with SSE headers; for non-streaming a standard JSON response is returned. Credit deduction, usage recording, and request metadata saving are handled as background tasks.

Section 2: Low-Level Deep-Dive

2.1 Requirements & Pipeline

Authentication: Depends(get_api_key) — NOT require_admin. Regular user API key authentication.

Auth chain:

get_api_key: extracts Bearer token, calls validate_api_key_security(api_key, client_ip, referer)
get_user(api_key): looks up user — HTTP 401 if not found
validate_trial_access(api_key): checks trial validity — HTTP 403 if not valid

Request Schema (AISDKChatRequest):

{
  "model": str,                   // Required. Format: "provider/model-name"
  "messages": [
    { "role": str, "content": str }  // Required list
  ],
  "max_tokens": int | null,       // Optional
  "temperature": float | null,    // Optional, 0.0-2.0
  "top_p": float | null,          // Optional
  "frequency_penalty": float | null,  // Optional
  "presence_penalty": float | null,   // Optional
  "stream": bool | null           // Optional, default false
}

Response Schema (AISDKChatResponse for non-streaming):

{
  "choices": [
    {
      "message": { "role": str, "content": str },
      "finish_reason": str | null
    }
  ],
  "usage": {
    "prompt_tokens": int,
    "completion_tokens": int,
    "total_tokens": int
  }
}

Streaming response: StreamingResponse with media_type="text/event-stream". Headers:

X-Accel-Buffering: no
Cache-Control: no-cache, no-transform
Connection: keep-alive

SSE format: data: {"choices": [{"delta": {"role": "assistant", "content": "..."}}]}\n\n Final chunks: data: {"choices": [{"finish_reason": "stop"}]}\n\n then data: [DONE]\n\n

Error codes:

Code	Condition
401	API key invalid or user not found
403	Trial access denied (`validate_trial_access` failed)
500	General processing error
503	AI SDK or OpenRouter not configured (ValueError)

2.2 Mermaid Diagram

flowchart TD
    A([POST /api/chat/ai-sdk]) --> B[get_api_key auth]
    B -->|invalid| C[HTTP 401]
    B -->|OK| D[get_user api_key]
    D -->|not found| E[HTTP 401]
    D -->|found| F[validate_trial_access api_key]
    F -->|denied| G[HTTP 403]
    F -->|OK| H[Generate request_id UUID\nrecord start_time]
    H --> I[AISDKChatAdapter.to_internal_request\nConvert AI SDK format to internal]
    I --> J[ChatInferenceHandler\napi_key, background_tasks]
    J --> K{request.stream?}
    K -->|true| L[handler.process_stream internal_request\nAdapter.from_internal_stream]
    L --> M[StreamingResponse SSE\nX-Accel-Buffering: no]
    K -->|false| N[await handler.process internal_request]
    N --> O[adapter.from_internal_response]
    O --> P[Return AISDKChatResponse]

    subgraph Error Handling
    Q[HTTPException] --> R[background_tasks: save_chat_completion_request\nstatus=failed]
    S[ValueError] --> T[logger.error\nSentry capture\nHTTP 503]
    U[Exception] --> V[logger.error\nSentry capture\nHTTP 500]
    end

2.3 Complete Dependency Map

Dependency	File	Operation	Details
`get_api_key`	`src/security/deps.py:74`	Auth	Bearer token extraction + validate_api_key_security
`get_user`	`src/db/users.py`	DB read	Look up user by API key
`validate_trial_access`	`src/services/trial_validation.py`	Validation	Checks trial validity; returns dict with is_valid, is_trial, is_expired
`AISDKChatAdapter`	`src/adapters/chat.py`	Format conversion	Converts AI SDK format -> internal format; converts internal response -> AI SDK format
`ChatInferenceHandler`	`src/handlers/chat_handler.py`	Inference routing	Unified handler for all chat inference. Handles provider routing, streaming, non-streaming.
`handler.process(internal_request)`	`src/handlers/chat_handler.py`	Inference	Awaitable non-streaming inference call
`handler.process_stream(internal_request)`	`src/handlers/chat_handler.py`	Streaming	Returns async generator of internal stream chunks
`adapter.from_internal_stream`	`src/adapters/chat.py`	Format conversion	Converts internal stream -> AI SDK SSE format async generator
`save_chat_completion_request`	`src/db/chat_completion_requests.py`	DB write (background)	Saves request metadata to `chat_completion_requests` table. Called via `background_tasks.add_task()` on both success and failure paths.
`deduct_credits`	`src/db/users.py`	DB write	Credit deduction (executed in legacy code path after line 361 — currently unreachable in normal execution flow due to early return at line 360)
`record_usage`	`src/db/users.py`	DB write	Records usage stats (same legacy code path issue)
`calculate_cost`	`src/services/pricing.py`	Computation	Calculates USD cost from model name + token counts
`track_trial_usage`	`src/services/trial_validation.py`	DB write	Tracks trial token usage
`sentry_sdk.capture_exception`	sentry_sdk	Error capture	Fires on ValueError (503) and unexpected Exception (500)
`_check_trial_override`	`src/routes/ai_sdk.py:158`	Logic	Defense-in-depth: overrides is_trial if user has active Stripe subscription

Important code note (lines 360-441): There is dead/legacy code after return processed at line 360. The credit deduction, trial tracking, and save_chat_completion_request background task calls in the non-streaming path (lines 363–441) are unreachable in normal execution flow because the return processed statement at line 360 exits the function. The streaming path and error paths do save metadata correctly via background_tasks.add_task().

2.4 Side Effects

On success (non-streaming):

Background task: save_chat_completion_request writes to chat_completion_requests table (request_id, model_name, input_tokens, output_tokens, processing_time_ms, status="completed", user_id, provider_name, api_key_id)
Legacy dead code (unreachable): deduct_credits, record_usage, track_trial_usage

On failure (HTTPException, ValueError, Exception):

Background task: save_chat_completion_request writes to chat_completion_requests table with status="failed" and error_message
sentry_sdk.capture_exception on ValueError and Exception paths

Always:

audit_logger.log_api_key_usage on every authenticated call
Request correlation ID (uuid4()) generated and used for distributed tracing

Streaming side effects:

Credit deduction and usage recording occur AFTER streaming completes, inside the async generator function
Token estimation fallback (1 token ≈ 4 chars) applied when provider doesn't return usage data
capture_payment_error from src/utils/sentry_context.py called if post-stream credit deduction fails

Issue: #1689

Deep-Dive API Documentation: POST /v1/chat/completions

Endpoint Overview

Method: POST
Path: /v1/chat/completions
Handler: chat_completions() in src/routes/chat.py
Auth: Optional — supports both anonymous (no key) and authenticated (API key) users
Purpose: Primary chat inference endpoint. Routes requests to 30+ AI providers with failover, credit billing, streaming, rate limiting, web search injection, and full observability

Request Schema: `ProxyRequest`

Defined in src/schemas/proxy.py:

Field	Type	Default	Validation	Description
`model`	str	Required	—	Model ID (e.g. `openrouter/meta-llama/llama-3.1-70b`)
`messages`	list[Message]	Required	min_length=1	Conversation messages
`max_tokens`	int	4096	—	Max tokens to generate
`temperature`	float	1.0	ge=0, le=2	Sampling temperature
`top_p`	float	1.0	ge=0, le=1	Nucleus sampling probability
`n`	int	1	ge=1	Number of completions
`stop`	str\|list	None	max 4 if list	Stop sequences
`frequency_penalty`	float	0.0	ge=-2, le=2	Frequency penalty
`presence_penalty`	float	0.0	ge=-2, le=2	Presence penalty
`stream`	bool	False	—	Enable SSE streaming
`stream_options`	dict	None	—	Streaming options
`tools`	list	None	—	Function/tool definitions
`tool_choice`	any	None	—	Tool selection strategy
`parallel_tool_calls`	bool	True	—	Allow parallel tool calls
`response_format`	dict	None	—	Output format (e.g. JSON mode)
`logprobs`	bool	None	—	Return log probabilities
`top_logprobs`	int	None	ge=0, le=20	Number of top logprobs
`logit_bias`	dict	None	—	Token bias map
`seed`	int	None	—	Random seed
`user`	str	None	—	User identifier
`service_tier`	str	None	—	Service tier hint
`provider`	str	None	—	Force specific provider
`auto_web_search`	str	`"auto"`	—	Web search mode: auto/always/never
`web_search_threshold`	float	0.5	ge=0, le=1	Confidence threshold for auto search

Message Schema (src/schemas/proxy.py):

role: str — validated against {system, user, assistant, tool, function, developer}
content: str|list|None
name: str (optional)
tool_calls: list (optional)
tool_call_id: str (optional)

Config: extra="allow" — unknown fields passed through to providers.

Execution Flow (5+ Levels Deep)

Level 1: Route Handler — `chat_completions()`

@router.post("/v1/chat/completions")
async def chat_completions(
    request: ProxyRequest,
    http_request: Request,
    background_tasks: BackgroundTasks,
    api_key: str | None = Depends(get_optional_api_key),
):

Anonymous path (no API key):

validate_anonymous_request(request, http_request) — IP rate limit + model whitelist check
Detect provider + transform model ID
Route to anonymous provider handler

Authenticated path (API key present):

Parallel auth: asyncio.gather(get_user_task, get_api_key_id_task, get_trial_task)
Trial validation: validate_trial_request() if on trial plan
Plan check: check_user_plan() — verify subscription active
Rate limiting: check_rate_limit() — Redis-based per-key/per-user limits
Credit check: check_sufficient_credits() — Supabase balance lookup
Auto web search injection (if enabled)
Router detection: auto/general/code router prefix
Provider detection + model ID transformation
Failover chain construction
Health-based provider selection
Streaming or non-streaming dispatch
Background tasks: credit deduction, activity log, chat history, health capture

Level 2: Anonymous Validation — `validate_anonymous_request()`

Located in src/services/anonymous_rate_limiter.py:

Redis operation: INCR anon_rate:{ip_hash}:{minute_bucket} with TTL 60s
Model whitelist: Checks model ID against ANONYMOUS_ALLOWED_MODELS list
Raises HTTP 429 if rate limit exceeded, HTTP 403 if model not in whitelist

Level 2: Parallel Auth — `asyncio.gather()`

Three concurrent tasks:

user_task = get_user_by_api_key(api_key)              # Supabase: users table
api_key_id_task = get_api_key_id(api_key)             # Supabase: api_keys table
trial_task = get_user_trial_status(user_id)           # Supabase: trials table

Level 2: Rate Limiting — `check_rate_limit()`

Located in src/services/rate_limiting.py:

# Redis pipeline
pipe = redis.pipeline()
pipe.incr(f"rate_limit:{api_key_id}:{minute_bucket}")
pipe.expire(f"rate_limit:{api_key_id}:{minute_bucket}", 60)
pipe.get(f"rate_limit_config:{user_id}")
results = await pipe.execute()

Key: rate_limit:{api_key_id}:{minute_bucket}
Config key: rate_limit_config:{user_id} (custom limits)
Default limits from plan tier

Level 2: Router Logic

model_id = request.model
if model_id.startswith("router:"):
    router_type = model_id.split(":")[1]  # "auto", "general", "code"
    # Select actual model via NotDiamond/benchmark routing
    actual_model = await select_router_model(router_type, request.messages)
    request.model = actual_model

Level 3: Provider Detection — `detect_provider_from_model_id()`

Located in src/services/model_transformations.py:

def detect_provider_from_model_id(model_id: str) -> str:
    prefix = model_id.split("/")[0]
    return PROVIDER_PREFIX_MAP.get(prefix, "openrouter")  # Default: openrouter

Maps prefixes like featherless/, chutes/, deepinfra/ to provider slugs.

Level 3: Failover Chain — `build_provider_failover_chain()`

Located in src/services/provider_failover.py:

def build_provider_failover_chain(primary_provider: str, model_id: str) -> list[str]:
    chain = [primary_provider]
    for fallback in PROVIDER_FAILOVER_MAP.get(primary_provider, []):
        if not is_circuit_open(fallback):  # Check circuit breaker
            chain.append(fallback)
    return chain

Redis: GET circuit_breaker:{provider} per fallback provider

Level 3: Streaming Handler — `stream_generator()`

async def stream_generator():
    async for chunk in provider_stream:
        # Track time-to-first-chunk
        if first_chunk:
            ttfc = time.time() - start_time
            track_time_to_first_chunk(provider, model, ttfc)  # Prometheus
            first_chunk = False

        # Normalize SSE chunk across providers
        normalized = StreamNormalizer.normalize(chunk, provider)
        yield f"data: {json.dumps(normalized)}\n\n"

    yield "data: [DONE]\n\n"

    # Non-blocking background post-processing
    asyncio.create_task(_process_stream_completion_background(
        user_id, api_key_id, model, tokens, cost, session_id
    ))

Level 4: Background Post-Processing — `_process_stream_completion_background()`

async def _process_stream_completion_background(...):
    # 1. Deduct credits
    await deduct_credits(user_id, cost)
    # Supabase: UPDATE credit_transactions INSERT + users UPDATE balance

    # 2. Log activity
    await log_activity(user_id, api_key_id, model, tokens, cost)
    # Supabase: INSERT INTO activity (user_id, api_key_id, model, ...)

    # 3. Save chat history
    if session_id:
        await save_chat_message(session_id, "assistant", response_content, model, tokens)
        # Supabase: INSERT INTO chat_messages + UPDATE chat_sessions

    # 4. Capture model health
    capture_model_health(model, provider, success=True, latency=latency)
    # Redis: LPUSH model_health:{model} + EXPIRE

Prometheus Metrics

Metric	Type	Labels	When Recorded
`model_inference_requests`	Counter	provider, model, status (success/error)	Every request completion
`model_inference_duration`	Histogram	provider, model	Request duration (buckets: 0.1–60s)
`tokens_used`	Counter	provider, model, token_type (prompt/completion)	After completion
`credits_used`	Counter	provider, model	After credit deduction
`api_cost_usd_total`	Counter	provider, model	After cost calculation
`api_cost_per_request`	Histogram	provider, model	Per-request cost distribution

TTFC (time-to-first-chunk) tracked via track_time_to_first_chunk() (custom Prometheus metric).

Redis Operations

Operation	Key Pattern	Purpose
INCR + EXPIRE	`anon_rate:{ip_hash}:{minute}`	Anonymous rate limiting
INCR + EXPIRE	`rate_limit:{api_key_id}:{minute}`	Authenticated rate limiting
GET	`rate_limit_config:{user_id}`	Custom rate limit config
GET	`circuit_breaker:{provider}`	Provider circuit breaker state
GET	`model_catalog:{provider}`	Model metadata lookup
LPUSH + EXPIRE	`model_health:{model}`	Health data recording

Supabase Operations

Table	Operation	Trigger
`users`	SELECT by API key	Auth
`api_keys`	SELECT by key hash	Auth
`trials`	SELECT by user_id	Trial validation
`plans`	SELECT by user_id	Plan check
`rate_limits`	SELECT by user_id	Custom rate limits
`credit_transactions`	INSERT	Credit deduction
`users`	UPDATE `credit_balance`	Credit deduction
`activity`	INSERT	Activity logging
`chat_sessions`	INSERT/UPDATE	Chat history
`chat_messages`	INSERT	Chat history
`model_health`	INSERT	Health capture

Error Handling

Scenario	HTTP Status	Response
No API key + model not in whitelist	403	`{"detail": "Model not available for anonymous use"}`
Anonymous rate limit exceeded	429	`{"detail": "Rate limit exceeded"}`
Insufficient credits	402	`{"detail": "Insufficient credits"}`
Rate limit exceeded	429	`{"detail": "Rate limit exceeded. Limit: X RPM"}`
All providers in failover fail	502	`{"detail": "All providers failed"}`
Provider timeout	504	`{"detail": "Request timed out"}`
Trial expired	403	`{"detail": "Trial expired"}`
Invalid messages	422	FastAPI/Pydantic validation

Provider Routing Registry (PROVIDER_ROUTING)

Maps provider slugs to async handler functions:

PROVIDER_ROUTING = {
    "openrouter": {"request": openrouter_request, "stream": openrouter_stream, ...},
    "featherless": {"request": featherless_request, "stream": featherless_stream, ...},
    "chutes": {...}, "deepinfra": {...}, "fireworks": {...},
    "together": {...}, "groq": {...}, "cerebras": {...},
    # ... 25+ total providers
}

Issue: #1691

Deep-Dive API Documentation: GET /v1/chat/sessions

Endpoint Overview

Method: GET
Path: /v1/chat/sessions
Handler: get_sessions() in src/routes/chat_history.py
Auth: Required — API key via Depends(get_api_key)
Purpose: Returns a paginated list of the authenticated user's chat sessions, ordered by most recently updated. Each session includes basic metadata (id, title, model, timestamps) without message content.

Query Parameters

Parameter	Type	Default	Validation	Description
`limit`	int	20	ge=1, le=100	Max sessions to return
`offset`	int	0	ge=0	Pagination offset

Execution Flow (3+ Levels Deep)

Level 1: Route Handler — `get_sessions()`

@router.get("/sessions")
async def get_sessions(
    limit: int = Query(20, ge=1, le=100),
    offset: int = Query(0, ge=0),
    api_key: str = Depends(get_api_key),
):
    user = await get_user(api_key)
    if not user:
        raise HTTPException(status_code=401, detail="User not found")

    sessions = await get_user_chat_sessions(
        user_id=user["id"],
        limit=limit,
        offset=offset,
    )
    return ChatSessionsListResponse(
        success=True,
        data=sessions,
        count=len(sessions),
        message=f"Found {len(sessions)} sessions",
    )

Level 2: User Lookup — `get_user(api_key)`

Located in src/services/user_lookup_cache.py:

async def get_user(api_key: str) -> dict | None:
    # Check in-process LRU cache first
    cached = _user_cache.get(api_key)
    if cached:
        return cached

    # Fetch from DB
    user = await get_user_by_api_key(api_key)
    if user:
        _user_cache[api_key] = user  # Cache for TTL
    return user

Cache: cachetools.TTLCache, max 512 entries, TTL 300s (5 minutes) Reduces Supabase queries by ~95% for repeat requests from same API key.

Supabase Query (on cache miss):

supabase.table("api_keys")
    .select("user_id, is_active")
    .eq("key_hash", hmac_sha256(api_key))
    .eq("is_active", True)
    .single()
    .execute()
# Then:
supabase.table("users")
    .select("*")
    .eq("id", user_id)
    .single()
    .execute()

Level 2: Session Retrieval — `get_user_chat_sessions()`

Located in src/db/chat_history.py:

@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
async def get_user_chat_sessions(
    user_id: str,
    limit: int = 20,
    offset: int = 0,
) -> list[dict]:
    result = (
        supabase.table("chat_sessions")
        .select("*")
        .eq("user_id", user_id)
        .eq("is_active", True)
        .order("updated_at", desc=True)
        .range(offset, offset + limit - 1)
        .execute()
    )
    return result.data or []

Supabase Query:

Table: chat_sessions
Operation: SELECT *
Filters: user_id = {user_id} AND is_active = True
Order: updated_at DESC
Pagination: .range(offset, offset + limit - 1) (Supabase server-side)

@with_retry decorator:

Max attempts: 3
Initial delay: 0.1s (exponential backoff)
Max delay: 2.0s
Retries on: RemoteProtocolError, ConnectError, ReadTimeout

Level 3: Retry Wrapper — `_execute_with_connection_retry()`

async def _execute_with_connection_retry(func, *args, max_retries=3, **kwargs):
    for attempt in range(max_retries):
        try:
            return await func(*args, **kwargs)
        except (RemoteProtocolError, ConnectError, ReadTimeout) as e:
            if attempt == max_retries - 1:
                raise
            delay = 0.1 * (2 ** attempt)  # 0.1s, 0.2s, 0.4s
            await asyncio.sleep(delay)

No Redis Operations

This endpoint does not use Redis. User lookup uses in-process TTLCache only.

Supabase Operations

Table	Operation	Columns	Filters	Notes
`api_keys`	SELECT	`user_id, is_active`	`key_hash = ?` AND `is_active = True`	On user cache miss
`users`	SELECT	`*`	`id = ?`	On user cache miss
`chat_sessions`	SELECT	`*`	`user_id = ?` AND `is_active = True` ORDER BY `updated_at` DESC	Always

Response Schema: `ChatSessionsListResponse`

Defined in src/schemas/chat.py:

{
  "success": true,
  "count": 3,
  "message": "Found 3 sessions",
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "user_id": "user-uuid-here",
      "title": "Python debugging help",
      "model": "openrouter/meta-llama/llama-3.1-70b-instruct",
      "created_at": "2026-03-04T08:00:00Z",
      "updated_at": "2026-03-04T09:30:00Z",
      "is_active": true,
      "messages": []
    }
  ]
}

ChatSession Schema fields (src/schemas/chat.py):

id: UUID string
user_id: UUID string
title: str or None
model: str or None
created_at: datetime string
updated_at: datetime string
is_active: bool (default True)
messages: list[ChatMessage] (empty for list endpoint — populated only in detail endpoint)

Error Handling

Scenario	HTTP Status	Response
Missing API key	401	`{"detail": "API key required"}`
Invalid API key	401	`{"detail": "Invalid API key"}`
User not found	401	`{"detail": "User not found"}`
No sessions exist	200	`{"success": true, "count": 0, "data": []}`
DB connection error (after retries)	500	Exception propagated
`limit` > 100	422	FastAPI validation error

Performance Characteristics

User cache hit: ~1ms (in-process TTLCache lookup)
User cache miss: ~50–200ms (2 Supabase queries for key lookup + user)
Session query: ~20–100ms (indexed by user_id + is_active)
Total warm path: ~25–110ms
Retry overhead: Up to 0.7s additional (3 retries: 0.1 + 0.2 + 0.4s) on transient errors
Pagination: Server-side via Supabase .range() — no Python-side slicing

chat_sessions Table Schema

Key columns:

id UUID PRIMARY KEY
user_id UUID REFERENCES users(id)
title TEXT
model TEXT
is_active BOOLEAN DEFAULT true
created_at TIMESTAMPTZ DEFAULT now()
updated_at TIMESTAMPTZ DEFAULT now()

Index: (user_id, is_active, updated_at DESC) for efficient pagination.

Issue: #1692

Deep-Dive API Documentation: GET /v1/chat/sessions/{session_id}

Endpoint Overview

Method: GET
Path: /v1/chat/sessions/{session_id}
Handler: get_session() in src/routes/chat_history.py
Auth: Required — API key via Depends(get_api_key)
Purpose: Returns a single chat session with its full message history. The session must belong to the authenticated user.

Path Parameters

Parameter	Type	Description
`session_id`	str	UUID of the chat session

Execution Flow (3+ Levels Deep)

Level 1: Route Handler — `get_session()`

@router.get("/sessions/{session_id}")
async def get_session(
    session_id: str,
    api_key: str = Depends(get_api_key),
):
    user = await get_user(api_key)
    if not user:
        raise HTTPException(status_code=401, detail="User not found")

    session = await get_chat_session(
        session_id=session_id,
        user_id=user["id"],
    )
    if not session:
        raise HTTPException(status_code=404, detail="Session not found")

    return ChatSessionResponse(
        success=True,
        data=session,
        message="Session retrieved successfully",
    )

Resolves user from API key (with in-process cache)
Calls get_chat_session(session_id, user_id) — both filters applied for ownership check
Returns 404 if session not found or doesn't belong to user

Level 2: User Lookup — `get_user(api_key)`

Located in src/services/user_lookup_cache.py:

In-process TTLCache: max 512 entries, TTL 300s
Cache miss Supabase queries: api_keys SELECT + users SELECT

Level 2: Session + Messages Fetch — `get_chat_session()`

Located in src/db/chat_history.py:

@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
async def get_chat_session(session_id: str, user_id: str) -> dict | None:
    # Query 1: Fetch session
    session_result = (
        supabase.table("chat_sessions")
        .select("*")
        .eq("id", session_id)
        .eq("user_id", user_id)   # Ownership enforcement
        .eq("is_active", True)
        .single()
        .execute()
    )

    if not session_result.data:
        return None

    session = session_result.data

    # Query 2: Fetch messages for this session
    messages_result = (
        supabase.table("chat_messages")
        .select("*")
        .eq("session_id", session_id)
        .order("created_at", desc=False)  # Chronological order
        .execute()
    )

    session["messages"] = messages_result.data or []
    return session

Two sequential Supabase queries per request:

Fetch session metadata (with ownership check)
Fetch all messages for that session in chronological order

Level 3: Ownership Enforcement

The user_id filter in the session query (eq("user_id", user_id)) serves as the authorization check. A session belonging to a different user will return None → HTTP 404, preventing information leakage.

Level 3: Retry Wrapper

@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0):

Retries on RemoteProtocolError, ConnectError, ReadTimeout
Exponential backoff: 0.1s, 0.2s, 0.4s between attempts
After max attempts, exception propagates (→ HTTP 500)

Supabase Operations

Table	Operation	Columns	Filters	Notes
`api_keys`	SELECT	`user_id, is_active`	`key_hash = ?` AND `is_active = True`	User cache miss only
`users`	SELECT	`*`	`id = ?`	User cache miss only
`chat_sessions`	SELECT	`*`	`id = ?` AND `user_id = ?` AND `is_active = True`	Always
`chat_messages`	SELECT	`*`	`session_id = ?` ORDER BY `created_at` ASC	Always (if session found)

No Redis Operations

This endpoint does not use Redis. All data from Supabase, user lookup from in-process TTLCache.

Response Schema: `ChatSessionResponse`

Defined in src/schemas/chat.py:

{
  "success": true,
  "message": "Session retrieved successfully",
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "user_id": "user-uuid-here",
    "title": "Python debugging help",
    "model": "openrouter/meta-llama/llama-3.1-70b-instruct",
    "created_at": "2026-03-04T08:00:00Z",
    "updated_at": "2026-03-04T09:30:00Z",
    "is_active": true,
    "messages": [
      {
        "id": "msg-uuid-1",
        "session_id": "550e8400-e29b-41d4-a716-446655440000",
        "role": "user",
        "content": "Why is my Python code throwing a KeyError?",
        "model": null,
        "tokens": 0,
        "created_at": "2026-03-04T08:00:05Z"
      },
      {
        "id": "msg-uuid-2",
        "session_id": "550e8400-e29b-41d4-a716-446655440000",
        "role": "assistant",
        "content": "A KeyError occurs when you try to access a dictionary key that doesn't exist...",
        "model": "openrouter/meta-llama/llama-3.1-70b-instruct",
        "tokens": 147,
        "created_at": "2026-03-04T08:00:08Z"
      }
    ]
  }
}

ChatMessage Schema fields:

id: UUID string
session_id: UUID string
role: str (user, assistant, system, tool)
content: str
model: str or None (which model generated the message)
tokens: int (default 0)
created_at: datetime string

Error Handling

Scenario	HTTP Status	Response
Missing API key	401	`{"detail": "API key required"}`
Invalid API key	401	`{"detail": "Invalid API key"}`
User not found	401	`{"detail": "User not found"}`
Session not found	404	`{"detail": "Session not found"}`
Session belongs to different user	404	`{"detail": "Session not found"}` (same as not found — no info leakage)
Deleted session (`is_active=False`)	404	`{"detail": "Session not found"}`
DB error after retries	500	Exception propagated

Performance Characteristics

User cache hit: ~1ms (TTLCache)
User cache miss: ~50–200ms (2 Supabase queries)
Session query: ~20–80ms (indexed by id + user_id)
Messages query: ~20–200ms (depends on message count; indexed by session_id + created_at)
Total warm path: ~45–285ms
Message volume: No pagination on messages — all messages returned for the session
Large sessions: Sessions with 1000+ messages may have significant response size

chat_messages Table Schema

Key columns:

id UUID PRIMARY KEY
session_id UUID REFERENCES chat_sessions(id)
role TEXT NOT NULL
content TEXT NOT NULL
model TEXT
tokens INTEGER DEFAULT 0
created_at TIMESTAMPTZ DEFAULT now()

Index: (session_id, created_at ASC) for efficient chronological message retrieval.

Issue: #1693

Deep-Dive API Documentation: GET /v1/chat/stats

Endpoint Overview

Method: GET
Path: /v1/chat/stats
Handler: get_stats() in src/routes/chat_history.py
Auth: Required — API key via Depends(get_api_key)
Purpose: Returns aggregate statistics about the authenticated user's chat history: total session count, total message count, and total tokens used across all sessions

Query Parameters

None.

Execution Flow (3+ Levels Deep)

Level 1: Route Handler — `get_stats()`

@router.get("/stats")
async def get_stats(
    api_key: str = Depends(get_api_key),
):
    user = await get_user(api_key)
    if not user:
        raise HTTPException(status_code=401, detail="User not found")

    stats = await get_chat_session_stats(user_id=user["id"])
    return ChatSessionStatsResponse(
        success=True,
        stats=stats,
        message="Chat statistics retrieved successfully",
    )

Level 2: User Lookup — `get_user(api_key)`

Located in src/services/user_lookup_cache.py:

In-process TTLCache: max 512 entries, TTL 300s
Cache miss: 2 Supabase queries (api_keys + users tables)

Level 2: Stats Aggregation — `get_chat_session_stats()`

Located in src/db/chat_history.py:

@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
async def get_chat_session_stats(user_id: str) -> dict:
    # Query 1: Count active sessions
    sessions_result = (
        supabase.table("chat_sessions")
        .select("id", count="exact")
        .eq("user_id", user_id)
        .eq("is_active", True)
        .execute()
    )
    session_count = sessions_result.count or 0

    # Query 2: Count total messages across all user's sessions
    messages_result = (
        supabase.table("chat_messages")
        .select("id", count="exact")
        .eq("chat_sessions.user_id", user_id)  # JOIN filter
        .execute()
    )
    message_count = messages_result.count or 0

    # Query 3: Sum total tokens across all user's messages
    tokens_result = (
        supabase.table("chat_messages")
        .select("tokens, chat_sessions!inner(user_id)")
        .eq("chat_sessions.user_id", user_id)  # JOIN on chat_sessions
        .execute()
    )
    total_tokens = sum(
        row.get("tokens", 0) or 0
        for row in (tokens_result.data or [])
    )

    return {
        "total_sessions": session_count,
        "total_messages": message_count,
        "total_tokens": total_tokens,
    }

Three sequential Supabase queries:

COUNT of active sessions for user
COUNT of messages via JOIN with chat_sessions (filtered by user_id)
SUM of tokens via JOIN with chat_sessions (fetches all rows, sums in Python)

Level 3: Retry Wrapper

@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0):

Retries all 3 queries as a unit on transient connection errors
Exponential backoff: 0.1s, 0.2s, 0.4s

Level 3: Token Sum Implementation

The token sum is computed in Python application layer (not SQL SUM):

total_tokens = sum(
    row.get("tokens", 0) or 0
    for row in (tokens_result.data or [])
)

This fetches ALL message rows and sums locally. For users with many messages, this can return large amounts of data.

Supabase Operations

Table	Operation	Columns	Filters	Count Mode
`api_keys`	SELECT	`user_id, is_active`	`key_hash = ?` AND `is_active = True`	User cache miss
`users`	SELECT	`*`	`id = ?`	User cache miss
`chat_sessions`	SELECT (count)	`id`	`user_id = ?` AND `is_active = True`	`count="exact"` (Supabase COUNT)
`chat_messages`	SELECT (count)	`id`	JOIN `chat_sessions.user_id = ?`	`count="exact"` (Supabase COUNT)
`chat_messages`	SELECT	`tokens, chat_sessions(user_id)`	JOIN `chat_sessions.user_id = ?`	Fetch all for Python sum

No Redis Operations

This endpoint does not use Redis. Stats are computed fresh from Supabase on each request.

Response Schema: `ChatSessionStatsResponse`

Defined in src/schemas/chat.py:

{
  "success": true,
  "message": "Chat statistics retrieved successfully",
  "stats": {
    "total_sessions": 47,
    "total_messages": 1293,
    "total_tokens": 842750
  }
}

Stats fields:

total_sessions: Count of active (is_active=True) chat sessions
total_messages: Total message count across all sessions
total_tokens: Sum of tokens column across all chat_messages for this user

Error Handling

Scenario	HTTP Status	Response
Missing API key	401	`{"detail": "API key required"}`
Invalid API key	401	`{"detail": "Invalid API key"}`
User not found	401	`{"detail": "User not found"}`
No sessions/messages	200	`{"stats": {"total_sessions": 0, "total_messages": 0, "total_tokens": 0}}`
DB error after retries	500	Exception propagated
Message `tokens` field is None	200	`None or 0` guard handles gracefully

Performance Characteristics

User cache hit: ~1ms (TTLCache)
User cache miss: ~50–200ms (2 Supabase queries)
Session count query: ~20–50ms (COUNT with user_id index)
Message count query: ~30–100ms (COUNT with JOIN)
Token sum query: ~50ms–5s+ (fetches ALL message rows with tokens; grows linearly with message count)
Total warm path: ~100–400ms for typical users
Scalability concern: The token sum query fetches all message rows in Python memory. Users with 10,000+ messages may experience slow responses. A SQL SUM(tokens) aggregation would be more efficient.

Recommended Optimization (not yet implemented)

Replace the Python-side token sum with a Supabase RPC call:

SELECT SUM(m.tokens)
FROM chat_messages m
JOIN chat_sessions s ON m.session_id = s.id
WHERE s.user_id = $1 AND s.is_active = TRUE;

This would reduce data transfer from O(n messages) to a single integer.

Schema Context

chat_sessions relevant columns:

id, user_id, is_active

chat_messages relevant columns:

id, session_id, tokens (INTEGER, stores token count for each message)

Session-to-messages relationship: chat_messages.session_id → chat_sessions.id User-to-sessions relationship: chat_sessions.user_id → users.id

Issue: #1694

API Endpoint Documentation: GET /v1/chat/feedback

Handler: `get_my_feedback()` in `src/routes/chat_history.py`

1. Overview

Returns the authenticated user's feedback history with optional filtering by feedback type, session ID, and model name. Supports pagination via limit and offset query parameters.

Route: GET /v1/chat/feedback Router prefix: /v1/chat Tags: chat-history Response model: MessageFeedbackListResponse Auth: Required (Bearer token via get_api_key)

2. Query Parameters

Parameter	Type	Default	Validation	Description
`feedback_type`	`str \| None`	`None`	None (any string accepted)	Filter by feedback type (`thumbs_up`, `thumbs_down`, `regenerate`)
`session_id`	`int \| None`	`None`	None	Filter by chat session ID
`model`	`str \| None`	`None`	None	Filter by model name
`limit`	`int`	`50`	`ge=1, le=100`	Max records to return
`offset`	`int`	`0`	`ge=0`	Pagination offset

3. Pydantic Schemas

`MessageFeedbackListResponse`

Field	Type	Default	Description
`success`	`bool`	required	Operation success flag
`data`	`list[MessageFeedback]`	required	List of feedback records
`count`	`int`	required	Number of records returned
`message`	`str \| None`	`None`	Human-readable message

`MessageFeedback`

Field	Type	Default	Validation	Description
`id`	`int \| None`	`None`	-	Feedback record ID
`session_id`	`int \| None`	`None`	-	Associated session
`message_id`	`int \| None`	`None`	-	Associated message
`user_id`	`int`	required	-	User who submitted
`feedback_type`	`Literal["thumbs_up","thumbs_down","regenerate"]`	required	Literal check	Feedback type
`rating`	`int \| None`	`None`	`ge=1, le=5`	Star rating
`comment`	`str \| None`	`None`	-	Text comment
`model`	`str \| None`	`None`	-	Model name
`metadata`	`dict[str, Any] \| None`	`None`	-	Additional context
`created_at`	`datetime \| None`	`None`	-	Creation timestamp
`updated_at`	`datetime \| None`	`None`	-	Last update timestamp

4. Dependency Chain

get_my_feedback()
├── get_api_key() [src/security/deps.py]
│   ├── HTTPBearer (extracts Bearer token)
│   ├── validate_api_key_security() [src/security/security.py]
│   │   └── Checks: active, expired, request limits, IP allowlist, domain
│   ├── get_user() [src/services/user_lookup_cache.py] → audit logging
│   └── audit_logger.log_api_key_usage()
├── get_user(api_key) [src/services/user_lookup_cache.py]
│   └── db_get_user() [src/db/users.py] (60s TTL in-memory cache)
└── get_user_feedback() [src/db/feedback.py]
    └── Supabase query on message_feedback table

5. Supabase Queries

Query: Get user feedback

Table: message_feedback
Operation: SELECT *
Filters:
- .eq("user_id", user_id) (always)
- .eq("feedback_type", feedback_type) (if provided)
- .eq("session_id", session_id) (if provided)
- .eq("model", model) (if provided)
Order: .order("created_at", desc=True)
Pagination: .range(offset, offset + limit - 1)
Retry: _execute_with_connection_retry (3 retries, exponential backoff 0.1s initial)

6. Redis Operations

None directly. The user lookup uses in-memory caching (60s TTL) in src/db/users.py, not Redis.

7. Prometheus Metrics

None directly emitted by this endpoint. Middleware-level metrics (request latency, status codes) apply via the global middleware pipeline.

8. Middleware Effects

Security Middleware (src/middleware/security_middleware.py): IP rate limiting, behavioral analysis, velocity mode
Sentry Middleware (src/middleware/sentry_middleware.py): Error tracking
Observability Middleware (src/middleware/observability_middleware.py): Request/response logging
Timeout Middleware (src/middleware/timeout_middleware.py): Request timeout
GZip Middleware (src/middleware/gzip_middleware.py): Response compression
Trace Middleware (src/middleware/trace_middleware.py): OpenTelemetry tracing

9. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Missing/invalid API key (from `get_api_key`)
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(500)`	500	Any unhandled exception

Error flow: All HTTPExceptions re-raised. Generic exceptions caught and wrapped in 500.

10. Mermaid Diagram

flowchart TD
    A[GET /v1/chat/feedback] --> B{Auth: get_api_key}
    B -->|Invalid/Missing| C[401 Unauthorized]
    B -->|Valid| D[get_user api_key]
    D -->|None| E[401 Invalid API key]
    D -->|User found| F[get_user_feedback]
    F --> G{Apply filters}
    G --> H[feedback_type filter?]
    G --> I[session_id filter?]
    G --> J[model filter?]
    H --> K[Query message_feedback table]
    I --> K
    J --> K
    K -->|Success| L[Return MessageFeedbackListResponse]
    K -->|DB Error| M[500 Internal Server Error]
    L --> N[200 OK with feedback list]

Issue: #1695

API Endpoint Documentation: GET /v1/chat/feedback/stats

Handler: `get_my_feedback_stats()` in `src/routes/chat_history.py`

1. Overview

Returns aggregated feedback statistics for the authenticated user over a configurable time period. Includes counts by type, average rating, thumbs up/down rates, and per-model breakdown.

Route: GET /v1/chat/feedback/stats Router prefix: /v1/chat Tags: chat-history Response model: FeedbackStatsResponse Auth: Required (Bearer token via get_api_key)

2. Query Parameters

Parameter	Type	Default	Validation	Description
`model`	`str \| None`	`None`	None	Filter stats by model name
`days`	`int`	`30`	`ge=1, le=365`	Number of days to aggregate

3. Pydantic Schemas

`FeedbackStatsResponse`

Field	Type	Default	Description
`success`	`bool`	required	Operation success flag
`stats`	`dict[str, Any]`	required	Aggregated statistics dict
`message`	`str \| None`	`None`	Human-readable message

Stats dict structure (computed in `get_feedback_stats()`):

Key	Type	Description
`total_feedback`	`int`	Total feedback count in period
`thumbs_up`	`int`	Count of thumbs_up feedback
`thumbs_down`	`int`	Count of thumbs_down feedback
`regenerate`	`int`	Count of regenerate feedback
`thumbs_up_rate`	`float`	Percentage (0-100), rounded to 2 decimals
`thumbs_down_rate`	`float`	Percentage (0-100), rounded to 2 decimals
`average_rating`	`float \| None`	Average of 1-5 ratings, rounded to 2 decimals
`by_model`	`dict[str, dict]`	Per-model breakdown with thumbs_up/down/regenerate/total counts
`period_days`	`int`	The days parameter used

4. Dependency Chain

get_my_feedback_stats()
├── get_api_key() [src/security/deps.py]
│   ├── HTTPBearer → validate_api_key_security() → audit logging
│   └── Returns validated API key string
├── get_user(api_key) [src/services/user_lookup_cache.py]
│   └── db_get_user() [src/db/users.py] (60s TTL in-memory cache)
└── get_feedback_stats() [src/db/feedback.py]
    └── Supabase query on message_feedback table
    └── Python-side aggregation (counts, rates, averages, by_model grouping)

5. Supabase Queries

Query: Get all feedback for aggregation

Table: message_feedback
Operation: SELECT feedback_type, rating, model, created_at
Filters:
- .gte("created_at", from_date.isoformat()) - from_date = now - timedelta(days=days) truncated to midnight
- .eq("user_id", user_id) (if provided, always provided from route)
- .eq("model", model) (if provided)
No pagination - fetches all matching records for aggregation
Retry: _execute_with_connection_retry (3 retries, exponential backoff 0.1s)

Aggregation Logic (Python-side):

Counts: thumbs_up, thumbs_down, regenerate by feedback_type field
Rates: (count / total) * 100, rounded to 2 decimals
Average rating: sum(ratings) / len(ratings) for non-null ratings, rounded to 2 decimals
By-model: Groups by model field (uses "unknown" for null), counts per type

6. Redis Operations

None. User lookup uses in-memory cache only.

7. Prometheus Metrics

None directly. Standard middleware metrics apply.

8. Middleware Effects

Standard middleware pipeline: Security → Sentry → Observability → Timeout → GZip → Trace

9. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(500)`	500	Any unhandled exception (DB errors, aggregation errors)

Error flow: HTTPExceptions re-raised directly. All other exceptions caught at route level → 500.

10. Mermaid Diagram

flowchart TD
    A[GET /v1/chat/feedback/stats] --> B{Auth: get_api_key}
    B -->|Invalid| C[401 Unauthorized]
    B -->|Valid| D[get_user api_key]
    D -->|None| E[401 Invalid API key]
    D -->|User found| F[get_feedback_stats user_id, model, days]
    F --> G[Query message_feedback table]
    G --> H[Fetch all records in date range]
    H --> I{Records found?}
    I -->|Yes| J[Count by feedback_type]
    J --> K[Calculate rates]
    K --> L[Calculate avg rating]
    L --> M[Group by model]
    M --> N[Build stats dict]
    I -->|No| O[Return zero stats]
    N --> P[Return FeedbackStatsResponse]
    O --> P
    G -->|DB Error| Q[500 Internal Server Error]

Issue: #1696

API Endpoint Documentation: GET /v1/chat/sessions/{session_id}/feedback

Handler: `get_session_feedback()` in `src/routes/chat_history.py`

1. Overview

Returns all feedback records for a specific chat session. Verifies session ownership before returning results.

Route: GET /v1/chat/sessions/{session_id}/feedback Router prefix: /v1/chat Tags: chat-history Response model: MessageFeedbackListResponse Auth: Required (Bearer token via get_api_key)

2. Path Parameters

Parameter	Type	Description
`session_id`	`int`	Chat session ID to get feedback for

3. Pydantic Schemas

`MessageFeedbackListResponse`

Field	Type	Default	Description
`success`	`bool`	required	Operation success flag
`data`	`list[MessageFeedback]`	required	List of feedback records
`count`	`int`	required	Number of records returned
`message`	`str \| None`	`None`	Human-readable message

(See issue #1694 for full MessageFeedback schema.)

4. Dependency Chain

get_session_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
│   └── db_get_user() [src/db/users.py] (60s TTL cache)
├── get_chat_session(session_id, user_id) [src/db/chat_history.py]
│   └── 2 Supabase queries: session + messages
└── get_feedback_by_session(session_id, user_id) [src/db/feedback.py]
    └── Supabase query on message_feedback table

5. Supabase Queries

Query 1: Verify session ownership

Table: chat_sessions
Operation: SELECT *
Filters: .eq("id", session_id).eq("user_id", user_id).eq("is_active", True)
Retry: _execute_with_connection_retry (3 retries)

Query 2: Get session messages (part of get_chat_session)

Table: chat_messages
Operation: SELECT *
Filters: .eq("session_id", session_id)
Order: .order("created_at", desc=False)

Query 3: Get session feedback

Table: message_feedback
Operation: SELECT *
Filters: .eq("session_id", session_id).eq("user_id", user_id)
Order: .order("created_at", desc=True)
Retry: _execute_with_connection_retry (3 retries)

6. Redis Operations

None.

7. Prometheus Metrics

None directly.

8. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	Session not found or doesn't belong to user
`HTTPException(500)`	500	Any unhandled exception

9. Mermaid Diagram

flowchart TD
    A[GET /v1/chat/sessions/session_id/feedback] --> B{Auth: get_api_key}
    B -->|Invalid| C[401 Unauthorized]
    B -->|Valid| D[get_user api_key]
    D -->|None| E[401 Invalid API key]
    D -->|User found| F[get_chat_session session_id, user_id]
    F -->|None| G[404 Chat session not found]
    F -->|Session found| H[get_feedback_by_session session_id, user_id]
    H --> I[Query message_feedback table]
    I -->|Success| J[Return MessageFeedbackListResponse]
    I -->|DB Error| K[500 Internal Server Error]

Issue: #1697

API Endpoint Documentation: POST /v1/chat/sessions

Handler: `create_session()` in `src/routes/chat_history.py`

1. Overview

Creates a new chat session for the authenticated user. Uses cached user lookup for performance and logs session creation activity in the background (non-blocking).

Route: POST /v1/chat/sessions Router prefix: /v1/chat Tags: chat-history Response model: ChatSessionResponse Auth: Required (Bearer token via get_api_key)

2. Request Body

`CreateChatSessionRequest`

Field	Type	Default	Validation	Description
`title`	`str \| None`	`None`	None	Session title. Auto-generated as `"Chat YYYY-MM-DD HH:MM"` if not provided
`model`	`str \| None`	`None`	None	Model name. Defaults to `"openai/gpt-3.5-turbo"` in DB layer if not provided

3. Response Schema

`ChatSessionResponse`

Field	Type	Default	Description
`success`	`bool`	required	Operation success flag
`data`	`ChatSession \| None`	`None`	Created session object
`message`	`str \| None`	`None`	Human-readable message

`ChatSession`

Field	Type	Default	Description
`id`	`int \| None`	`None`	Session ID
`user_id`	`int`	required	Owner user ID
`title`	`str`	required	Session title
`model`	`str`	required	Model name
`created_at`	`datetime \| None`	`None`	Creation timestamp
`updated_at`	`datetime \| None`	`None`	Last update timestamp
`is_active`	`bool \| None`	`True`	Active flag
`messages`	`list[ChatMessage] \| None`	`[]`	Messages list

4. Dependency Chain

create_session()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
│   └── db_get_user() [src/db/users.py] (60s TTL cache)
├── create_chat_session(user_id, title, model) [src/db/chat_history.py]
│   ├── @with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
│   └── _execute_with_connection_retry() (3 retries, exponential backoff)
└── log_activity_background() [src/services/background_tasks.py]
    ├── Creates asyncio task if event loop running
    └── Falls back to synchronous db_log_activity()
        └── log_activity() [src/db/activity.py] → INSERT into activity table

5. Supabase Queries

Query: Insert chat session

Table: chat_sessions
Operation: INSERT
Data:
- user_id: int
- title: str (auto-generated if None: "Chat YYYY-MM-DD HH:MM")
- model: str (defaults to "openai/gpt-3.5-turbo" if None)
- created_at: ISO datetime (UTC)
- updated_at: ISO datetime (UTC)
- is_active: True
Retry: @with_retry decorator (3 attempts) + _execute_with_connection_retry (3 retries per attempt)

Background: Log activity

Table: activity (via src/db/activity.py)
Operation: INSERT
Data: user_id, model, provider="Chat History", tokens=0, cost=0.0, finish_reason="session_created", app="Chat", metadata with action/session_id/title

6. Redis Operations

None directly.

7. Prometheus Metrics

None directly. Standard middleware metrics apply.

8. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(500)`	500	DB insert fails or any unhandled exception

Background activity logging errors are caught and logged but do NOT fail the request.

9. Performance Notes

User lookup: Cached with 60s TTL (reduces DB queries by ~95%)
Activity logging: Non-blocking (background task)
Performance metrics: Logs user_lookup_ms and session_create_ms timing

10. Mermaid Diagram

flowchart TD
    A[POST /v1/chat/sessions] --> B{Auth: get_api_key}
    B -->|Invalid| C[401 Unauthorized]
    B -->|Valid| D[get_user api_key - cached]
    D -->|None| E[401 Invalid API key]
    D -->|User found| F[create_chat_session user_id, title, model]
    F --> G{Title provided?}
    G -->|No| H[Auto-generate: Chat YYYY-MM-DD HH:MM]
    G -->|Yes| I[Use provided title]
    H --> J[INSERT into chat_sessions]
    I --> J
    J -->|Success| K[Log activity in background]
    K --> L{Background log success?}
    L -->|Yes| M[Continue]
    L -->|No| N[Log error, continue anyway]
    M --> O[Return ChatSessionResponse 200]
    N --> O
    J -->|Failure after retries| P[500 Internal Server Error]

Issue: #1698

API Endpoint Documentation: POST /v1/chat/search

Handler: `search_sessions()` in `src/routes/chat_history.py`

1. Overview

Searches chat sessions by title and message content. Combines results from both title matching and content matching, deduplicates, sorts by updated_at, and returns up to limit results.

Route: POST /v1/chat/search Router prefix: /v1/chat Tags: chat-history Response model: ChatSessionsListResponse Auth: Required (Bearer token via get_api_key)

2. Request Body

`SearchChatSessionsRequest`

Field	Type	Default	Validation	Description
`query`	`str`	required	None	Search query text
`limit`	`int \| None`	`20`	None	Maximum results to return

3. Response Schema

`ChatSessionsListResponse`

Field	Type	Default	Description
`success`	`bool`	required	Operation success flag
`data`	`list[ChatSession]`	required	Matching sessions
`count`	`int`	required	Number of results
`message`	`str \| None`	`None`	Human-readable message

4. Dependency Chain

search_sessions()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
│   └── db_get_user() [src/db/users.py] (60s TTL cache)
└── search_chat_sessions(user_id, query, limit) [src/db/chat_history.py]
    ├── Search 1: Title matching (ILIKE)
    ├── Search 2: Message content matching (ILIKE)
    ├── Search 3: Fetch sessions by matching message session_ids
    └── Python-side: combine, deduplicate by session ID, sort, limit

5. Supabase Queries

Query 1: Search session titles

Table: chat_sessions
Operation: SELECT *
Filters: .eq("user_id", user_id).eq("is_active", True).ilike("title", f"%{query}%")
Retry: _execute_with_connection_retry (3 retries)

Query 2: Search message content

Table: chat_messages
Operation: SELECT session_id
Filters: .ilike("content", f"%{query}%")
Note: This query does NOT filter by user_id at message level — session ownership is enforced in the next query
Retry: _execute_with_connection_retry (3 retries)

Query 3: Fetch sessions from matching messages (conditional)

Table: chat_sessions
Operation: SELECT *
Filters: .eq("user_id", user_id).eq("is_active", True).in_("id", list(session_ids))
Only executed if: Query 2 returned session_ids
Retry: _execute_with_connection_retry (3 retries)

Post-processing (Python):

Combine title results + message session results
Deduplicate by session id (dict keying)
Sort by updated_at descending
Slice to limit

6. Redis Operations

None.

7. Prometheus Metrics

None directly.

8. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(500)`	500	Any unhandled exception

Note: No HTTPException re-raise guard — any exception (including from get_user returning None check) falls through to the generic 500 handler. The raise HTTPException(401) is inside the try/except that catches all Exceptions.

9. Mermaid Diagram

flowchart TD
    A[POST /v1/chat/search] --> B{Auth: get_api_key}
    B -->|Invalid| C[401 Unauthorized]
    B -->|Valid| D[get_user api_key]
    D -->|None| E[401 - wraps as 500 since no HTTPException guard]
    D -->|User found| F[search_chat_sessions]
    F --> G[Query 1: ILIKE title search]
    F --> H[Query 2: ILIKE message content search]
    H --> I{Message matches found?}
    I -->|Yes| J[Query 3: Get sessions by IDs + user filter]
    I -->|No| K[Empty message sessions]
    G --> L[Combine title + message results]
    J --> L
    K --> L
    L --> M[Deduplicate by session ID]
    M --> N[Sort by updated_at DESC]
    N --> O[Slice to limit]
    O --> P[Return ChatSessionsListResponse 200]
    G -->|DB Error| Q[500 Internal Server Error]

Issue: #1699

API Endpoint Documentation: POST /v1/chat/sessions/{session_id}/messages

Handler: `save_message()` in `src/routes/chat_history.py`

1. Overview

Saves a single message to a chat session. Verifies session ownership, checks for duplicates (within 5 minutes), inserts the message, and updates the session's updated_at timestamp and model.

Route: POST /v1/chat/sessions/{session_id}/messages Router prefix: /v1/chat Tags: chat-history Auth: Required (Bearer token via get_api_key)

2. Path Parameters

Parameter	Type	Description
`session_id`	`int`	Target chat session ID

3. Request Body

`SaveChatMessageRequest`

Field	Type	Default	Validation	Description
`role`	`str`	required	None	Message role: `"user"` or `"assistant"`
`content`	`str`	required	None	Message text content
`model`	`str \| None`	`None`	None	Model that generated response
`tokens`	`int \| None`	`0`	None	Token count
`created_at`	`str \| None`	`None`	None	ISO datetime from frontend (not used in DB layer)

4. Response (untyped dict)

{
  "success": true,
  "data": { "id": 123, "session_id": 1, "role": "user", "content": "...", ... },
  "message": "Message saved successfully"
}

5. Dependency Chain

save_message()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── get_chat_session(session_id, user_id) [src/db/chat_history.py]
│   ├── Query chat_sessions (ownership check)
│   └── Query chat_messages (session messages)
└── save_chat_message() [src/db/chat_history.py]
    ├── @with_retry(max_attempts=3)
    ├── Duplicate check (SELECT within last 5 min)
    ├── INSERT into chat_messages
    └── UPDATE chat_sessions (updated_at, model)

6. Supabase Queries

Query 1: Verify session ownership (via `get_chat_session`)

Table: chat_sessions → SELECT * WHERE id=session_id AND user_id=user_id AND is_active=True
Table: chat_messages → SELECT * WHERE session_id=session_id ORDER BY created_at ASC

Query 2: Duplicate check (conditional, skipped if `content` is empty)

Table: chat_messages
Operation: SELECT *
Filters: .eq("session_id", session_id).eq("role", role).eq("content", content).gte("created_at", five_minutes_ago)
Order: .order("created_at", desc=True).limit(1)
If duplicate found: Returns existing message immediately (no insert)
If check fails: Logs warning, proceeds with insert anyway

Query 3: Insert message

Table: chat_messages
Operation: INSERT
Data: session_id, role, content, model, tokens, created_at (UTC ISO)
Retry: _execute_with_connection_retry + @with_retry

Query 4: Update session timestamp

Table: chat_sessions
Operation: UPDATE
Data: updated_at (always), model (if provided)
Filters: .eq("id", session_id) + .eq("user_id", user_id) (if provided)

7. Redis Operations

None.

8. Prometheus Metrics

None directly.

9. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	Session not found or not owned by user
`HTTPException(500)`	500	DB insert/update failure

Duplicate check failures do NOT cause request failure — logged and skipped.

10. Mermaid Diagram

flowchart TD
    A[POST /v1/chat/sessions/session_id/messages] --> B{Auth: get_api_key}
    B -->|Invalid| C[401 Unauthorized]
    B -->|Valid| D[get_user api_key]
    D -->|None| E[401 Invalid API key]
    D -->|User found| F[get_chat_session ownership check]
    F -->|None| G[404 Session not found]
    F -->|Found| H[save_chat_message]
    H --> I{Duplicate check}
    I -->|Duplicate found| J[Return existing message]
    I -->|No duplicate| K[INSERT into chat_messages]
    I -->|Check failed| K
    K -->|Success| L[UPDATE chat_sessions timestamp]
    L --> M[Return 200 with message data]
    K -->|Failure after retries| N[500 Internal Server Error]

Issue: #1700

API Endpoint Documentation: POST /v1/chat/sessions/{session_id}/messages/batch

Handler: `save_messages_batch()` in `src/routes/chat_history.py`

1. Overview

Saves multiple messages to a chat session in a single request. Reduces API overhead by 60-80% compared to individual calls. Processes each message individually, collecting successes and failures separately. Partial success is possible.

Route: POST /v1/chat/sessions/{session_id}/messages/batch Router prefix: /v1/chat Tags: chat-history Auth: Required (Bearer token via get_api_key)

2. Path Parameters

Parameter	Type	Description
`session_id`	`int`	Target chat session ID

3. Request Body

`BatchMessageRequest` (defined inline in `chat_history.py`)

Field	Type	Default	Validation	Description
`messages`	`list[SaveChatMessageRequest]`	required	None	Array of messages to save

`SaveChatMessageRequest` (each element)

Field	Type	Default	Validation	Description
`role`	`str`	required	None	`"user"` or `"assistant"`
`content`	`str`	required	None	Message content
`model`	`str \| None`	`None`	None	Model name
`tokens`	`int \| None`	`0`	None	Token count
`created_at`	`str \| None`	`None`	None	ISO datetime

4. Response (untyped dict)

{
  "success": true,  // true only if ALL messages saved
  "data": {
    "saved": [{"success": true, "message_id": 1, "data": {...}}, ...],
    "failed": [{"success": false, "error": "...", "content_preview": "first 50 chars"}],
    "total": 5,
    "success_count": 4,
    "failure_count": 1
  },
  "message": "Saved 4/5 messages successfully"
}

5. Dependency Chain

save_messages_batch()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── get_chat_session(session_id, user_id) [src/db/chat_history.py]
│   └── Ownership verification
└── for each message in request.messages:
    └── save_chat_message() [src/db/chat_history.py]
        ├── Duplicate check (5-min window)
        ├── INSERT into chat_messages
        └── UPDATE chat_sessions timestamp/model

6. Supabase Queries

Per-message queries (repeated for each message):

Duplicate check: SELECT * FROM chat_messages WHERE session_id=X AND role=Y AND content=Z AND created_at >= 5_min_ago LIMIT 1
Insert message: INSERT INTO chat_messages (session_id, role, content, model, tokens, created_at)
Update session: UPDATE chat_sessions SET updated_at=NOW(), model=M WHERE id=session_id AND user_id=user_id

Each query has _execute_with_connection_retry (3 retries) + @with_retry decorator (3 attempts).

Total worst-case queries: 1 (session check) + N * 3 (per message) where N = number of messages.

7. Redis Operations

None.

8. Prometheus Metrics

None directly.

9. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	Session not found / not owned
`HTTPException(500)`	500	Outer exception (before loop)

Individual message failures do NOT abort the batch. Failed messages are collected in failed_messages array. Response success field is True only if failed_messages is empty.

10. Mermaid Diagram

flowchart TD
    A[POST /v1/chat/sessions/session_id/messages/batch] --> B{Auth}
    B -->|Invalid| C[401]
    B -->|Valid| D[get_user]
    D -->|None| E[401]
    D -->|Found| F[get_chat_session ownership check]
    F -->|Not found| G[404]
    F -->|Found| H[Loop through messages]
    H --> I{For each message}
    I --> J[save_chat_message]
    J -->|Success| K[Add to saved_messages]
    J -->|Error| L[Add to failed_messages]
    K --> M{More messages?}
    L --> M
    M -->|Yes| I
    M -->|No| N{Any failures?}
    N -->|No| O[Return success=true with results]
    N -->|Yes| P[Return success=false with partial results]

Issue: #1701

API Endpoint Documentation: POST /v1/chat/feedback

Handler: `submit_feedback()` in `src/routes/chat_history.py`

1. Overview

Submits feedback for a chat message (thumbs up/down, regenerate, star rating, comment). Validates session and message ownership when IDs are provided. Logs activity in the background.

Route: POST /v1/chat/feedback Router prefix: /v1/chat Tags: chat-history Response model: MessageFeedbackResponse Auth: Required (Bearer token via get_api_key)

2. Request Body

`SaveMessageFeedbackRequest`

Field	Type	Default	Validation	Description
`session_id`	`int \| None`	`None`	None	Optional associated session
`message_id`	`int \| None`	`None`	None	Optional associated message
`feedback_type`	`Literal["thumbs_up","thumbs_down","regenerate"]`	required	Literal enforcement by Pydantic	Type of feedback
`rating`	`int \| None`	`None`	`ge=1, le=5`	Optional 1-5 star rating
`comment`	`str \| None`	`None`	None	Optional text feedback
`model`	`str \| None`	`None`	None	Model that generated response
`metadata`	`dict[str, Any] \| None`	`None`	None	Additional context

3. Response Schema

`MessageFeedbackResponse`

Field	Type	Default	Description
`success`	`bool`	required	Operation success flag
`data`	`MessageFeedback \| None`	`None`	Created feedback record
`message`	`str \| None`	`None`	Human-readable message

4. Dependency Chain

submit_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── [conditional] get_chat_session(session_id, user_id) [src/db/chat_history.py]
│   └── Only if session_id is not None
├── [conditional] validate_message_ownership() [src/db/chat_history.py]
│   └── Only if message_id is not None
│   └── Joins chat_messages with chat_sessions to verify user ownership
├── save_message_feedback() [src/db/feedback.py]
│   ├── @with_retry(max_attempts=3)
│   ├── Validates feedback_type against VALID_FEEDBACK_TYPES set
│   ├── Validates rating range (1-5)
│   └── INSERT into message_feedback
└── log_activity_background() [src/services/background_tasks.py]
    └── Async INSERT into activity table

5. Supabase Queries

Query 1: Verify session ownership (conditional)

Table: chat_sessions
Condition: Only if request.session_id is not None
Operation: SELECT * WHERE id=session_id AND user_id=user_id AND is_active=True

Query 2: Validate message ownership (conditional)

Table: chat_messages with chat_sessions!inner join
Condition: Only if request.message_id is not None
Operation: SELECT id, session_id, chat_sessions!inner(id, user_id) WHERE id=message_id AND chat_sessions.user_id=user_id
Additional filter: .eq("session_id", session_id) if session_id provided

Query 3: Insert feedback

Table: message_feedback
Operation: INSERT
Data: user_id, feedback_type, created_at, updated_at (always) + optional: session_id, message_id, rating, comment, model, metadata
Retry: @with_retry (3 attempts) + _execute_with_connection_retry (3 retries)

Background: Log activity

Table: activity
Data: user_id, model, provider="Chat Feedback", action="submit_feedback", metadata with feedback details

6. Redis Operations

None.

7. Prometheus Metrics

None directly.

8. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	Session not found (when session_id provided)
`HTTPException(404)`	404	Message not found (when message_id provided)
`HTTPException(400)`	400	`ValueError` from DB layer (invalid feedback_type or rating)
`HTTPException(500)`	500	Any unhandled exception

Pydantic validation: feedback_type Literal and rating ge/le constraints are enforced before handler is reached (422 Unprocessable Entity). DB-layer validation: Double-checks feedback_type and rating at save time. Background activity logging errors are caught and do NOT fail the request.

9. Mermaid Diagram

flowchart TD
    A[POST /v1/chat/feedback] --> B{Auth: get_api_key}
    B -->|Invalid| C[401]
    B -->|Valid| D[get_user]
    D -->|None| E[401]
    D -->|Found| F{session_id provided?}
    F -->|Yes| G[get_chat_session ownership check]
    G -->|Not found| H[404 Chat session not found]
    G -->|Found| I{message_id provided?}
    F -->|No| I
    I -->|Yes| J[validate_message_ownership]
    J -->|Invalid| K[404 Message not found]
    J -->|Valid| L[save_message_feedback]
    I -->|No| L
    L --> M{DB validation}
    M -->|Invalid type/rating| N[400 ValueError]
    M -->|Success| O[INSERT into message_feedback]
    O --> P[Log activity background]
    P --> Q[Return MessageFeedbackResponse 200]
    O -->|DB Error| R[500 Internal Server Error]

Issue: #1702

API Endpoint Documentation: PUT /v1/chat/sessions/{session_id}

Handler: `update_session()` in `src/routes/chat_history.py`

1. Overview

Updates a chat session's title and/or model. After updating, fetches and returns the updated session with all its messages.

Route: PUT /v1/chat/sessions/{session_id} Router prefix: /v1/chat Tags: chat-history Response model: ChatSessionResponse Auth: Required (Bearer token via get_api_key)

2. Path Parameters & Request Body

Parameter	Type	Description
`session_id`	`int`	Session ID to update

`UpdateChatSessionRequest`

Field	Type	Default	Validation	Description
`title`	`str \| None`	`None`	None	New session title
`model`	`str \| None`	`None`	None	New model name

3. Dependency Chain

update_session()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── update_chat_session(session_id, user_id, title, model) [src/db/chat_history.py]
│   ├── @with_retry(max_attempts=3)
│   └── UPDATE chat_sessions
└── get_chat_session(session_id, user_id) [src/db/chat_history.py]
    ├── SELECT from chat_sessions
    └── SELECT from chat_messages

4. Supabase Queries

Query 1: Update session

Table: chat_sessions
Operation: UPDATE
Data: updated_at (always), title (if truthy), model (if truthy)
Filters: .eq("id", session_id).eq("user_id", user_id)
Returns: False if no rows matched (session not found / not owned)
Retry: @with_retry + _execute_with_connection_retry

Query 2: Fetch updated session

Table: chat_sessions → SELECT * WHERE id AND user_id AND is_active
Table: chat_messages → SELECT * WHERE session_id ORDER BY created_at ASC

5. Redis Operations

None.

6. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	`update_chat_session` returns `False`
`HTTPException(500)`	500	Any unhandled exception

7. Mermaid Diagram

flowchart TD
    A[PUT /v1/chat/sessions/session_id] --> B{Auth}
    B -->|Invalid| C[401]
    B -->|Valid| D[get_user]
    D -->|None| E[401]
    D -->|Found| F[update_chat_session]
    F -->|False / not found| G[404 Chat session not found]
    F -->|True| H[get_chat_session - fetch updated]
    H --> I[Return ChatSessionResponse 200]
    F -->|DB Error| J[500]

Issue: #1703

API Endpoint Documentation: PUT /v1/chat/feedback/{feedback_id}

Handler: `update_my_feedback()` in `src/routes/chat_history.py`

1. Overview

Updates an existing feedback record. Only the record's owner can update it. All fields are optional — only provided fields are updated.

Route: PUT /v1/chat/feedback/{feedback_id} Router prefix: /v1/chat Tags: chat-history Response model: MessageFeedbackResponse Auth: Required (Bearer token via get_api_key)

2. Path Parameters & Request Body

Parameter	Type	Description
`feedback_id`	`int`	Feedback record ID to update

`UpdateMessageFeedbackRequest`

Field	Type	Default	Validation	Description
`feedback_type`	`Literal["thumbs_up","thumbs_down","regenerate"] \| None`	`None`	Literal (Pydantic)	New feedback type
`rating`	`int \| None`	`None`	`ge=1, le=5`	New star rating
`comment`	`str \| None`	`None`	None	New comment text

3. Dependency Chain

update_my_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
└── update_feedback(feedback_id, user_id, ...) [src/db/feedback.py]
    ├── @with_retry(max_attempts=3)
    ├── Validates feedback_type against VALID_FEEDBACK_TYPES
    ├── Validates rating range (1-5)
    └── UPDATE message_feedback WHERE id AND user_id

4. Supabase Queries

Query: Update feedback

Table: message_feedback
Operation: UPDATE
Data: updated_at (always) + optional: feedback_type, rating, comment
Filters: .eq("id", feedback_id).eq("user_id", user_id)
Returns: None if no rows matched (not found or not owned)
Retry: @with_retry + _execute_with_connection_retry

5. Redis Operations

None.

6. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	`update_feedback` returns `None`
`HTTPException(400)`	400	`ValueError` (invalid feedback_type or rating in DB layer)
`HTTPException(500)`	500	Any unhandled exception

7. Mermaid Diagram

flowchart TD
    A[PUT /v1/chat/feedback/feedback_id] --> B{Auth}
    B -->|Invalid| C[401]
    B -->|Valid| D[get_user]
    D -->|None| E[401]
    D -->|Found| F[update_feedback]
    F --> G{Validation}
    G -->|Invalid type/rating| H[400 ValueError]
    G -->|Valid| I[UPDATE message_feedback]
    I -->|No rows matched| J[404 Feedback not found]
    I -->|Updated| K[Return MessageFeedbackResponse 200]
    I -->|DB Error| L[500]

Issue: #1704

API Endpoint Documentation: DELETE /v1/chat/sessions/{session_id}

Handler: `delete_session()` in `src/routes/chat_history.py`

1. Overview

Soft-deletes a chat session by setting is_active = False. Does NOT physically remove the record or associated messages.

Route: DELETE /v1/chat/sessions/{session_id} Router prefix: /v1/chat Tags: chat-history Auth: Required (Bearer token via get_api_key)

2. Path Parameters

Parameter	Type	Description
`session_id`	`int`	Session ID to delete

3. Response (untyped dict)

{ "success": true, "message": "Chat session deleted successfully" }

4. Dependency Chain

delete_session()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
└── delete_chat_session(session_id, user_id) [src/db/chat_history.py]
    ├── @with_retry(max_attempts=3)
    └── UPDATE chat_sessions SET is_active=False, updated_at=NOW()

5. Supabase Queries

Query: Soft delete session

Table: chat_sessions
Operation: UPDATE (NOT DELETE)
Data: {"is_active": False, "updated_at": datetime.now(UTC).isoformat()}
Filters: .eq("id", session_id).eq("user_id", user_id)
Returns: False if no rows matched
Retry: @with_retry (3 attempts) + _execute_with_connection_retry (3 retries)

6. Redis Operations

None.

7. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	Session not found / not owned
`HTTPException(500)`	500	Any unhandled exception

8. Mermaid Diagram

flowchart TD
    A[DELETE /v1/chat/sessions/session_id] --> B{Auth}
    B -->|Invalid| C[401]
    B -->|Valid| D[get_user]
    D -->|None| E[401]
    D -->|Found| F[delete_chat_session - soft delete]
    F -->|False| G[404 Chat session not found]
    F -->|True| H[Return 200 success]
    F -->|DB Error| I[500]

Issue: #1705

API Endpoint Documentation: DELETE /v1/chat/feedback/{feedback_id}

Handler: `delete_my_feedback()` in `src/routes/chat_history.py`

1. Overview

Permanently deletes a feedback record. Only the record's owner can delete it. This is a hard delete (not soft delete).

Route: DELETE /v1/chat/feedback/{feedback_id} Router prefix: /v1/chat Tags: chat-history Auth: Required (Bearer token via get_api_key)

2. Path Parameters

Parameter	Type	Description
`feedback_id`	`int`	Feedback record ID to delete

3. Response (untyped dict)

{ "success": true, "message": "Feedback deleted successfully" }

4. Dependency Chain

delete_my_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
└── delete_feedback(feedback_id, user_id) [src/db/feedback.py]
    ├── @with_retry(max_attempts=3)
    └── DELETE FROM message_feedback WHERE id AND user_id

5. Supabase Queries

Query: Delete feedback (HARD DELETE)

Table: message_feedback
Operation: DELETE
Filters: .eq("id", feedback_id).eq("user_id", user_id)
Returns: False if no rows matched
Retry: @with_retry (3 attempts) + _execute_with_connection_retry (3 retries)

6. Redis Operations

None.

7. Error Handling

Error	Status	Condition
`HTTPException(401)`	401	Invalid/missing API key
`HTTPException(401)`	401	`get_user()` returns `None`
`HTTPException(404)`	404	Feedback not found / not owned
`HTTPException(500)`	500	Any unhandled exception

8. Mermaid Diagram

flowchart TD
    A[DELETE /v1/chat/feedback/feedback_id] --> B{Auth}
    B -->|Invalid| C[401]
    B -->|Valid| D[get_user]
    D -->|None| E[401]
    D -->|Found| F[delete_feedback - HARD DELETE]
    F -->|False| G[404 Feedback not found]
    F -->|True| H[Return 200 success]
    F -->|DB Error| I[500]

Issue: #1706

API Endpoint Documentation: GET /v1/chat/completions/metrics/tokens-per-second

Handler: `get_tokens_per_second()` in `src/routes/chat_metrics.py`

1. Overview

Returns tokens-per-second throughput metrics for a specific model and provider within a time range. Filtered to only allow requests for top 3 most popular models plus minimum 1 model per provider. Output is in Prometheus text exposition format.

Route: GET /v1/chat/completions/metrics/tokens-per-second Router prefix: /v1/chat/completions/metrics Tags: chat-metrics Auth: None (public endpoint) Response: text/plain (Prometheus format)

2. Query Parameters

Parameter	Type	Default	Validation	Description
`time`	`str`	required	Must be one of: `hour`, `week`, `month`, `1year`, `2year`	Time range filter
`model_id`	`int`	required	None	Model ID integer
`provider_id`	`str`	required	None	Provider slug (e.g., "openrouter")

3. Response Format (Prometheus text)

# HELP gatewayz_tokens_per_second Token throughput (tokens/second) by model and provider
# TYPE gatewayz_tokens_per_second gauge
# Generated: 2026-03-04T12:00:00+00:00
# Time range: week
# Filtered to: top 3 models + minimum 1 per provider

gatewayz_tokens_per_second{model="gpt-4o",provider="openai",requests="150",total_tokens="50000"} 125.5

4. Dependency Chain

get_tokens_per_second()
├── _get_top_models_async(limit=3)
│   └── get_top_models_by_requests() [src/db/chat_completion_requests.py]
│       ├── SELECT * FROM models
│       └── For each model: COUNT from chat_completion_requests + SUM tokens
├── _get_all_providers_async()
│   └── get_all_providers() [src/db/chat_completion_requests.py]
│       └── SELECT providers.slug FROM models JOIN providers
├── _get_models_with_min_one_per_provider_async()
│   └── get_models_with_min_one_per_provider() [src/db/chat_completion_requests.py]
│       └── For missing providers: query models + count requests
├── Model ID access check (403 if not in filtered list)
├── _calculate_tokens_per_second_async()
│   └── calculate_tokens_per_second() [src/db/chat_completion_requests.py]
│       ├── SELECT tokens + processing_time FROM chat_completion_requests
│       └── SELECT model_name, provider FROM models JOIN providers
└── _format_tokens_per_second_metric() → Prometheus text

5. Supabase Queries

Phase 1: Get top 3 models by request count

Table: models → SELECT * (all models)
For each model:
- chat_completion_requests → SELECT *, count=exact WHERE model_id=X AND status=completed
- chat_completion_requests → SELECT input_tokens, output_tokens WHERE model_id=X AND status=completed
Sort: Python-side by request count DESC, take top 3

Phase 2: Get all providers

Table: models with providers!inner join → SELECT providers.slug
Deduplicate: Python set

Phase 3: Ensure min 1 per provider (for missing providers)

Table: models → SELECT id, model_name, providers!inner(slug) WHERE providers.slug=X AND is_active=True
For each model in missing provider: chat_completion_requests → count query

Phase 4: Calculate tokens per second

Table: chat_completion_requests → SELECT input_tokens, output_tokens, processing_time_ms, created_at WHERE model_id=X AND status=completed
- With time filter: .gte("created_at", start_time) based on time range
Table: models → SELECT model_name, providers!inner(slug) WHERE id=model_id
Calculation: total_tokens / (total_time_ms / 1000)

6. Redis Operations

None.

7. Prometheus Metrics

Emitted metric:

Name: gatewayz_tokens_per_second
Type: Gauge
Labels: model, provider, requests, total_tokens
Note: This metric is generated as text output, not registered in the Prometheus client registry

8. Error Handling

Error	Status	Condition
`HTTPException(400)`	400	Invalid `time` parameter
`HTTPException(403)`	403	Model not in top 3 or minimum provider coverage
`HTTPException(500)`	500	Any unhandled exception

Graceful degradation: If top models or providers queries fail, they return empty lists. If calculation returns no data, returns empty Prometheus metrics (not an error).

9. Mermaid Diagram

flowchart TD
    A[GET /tokens-per-second] --> B{Validate time param}
    B -->|Invalid| C[400 Bad Request]
    B -->|Valid| D[Get top 3 models]
    D --> E[Get all providers]
    E --> F[Ensure min 1 per provider]
    F --> G{model_id in filtered list?}
    G -->|No| H[403 Forbidden]
    G -->|Yes| I[calculate_tokens_per_second]
    I --> J{Data found?}
    J -->|No| K[Return empty Prometheus metrics]
    J -->|Yes| L[Format as Prometheus text]
    L --> M[Return 200 text/plain]
    D -->|Error| N[Empty list, continue]
    E -->|Error| N

Issue: #1707

API Endpoint Documentation: GET /v1/chat/completions/metrics/tokens-per-second/all

Handler: `get_all_tokens_per_second()` in `src/routes/chat_metrics.py`

1. Overview

Returns tokens-per-second metrics for all time (no time filtering) for a specific model and provider. Unlike the time-filtered endpoint, this one does NOT enforce the top-3 model filter — any model_id can be queried. Output in Prometheus text format for Grafana/Prometheus scraping.

Route: GET /v1/chat/completions/metrics/tokens-per-second/all Router prefix: /v1/chat/completions/metrics Tags: chat-metrics Auth: None (public endpoint) Response: text/plain (Prometheus format)

2. Query Parameters

Parameter	Type	Default	Validation	Description
`provider_id`	`str`	required	None	Provider slug
`model_id`	`int`	required	None	Model ID integer

3. Response Format

Same Prometheus text format as the time-filtered endpoint (see issue #1706), but with time_range: all.

4. Dependency Chain

get_all_tokens_per_second()
└── _calculate_tokens_per_second_async(model_id, provider_id, time_range=None)
    └── calculate_tokens_per_second() [src/db/chat_completion_requests.py]
        ├── SELECT from chat_completion_requests (no time filter)
        └── SELECT from models JOIN providers (model name lookup)

5. Supabase Queries

Query 1: Get request data

Table: chat_completion_requests
Operation: SELECT input_tokens, output_tokens, processing_time_ms, created_at
Filters: .eq("model_id", model_id).eq("status", "completed")
No time filter (all-time query)

Query 2: Get model info

Table: models with providers!inner join
Operation: SELECT model_name, providers!inner(slug)
Filters: .eq("id", model_id)

Calculation:

total_tokens = sum(input_tokens + output_tokens)
total_time_seconds = sum(processing_time_ms) / 1000
tokens_per_second = total_tokens / total_time_seconds

6. Redis Operations

None.

7. Prometheus Metrics

Emitted metric (text output):

Name: gatewayz_tokens_per_second
Type: Gauge
Labels: model, provider, requests, total_tokens

8. Error Handling

Error	Status	Condition
`HTTPException(500)`	500	Any unhandled exception

Graceful degradation: If no data found or result has error, returns empty Prometheus metrics (200 with comment # No data available).

9. Mermaid Diagram

flowchart TD
    A[GET /tokens-per-second/all] --> B[calculate_tokens_per_second - all time]
    B --> C[Query chat_completion_requests]
    C --> D[Query models for name/provider]
    D --> E{Result valid?}
    E -->|No data or error| F[Return empty Prometheus metrics 200]
    E -->|Data found| G[Calculate tokens/sec]
    G --> H[Format as Prometheus text]
    H --> I[Return 200 text/plain]
    B -->|Exception| J[500 Internal Server Error]

Circuit Breakers

4 endpoints

Issue: #1708

API Endpoint Documentation: GET /circuit-breakers

Handler: `get_all_circuit_breaker_states()` in `src/routes/circuit_breaker_status.py`

1. Overview

Returns the current state of all registered circuit breakers with summary counts by state. Provides real-time monitoring data for provider health dashboards.

Route: GET /circuit-breakers Router prefix: /circuit-breakers Tags: circuit-breakers, monitoring Response model: dict[str, Any] Auth: None (public monitoring endpoint)

2. Response Format

{
  "circuit_breakers": {
    "openrouter": {
      "provider": "openrouter",
      "state": "closed",
      "failure_count": 0,
      "success_count": 15,
      "failure_rate": 0.0,
      "recent_requests": 15,
      "opened_at": null,
      "seconds_until_retry": 0
    }
  },
  "total_count": 5,
  "open_count": 1,
  "half_open_count": 0,
  "closed_count": 4
}

3. Dependency Chain

get_all_circuit_breaker_states()
└── get_all_circuit_breakers() [src/services/circuit_breaker.py]
    └── For each registered provider in _circuit_breakers dict:
        └── breaker.get_state()
            ├── _load_state_from_redis() — loads state, counts, opened_at
            └── _calculate_failure_rate() — rolling window calculation

4. Supabase Queries

None. Circuit breakers are in-memory + Redis only.

5. Redis Operations

For each registered circuit breaker, `get_state()` calls `_load_state_from_redis()`:

Operation	Key Pattern	Type	Description
`GET`	`circuit_breaker:{provider}:state`	string	Current state: "closed", "open", "half_open"
`GET`	`circuit_breaker:{provider}:failure_count`	string (int)	Consecutive failure count
`GET`	`circuit_breaker:{provider}:success_count`	string (int)	Consecutive success count
`GET`	`circuit_breaker:{provider}:opened_at`	string (float)	Unix timestamp when opened
`GET`	`circuit_breaker:{provider}:consecutive_opens`	string (int)	Consecutive open transitions

TTL: All keys have 3600s (1 hour) TTL set during writes. Fallback: If Redis unavailable, uses in-memory state.

6. Prometheus Metrics

Not directly emitted by the endpoint, but the underlying CircuitBreaker.get_state() reads from the same state that emits these metrics on state changes:

Metric	Type	Labels	Description
`circuit_breaker_state_transitions_total`	Counter	`provider`, `from_state`, `to_state`	State transition count
`circuit_breaker_failures_total`	Counter	`provider`, `state`	Failure count
`circuit_breaker_successes_total`	Counter	`provider`, `state`	Success count
`circuit_breaker_rejected_requests_total`	Counter	`provider`	Rejected request count
`circuit_breaker_current_state`	Gauge	`provider`, `state`	1=active, 0=inactive

7. Error Handling

Error	Status	Condition
`HTTPException(500)`	500	Any exception from `get_all_circuit_breakers()`

8. Mermaid Diagram

flowchart TD
    A[GET /circuit-breakers] --> B[get_all_circuit_breakers]
    B --> C[For each registered provider]
    C --> D[breaker.get_state]
    D --> E[Load from Redis]
    E -->|Redis available| F[Parse state from Redis keys]
    E -->|Redis unavailable| G[Use in-memory state]
    F --> H[Calculate failure rate - rolling window]
    G --> H
    H --> I[Build state dict]
    I --> J{More providers?}
    J -->|Yes| C
    J -->|No| K[Count open/half_open/closed]
    K --> L[Return 200 with all states + summary]
    B -->|Exception| M[500 Internal Server Error]

Issue: #1709

API Endpoint Documentation: GET /circuit-breakers/{provider}

Handler: `get_circuit_breaker_state()` in `src/routes/circuit_breaker_status.py`

1. Overview

Returns the current state of a specific provider's circuit breaker. If the provider has no registered circuit breaker, one is created with default configuration.

Route: GET /circuit-breakers/{provider} Router prefix: /circuit-breakers Tags: circuit-breakers, monitoring Response model: dict[str, Any] Auth: None (public monitoring endpoint)

2. Path Parameters

Parameter	Type	Description
`provider`	`str`	Provider name (e.g., "openrouter", "groq")

3. Response Format

{
  "provider": "openrouter",
  "state": "closed",
  "failure_count": 0,
  "success_count": 15,
  "failure_rate": 0.0,
  "recent_requests": 15,
  "opened_at": null,
  "seconds_until_retry": 0
}

4. Dependency Chain

get_circuit_breaker_state()
├── get_circuit_breaker(provider) [src/services/circuit_breaker.py]
│   └── Gets or creates CircuitBreaker for provider (thread-safe via _registry_lock)
└── breaker.get_state()
    ├── _load_state_from_redis()
    └── _calculate_failure_rate()

5. Redis Operations

Same as GET /circuit-breakers (see issue #1708):

5 x GET operations on circuit_breaker:{provider}:* keys
Fallback: In-memory state if Redis unavailable

6. Prometheus Metrics

None directly emitted. Same underlying metrics as documented in issue #1708.

7. Error Handling

Error	Status	Condition
`HTTPException(500)`	500	Any exception

Note: There is no 404 case. If the provider has no existing circuit breaker, get_circuit_breaker() creates a new one with default config (CLOSED state, zero counts).

8. Mermaid Diagram

flowchart TD
    A[GET /circuit-breakers/provider] --> B[get_circuit_breaker provider]
    B --> C{Exists in registry?}
    C -->|No| D[Create new CircuitBreaker with defaults]
    C -->|Yes| E[Return existing breaker]
    D --> F[breaker.get_state]
    E --> F
    F --> G[Load state from Redis]
    G -->|Available| H[Parse Redis state]
    G -->|Unavailable| I[Use in-memory defaults]
    H --> J[Calculate failure rate]
    I --> J
    J --> K[Return 200 with state dict]
    F -->|Exception| L[500 Internal Server Error]

Issue: #1710

API Endpoint Documentation: POST /circuit-breakers/{provider}/reset

Handler: `reset_provider_circuit_breaker()` in `src/routes/circuit_breaker_status.py`

1. Overview

Manually resets a specific provider's circuit breaker to CLOSED state. Used when a provider has recovered and traffic should be immediately resumed.

Route: POST /circuit-breakers/{provider}/reset Router prefix: /circuit-breakers Tags: circuit-breakers, monitoring Response model: dict[str, Any] Auth: None (public endpoint - consider adding auth for production)

2. Path Parameters

Parameter	Type	Description
`provider`	`str`	Provider name to reset

3. Response Format

{
  "success": true,
  "message": "Circuit breaker for 'openrouter' has been reset",
  "state": {
    "provider": "openrouter",
    "state": "closed",
    "failure_count": 0,
    "success_count": 0,
    ...
  }
}

4. Dependency Chain

reset_provider_circuit_breaker()
├── reset_circuit_breaker(provider) [src/services/circuit_breaker.py]
│   ├── Thread-safe via _registry_lock
│   ├── Returns False if provider not in registry
│   └── breaker.reset()
│       ├── _transition_to(CLOSED, "manual reset")
│       │   ├── Reset failure_count, success_count, consecutive_opens
│       │   ├── _save_state_to_redis() — pipeline SETEX x5
│       │   └── Update Prometheus metrics (state transition + current state)
│       └── Clear _recent_requests list
├── get_circuit_breaker(provider) — get updated state
└── breaker.get_state() — return new state

5. Redis Operations

Write: Save reset state via pipeline

Operation	Key Pattern	Value	TTL
`SETEX`	`circuit_breaker:{provider}:state`	`"closed"`	3600s
`SETEX`	`circuit_breaker:{provider}:failure_count`	`"0"`	3600s
`SETEX`	`circuit_breaker:{provider}:success_count`	`"0"`	3600s
`SETEX`	`circuit_breaker:{provider}:opened_at`	`"0.0"`	3600s
`SETEX`	`circuit_breaker:{provider}:consecutive_opens`	`"0"`	3600s

Read: Load state for response (5x GET on same keys)

6. Prometheus Metrics

Emitted during _transition_to():

Metric	Operation	Labels
`circuit_breaker_state_transitions_total`	`.inc()`	`provider={provider}, from_state={old}, to_state=closed`
`circuit_breaker_current_state`	`.set(1)`	`provider={provider}, state=closed`
`circuit_breaker_current_state`	`.set(0)`	`provider={provider}, state={old_state}`

7. Error Handling

Error	Status	Condition
`HTTPException(404)`	404	Provider not found in circuit breaker registry
`HTTPException(500)`	500	Any unhandled exception

8. Mermaid Diagram

flowchart TD
    A[POST /circuit-breakers/provider/reset] --> B[reset_circuit_breaker provider]
    B --> C{Provider in registry?}
    C -->|No| D[404 Not Found]
    C -->|Yes| E[breaker.reset]
    E --> F[Transition to CLOSED]
    F --> G[Reset all counters]
    G --> H[Save to Redis pipeline]
    H --> I[Update Prometheus metrics]
    I --> J[get_circuit_breaker - fetch updated]
    J --> K[breaker.get_state]
    K --> L[Return 200 with success + new state]
    E -->|Exception| M[500 Internal Server Error]

Issue: #1711

API Endpoint Documentation: POST /circuit-breakers/reset-all

Handler: `reset_all_provider_circuit_breakers()` in `src/routes/circuit_breaker_status.py`

1. Overview

Bulk operation that resets ALL registered circuit breakers to CLOSED state. Use with caution — only when confident all providers have recovered.

Route: POST /circuit-breakers/reset-all Router prefix: /circuit-breakers Tags: circuit-breakers, monitoring Response model: dict[str, Any] Auth: None (public endpoint - consider adding auth for production)

2. Response Format

{
  "success": true,
  "message": "All circuit breakers have been reset",
  "reset_count": 5,
  "states": {
    "openrouter": { "provider": "openrouter", "state": "closed", ... },
    "groq": { "provider": "groq", "state": "closed", ... }
  }
}

3. Dependency Chain

reset_all_provider_circuit_breakers()
├── reset_all_circuit_breakers() [src/services/circuit_breaker.py]
│   ├── Thread-safe via _registry_lock
│   └── For each breaker in registry:
│       └── breaker.reset()
│           ├── _transition_to(CLOSED, "manual reset")
│           │   ├── _save_state_to_redis() — pipeline SETEX x5
│           │   └── Update Prometheus metrics
│           └── Clear _recent_requests
└── get_all_circuit_breakers() — fetch all states for response
    └── For each breaker: get_state() → load from Redis + calculate rates

4. Redis Operations

Write: For EACH provider (N providers × 5 keys):

Operation	Key Pattern	Value	TTL
`SETEX`	`circuit_breaker:{provider}:state`	`"closed"`	3600s
`SETEX`	`circuit_breaker:{provider}:failure_count`	`"0"`	3600s
`SETEX`	`circuit_breaker:{provider}:success_count`	`"0"`	3600s
`SETEX`	`circuit_breaker:{provider}:opened_at`	`"0.0"`	3600s
`SETEX`	`circuit_breaker:{provider}:consecutive_opens`	`"0"`	3600s

Read: For EACH provider (N × 5 GET for state response):

Same key patterns as above.

Total Redis operations: N * 10 (5 writes + 5 reads per provider) using pipelines for writes.

5. Prometheus Metrics

For EACH provider that transitions (emitted per breaker):

Metric	Labels
`circuit_breaker_state_transitions_total`	`provider, from_state, to_state=closed`
`circuit_breaker_current_state` (new)	`provider, state=closed` → set to 1
`circuit_breaker_current_state` (old)	`provider, state={old}` → set to 0

Note: If a breaker is already CLOSED, _transition_to returns early (no-op).

6. Error Handling

Error	Status	Condition
`HTTPException(500)`	500	Any exception

No 404 case — if no breakers registered, returns reset_count: 0 with empty states.

Logging: Uses logger.warning for the reset (audit trail).

7. Mermaid Diagram

flowchart TD
    A[POST /circuit-breakers/reset-all] --> B[reset_all_circuit_breakers]
    B --> C[For each breaker in registry]
    C --> D[breaker.reset]
    D --> E[Transition to CLOSED]
    E --> F[Save to Redis]
    F --> G[Update Prometheus]
    G --> H{More breakers?}
    H -->|Yes| C
    H -->|No| I[get_all_circuit_breakers]
    I --> J[Load all states]
    J --> K[Return 200 with reset_count + all states]
    B -->|Exception| L[500 Internal Server Error]

Code Router

5 endpoints

Issue: #1712

API Endpoint Documentation: GET /code-router/settings/options

Handler: `get_code_router_settings_options()` in `src/routes/code_router.py`

1. Overview

Returns available settings options for the code router, including configurable fields and their descriptions, plus all available routing modes. Used by client applications to build dynamic settings UIs.

Route: GET /code-router/settings/options Router prefix: /code-router Tags: code-router Auth: None (intentionally public — exposes non-sensitive configuration)

2. Response Format

{
  "success": true,
  "options": {
    "use_code_router": {
      "type": "boolean",
      "default": true,
      "label": "Use Code Router",
      "description": "Enable intelligent model selection based on task complexity"
    },
    "optimization_mode": {
      "type": "select",
      "default": "balanced",
      "label": "Optimization Mode",
      "description": "How to balance cost and quality",
      "options": [
        {"value": "balanced", "label": "Balanced", "description": "..."},
        {"value": "price", "label": "Price Optimized", "description": "..."},
        {"value": "quality", "label": "Quality Optimized", "description": "..."},
        {"value": "agentic", "label": "Agentic Mode", "description": "..."}
      ],
      "depends_on": {"use_code_router": true}
    },
    "manual_model": {
      "type": "model_select",
      "default": "anthropic/claude-sonnet-4",
      "label": "Manual Model",
      "depends_on": {"use_code_router": false}
    },
    "show_routing_info": { "type": "boolean", "default": true, ... },
    "show_savings": { "type": "boolean", "default": true, ... }
  },
  "modes": [
    {"value": "balanced", "label": "Balanced", "description": "Auto-select best price/performance balance"},
    {"value": "price", "label": "Price", "description": "Optimize for lowest cost while maintaining quality"},
    {"value": "quality", "label": "Quality", "description": "Optimize for highest quality, use better models"},
    {"value": "agentic", "label": "Agentic", "description": "Always use premium models for complex tasks"}
  ]
}

3. Dependency Chain

get_code_router_settings_options()
├── get_settings_options() [src/services/code_router_client.py]
│   └── Returns hardcoded dict describing all configurable options
└── CodeRouterMode enum [src/services/code_router_client.py]
    └── Iterated to build modes list
    └── _get_mode_description(mode) [src/routes/code_router.py]
        └── Returns description string from hardcoded dict

4. Key Types

`CodeRouterMode` (Enum)

Value	Description
`BALANCED`	`"balanced"` - Auto-select best price/performance
`PRICE`	`"price"` - Optimize for lowest cost
`QUALITY`	`"quality"` - Optimize for highest quality
`AGENTIC`	`"agentic"` - Always use premium models

5. Supabase Queries

None. Entirely in-memory/static configuration.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Error Handling

No explicit error handling. This endpoint is a simple dict return with no external dependencies. If an exception occurs (unlikely), FastAPI's default 500 handler catches it.

9. Mermaid Diagram

flowchart TD
    A[GET /code-router/settings/options] --> B[get_settings_options - static config]
    B --> C[Build options dict]
    A --> D[Iterate CodeRouterMode enum]
    D --> E[Build modes list with descriptions]
    C --> F[Return 200 JSON]
    E --> F

Issue: #1713

API Endpoint Documentation: GET /code-router/tiers

Handler: `get_code_router_tiers()` in `src/routes/code_router.py`

1. Overview

Returns model tier configuration for the code router, including models per tier, the fallback model, and baseline models for savings calculations. Data is loaded from the code_quality_priors.json file.

Route: GET /code-router/tiers Router prefix: /code-router Tags: code-router Auth: None (intentionally public — exposes non-sensitive configuration)

2. Response Format

{
  "success": true,
  "tiers": {
    "1": {
      "models": [
        {
          "id": "anthropic/claude-opus-4",
          "name": "Claude Opus 4",
          "provider": "anthropic",
          "swe_bench": 72.5,
          "human_eval": 96.4,
          "price_input": 15.0,
          "price_output": 75.0,
          "strengths": ["code_generation", "debugging", "architecture"]
        }
      ]
    },
    "2": { ... },
    "3": { ... },
    "4": { ... }
  },
  "fallback_model": {
    "id": "zai/glm-4.7",
    "provider": "zai"
  },
  "baselines": {
    "gpt-4o": { "price_input": ..., "price_output": ... },
    "claude-3.5-sonnet": { ... }
  }
}

3. Dependency Chain

get_code_router_tiers()
├── get_model_tiers() [src/services/code_router.py]
│   └── _load_quality_priors() — lazy-loads from code_quality_priors.json
│       └── File: src/services/code_quality_priors.json
│       └── Cached in module-level _quality_priors variable
├── get_fallback_model() [src/services/code_router.py]
│   └── _load_quality_priors() (cached, returns same dict)
└── get_baselines() [src/services/code_router.py]
    └── _load_quality_priors() (cached, returns same dict)

4. File Loading Details

`_load_quality_priors()`:

File: src/services/code_quality_priors.json
Caching: Module-level _quality_priors variable — loaded once, never reloaded
Error handling: If file not found or JSON parse fails:
- Logs error
- Captures to Sentry (if available)
- Falls back to minimal config: {"model_tiers": {}, "fallback_model": {"id": "zai/glm-4.7", "provider": "zai"}, "baselines": {}}

5. Supabase Queries

None. All data comes from static JSON file.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Error Handling

No explicit error handling in the route. The underlying _load_quality_priors() has its own error handling with fallback values. If the function somehow throws, FastAPI's default 500 handler catches it.

9. Mermaid Diagram

flowchart TD
    A[GET /code-router/tiers] --> B{Quality priors loaded?}
    B -->|Yes - cached| C[Return cached data]
    B -->|No - first call| D[Load code_quality_priors.json]
    D -->|Success| E[Cache in module variable]
    D -->|File error| F[Log error + Sentry capture]
    F --> G[Use minimal fallback config]
    E --> H[Extract model_tiers]
    G --> H
    H --> I[Extract fallback_model]
    I --> J[Extract baselines]
    J --> K[Return 200 JSON with tiers + fallback + baselines]

Issue: #1714

API Endpoint Documentation: GET /code-router/stats

Overview

Handler: get_code_router_stats() in src/routes/code_router.py (line 233) Router prefix: /code-router Tags: ["code-router"] Authentication: None required (public endpoint)

Pydantic Schema

Response

Returns dict[str, Any] (no Pydantic model). Shape:

# Success case:
{
    "success": True,
    "stats": {
        "tiers_loaded": int,           # Number of model tiers
        "models_available": int,        # Total models across all tiers
        "fallback_model": str | None,   # Fallback model ID
        "baselines": list[str],         # Baseline model keys
        "metrics_enabled": bool,        # Whether Prometheus module found
    }
}

# Error case (graceful degradation, still 200):
{
    "success": False,
    "error": str,
    "stats": {}
}

Dependency Trace (3+ levels deep)

get_code_router_stats()
├── get_router()                              # src/services/code_router.py:405
│   └── CodeRouter.__init__()                 # src/services/code_router.py:93
│       ├── get_classifier()                  # src/services/code_classifier.py
│       ├── get_model_tiers()                 # src/services/code_router.py:66
│       │   └── _load_quality_priors()        # src/services/code_router.py:32
│       │       └── Reads code_quality_priors.json from disk (cached globally)
│       ├── get_fallback_model()              # src/services/code_router.py:71
│       │   └── _load_quality_priors()        # (cached)
│       ├── get_baselines()                   # src/services/code_router.py:76
│       │   └── _load_quality_priors()        # (cached)
│       └── _build_model_lookup()             # src/services/code_router.py:102
│           └── Builds _model_lookup and _tier_models dicts
├── router_instance.model_tiers               # Access cached tiers dict
├── router_instance.fallback_model            # Access cached fallback
├── router_instance.baselines                 # Access cached baselines
└── importlib.util.find_spec("src.services.prometheus_metrics")
    └── Checks if Prometheus metrics module is importable

Supabase Queries

None. This endpoint reads only from in-memory/file-cached data.

Redis Operations

None. No Redis interaction.

Prometheus Metrics

None emitted directly by this endpoint. It only checks if the src.services.prometheus_metrics module is importable via importlib.util.find_spec().

Middleware Effects

Standard FastAPI middleware pipeline applies (Sentry, observability, timeout, security, gzip, trace)
No authentication middleware enforced (no Depends() for auth)

Error Handling

Error Path	Status Code	Detail
Any exception in handler	200	Returns `{"success": False, "error": str(e), "stats": {}}` - graceful degradation

The endpoint intentionally does not raise HTTPException on failure. It returns a 200 with success: False for graceful degradation since stats are non-critical.

Mermaid Diagram

flowchart TD
    A[GET /code-router/stats] --> B{try block}
    B --> C[get_router - singleton]
    C --> D{_router is None?}
    D -->|Yes| E[CodeRouter.__init__]
    E --> F[Load quality priors from JSON]
    F --> G[Build model lookup]
    D -->|No| H[Return cached instance]
    G --> H
    H --> I[Build stats dict]
    I --> J[Count tiers_loaded]
    I --> K[Sum models_available]
    I --> L[Get fallback_model.id]
    I --> M[List baselines keys]
    I --> N[Check prometheus_metrics importable]
    N --> O[Return success: true + stats]
    B -->|Exception| P[Log error]
    P --> Q[Return success: false + error string + empty stats]

Issue: #1715

API Endpoint Documentation: POST /code-router/test

Overview

Handler: test_code_routing() in src/routes/code_router.py (line 143) Router prefix: /code-router Tags: ["code-router"] Authentication: None required (public endpoint)

Pydantic Schemas

Request: `RouteTestRequest`

Field	Type	Default	Validation
`prompt`	`str`	required	Must be non-empty
`mode`	`RoutingMode` (Literal["auto","price","quality","agentic"])	`"auto"`	`field_validator` lowercases and validates against `VALID_ROUTING_MODES`
`context`	`dict[str, Any] \| None`	`None`	Optional context dict

Response: `RouteTestResponse`

Field	Type	Description
`model_id`	`str`	Selected model identifier
`provider`	`str`	Provider slug
`tier`	`int`	Selected tier number (1-4)
`task_category`	`str`	Classified task category
`complexity`	`str`	Classified complexity level
`confidence`	`float`	Classification confidence score
`mode`	`str`	Routing mode used
`routing_latency_ms`	`float`	Time taken for routing decision in ms
`savings_estimate`	`dict[str, Any]`	Savings vs baselines
`model_info`	`dict[str, Any]`	Selected model metadata

Dependency Trace (3+ levels deep)

test_code_routing(request)
├── route_code_prompt()                           # src/services/code_router.py:413
│   └── get_router().route()                      # src/services/code_router.py:113
│       ├── classifier.classify(prompt, context)  # src/services/code_classifier.py
│       │   └── Pattern matching + keyword analysis
│       │       └── Returns {category, complexity, default_tier, min_tier, confidence}
│       ├── _calculate_target_tier()              # src/services/code_router.py:198
│       │   └── Mode-based tier selection with quality gates
│       │       ├── agentic → always tier 1
│       │       ├── quality → max(1, default_tier - 1) clamped by min_tier
│       │       ├── price → default_tier clamped by min_tier
│       │       └── auto → default_tier clamped by min_tier
│       ├── _select_model_from_tier()             # src/services/code_router.py:237
│       │   └── Score models by strengths, price/quality benchmarks
│       │       └── Returns highest-scored model or fallback_model
│       ├── _calculate_savings_estimate()         # src/services/code_router.py:292
│       │   └── Compare selected model cost vs baselines (1K input + 500 output tokens)
│       └── _track_routing_metrics()              # src/services/code_router.py:360
│           ├── code_router_requests_total.labels(...).inc()
│           ├── code_router_latency_seconds.observe(...)
│           └── code_router_savings_dollars.labels(...).inc()
└── Return RouteTestResponse

Supabase Queries

None. All data comes from in-memory/file-cached configuration.

Redis Operations

None. No Redis interaction.

Prometheus Metrics

Metric Name	Type	Labels
`code_router_requests_total`	Counter	`task_category`, `complexity`, `mode`, `selected_model`, `selected_tier`
`code_router_latency_seconds`	Histogram	(none) — buckets: 0.5ms to 100ms
`code_router_savings_dollars_total`	Counter	`baseline`, `task_category`

Metrics are emitted via _track_routing_metrics() inside CodeRouter.route(). If prometheus_metrics module is not importable, metrics are silently skipped.

Middleware Effects

Standard FastAPI middleware pipeline (Sentry, observability, timeout, security, gzip, trace)
No authentication middleware (no Depends() for auth)
Request body validated by Pydantic before handler runs

Error Handling

Error Path	Status Code	Detail
Invalid `mode` value	422	Pydantic validation error
Missing `prompt` field	422	Pydantic validation error
Any exception in `route_code_prompt()`	500	`"Routing test failed: {error}"`

Mermaid Diagram

flowchart TD
    A[POST /code-router/test] --> B[Pydantic validation]
    B -->|Invalid| B1[422 Validation Error]
    B -->|Valid| C{try block}
    C --> D[route_code_prompt]
    D --> E[get_router - singleton]
    E --> F[classifier.classify prompt]
    F --> G[Determine category + complexity + tiers]
    G --> H[_calculate_target_tier based on mode]
    H --> I{Mode?}
    I -->|agentic| I1[Tier 1 always]
    I -->|quality| I2[Bump up tier, respect min_tier]
    I -->|price| I3[Default tier, respect min_tier]
    I -->|auto| I4[Default tier, respect min_tier]
    I1 --> J[_select_model_from_tier]
    I2 --> J
    I3 --> J
    I4 --> J
    J --> K[Score models by strengths + mode preference]
    K --> L{Models in tier?}
    L -->|Yes| M[Select highest scored]
    L -->|No| N[Use fallback_model]
    M --> O[_calculate_savings_estimate]
    N --> O
    O --> P[_track_routing_metrics - Prometheus]
    P --> Q[Return RouteTestResponse]
    C -->|Exception| R[Log error]
    R --> S[HTTPException 500]

Issue: #1716

API Endpoint Documentation: POST /code-router/settings/validate

Overview

Handler: validate_code_router_settings() in src/routes/code_router.py (line 175) Router prefix: /code-router Tags: ["code-router"] Authentication: None required (public endpoint)

Pydantic Schemas

Request: `SettingsValidationRequest`

Field	Type	Default	Validation
`use_code_router`	`bool`	`True`	Standard bool
`optimization_mode`	`str`	`"balanced"`	Validated against `CodeRouterMode` enum values
`manual_model`	`str \| None`	`None`	Optional; required when `use_code_router=False`

Response: `SettingsValidationResponse`

Field	Type	Default	Description
`valid`	`bool`	-	Whether settings are valid
`model_string`	`str`	-	Resulting model string (e.g., `"router:code:price"`)
`errors`	`list[str]`	`[]`	Validation errors
`warnings`	`list[str]`	`[]`	Validation warnings

Dependency Trace (3+ levels deep)

validate_code_router_settings(request)
├── CodeRouterMode enum validation           # src/services/code_router_client.py:27
│   └── Check optimization_mode against ["balanced", "price", "quality", "agentic"]
├── If use_code_router is False:
│   ├── Check manual_model is provided
│   └── _is_valid_model_id(manual_model)     # src/routes/code_router.py:289
│       └── Check "/" in model_id OR model_id in known_aliases list
│           └── Known aliases: gpt-4, gpt-4o, gpt-4o-mini, gpt-3.5-turbo, 
│               claude-3-opus, claude-3-sonnet, claude-3-haiku, gemini-pro, gemini-flash
├── CodeRouterSettings(...)                   # src/services/code_router_client.py:196
│   └── Pydantic model with mode, manual_model, router toggle
└── settings.get_model_string()              # src/services/code_router_client.py:232
    ├── If not use_code_router → return manual_model
    ├── If BALANCED → "router:code"
    └── Else → f"router:code:{mode.value}"

Supabase Queries

None. Pure validation logic, no database interaction.

Redis Operations

None. No Redis interaction.

Prometheus Metrics

None. No metrics emitted.

Middleware Effects

Standard FastAPI middleware pipeline (Sentry, observability, timeout, security, gzip, trace)
No authentication middleware
Request body validated by Pydantic before handler runs

Error Handling

Error Path	Status Code	Detail
Pydantic validation failure on request	422	Automatic validation error
Invalid `optimization_mode`	200	Returns `valid: false` with error in `errors` list
`use_code_router=False` without `manual_model`	200	Returns `valid: false` with error
Invalid `manual_model` format	200	Returns `valid: true` with warning in `warnings` list
Exception in `CodeRouterSettings` construction	200	Returns `valid: false` with error string

Note: This endpoint never raises HTTPException. All validation failures return 200 with valid: false.

Mermaid Diagram

flowchart TD
    A[POST /code-router/settings/validate] --> B[Pydantic request validation]
    B -->|Invalid| B1[422 Validation Error]
    B -->|Valid| C[Initialize errors + warnings lists]
    C --> D{optimization_mode in valid modes?}
    D -->|No| E[Add error: invalid mode]
    D -->|Yes| F{use_code_router?}
    F -->|False| G{manual_model provided?}
    G -->|No| H[Add error: manual_model required]
    G -->|Yes| I{_is_valid_model_id?}
    I -->|No| J[Add warning: model may not be available]
    I -->|Yes| K[Continue]
    F -->|True| K
    J --> K
    H --> K
    E --> K
    K --> L{errors list empty?}
    L -->|No| M[Return valid:false + errors + warnings]
    L -->|Yes| N{try: build CodeRouterSettings}
    N -->|Success| O[settings.get_model_string]
    O --> P[Return valid:true + model_string + warnings]
    N -->|Exception| Q[Return valid:false + error string]

Coupons

3 endpoints

Issue: #1717

API Endpoint Documentation: GET /coupons/available

Overview

Handler: get_available_coupons() in src/routes/coupons.py (line 88) Tags: ["coupons"] Authentication: Required - get_current_user (Bearer token)

Pydantic Schemas

Response: `list[AvailableCouponResponse]`

Field	Type	Description
`coupon_id`	`int`	Coupon ID
`code`	`str`	Coupon code
`value_usd`	`float`	Dollar value
`coupon_scope`	`str`	"user_specific" or "global"
`coupon_type`	`str`	"promotional", "referral", "compensation", "partnership"
`description`	`str \| None`	Internal description
`valid_until`	`datetime`	Expiration date
`remaining_uses`	`int`	Uses remaining

Dependency Trace (3+ levels deep)

get_available_coupons(user)
├── Depends(get_current_user)                    # src/security/deps.py:192
│   ├── Depends(get_api_key)                     # src/security/deps.py:74
│   │   ├── HTTPBearer credential extraction
│   │   ├── validate_api_key_security()          # src/security/security.py
│   │   │   └── Key lookup, status, expiry, IP, domain checks
│   │   ├── get_user(api_key)                    # src/services/user_lookup_cache.py
│   │   └── audit_logger.log_api_key_usage()     # src/security/security.py
│   ├── get_user(api_key)                        # src/services/user_lookup_cache.py
│   └── validate_trial_expiration(user)          # src/utils/trial_utils.py
│       └── Raises HTTPException(402) if trial expired
├── get_available_coupons_for_user(user_id)      # src/db/coupons.py:450
│   ├── get_supabase_client()                    # src/config/supabase_config.py
│   └── client.rpc("get_available_coupons",      # Supabase RPC call
│       {"p_user_id": user_id})
└── Return [AvailableCouponResponse(**c) for c in coupons]

Supabase Queries

Operation	Table/RPC	Details
RPC call	`get_available_coupons`	Params: `{"p_user_id": user_id}`

The get_available_coupons is a PostgreSQL function that returns both user-specific coupons assigned to this user and global coupons not yet redeemed by this user.

Redis Operations

None directly. However, get_user() via user_lookup_cache may use Redis for user caching.

Prometheus Metrics

None emitted directly by this endpoint. Authentication middleware may emit metrics.

Middleware Effects

Standard middleware pipeline (Sentry, observability, timeout, security, gzip, trace)
Bearer token authentication via get_current_user dependency chain
API key validated for: active status, expiration, request limits, IP allowlist, domain restrictions
Trial expiration checked (raises 402 if expired)
Audit log entry created for API key usage

Error Handling

Error Path	Status Code	Detail
No Authorization header	401	"Authorization header is required"
Invalid/inactive API key	401	Various key validation messages
Expired API key	401	"API key expired"
Rate limit exceeded	429	"limit reached"
IP not allowed	403	"IP address not allowed"
User not found	404	"User not found"
Trial expired	402	Trial expiration message
RPC call fails	500	"Internal server error"
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[GET /coupons/available] --> B[get_current_user dependency]
    B --> C[get_api_key - validate Bearer token]
    C -->|No token| C1[401 Authorization required]
    C -->|Invalid| C2[401/403/429 Key validation error]
    C -->|Valid| D[get_user - lookup user]
    D -->|Not found| D1[404 User not found]
    D -->|Found| E[validate_trial_expiration]
    E -->|Expired| E1[402 Trial expired]
    E -->|Valid| F{try block}
    F --> G[get_available_coupons_for_user]
    G --> H[Supabase RPC: get_available_coupons]
    H --> I[Return coupon list from DB function]
    I --> J[Map to AvailableCouponResponse list]
    J --> K[Return 200 with coupon list]
    F -->|HTTPException| L[Re-raise]
    F -->|Other Exception| M[Log error]
    M --> N[500 Internal server error]

Issue: #1718

API Endpoint Documentation: GET /coupons/history

Overview

Handler: get_redemption_history() in src/routes/coupons.py (line 112) Tags: ["coupons"] Authentication: Required - get_current_user (Bearer token)

Pydantic Schemas

Query Parameters

Param	Type	Default	Validation
`limit`	`int`	`50`	Standard int

Response: `RedemptionHistoryResponse`

Field	Type	Description
`redemptions`	`list[RedemptionHistoryItem]`	List of redemption records
`total_redemptions`	`int`	Count of redemptions
`total_value_redeemed`	`float`	Sum of all values redeemed

`RedemptionHistoryItem`

Field	Type	Description
`id`	`int`	Redemption record ID
`coupon_code`	`str`	Coupon code
`coupon_scope`	`str`	"user_specific" or "global"
`coupon_type`	`str`	Type category
`value_applied`	`float`	Value applied
`redeemed_at`	`datetime`	Redemption timestamp
`user_balance_before`	`float`	Balance before redemption
`user_balance_after`	`float`	Balance after redemption

Dependency Trace (3+ levels deep)

get_redemption_history(limit, user)
├── Depends(get_current_user)                    # (same auth chain as #1717)
├── get_user_redemption_history(user_id, limit)  # src/db/coupons.py:474
│   ├── get_supabase_client()                    # src/config/supabase_config.py
│   └── client.table("coupon_redemptions")
│       .select("*, coupons(code, coupon_type, coupon_scope)")
│       .eq("user_id", user_id)
│       .order("redeemed_at", desc=True)
│       .limit(limit)
│       .execute()
└── Transform data:
    ├── Extract nested "coupons" join data
    ├── Build RedemptionHistoryItem for each record
    └── Sum total_value from value_applied fields

Supabase Queries

Operation	Table	Columns	Filters	Order
SELECT	`coupon_redemptions`	`*, coupons(code, coupon_type, coupon_scope)`	`.eq("user_id", user_id)`	`.order("redeemed_at", desc=True)`

This uses a PostgREST foreign key join to fetch coupon details inline with redemption records. Limited by the limit parameter (default 50).

Redis Operations

None directly. User lookup cache may use Redis.

Prometheus Metrics

None.

Middleware Effects

Standard middleware pipeline
Bearer token authentication via get_current_user dependency chain
Trial expiration validation

Error Handling

Error Path	Status Code	Detail
Auth failures	401/402/403/404/429	Various auth errors (same as #1717)
Supabase query error	500	"Internal server error"
Any other exception	500	"Internal server error"

On Supabase error, get_user_redemption_history returns [] (empty list), which would result in an empty response rather than a 500.

Mermaid Diagram

flowchart TD
    A[GET /coupons/history?limit=50] --> B[get_current_user dependency]
    B -->|Auth fail| B1[401/402/403/404/429]
    B -->|Success| C{try block}
    C --> D[get_user_redemption_history]
    D --> E[Supabase SELECT coupon_redemptions + JOIN coupons]
    E --> F[Return redemptions list]
    F --> G{For each redemption}
    G --> H[Extract nested coupons data]
    H --> I[Build RedemptionHistoryItem]
    I --> J[Accumulate total_value]
    J --> K[Return RedemptionHistoryResponse]
    C -->|HTTPException| L[Re-raise]
    C -->|Other Exception| M[500 Internal server error]

Issue: #1723

API Endpoint Documentation: POST /coupons/redeem

Overview

Handler: redeem_coupon_endpoint() in src/routes/coupons.py (line 47) Tags: ["coupons"] Authentication: Required - get_current_user (Bearer token)

Pydantic Schemas

Request: `RedeemCouponRequest`

Field	Type	Default	Validation
`code`	`str`	required	`min_length=3`, `max_length=50`

Response: `RedemptionResponse`

Field	Type	Default	Description
`success`	`bool`	-	Whether redemption succeeded
`message`	`str`	-	Success/error message
`coupon_code`	`str \| None`	`None`	Redeemed code
`coupon_value`	`float \| None`	`None`	Dollar value applied
`previous_balance`	`float \| None`	`None`	Balance before
`new_balance`	`float \| None`	`None`	Balance after
`error_code`	`str \| None`	`None`	Error code if failed

Dependency Trace (3+ levels deep)

redeem_coupon_endpoint(request, redemption_request, user)
├── Depends(get_current_user)                   # (auth chain)
├── Extract client_host + user_agent from request
├── redeem_coupon(code, user_id, ip, ua)        # src/db/coupons.py:316
│   ├── get_supabase_client()
│   ├── Step 1: validate_coupon(code, user_id)  # src/db/coupons.py:251
│   │   ├── get_supabase_client()
│   │   └── client.rpc("is_coupon_redeemable",
│   │       {"p_coupon_code": code, "p_user_id": user_id})
│   │       └── PostgreSQL function validates:
│   │           - Code exists, is_active, not expired
│   │           - User hasn't already redeemed
│   │           - max_uses not exceeded
│   │           - scope rules (user_specific assignment)
│   │       └── Returns: {is_valid, error_code, error_message, coupon_id, coupon_value}
│   ├── If not valid → return failure dict
│   ├── Step 2: Get user balance
│   │   └── SELECT credits FROM users WHERE id = user_id
│   ├── Step 3: Update user balance
│   │   └── UPDATE users SET credits = new_balance WHERE id = user_id
│   ├── Step 4: Increment coupon usage (two methods)
│   │   ├── SELECT times_used + manual UPDATE coupons (non-atomic)
│   │   └── client.rpc("increment", {"row_id": coupon_id, "x": 1})
│   ├── Step 5: Record redemption
│   │   └── INSERT INTO coupon_redemptions {coupon_id, user_id, value_applied,
│   │       user_balance_before, user_balance_after, ip_address, user_agent}
│   └── Return success dict
├── If result not success → JSONResponse(400)
└── If success → Return RedemptionResponse

Supabase Queries

Step	Operation	Table	Details
1	RPC	`is_coupon_redeemable`	Params: `p_coupon_code`, `p_user_id`
2	SELECT	`users`	Columns: `credits`, Filter: `.eq("id", user_id)`
3	UPDATE	`users`	Set: `credits=new_balance`, Filter: `.eq("id", user_id)`
4a	SELECT + UPDATE	`coupons`	Read `times_used`, then update (non-atomic)
4b	RPC	`increment`	Params: `row_id`, `x` (atomic increment)
5	INSERT	`coupon_redemptions`	All redemption fields

Note: Step 4 has a race condition -- it does both a non-atomic read+write AND an RPC atomic increment. This could result in double-incrementing times_used.

Redis Operations

None directly.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth failures	401/402/403/404/429	Various
Invalid code format	422	Pydantic validation (min/max length)
Coupon not redeemable (validation fails)	400	JSONResponse with result dict
User not found (during balance lookup)	Returns failure dict → 400
Balance update fails	500 via exception path	Internal error
Redemption record insert fails	Still returns success	Audit failure only
Any other exception	500	"Internal server error"

Mermaid Diagram

flowchart TD
    A[POST /coupons/redeem] --> B[Pydantic validation]
    B -->|Invalid code length| B1[422]
    B -->|Valid| C[get_current_user auth]
    C -->|Auth fail| C1[401/402/403/404/429]
    C -->|Success| D{try block}
    D --> E[Extract client_host + user_agent]
    E --> F[redeem_coupon]
    F --> G[validate_coupon via RPC]
    G --> H{is_valid?}
    H -->|No| I[Return 400 with error]
    H -->|Yes| J[SELECT user credits]
    J --> K{User found?}
    K -->|No| L[Return failure dict → 400]
    K -->|Yes| M[UPDATE users SET credits]
    M --> N{Update succeeded?}
    N -->|No| O[Raise exception → 500]
    N -->|Yes| P[Increment coupon times_used]
    P --> Q[INSERT coupon_redemptions]
    Q --> R{Insert succeeded?}
    R -->|No| S[Log error but continue]
    R -->|Yes| T[Return 200 RedemptionResponse]
    S --> T
    D -->|HTTPException| U[Re-raise]
    D -->|Other Exception| V[500 Internal server error]

Credits

6 endpoints

Issue: #1727

API Endpoint Documentation: GET /credits/summary

Overview

Handler: get_credits_summary_endpoint() in src/routes/credits.py (line 625) Tags: ["credits", "admin"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Query Parameters

Param	Type	Default	Description
`user_id`	`int \| None`	`None`	Filter by specific user
`from_date`	`str \| None`	`None`	Start date (YYYY-MM-DD)
`to_date`	`str \| None`	`None`	End date (YYYY-MM-DD)

Response: `dict[str, Any]`

User-specific response (when user_id is provided):

{
    "status": "success",
    "user_id": int,
    "user_info": {"id": int, "username": str, "credits": float},
    "current_balance": float,
    "summary": {  # from get_transaction_summary()
        "total_transactions": int,
        "total_credits_added": float,
        "total_credits_used": float,
        "net_change": float,
        "by_type": {type: {"count": int, "total_amount": float, "average_amount": float}},
        "daily_breakdown": [{"date": str, "credits_added": float, "credits_used": float, "count": int}],
        "largest_credit": {...} | None,
        "largest_charge": {...} | None,
        "average_transaction": float,
        "transaction_count_by_direction": {"credits": int, "charges": int}
    },
    "filters": {"from_date": str, "to_date": str},
    "timestamp": str
}

System-wide response (no user_id):

{
    "status": "success",
    "system_summary": {
        "total_users": int,
        "total_credits_in_system": float,
        "average_credits_per_user": float,
        "total_transactions": int,
        "total_credits_added": float,
        "total_credits_used": float,
        "net_change": float,
        "by_type": {type: {"count": int, "total_amount": float}}
    },
    "filters": {...},
    "timestamp": str
}

Dependency Trace (3+ levels deep)

get_credits_summary_endpoint(user_id, from_date, to_date, admin_user)
├── Depends(require_admin)                          # (admin auth chain)
├── get_supabase_client()
├── If user_id provided:
│   ├── get_transaction_summary(user_id, from_date, to_date)  # src/db/credit_transactions.py:492
│   │   ├── get_supabase_client()
│   │   ├── SELECT * FROM credit_transactions WHERE user_id = ?
│   │   │   + optional .gte("created_at", from_date)
│   │   │   + optional .lte("created_at", to_date)
│   │   └── Client-side aggregation:
│   │       ├── total_credits_added (positive amounts)
│   │       ├── total_credits_used (negative amounts)
│   │       ├── by_type breakdown
│   │       ├── daily_breakdown
│   │       ├── largest_credit / largest_charge
│   │       └── average_transaction
│   └── SELECT id, username, credits FROM users WHERE id = user_id
├── If no user_id (system-wide):
│   ├── SELECT id, credits FROM users (ALL users)
│   ├── SELECT transaction_type, amount FROM credit_transactions
│   │   + optional date filters
│   └── Client-side aggregation
└── Return dict response

Supabase Queries

User-specific path

Operation	Table	Columns	Filters
SELECT	`credit_transactions`	`*`	`.eq("user_id", user_id)` + optional date range
SELECT	`users`	`id, username, credits`	`.eq("id", user_id)`

System-wide path

Operation	Table	Columns	Filters
SELECT	`users`	`id, credits`	None (all users)
SELECT	`credit_transactions`	`transaction_type, amount`	Optional date range

Performance Warning: System-wide path fetches ALL users and ALL transactions for aggregation client-side.

Redis Operations

None.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Any exception	500	"Failed to get credits summary"

Mermaid Diagram

flowchart TD
    A[GET /credits/summary] --> B[require_admin]
    B -->|Not admin| B1[403]
    B -->|Admin| C{user_id provided?}
    C -->|Yes| D[get_transaction_summary for user]
    D --> E[SELECT credit_transactions WHERE user_id]
    E --> F[Client-side aggregation]
    F --> G[SELECT user info from users]
    G --> H[Return user-specific summary]
    C -->|No| I[SELECT all users with credits]
    I --> J[SELECT all credit_transactions]
    J --> K[Client-side aggregation]
    K --> L[Calculate totals, averages, by_type]
    L --> M[Return system-wide summary]
    C -->|Exception| N[500 Failed to get credits summary]

Issue: #1728

API Endpoint Documentation: GET /credits/transactions

Overview

Handler: get_credits_transactions_endpoint() in src/routes/credits.py (line 741) Tags: ["credits", "admin"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Query Parameters

Param	Type	Default	Validation	Description
`limit`	`int`	`50`	`ge=1, le=1000`	Max transactions to return
`offset`	`int`	`0`	`ge=0`	Pagination offset
`user_id`	`int \| None`	`None`	-	Filter by user
`transaction_type`	`str \| None`	`None`	-	Filter by type (trial, purchase, api_usage, admin_credit, admin_debit, refund, bonus, transfer)
`from_date`	`str \| None`	`None`	-	Start date (YYYY-MM-DD or ISO)
`to_date`	`str \| None`	`None`	-	End date (YYYY-MM-DD or ISO)
`min_amount`	`float \| None`	`None`	-	Min absolute amount
`max_amount`	`float \| None`	`None`	-	Max absolute amount
`direction`	`str \| None`	`None`	Must be "credit" or "charge"	Filter positive/negative
`sort_by`	`str`	`"created_at"`	Must be "created_at", "amount", or "transaction_type"	Sort field
`sort_order`	`str`	`"desc"`	Must be "asc" or "desc"	Sort order

Response: `dict[str, Any]`

{
    "status": "success",
    "transactions": [{
        "id": int, "user_id": int, "amount": float,
        "transaction_type": str, "description": str,
        "balance_before": float, "balance_after": float,
        "created_at": str, "payment_id": int | None,
        "metadata": dict, "created_by": str | None
    }],
    "pagination": {
        "total": int,      # Count of returned items (not DB total)
        "limit": int, "offset": int, "has_more": bool
    },
    "filters_applied": {all filter values},
    "timestamp": str
}

Dependency Trace (3+ levels deep)

get_credits_transactions_endpoint(...)
├── Depends(require_admin)
├── Validate direction, sort_by, sort_order
├── get_all_transactions(limit+1, ...)        # src/db/credit_transactions.py:290
│   ├── get_supabase_client()
│   ├── Build query: SELECT * FROM credit_transactions
│   │   + optional .eq("user_id", user_id)
│   │   + optional .eq("transaction_type", type)
│   │   + optional .gte("created_at", from_date)
│   │   + optional .lte("created_at", to_date)
│   │   + optional .gt("amount", 0) for "credit" direction
│   │   + optional .lt("amount", 0) for "charge" direction
│   │   + .order(sort_by, desc=desc_order)
│   ├── If min_amount/max_amount:
│   │   └── Fetch ALL, filter client-side by abs(amount), then paginate
│   └── Else:
│       └── .range(offset, offset+limit-1) — DB-side pagination
├── has_more = len(results) > limit
├── Trim to limit
├── Format transactions list
└── Return response dict

Supabase Queries

Operation	Table	Columns	Filters	Pagination
SELECT	`credit_transactions`	`*`	user_id, transaction_type, date range, direction	DB-side `.range()` OR client-side (if amount filters)

Performance Note: When min_amount or max_amount are used, the query fetches ALL matching rows and filters client-side, then applies pagination. This can be slow for large datasets.

has_more detection: Fetches limit + 1 rows; if more than limit returned, has_more = True.

Redis Operations

None.

Prometheus Metrics

None.

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Invalid direction	400	"direction must be 'credit' or 'charge'"
Invalid sort_by	400	"sort_by must be 'created_at', 'amount', or 'transaction_type'"
Invalid sort_order	400	"sort_order must be 'asc' or 'desc'"
Any exception	500	"Failed to get credit transactions"

Mermaid Diagram

flowchart TD
    A[GET /credits/transactions] --> B[require_admin]
    B -->|Not admin| B1[403]
    B -->|Admin| C[Validate direction, sort_by, sort_order]
    C -->|Invalid| C1[400 validation error]
    C -->|Valid| D[get_all_transactions with limit+1]
    D --> E[Build Supabase query with filters]
    E --> F{min/max amount filters?}
    F -->|Yes| G[Fetch ALL rows]
    G --> H[Filter client-side by abs amount]
    H --> I[Apply offset + limit pagination]
    F -->|No| J[DB-side .range pagination]
    I --> K[Determine has_more]
    J --> K
    K --> L[Trim to limit]
    L --> M[Format transaction dicts]
    M --> N[Return response with pagination]
    D -->|Exception| O[500 Failed to get transactions]

Issue: #1729

API Endpoint Documentation: POST /credits/add

Overview

Handler: add_credits_endpoint() in src/routes/credits.py (line 189) Tags: ["credits", "admin"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Request: `CreditAddRequest`

Field	Type	Default	Validation
`user_id`	`int`	required	Target user ID
`amount`	`float`	required	`gt=0` (must be positive)
`reason`	`str`	required	`min_length=10`
`description`	`str`	`"Admin credit addition"`
`metadata`	`dict[str, Any] \| None`	`None`	Optional

Response: `CreditResponse`

Field	Type	Default
`status`	`str`	-
`message`	`str`	-
`user_id`	`int`	-
`previous_balance`	`float`	-
`new_balance`	`float`	-
`amount_changed`	`float`	-
`transaction_id`	`int \| None`	`None`
`timestamp`	`str`	-

Dependency Trace (3+ levels deep)

add_credits_endpoint(request, admin_user)
├── Depends(require_admin)                         # (admin auth chain)
├── _validate_admin_credit_grant(amount, admin)    # src/routes/credits.py:134
│   ├── Check amount <= Config.ADMIN_MAX_CREDIT_GRANT (default $1000)
│   │   └── If exceeded → 400
│   ├── get_admin_daily_grant_total(admin_id)      # src/db/credit_transactions.py:672
│   │   ├── get_supabase_client()
│   │   └── SELECT amount FROM credit_transactions
│   │       WHERE transaction_type = 'admin_credit'
│   │       AND created_by = 'admin:{id}'
│   │       AND created_at >= 24h_ago
│   │       AND amount > 0
│   │   └── Sum amounts; on error → return inf (fail closed)
│   └── Check daily_total + amount <= ADMIN_DAILY_GRANT_LIMIT (default $5000)
│       └── If exceeded → 400
├── get_supabase_client()
├── SELECT id, credits FROM users WHERE id = user_id
│   └── If not found → 404
├── Calculate balance_after = balance_before + amount
├── UPDATE users SET credits = balance_after, updated_at = now()
│     WHERE id = user_id
│   └── If no data returned → 500
├── log_credit_transaction(...)                    # src/db/credit_transactions.py:68
│   ├── execute_with_retry(do_insert, max_retries=2)
│   │   ├── get_supabase_client()
│   │   └── INSERT INTO credit_transactions
│   │       {user_id, amount, transaction_type='admin_credit',
│   │        description, balance_before, balance_after,
│   │        metadata={...reason, admin_user_id, admin_username},
│   │        created_by='admin:{id}', created_at}
│   └── On connection error → refresh_supabase_client() and retry
│   └── On final failure → capture_database_error (Sentry)
└── Return CreditResponse

Supabase Queries

Step	Operation	Table	Columns	Filters
Safety check	SELECT	`credit_transactions`	`amount`	`transaction_type='admin_credit'`, `created_by='admin:{id}'`, `created_at >= 24h_ago`, `amount > 0`
Get user	SELECT	`users`	`id, credits`	`.eq("id", user_id)`
Update balance	UPDATE	`users`	`credits, updated_at`	`.eq("id", user_id)`
Audit trail	INSERT	`credit_transactions`	All fields	With retry logic

Redis Operations

None directly. execute_with_retry may trigger refresh_supabase_client() which resets the HTTP connection pool.

Prometheus Metrics

None emitted directly. The capture_database_error Sentry call is the observability hook.

Admin Safety Controls

Control	Config Var	Default	Behavior
Per-transaction cap	`ADMIN_MAX_CREDIT_GRANT`	`$1000`	400 if exceeded
24-hour rolling limit	`ADMIN_DAILY_GRANT_LIMIT`	`$5000`	400 if cumulative exceeds
Audit trail	-	Always	Transaction logged with admin ID, reason
Fail-closed	-	On query error	`get_admin_daily_grant_total` returns `inf`

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Pydantic validation (amount<=0, reason<10 chars)	422	Automatic
Amount exceeds per-transaction cap	400	Detailed message with limit info
Would exceed daily grant limit	400	Detailed message with remaining budget
User not found	404	"User {id} not found"
Balance update fails	500	"Failed to update user credits"
Transaction log fails	Continues	Returns `transaction_id: None`
Any other exception	500	"Failed to add credits"

Mermaid Diagram

flowchart TD
    A[POST /credits/add] --> B[Pydantic validation]
    B -->|Invalid| B1[422]
    B -->|Valid| C[require_admin]
    C -->|Not admin| C1[403]
    C -->|Admin| D[_validate_admin_credit_grant]
    D --> E{Amount <= max single grant?}
    E -->|No| E1[400 Exceeds per-transaction cap]
    E -->|Yes| F[Query 24h admin grant total]
    F --> G{daily_total + amount <= daily limit?}
    G -->|No| G1[400 Exceeds daily limit]
    G -->|Yes| H[SELECT user by ID]
    H -->|Not found| H1[404]
    H -->|Found| I[Calculate new balance]
    I --> J[UPDATE users credits]
    J -->|Fails| J1[500]
    J -->|Success| K[log_credit_transaction with retry]
    K --> L[Return CreditResponse]

Issue: #1730

API Endpoint Documentation: POST /credits/adjust

Overview

Handler: adjust_credits_endpoint() in src/routes/credits.py (line 279) Tags: ["credits", "admin"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Request: `CreditAdjustRequest`

Field	Type	Default	Validation
`user_id`	`int`	required	Target user ID
`amount`	`float`	required	Can be positive (add) or negative (remove)
`description`	`str`	`"Admin credit adjustment"`
`reason`	`str`	required	`min_length=10`
`metadata`	`dict[str, Any] \| None`	`None`	Optional

Response: `CreditResponse` (same as #1729)

Dependency Trace (3+ levels deep)

adjust_credits_endpoint(request, admin_user)
├── Depends(require_admin)
├── If amount > 0:
│   └── _validate_admin_credit_grant(amount, admin)  # Same safety checks as /add
│       ├── Per-transaction cap check
│       └── 24h rolling limit check
├── get_supabase_client()
├── SELECT id, credits FROM users WHERE id = user_id
│   └── If not found → 404
├── Calculate balance_after = balance_before + amount
│   └── If balance_after < 0 → 400 "negative balance"
├── UPDATE users SET credits = balance_after, updated_at = now()
├── Determine transaction_type:
│   ├── amount > 0 → TransactionType.ADMIN_CREDIT
│   └── amount <= 0 → TransactionType.ADMIN_DEBIT
├── log_credit_transaction(...)
│   └── INSERT INTO credit_transactions (with retry)
└── Return CreditResponse

Supabase Queries

Step	Operation	Table	Columns	Filters
Safety (positive only)	SELECT	`credit_transactions`	`amount`	admin daily grants
Get user	SELECT	`users`	`id, credits`	`.eq("id", user_id)`
Update balance	UPDATE	`users`	`credits, updated_at`	`.eq("id", user_id)`
Audit trail	INSERT	`credit_transactions`	All fields	With retry

Redis Operations

None.

Prometheus Metrics

None.

Key Differences from POST /credits/add

Amount can be negative -- allows credit removal
Negative balance protection -- raises 400 if adjustment would go below 0
Transaction type varies -- admin_credit for positive, admin_debit for negative
Safety controls only for positive -- _validate_admin_credit_grant skipped for debits

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Pydantic validation (reason < 10 chars)	422	Automatic
Positive amount exceeds cap/daily limit	400	Detailed limit message
User not found	404	"User {id} not found"
Would result in negative balance	400	Shows current balance + adjustment
Balance update fails	500	"Failed to update user credits"
Any other exception	500	"Failed to adjust credits"

Mermaid Diagram

flowchart TD
    A[POST /credits/adjust] --> B[Pydantic validation]
    B -->|Invalid| B1[422]
    B -->|Valid| C[require_admin]
    C -->|Not admin| C1[403]
    C -->|Admin| D{amount > 0?}
    D -->|Yes| E[_validate_admin_credit_grant]
    E -->|Exceeds cap| E1[400]
    E -->|OK| F[SELECT user]
    D -->|No/Zero| F
    F -->|Not found| F1[404]
    F -->|Found| G[Calculate new balance]
    G --> H{balance_after < 0?}
    H -->|Yes| H1[400 Negative balance]
    H -->|No| I[UPDATE users credits]
    I -->|Fails| I1[500]
    I -->|Success| J{amount > 0?}
    J -->|Yes| K[type = ADMIN_CREDIT]
    J -->|No| L[type = ADMIN_DEBIT]
    K --> M[log_credit_transaction]
    L --> M
    M --> N[Return CreditResponse]

Issue: #1731

API Endpoint Documentation: POST /credits/bulk-add

Overview

Handler: bulk_add_credits_endpoint() in src/routes/credits.py (line 385) Tags: ["credits", "admin"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Request: `BulkCreditAddRequest`

Field	Type	Default	Validation
`user_ids`	`list[int]`	required	`min_length=1, max_length=100`
`amount`	`float`	required	`gt=0`
`reason`	`str`	required	`min_length=10`
`description`	`str`	`"Bulk credit addition"`
`metadata`	`dict[str, Any] \| None`	`None`	Optional

Response: `BulkCreditResponse`

Field	Type	Description
`status`	`str`	"success", "partial", or "failed"
`message`	`str`	Summary message
`total_users`	`int`	Count of unique users processed
`successful`	`int`	Users successfully credited
`failed`	`int`	Users that failed
`amount_per_user`	`float`	Amount per user
`total_credits_added`	`float`	amount * successful
`results`	`list[dict]`	Per-user result details
`timestamp`	`str`	ISO timestamp

Dependency Trace (3+ levels deep)

bulk_add_credits_endpoint(request, admin_user)
├── Depends(require_admin)
├── Deduplicate user_ids → unique_user_ids
├── _validate_admin_credit_grant(amount, admin,
│     is_bulk=True, bulk_user_count=len(unique))     # Safety controls
│   ├── Per-transaction cap: amount <= ADMIN_MAX_CREDIT_GRANT
│   └── Daily limit: daily_total + (amount * user_count) <= ADMIN_DAILY_GRANT_LIMIT
├── get_supabase_client()
├── Batch fetch: SELECT id, credits, username FROM users
│     WHERE id IN (unique_user_ids)                   # Single query for all users
├── For each unique user_id:
│   ├── Look up user from batch results
│   ├── If not found → record failure, continue
│   ├── Calculate balance_after
│   ├── UPDATE users SET credits, updated_at WHERE id
│   │   └── If fails → record failure, continue
│   ├── log_credit_transaction(type=ADMIN_CREDIT)
│   │   └── metadata includes bulk_operation=True
│   └── Record success with details
├── Determine status:
│   ├── failed=0 → "success"
│   ├── successful>0 && failed>0 → "partial"
│   └── successful=0 → "failed"
└── Return BulkCreditResponse

Supabase Queries

Step	Operation	Table	Columns	Filters
Safety	SELECT	`credit_transactions`	`amount`	Admin daily grant query
Batch fetch	SELECT	`users`	`id, credits, username`	`.in_("id", unique_user_ids)`
Per-user update	UPDATE	`users`	`credits, updated_at`	`.eq("id", user_id)` — N queries
Per-user audit	INSERT	`credit_transactions`	All fields	N queries with retry

Performance Note: The batch SELECT is efficient (single query), but updates and transaction logs are per-user (N+N queries for N users, max 100).

Redis Operations

None.

Prometheus Metrics

None.

Admin Safety Controls

Same as /credits/add but with bulk awareness:

Per-transaction cap applies to the amount (not total)
Daily limit check uses amount * unique_user_count as the total grant amount

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Pydantic: empty user_ids, >100, amount<=0, reason<10	422	Automatic
Amount exceeds per-transaction cap	400	Detailed message
Total grant exceeds daily limit	400	Detailed message with remaining budget
Individual user not found	Continues	Recorded in results as "failed"
Individual update fails	Continues	Recorded in results as "failed"
Individual exception	Continues	Logged, recorded as "failed"
Any outer exception	500	"Failed to perform bulk credit addition"

Mermaid Diagram

flowchart TD
    A[POST /credits/bulk-add] --> B[Pydantic validation]
    B -->|Invalid| B1[422]
    B -->|Valid| C[require_admin]
    C -->|Not admin| C1[403]
    C -->|Admin| D[Deduplicate user_ids]
    D --> E[_validate_admin_credit_grant with bulk=True]
    E -->|Exceeds limits| E1[400]
    E -->|OK| F[Batch SELECT users by IDs]
    F --> G[For each unique user]
    G --> H{User found in batch?}
    H -->|No| I[Record failure, continue]
    H -->|Yes| J[Calculate new balance]
    J --> K[UPDATE user credits]
    K -->|Fails| L[Record failure, continue]
    K -->|Success| M[log_credit_transaction]
    M --> N[Record success]
    I --> O{More users?}
    L --> O
    N --> O
    O -->|Yes| G
    O -->|No| P[Determine overall status]
    P --> Q[Return BulkCreditResponse]

Issue: #1732

API Endpoint Documentation: POST /credits/refund

Overview

Handler: refund_credits_endpoint() in src/routes/credits.py (line 536) Tags: ["credits", "admin"] Authentication: Required - require_admin (admin role)

Pydantic Schemas

Request: `CreditRefundRequest`

Field	Type	Default	Validation
`user_id`	`int`	required	Target user ID
`amount`	`float`	required	`gt=0` (must be positive)
`original_transaction_id`	`int \| None`	`None`	Optional reference
`reason`	`str`	`"Refund"`	Reason for refund
`metadata`	`dict[str, Any] \| None`	`None`	Optional

Response: `CreditResponse` (same as #1729)

Dependency Trace (3+ levels deep)

refund_credits_endpoint(request, admin_user)
├── Depends(require_admin)
├── get_supabase_client()
├── SELECT id, credits FROM users WHERE id = user_id
│   └── If not found → 404
├── Calculate balance_after = balance_before + amount
├── UPDATE users SET credits = balance_after, updated_at = now()
│     WHERE id = user_id
│   └── If fails → 500
├── log_credit_transaction(                       # src/db/credit_transactions.py:68
│     user_id, amount, type=REFUND,
│     description="Refund: {reason}",
│     metadata={reason, original_transaction_id,
│               admin_user_id, admin_username},
│     created_by="admin:{id}")
│   └── execute_with_retry → INSERT INTO credit_transactions
└── Return CreditResponse

Supabase Queries

Step	Operation	Table	Columns	Filters
Get user	SELECT	`users`	`id, credits`	`.eq("id", user_id)`
Update balance	UPDATE	`users`	`credits, updated_at`	`.eq("id", user_id)`
Audit trail	INSERT	`credit_transactions`	All fields	With retry logic

Redis Operations

None.

Prometheus Metrics

None.

Key Differences from POST /credits/add

No admin safety controls -- _validate_admin_credit_grant is NOT called for refunds
Transaction type is REFUND -- not ADMIN_CREDIT
Description prefixed with "Refund: "
Tracks original_transaction_id in metadata for linking to original charge
reason field has a default ("Refund") -- not strictly required like /add

Error Handling

Error Path	Status Code	Detail
Auth/admin failures	401/402/403/404/429	Various
Pydantic validation (amount<=0)	422	Automatic
User not found	404	"User {id} not found"
Balance update fails	500	"Failed to update user credits"
Transaction log fails	Continues	Returns `transaction_id: None`
Any other exception	500	"Failed to refund credits"

Mermaid Diagram

flowchart TD
    A[POST /credits/refund] --> B[Pydantic validation]
    B -->|Invalid| B1[422]
    B -->|Valid| C[require_admin]
    C -->|Not admin| C1[403]
    C -->|Admin| D{try block}
    D --> E[SELECT user by ID]
    E -->|Not found| E1[404]
    E -->|Found| F[Calculate balance_after = before + amount]
    F --> G[UPDATE users credits]
    G -->|Fails| G1[500]
    G -->|Success| H[log_credit_transaction type=REFUND]
    H --> I[Return CreditResponse]
    D -->|HTTPException| J[Re-raise]
    D -->|Other| K[500 Failed to refund credits]

Note: Unlike /credits/add, refunds bypass the admin grant safety controls (ADMIN_MAX_CREDIT_GRANT and ADMIN_DAILY_GRANT_LIMIT). This is by design -- refunds reverse existing charges.

Diagnostics

2 endpoints

Issue: #1734

API Endpoint Documentation: GET /api/diagnostics/concurrency

Handler: `get_concurrency_stats()` in `src/routes/diagnostics.py`

1. Overview

Returns real-time concurrency gate statistics including active requests, queued requests, utilization percentages, and overall health status. Designed for diagnosing 503 Service Unavailable errors caused by server capacity exhaustion.

Router: APIRouter(prefix="/api/diagnostics", tags=["diagnostics"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

No query parameters, path parameters, or request body.

3. Response

Success (200)

{
  "active_requests": 5,
  "queued_requests": 2,
  "concurrency_limit": 20,
  "queue_size_limit": 50,
  "queue_timeout_seconds": 10.0,
  "utilization_percent": 25.0,
  "queue_utilization_percent": 4.0,
  "status": "healthy",
  "available_slots": 15,
  "available_queue_slots": 48
}

Error (200 with error payload - no HTTPException raised)

{
  "error": "error message string",
  "status": "unknown",
  "concurrency_limit": 20,
  "queue_size_limit": 50
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_concurrency_stats() in src/routes/diagnostics.py (line 19-90)

Level 2: Imported Dependencies

concurrency_active / concurrency_queued from src/middleware/concurrency_middleware.py (lazy import at line 44-47)
Config.CONCURRENCY_LIMIT, Config.CONCURRENCY_QUEUE_SIZE, Config.CONCURRENCY_QUEUE_TIMEOUT from src/config/config.py

Level 3: Prometheus Metrics (concurrency_middleware.py)

concurrency_active = Gauge("concurrency_active_requests", "Number of requests currently being processed") (line 27-30)
concurrency_queued = Gauge("concurrency_queued_requests", "Number of requests waiting in the admission queue") (line 31-34)
Values read via ._value._value (internal prometheus_client Gauge value access)

Level 4: Config Values (config.py)

CONCURRENCY_LIMIT = int(os.environ.get("CONCURRENCY_LIMIT", "20")) (line 437)
CONCURRENCY_QUEUE_SIZE = int(os.environ.get("CONCURRENCY_QUEUE_SIZE", "50")) (line 438)
CONCURRENCY_QUEUE_TIMEOUT = float(os.environ.get("CONCURRENCY_QUEUE_TIMEOUT", "10.0")) (line 439)

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics Read

Metric Name	Type	Labels	Description
`concurrency_active_requests`	Gauge	none	Current requests being processed
`concurrency_queued_requests`	Gauge	none	Current requests waiting in queue

Note: This endpoint reads these metrics. They are written by ConcurrencyMiddleware in src/middleware/concurrency_middleware.py.

Additionally, the middleware defines:

Metric Name	Type	Labels	Description
`concurrency_rejected_total`	Counter	`reason` (queue_full, queue_timeout)	Total rejected requests

8. Pydantic Schemas

None. Return type is dict[str, Any].

9. Middleware Effects

Standard middleware pipeline applies (sentry, observability, timeout, security, gzip, trace)
ConcurrencyMiddleware applies unless path is in CONCURRENCY_EXEMPT_PATHS (/health, /metrics, /ready). /api/diagnostics/concurrency is NOT exempt, so it is subject to concurrency gating itself.

10. Error Handling

Error	Status	Condition
Generic Exception caught	200 (degraded)	Any exception in try block returns `{"error": str(e), "status": "unknown", ...}`

No HTTPException is ever raised. All errors are caught and returned in the response body.

11. Status Determination Logic

Condition	Status
`utilization >= 90%` OR `queue_utilization >= 80%`	`"critical"`
`utilization >= 70%` OR `queue_utilization >= 60%`	`"warning"`
Otherwise	`"healthy"`

12. Mermaid Diagram

flowchart TD
    A[GET /api/diagnostics/concurrency] --> B{Try block}
    B --> C[Import concurrency_active, concurrency_queued from middleware]
    C --> D[Read Prometheus Gauge internal values]
    D --> E[Calculate utilization_percent = active/CONCURRENCY_LIMIT * 100]
    E --> F[Calculate queue_utilization_percent = queued/CONCURRENCY_QUEUE_SIZE * 100]
    F --> G{utilization >= 90 OR queue_util >= 80?}
    G -->|Yes| H[status = critical]
    G -->|No| I{utilization >= 70 OR queue_util >= 60?}
    I -->|Yes| J[status = warning]
    I -->|No| K[status = healthy]
    H --> L[Return full stats dict with status]
    J --> L
    K --> L
    B -->|Exception| M[Log error]
    M --> N[Return error dict with status=unknown]

13. Complete Dependency Map

get_concurrency_stats()
├── src/middleware/concurrency_middleware.py
│   ├── concurrency_active (Prometheus Gauge)
│   └── concurrency_queued (Prometheus Gauge)
├── src/config/config.py (Config)
│   ├── CONCURRENCY_LIMIT (env: CONCURRENCY_LIMIT, default: 20)
│   ├── CONCURRENCY_QUEUE_SIZE (env: CONCURRENCY_QUEUE_SIZE, default: 50)
│   └── CONCURRENCY_QUEUE_TIMEOUT (env: CONCURRENCY_QUEUE_TIMEOUT, default: 10.0)
└── logging (stdlib)

Issue: #1735

API Endpoint Documentation: GET /api/diagnostics/provider-timing

Handler: `get_provider_timing_summary()` in `src/routes/diagnostics.py`

1. Overview

Returns a summary of provider response times from Prometheus metrics, identifying slow providers (>30s response times) that may be contributing to concurrency slot blocking and 503 errors.

Router: APIRouter(prefix="/api/diagnostics", tags=["diagnostics"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict[str, Any]

2. Request

No query parameters, path parameters, or request body.

3. Response

Success (200)

{
  "metrics_available": true,
  "slow_request_counts": {
    "openrouter/gpt-4": {
      "slow": 5,
      "very_slow": 2
    }
  },
  "note": "Use Prometheus/Grafana for detailed timing histograms. Query: provider_response_duration_seconds",
  "thresholds": {
    "slow": "30-45 seconds",
    "very_slow": ">45 seconds"
  }
}

Metrics Unavailable (200)

{
  "metrics_available": false,
  "error": "error message",
  "note": "Provider timing metrics are exposed via Prometheus at /metrics endpoint"
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_provider_timing_summary() in src/routes/diagnostics.py (line 93-141)

Level 2: Imported Dependencies

provider_slow_requests_total from src/services/prometheus_metrics.py (lazy import at line 107-109)

Level 3: Prometheus Metric Definition (prometheus_metrics.py)

provider_slow_requests_total = Counter("provider_slow_requests_total", "Total slow provider requests (>30s) by severity level", ["provider", "model", "severity"]) (line 777-781)
Labels: provider, model, severity (values: slow for 30-45s, very_slow for >45s)

Level 4: Metric Collection

Uses provider_slow_requests_total.collect()[0].samples to iterate all recorded samples
Each sample has .labels dict and .value float
Groups by {provider}/{model} key with severity breakdown

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics Read

Metric Name	Type	Labels	Description
`provider_slow_requests_total`	Counter	`provider`, `model`, `severity`	Total slow provider requests (>30s)

Severity label values:

slow: 30-45 seconds response time
very_slow: >45 seconds response time

This metric is written by provider client modules elsewhere in the codebase and read by this endpoint.

8. Pydantic Schemas

None. Return type is dict[str, Any].

9. Middleware Effects

Standard middleware pipeline applies (sentry, observability, timeout, security, gzip, trace)
Subject to ConcurrencyMiddleware (not in exempt paths)

10. Error Handling

Error	Status	Condition
Generic Exception caught	200 (degraded)	Any exception returns `{"metrics_available": false, "error": str(e), ...}`

No HTTPException is raised. Errors logged at WARNING level via logger.warning().

11. Data Processing Logic

Import provider_slow_requests_total Counter from prometheus_metrics
Call .collect()[0].samples to get all recorded samples
For each sample with count > 0:
- Extract provider, model, severity labels
- Group by "{provider}/{model}" key
- Store severity counts as {severity: int(count)}
Return grouped counts with threshold documentation

12. Mermaid Diagram

flowchart TD
    A[GET /api/diagnostics/provider-timing] --> B{Try block}
    B --> C[Import provider_slow_requests_total from prometheus_metrics]
    C --> D[Collect all metric samples]
    D --> E[Initialize slow_counts dict]
    E --> F{For each sample}
    F --> G[Extract provider, model, severity labels]
    G --> H{count > 0?}
    H -->|Yes| I[Group by provider/model key]
    I --> J[Store severity count]
    J --> F
    H -->|No| F
    F -->|Done| K[Return metrics_available=true with slow_request_counts]
    B -->|Exception| L[Log warning]
    L --> M[Return metrics_available=false with error]

13. Complete Dependency Map

get_provider_timing_summary()
├── src/services/prometheus_metrics.py
│   └── provider_slow_requests_total (Counter with labels: provider, model, severity)
│       └── prometheus_client.Counter
└── logging (stdlib)

Error Monitoring

12 endpoints

Issue: #1746

API Endpoint Documentation: GET /error-monitor/autonomous/status

Handler: `autonomous_monitor_status()` in `src/routes/error_monitor.py`

1. Overview

Returns the current status of the autonomous error monitoring background service, including whether it is enabled, running, scan interval, last scan time, and error counts.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

No parameters.

3. Response

Success (200)

{
  "status": "ok",
  "monitor": {
    "enabled": true,
    "running": true,
    "auto_fix_enabled": true,
    "scan_interval": 300,
    "last_scan": "2026-03-04T11:55:00+00:00",
    "errors_since_last_fix": 3,
    "total_patterns": 15
  }
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

autonomous_monitor_status() in src/routes/error_monitor.py (line 71-83)

Level 2: Dependencies

get_autonomous_monitor() from src/services/autonomous_monitor.py (sync singleton)
monitor.get_status() (async method)

Level 3: get_autonomous_monitor() (autonomous_monitor.py line 250-255)

Returns or creates AutonomousMonitor singleton

Level 3: AutonomousMonitor.get_status() (autonomous_monitor.py line 233-243)

Returns dict built from instance attributes:

{
    "enabled": self.enabled,
    "running": self.is_running,
    "auto_fix_enabled": self.auto_fix_enabled,
    "scan_interval": self.scan_interval,
    "last_scan": self.last_scan.isoformat() if self.last_scan else None,
    "errors_since_last_fix": self.errors_since_last_fix,
    "total_patterns": len(self.error_monitor.error_patterns) if self.error_monitor else 0,
}

Level 4: ErrorMonitor.error_patterns

In-memory dict of {pattern_key: ErrorPattern} objects
Only populated if the monitor has been scanning

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Pydantic Schemas

None.

9. Middleware Effects

Standard pipeline + ConcurrencyMiddleware. No auth required.

10. Error Handling

Exception	Status	Handler
Generic `Exception`	500	Caught at line 81-83, raises `HTTPException(500, detail=str(e))`

11. Mermaid Diagram

flowchart TD
    A[GET /error-monitor/autonomous/status] --> B[get_autonomous_monitor singleton]
    B --> C[await monitor.get_status]
    C --> D[Read instance attributes: enabled, running, auto_fix, interval, last_scan, errors_count]
    D --> E[Read error_monitor.error_patterns count if initialized]
    E --> F["Return {status: ok, monitor: status_dict}"]
    B -->|Exception| G[500 HTTPException]

12. Complete Dependency Map

autonomous_monitor_status()
├── src/services/autonomous_monitor.py::get_autonomous_monitor() [sync singleton]
│   └── AutonomousMonitor.get_status()
│       ├── Instance attributes (enabled, is_running, auto_fix_enabled, scan_interval, last_scan, errors_since_last_fix)
│       └── self.error_monitor.error_patterns (len) if error_monitor initialized
└── logging (stdlib)

Issue: #1747

API Endpoint Documentation: GET /error-monitor/errors/recent

Handler: `get_recent_errors()` in `src/routes/error_monitor.py`

1. Overview

Fetches recent errors from Grafana Loki, analyzes them into structured error patterns with classification, severity, fixability assessment, and grouping of similar errors. Returns analyzed and deduplicated error patterns.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

Query Parameters

Parameter	Type	Default	Validation	Description
`hours`	`int`	`1`	`ge=1, le=24`	Lookback period in hours
`limit`	`int`	`100`	`ge=1, le=1000`	Max raw errors to fetch from Loki

3. Response

Success (200)

{
  "count": 5,
  "hours": 1,
  "errors": [
    {
      "error_type": "ConnectionError",
      "message": "Provider timeout after 30s",
      "category": "provider_error",
      "severity": "high",
      "file": "src/services/openrouter_client.py",
      "line": 150,
      "function": "send_request",
      "stack_trace": "...",
      "timestamp": "2026-03-04T11:50:00+00:00",
      "count": 3,
      "last_seen": "2026-03-04T11:58:00+00:00",
      "examples": ["msg1", "msg2"],
      "fixable": true,
      "suggested_fix": "Add retry logic with exponential backoff for provider calls"
    }
  ]
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_recent_errors() in src/routes/error_monitor.py (line 86-104)

Level 2: Dependencies

get_error_monitor() -> ErrorMonitor singleton
monitor.fetch_recent_errors(hours, limit) -> fetches from Loki
monitor.analyze_errors(raw_errors) -> classifies and groups

Level 3: ErrorMonitor.fetch_recent_errors() (error_monitor.py line 111-166)

Checks self.loki_enabled and self.loki_query_url and self.session
Makes HTTP GET to {loki_base_url}/loki/api/v1/query_range
Query: {level="ERROR"}
Params: query, limit, direction=backward
Uses httpx.AsyncClient (timeout=10.0)
Parses Loki response streams, extracts log entries (JSON or plain text)

Level 3: ErrorMonitor.analyze_errors() (error_monitor.py line 323-339)

For each raw error:

extract_error_details() -> creates ErrorPattern with parsed file/line/function from stack trace
classify_error() -> determines ErrorCategory and ErrorSeverity based on message content
determine_fixability() -> sets fixable flag and suggested_fix based on category
group_similar_errors() -> groups by {category}:{message[:50]} key, merges counts

Level 4: classify_error() logic (error_monitor.py line 168-230)

Classification rules (in order):

Pattern Match	Category	Severity
Provider names + timeout/503/504	PROVIDER_ERROR	HIGH
Provider names + 401/403	AUTH_ERROR	HIGH
Provider names (other)	PROVIDER_ERROR	MEDIUM
supabase/postgresql/database/connection pool	DATABASE_ERROR	CRITICAL
rate limit / 429	RATE_LIMIT_ERROR	MEDIUM
unauthorized/invalid api key/401	AUTH_ERROR	HIGH
timeout/deadlineexceeded	TIMEOUT_ERROR	MEDIUM
validation/invalid	VALIDATION_ERROR	LOW
redis/cache	CACHE_ERROR	MEDIUM
stripe/resend/email/payment	EXTERNAL_SERVICE_ERROR	HIGH
(default)	INTERNAL_ERROR	MEDIUM

Level 4: determine_fixability() logic (error_monitor.py line 290-321)

Category	Fixable	Suggested Fix
RATE_LIMIT_ERROR	Yes	Implement exponential backoff and request queuing
TIMEOUT_ERROR (provider)	Yes	Add retry logic with exponential backoff for provider calls
TIMEOUT_ERROR (other)	Yes	Increase timeout threshold or add connection pooling
CACHE_ERROR	Yes	Implement cache fallback to database queries
DATABASE_ERROR (pool)	Yes	Increase connection pool size or add fallback
DATABASE_ERROR (other)	Yes	Add database connection retry logic
AUTH_ERROR (invalid key)	Yes	Rotate API keys and update configuration
AUTH_ERROR (other)	Yes	Implement token refresh logic
All others	No	None

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None directly emitted.

8. External API Calls

Service	Method	URL	Auth	Timeout
Grafana Loki	GET	`{LOKI_QUERY_URL_base}/loki/api/v1/query_range`	None (configured in ErrorMonitor)	10s

9. Pydantic Schemas

ErrorPattern dataclass (error_monitor.py line 50-82):

Field	Type	Default	Description
`error_type`	`str`	required	Exception type name
`message`	`str`	required	Error message
`category`	`ErrorCategory`	required	Classified category enum
`severity`	`ErrorSeverity`	required	Classified severity enum
`file`	`str \| None`	required	Source file path
`line`	`int \| None`	required	Line number
`function`	`str \| None`	required	Function name
`stack_trace`	`str \| None`	required	Full stack trace
`timestamp`	`datetime`	required	When error occurred
`count`	`int`	`1`	Occurrence count
`last_seen`	`datetime \| None`	`None` (set to timestamp)	Last occurrence
`examples`	`list[str]`	`[]`	Example messages
`fixable`	`bool`	`False`	Whether auto-fixable
`suggested_fix`	`str \| None`	`None`	Fix suggestion

10. Error Handling

Exception	Status	Handler
Generic `Exception`	500	Caught at line 102-104, raises `HTTPException(500, detail=str(e))`
Loki fetch failures	200	`fetch_recent_errors()` returns `[]` on any error, resulting in `{"count": 0, "errors": []}`

11. Mermaid Diagram

flowchart TD
    A["GET /error-monitor/errors/recent?hours=1&limit=100"] --> B[get_error_monitor singleton]
    B --> C[fetch_recent_errors from Loki]
    C --> D{Loki enabled?}
    D -->|No| E["Return empty list"]
    D -->|Yes| F["HTTP GET Loki query_range {level='ERROR'}"]
    F --> G[Parse Loki response streams]
    G --> H[analyze_errors]
    H --> I[For each raw error: extract_error_details]
    I --> J[classify_error - determine category + severity]
    J --> K[determine_fixability - set fixable + suggested_fix]
    K --> L[group_similar_errors by category:message prefix]
    L --> M["Return {count, hours, errors: patterns.to_dict()}"]
    A -->|Exception| N[500 HTTPException]

12. Complete Dependency Map

get_recent_errors()
├── src/services/error_monitor.py::get_error_monitor() [async singleton]
│   └── ErrorMonitor
│       ├── fetch_recent_errors(hours, limit) -> Loki HTTP GET
│       │   ├── Config.LOKI_ENABLED
│       │   ├── Config.LOKI_QUERY_URL
│       │   └── httpx.AsyncClient (timeout=10s)
│       └── analyze_errors(raw_errors)
│           ├── extract_error_details() -> ErrorPattern
│           ├── classify_error() -> (ErrorCategory, ErrorSeverity)
│           ├── determine_fixability() -> (bool, str|None)
│           └── group_similar_errors() -> deduplicated dict
└── logging (stdlib)

Issue: #1748

API Endpoint Documentation: GET /error-monitor/errors/critical

Handler: `get_critical_errors()` in `src/routes/error_monitor.py`

1. Overview

Fetches recent errors from Loki and filters to only critical and high-severity errors, sorted by occurrence count descending. Uses the same Loki fetch + analysis pipeline as /errors/recent but with a severity filter.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

Query Parameters

Parameter	Type	Default	Validation	Description
`hours`	`int`	`1`	`ge=1, le=24`	Lookback period in hours

3. Response

Success (200)

{
  "count": 3,
  "hours": 1,
  "critical_errors": [
    {
      "error_type": "ConnectionError",
      "message": "Supabase connection pool exhausted",
      "category": "database_error",
      "severity": "critical",
      "count": 15,
      "fixable": true,
      "suggested_fix": "Increase connection pool size or add connection pooling fallback",
      ...
    }
  ]
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_critical_errors() in src/routes/error_monitor.py (line 107-123)

Level 2: Dependencies

get_error_monitor() -> ErrorMonitor singleton
monitor.get_critical_errors(hours=hours)

Level 3: ErrorMonitor.get_critical_errors() (error_monitor.py line 341-351)

Calls self.fetch_recent_errors(hours=hours) -> Loki query
Calls self.analyze_errors(raw_errors) -> classify + group
Filters: severity in [ErrorSeverity.CRITICAL, ErrorSeverity.HIGH]
Sorts by pattern.count descending

Level 4+: Same as #1747

fetch_recent_errors() -> Loki HTTP GET
analyze_errors() -> extract, classify, determine fixability, group

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. External API Calls

Same as #1747: HTTP GET to Grafana Loki /loki/api/v1/query_range with {level="ERROR"}.

9. Severity Filter

Only returns patterns where:

severity in [ErrorSeverity.CRITICAL, ErrorSeverity.HIGH]

Categories that map to CRITICAL/HIGH:

CRITICAL: Database errors (supabase, postgresql, connection pool)
HIGH: Provider errors with timeout/503/504, auth errors (401/403/unauthorized), external service errors (stripe, resend)

10. Error Handling

Exception	Status	Handler
Generic `Exception`	500	Raises `HTTPException(500, detail=str(e))`

11. Mermaid Diagram

flowchart TD
    A["GET /error-monitor/errors/critical?hours=1"] --> B[get_error_monitor singleton]
    B --> C[get_critical_errors hours=1]
    C --> D[fetch_recent_errors from Loki]
    D --> E[analyze_errors - classify + group]
    E --> F["Filter: severity in [CRITICAL, HIGH]"]
    F --> G[Sort by count descending]
    G --> H["Return {count, hours, critical_errors}"]
    A -->|Exception| I[500 HTTPException]

12. Complete Dependency Map

get_critical_errors() [route]
├── src/services/error_monitor.py::get_error_monitor()
│   └── ErrorMonitor.get_critical_errors()
│       ├── fetch_recent_errors() -> Loki HTTP GET
│       ├── analyze_errors() -> classify + group
│       └── filter severity in [CRITICAL, HIGH] + sort by count
└── logging (stdlib)

Issue: #1749

API Endpoint Documentation: GET /error-monitor/errors/fixable

Handler: `get_fixable_errors()` in `src/routes/error_monitor.py`

1. Overview

Fetches recent errors from Loki and filters to only errors that can be automatically fixed. Returns fixable errors sorted by severity then count descending. Each error includes a suggested_fix field.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

Query Parameters

Parameter	Type	Default	Validation	Description
`hours`	`int`	`1`	`ge=1, le=24`	Lookback period in hours

3. Response

Success (200)

{
  "count": 4,
  "hours": 1,
  "fixable_errors": [
    {
      "error_type": "TimeoutError",
      "message": "Provider request timeout",
      "category": "timeout_error",
      "severity": "medium",
      "fixable": true,
      "suggested_fix": "Increase timeout threshold or add connection pooling",
      "count": 8,
      ...
    }
  ]
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_fixable_errors() in src/routes/error_monitor.py (line 126-142)

Level 2: Dependencies

get_error_monitor() -> ErrorMonitor singleton
monitor.get_fixable_errors(hours=hours)

Level 3: ErrorMonitor.get_fixable_errors() (error_monitor.py line 353-359)

Calls self.fetch_recent_errors(hours=hours) -> Loki query
Calls self.analyze_errors(raw_errors) -> classify + group + determine fixability
Filters: pattern.fixable == True
Sorts by (severity.value, count) descending

Level 4: Fixability Rules (determine_fixability)

Category	Fixable	Suggested Fix
RATE_LIMIT_ERROR	Yes	Implement exponential backoff and request queuing
TIMEOUT_ERROR (provider)	Yes	Add retry logic with exponential backoff
TIMEOUT_ERROR (other)	Yes	Increase timeout or add connection pooling
CACHE_ERROR	Yes	Implement cache fallback to database queries
DATABASE_ERROR (pool)	Yes	Increase pool size or add fallback
DATABASE_ERROR (other)	Yes	Add database connection retry logic
AUTH_ERROR (invalid key)	Yes	Rotate API keys and update configuration
AUTH_ERROR (other)	Yes	Implement token refresh logic
PROVIDER_ERROR	No	-
VALIDATION_ERROR	No	-
EXTERNAL_SERVICE_ERROR	No	-
INTERNAL_ERROR	No	-
UNKNOWN	No	-

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. External API Calls

Same as #1747: HTTP GET to Grafana Loki.

9. Error Handling

Exception	Status	Handler
Generic `Exception`	500	Raises `HTTPException(500, detail=str(e))`

10. Mermaid Diagram

flowchart TD
    A["GET /error-monitor/errors/fixable?hours=1"] --> B[get_error_monitor singleton]
    B --> C[get_fixable_errors hours=1]
    C --> D[fetch_recent_errors from Loki]
    D --> E[analyze_errors - classify + group + fixability]
    E --> F["Filter: pattern.fixable == True"]
    F --> G["Sort by (severity, count) descending"]
    G --> H["Return {count, hours, fixable_errors}"]
    A -->|Exception| I[500 HTTPException]

11. Complete Dependency Map

get_fixable_errors() [route]
├── src/services/error_monitor.py::get_error_monitor()
│   └── ErrorMonitor.get_fixable_errors()
│       ├── fetch_recent_errors() -> Loki HTTP GET
│       ├── analyze_errors()
│       │   ├── classify_error()
│       │   ├── determine_fixability()
│       │   └── group_similar_errors()
│       └── filter fixable + sort by severity/count
└── logging (stdlib)

Issue: #1750

API Endpoint Documentation: GET /error-monitor/errors/patterns

Handler: `get_error_patterns()` in `src/routes/error_monitor.py`

1. Overview

Returns all error patterns currently tracked in the ErrorMonitor's in-memory store. Unlike /errors/recent, /errors/critical, and /errors/fixable which query Loki on each request, this endpoint returns patterns that have been previously stored via store_error_pattern() (from scans, continuous monitoring, or the autonomous monitor).

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

No parameters.

3. Response

Success (200)

{
  "total_patterns": 15,
  "patterns": [
    {
      "error_type": "ConnectionError",
      "message": "Provider timeout after 30s",
      "category": "provider_error",
      "severity": "high",
      "file": "src/services/openrouter_client.py",
      "line": 150,
      "function": "send_request",
      "stack_trace": "...",
      "timestamp": "2026-03-04T11:50:00+00:00",
      "count": 25,
      "last_seen": "2026-03-04T11:58:00+00:00",
      "examples": ["msg1", "msg2", "msg3"],
      "fixable": true,
      "suggested_fix": "Add retry logic..."
    }
  ]
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_error_patterns() in src/routes/error_monitor.py (line 145-158)

Level 2: Dependencies

get_error_monitor() -> ErrorMonitor singleton
monitor.error_patterns (in-memory dict {str: ErrorPattern})

Level 3: error_patterns storage

dict[str, ErrorPattern] keyed by {category.value}:{message[:50]}
Populated by store_error_pattern() method (line 361-371)
Patterns are accumulated - counts increment, last_seen is updated, examples are appended
Not persisted to database - lost on restart

Level 4: ErrorPattern.to_dict()

Converts dataclass to dict with:

timestamp -> ISO format string
last_seen -> ISO format string or None
category -> .value (string enum)
severity -> .value (string enum)

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. External API Calls

None. This endpoint reads only from in-memory state.

9. Important Note on Data Persistence

The error_patterns dictionary is in-memory only. It is populated by:

POST /error-monitor/monitor/scan -> monitor.store_error_pattern()
POST /error-monitor/monitor/start -> continuous monitoring loop
AutonomousMonitor._scan_for_errors() -> background scanning

On application restart, all tracked patterns are lost.

10. Error Handling

Exception	Status	Handler
Generic `Exception`	500	Raises `HTTPException(500, detail=str(e))`

11. Mermaid Diagram

flowchart TD
    A[GET /error-monitor/errors/patterns] --> B[get_error_monitor singleton]
    B --> C[Read monitor.error_patterns in-memory dict]
    C --> D[Convert values to list]
    D --> E["Call .to_dict() on each ErrorPattern"]
    E --> F["Return {total_patterns, patterns}"]
    A -->|Exception| G[500 HTTPException]

12. Complete Dependency Map

get_error_patterns() [route]
├── src/services/error_monitor.py::get_error_monitor()
│   └── ErrorMonitor.error_patterns (in-memory dict)
│       └── ErrorPattern.to_dict() for serialization
└── logging (stdlib)

Issue: #1751

API Endpoint Documentation: GET /error-monitor/fixes/generated

Handler: `get_generated_fixes()` in `src/routes/error_monitor.py`

1. Overview

Returns all bug fixes that have been generated by the BugFixGenerator, stored in its in-memory dictionary. Each fix includes the analysis, proposed code changes, files affected, and PR status.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

No parameters.

3. Response

Success (200)

{
  "total_fixes": 3,
  "fixes": [
    {
      "id": "uuid",
      "error_pattern_id": "provider_error:Provider timeout after 30s",
      "error_message": "Provider timeout after 30s",
      "error_category": "provider_error",
      "analysis": "Root cause analysis from Claude...",
      "proposed_fix": "Description of the fix",
      "code_changes": {
        "src/services/openrouter_client.py": "... code ..."
      },
      "files_affected": ["src/services/openrouter_client.py"],
      "severity": "high",
      "generated_at": "2026-03-04T11:00:00+00:00",
      "pr_url": "https://github.com/repo/pull/123",
      "status": "testing"
    }
  ]
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_generated_fixes() in src/routes/error_monitor.py (line 256-269)

Level 2: Dependencies

get_bug_fix_generator() from src/services/bug_fix_generator.py (async singleton)
generator.generated_fixes (in-memory dict {str: BugFix})

Level 3: get_bug_fix_generator() (bug_fix_generator.py line 655-661)

Creates singleton BugFixGenerator() on first call
Raises RuntimeError if ANTHROPIC_API_KEY is not configured
Calls initialize() -> creates httpx client, validates API key

Level 3: BugFix.to_dict() (bug_fix_generator.py line 60-75)

Converts dataclass to dict:

generated_at -> ISO format string
All other fields: direct mapping

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Data Persistence Note

generated_fixes is in-memory only. Fixes are accumulated when:

POST /error-monitor/fixes/generate-for-error -> generator.generate_fix()
POST /error-monitor/fixes/generate-batch -> generator.process_multiple_errors()
AutonomousMonitor._generate_fixes_for_critical() -> background auto-fix

On restart, all generated fixes are lost.

9. BugFix Dataclass (bug_fix_generator.py line 44-75)

Field	Type	Default	Description
`id`	`str`	required	UUID string
`error_pattern_id`	`str`	required	`{category}:{message[:50]}`
`error_message`	`str`	required	Original error message
`error_category`	`str`	required	Error category value
`analysis`	`str`	required	Claude-generated analysis
`proposed_fix`	`str`	required	Fix description
`code_changes`	`dict[str, str]`	required	`{file_path: code}`
`files_affected`	`list[str]`	required	List of file paths
`severity`	`str`	required	Severity level
`generated_at`	`datetime`	required	Generation timestamp
`pr_url`	`str \| None`	`None`	GitHub PR URL
`status`	`str`	`"pending"`	One of: pending, testing, merged, failed

10. Error Handling

Exception	Status	Handler
`RuntimeError` (ANTHROPIC_API_KEY missing)	500	`HTTPException(500, detail=str(e))`
Generic `Exception`	500	`HTTPException(500, detail=str(e))`

Note: Unlike the /health endpoint which gracefully handles a missing ANTHROPIC_API_KEY, this endpoint will return 500 if the key is not configured.

11. Mermaid Diagram

flowchart TD
    A[GET /error-monitor/fixes/generated] --> B{get_bug_fix_generator}
    B -->|RuntimeError: no API key| C[500 HTTPException]
    B -->|Success| D[Read generator.generated_fixes dict]
    D --> E[Convert BugFix values to list]
    E --> F["Call .to_dict() on each BugFix"]
    F --> G["Return {total_fixes, fixes}"]
    A -->|Exception| C

12. Complete Dependency Map

get_generated_fixes() [route]
├── src/services/bug_fix_generator.py::get_bug_fix_generator() [async singleton]
│   └── BugFixGenerator
│       ├── Config.ANTHROPIC_API_KEY (required)
│       ├── .generated_fixes (in-memory dict)
│       └── BugFix.to_dict() for serialization
└── logging (stdlib)

Issue: #1752

API Endpoint Documentation: GET /error-monitor/fixes/{fix_id}

Handler: `get_fix_details()` in `src/routes/error_monitor.py`

1. Overview

Retrieves the full details of a specific generated bug fix by its UUID. Looks up the fix in the BugFixGenerator's in-memory dictionary.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

Path Parameters

Parameter	Type	Description
`fix_id`	`str`	UUID of the generated fix

3. Response

Success (200)

{
  "fix": {
    "id": "abc12345-...",
    "error_pattern_id": "provider_error:Provider timeout",
    "error_message": "Provider timeout after 30s",
    "error_category": "provider_error",
    "analysis": "Root cause: The OpenRouter provider...",
    "proposed_fix": "Add retry logic with exponential backoff",
    "code_changes": {
      "src/services/openrouter_client.py": "import asyncio\n..."
    },
    "files_affected": ["src/services/openrouter_client.py"],
    "severity": "high",
    "generated_at": "2026-03-04T11:00:00+00:00",
    "pr_url": "https://github.com/repo/pull/123",
    "status": "testing"
  }
}

Error Responses

Status	Condition
404	Fix not found in generated_fixes dict
500	ANTHROPIC_API_KEY not configured or other error

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

get_fix_details() in src/routes/error_monitor.py (line 272-289)

Level 2: Dependencies

get_bug_fix_generator() -> BugFixGenerator singleton
generator.generated_fixes[fix_id] -> in-memory dict lookup

Level 3: get_bug_fix_generator()

Creates singleton; raises RuntimeError if ANTHROPIC_API_KEY missing
Returns BugFixGenerator instance

Level 3: generated_fixes dict

Keyed by UUID string (generated by uuid4())
Values are BugFix dataclass instances

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None.

8. Pydantic Schemas

See BugFix dataclass in issue #1751 documentation.

9. Error Handling

Exception	Status	Handler
`fix_id not in generator.generated_fixes`	404	`HTTPException(404, "Fix not found")` at line 279
`HTTPException` (404)	404	Re-raised at line 285-286
`RuntimeError` (no API key)	500	`HTTPException(500, detail=str(e))`
Generic `Exception`	500	`HTTPException(500, detail=str(e))`

10. Mermaid Diagram

flowchart TD
    A["GET /error-monitor/fixes/{fix_id}"] --> B{get_bug_fix_generator}
    B -->|RuntimeError| C[500 HTTPException]
    B -->|Success| D{fix_id in generated_fixes?}
    D -->|No| E[404 Fix not found]
    D -->|Yes| F[Get BugFix from dict]
    F --> G[Call fix.to_dict]
    G --> H["Return {fix: fix_dict}"]

11. Complete Dependency Map

get_fix_details() [route]
├── src/services/bug_fix_generator.py::get_bug_fix_generator() [async singleton]
│   └── BugFixGenerator
│       ├── Config.ANTHROPIC_API_KEY (required)
│       └── .generated_fixes[fix_id] -> BugFix.to_dict()
└── logging (stdlib)

Issue: #1753

API Endpoint Documentation: GET /error-monitor/dashboard

Handler: `error_dashboard()` in `src/routes/error_monitor.py`

1. Overview

Returns a comprehensive error monitoring dashboard with summary statistics, recent critical errors, recent fixable errors, and recently generated fixes. Aggregates data from both the ErrorMonitor (Loki-based) and BugFixGenerator (in-memory fixes) services.

Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)

2. Request

No parameters.

3. Response

Success (200)

{
  "timestamp": "2026-03-04T12:00:00+00:00",
  "summary": {
    "total_patterns": 15,
    "critical_errors": 3,
    "fixable_errors": 8,
    "generated_fixes": 5,
    "patterns_by_category": {
      "provider_error": 25,
      "database_error": 10,
      "timeout_error": 5,
      "cache_error": 3
    }
  },
  "recent_critical": [
    {
      "error_type": "...",
      "message": "...",
      "category": "database_error",
      "severity": "critical",
      ...
    }
  ],
  "recent_fixable": [
    {
      "error_type": "...",
      "fixable": true,
      "suggested_fix": "...",
      ...
    }
  ],
  "recent_fixes": [
    {
      "id": "...",
      "error_category": "...",
      "proposed_fix": "...",
      "pr_url": "...",
      "status": "testing",
      ...
    }
  ]
}

4. Dependency Trace (3+ levels deep)

Level 1: Route Handler

error_dashboard() in src/routes/error_monitor.py (line 292-331)

Level 2: Dependencies

get_error_monitor() -> ErrorMonitor singleton
get_bug_fix_generator() -> BugFixGenerator singleton
monitor.get_critical_errors(hours=1) -> Loki fetch + analysis + filter
monitor.get_fixable_errors(hours=1) -> Loki fetch + analysis + filter
monitor.error_patterns -> in-memory tracked patterns
generator.generated_fixes -> in-memory generated fixes

Level 3: get_critical_errors() (see #1748)

fetch_recent_errors(hours=1) -> Loki HTTP GET
analyze_errors() -> classify + group
Filter severity in [CRITICAL, HIGH]
Sort by count descending

Level 3: get_fixable_errors() (see #1749)

fetch_recent_errors(hours=1) -> Loki HTTP GET
analyze_errors() -> classify + group + fixability
Filter fixable == True
Sort by (severity, count) descending

Level 3: patterns_by_category calculation (line 304-307)

category_counts = {}
for pattern in monitor.error_patterns.values():
    cat = pattern.category.value
    category_counts[cat] = category_counts.get(cat, 0) + pattern.count

Counts total error occurrences per category from stored (in-memory) patterns.

Level 3: recent_fixes sorting (line 320-327)

sorted(generator.generated_fixes.values(), key=lambda x: x.generated_at, reverse=True)[:10]

Top 10 most recently generated fixes.

Note: This endpoint makes two separate Loki queries (one for critical, one for fixable), each fetching and analyzing errors independently.

5. Supabase Queries

None.

6. Redis Operations

None.

7. Prometheus Metrics

None directly emitted.

8. External API Calls

Service	Method	URL	Count per request
Grafana Loki	GET	`/loki/api/v1/query_range`	2 (critical + fixable)

Both queries use {level="ERROR"} with hours=1.

9. Response Limits

Field	Max Items
`recent_critical`	10 (sliced `[:10]`)
`recent_fixable`	10 (sliced `[:10]`)
`recent_fixes`	10 (sorted by `generated_at` desc, sliced `[:10]`)

10. Error Handling

Exception	Status	Handler
Generic `Exception`	500	Raises `HTTPException(500, detail=str(e))`

Important: If get_bug_fix_generator() raises RuntimeError (missing ANTHROPIC_API_KEY), the entire dashboard request fails with 500. This is different from /health which gracefully handles this case.

11. Mermaid Diagram

flowchart TD
    A[GET /error-monitor/dashboard] --> B[get_error_monitor singleton]
    B --> C[get_bug_fix_generator singleton]
    C -->|RuntimeError| D[500 HTTPException]
    C -->|Success| E[get_critical_errors hours=1]
    E --> F[Loki fetch + analyze + filter critical/high]
    F --> G[get_fixable_errors hours=1]
    G --> H[Loki fetch + analyze + filter fixable]
    H --> I[Build category_counts from in-memory patterns]
    I --> J[Sort generated_fixes by generated_at desc]
    J --> K[Build summary with counts]
    K --> L["Return dashboard: summary + recent_critical[:10] + recent_fixable[:10] + recent_fixes[:10]"]
    A -->|Exception| D

12. Complete Dependency Map

error_dashboard() [route]
├── src/services/error_monitor.py::get_error_monitor() [async singleton]
│   └── ErrorMonitor
│       ├── get_critical_errors(hours=1)
│       │   ├── fetch_recent_errors() -> Loki HTTP GET #1
│       │   ├── analyze_errors() -> classify + group
│       │   └── filter severity in [CRITICAL, HIGH]
│       ├── get_fixable_errors(hours=1)
│       │   ├── fetch_recent_errors() -> Loki HTTP GET #2
│       │   ├── analyze_errors() -> classify + group + fixability
│       │   └── filter fixable == True
│       └── error_patterns (in-memory dict -> category counts)
├── src/services/bug_fix_generator.py::get_bug_fix_generator() [async singleton]
│   └── BugFixGenerator
│       ├── Config.ANTHROPIC_API_KEY (required - will 500 if missing)
│       └── .generated_fixes (in-memory dict -> sorted by date, top 10)
├── datetime (stdlib)
└── logging (stdlib)

Issue: #1754

API Endpoint Documentation: POST /error-monitor/fixes/generate-for-error

Handler: `generate_fix_for_error()` in `src/routes/error_monitor.py` (lines 161-204)

1. Overview

Generates an automated bug fix for a specific error pattern identified by its error_type ID. Can optionally create a GitHub Pull Request with the fix in the background.

Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])

2. Authentication & Middleware

Authentication: NONE - This endpoint has no authentication dependency (get_api_key, get_admin_key, etc.)
Rate Limiting: Subject only to global middleware (IP-based rate limiting via security_middleware.py)
Middleware Pipeline: Request → Sentry middleware → Observability middleware → Timeout middleware → Security middleware → GZip middleware → Trace middleware → Handler

3. Request Parameters

Parameter	Type	Source	Required	Default	Validation
`error_id`	`str`	Query	Yes	N/A	FastAPI required query param
`create_pr`	`bool`	Query	No	`False`	Boolean
`background_tasks`	`BackgroundTasks`	DI	No	`BackgroundTasks()`	FastAPI injected

4. Response Schema

Success (synchronous, create_pr=False):

{
  "status": "success",
  "fix": {
    "id": "uuid",
    "error_pattern_id": "category:message_prefix",
    "error_message": "string",
    "error_category": "string",
    "analysis": "string",
    "proposed_fix": "string",
    "code_changes": {"file_path": "code"},
    "files_affected": ["file1.py"],
    "severity": "critical|high|medium|low|info",
    "generated_at": "ISO8601",
    "pr_url": null,
    "status": "pending"
  }
}

Success (background, create_pr=True):

{
  "status": "processing",
  "message": "Fix generation started in background"
}

5. Dependency Map (3+ levels deep)

generate_fix_for_error()
├── get_error_monitor() → ErrorMonitor singleton
│   ├── ErrorMonitor.__init__()
│   │   ├── Config.LOKI_ENABLED
│   │   └── Config.LOKI_QUERY_URL
│   └── ErrorMonitor.initialize()
│       └── httpx.AsyncClient(timeout=10.0)
├── get_bug_fix_generator() → BugFixGenerator singleton
│   ├── BugFixGenerator.__init__()
│   │   ├── Config.ANTHROPIC_API_KEY (required, raises RuntimeError if missing)
│   │   ├── Config.GITHUB_TOKEN (optional)
│   │   └── Config.ANTHROPIC_MODEL (default: "claude-3-5-sonnet-20241022")
│   └── BugFixGenerator.initialize()
│       ├── httpx.AsyncClient(timeout=30.0)
│       └── _validate_api_key() → POST https://api.anthropic.com/v1/messages
├── monitor.error_patterns.values() → dict iteration
├── pattern.to_dict() → dict serialization
├── generator.process_error() [if create_pr=True, background task]
│   ├── generate_fix() → see below
│   ├── create_branch_and_commit() → git subprocess calls
│   └── create_pull_request() → POST https://api.github.com/repos/{repo}/pulls
└── generator.generate_fix() [if create_pr=False, synchronous]
    ├── analyze_error()
    │   └── _make_claude_request() [with @retry: 3 attempts, exponential backoff]
    │       └── POST https://api.anthropic.com/v1/messages
    ├── _make_claude_request() [second call for fix generation]
    │   └── POST https://api.anthropic.com/v1/messages (max_tokens=2048)
    └── BugFix dataclass creation → stored in generator.generated_fixes dict

6. Supabase Queries

None - This endpoint does not interact with Supabase/PostgreSQL.

7. Redis Operations

None - This endpoint does not interact with Redis directly.

8. External API Calls

Service	Operation	URL	Details
Anthropic Claude API	POST	`https://api.anthropic.com/v1/messages`	Error analysis (max_tokens=1024)
Anthropic Claude API	POST	`https://api.anthropic.com/v1/messages`	Fix generation (max_tokens=2048)
GitHub API	POST	`https://api.github.com/repos/{repo}/pulls`	PR creation (if create_pr=True)

Retry logic: @retry decorator on _make_claude_request():

Retries on: httpx.TimeoutException, httpx.ConnectError
Wait: exponential backoff (min=2s, max=10s)
Max attempts: 3

9. Prometheus Metrics

None directly - This endpoint does not record Prometheus metrics itself.

10. Error Handling Paths

Error	Status Code	Condition
`HTTPException(404)`	404	Error pattern not found in `monitor.error_patterns`
`HTTPException(500)`	500	`generate_fix()` returns `None` (fix generation failed)
`HTTPException(500)`	500	Any unhandled exception (generic catch-all)
`RuntimeError`	500	`ANTHROPIC_API_KEY` not configured (from `get_bug_fix_generator()`)

Error re-raise pattern: except HTTPException: raise ensures 404 errors propagate correctly.

11. Mermaid Diagram

flowchart TD
    A[POST /error-monitor/fixes/generate-for-error] --> B[get_error_monitor singleton]
    B --> C[get_bug_fix_generator singleton]
    C --> D{ANTHROPIC_API_KEY configured?}
    D -->|No| E[RuntimeError 500]
    D -->|Yes| F[Search error_patterns by error_id]
    F --> G{Pattern found?}
    G -->|No| H[HTTPException 404]
    G -->|Yes| I{create_pr == True?}
    I -->|Yes| J[BackgroundTasks.add_task: process_error]
    J --> K[Return status: processing]
    I -->|No| L[generator.generate_fix synchronous]
    L --> M[analyze_error via Claude API]
    M --> N{Analysis successful?}
    N -->|No| O[Return None]
    O --> P[HTTPException 500: Failed to generate fix]
    N -->|Yes| Q[Generate fix via Claude API]
    Q --> R{Fix JSON parsed?}
    R -->|No| S[Return None → HTTPException 500]
    R -->|Yes| T[Create BugFix dataclass]
    T --> U[Store in generated_fixes dict]
    U --> V[Return status: success with fix]

12. Important Notes

Error patterns are stored in-memory only (ErrorMonitor.error_patterns dict) - they do not persist across restarts
The error_id lookup matches against pattern.to_dict().get("error_type"), which maps to ErrorPattern.error_type
Background task execution (when create_pr=True) includes git operations via subprocess.run
The BugFixGenerator validates the Anthropic API key format (must start with sk-ant-)
Prompt sanitization limits: MAX_PROMPT_LENGTH = 50000, MAX_ERROR_MESSAGE_LENGTH = 10000

Generated by AI documentation tool

Issue: #1755

API Endpoint Documentation: POST /error-monitor/fixes/generate-batch

Handler: `generate_fixes_batch()` in `src/routes/error_monitor.py` (lines 207-253)

1. Overview

Generates automated bug fixes for multiple error patterns simultaneously. Supports both synchronous batch processing and background processing with optional GitHub PR creation.

Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])

2. Authentication & Middleware

Authentication: NONE - No authentication dependency
Rate Limiting: Global middleware only (IP-based via security_middleware.py)
Middleware Pipeline: Sentry → Observability → Timeout → Security → GZip → Trace → Handler

3. Request Parameters

Parameter	Type	Source	Required	Default	Validation
`error_ids`	`list[str]`	Query	Yes	N/A	FastAPI required query param (list)
`create_prs`	`bool`	Query	No	`False`	Boolean
`background_tasks`	`BackgroundTasks`	DI	No	`BackgroundTasks()`	FastAPI injected

4. Response Schema

Success (synchronous, create_prs=False):

{
  "status": "success",
  "fixes": [
    {
      "id": "uuid",
      "error_pattern_id": "category:message_prefix",
      "error_message": "string",
      "error_category": "string",
      "analysis": "string",
      "proposed_fix": "string",
      "code_changes": {"file_path": "code"},
      "files_affected": ["file1.py"],
      "severity": "critical|high|medium|low|info",
      "generated_at": "ISO8601",
      "pr_url": "string|null",
      "status": "pending|testing|merged|failed"
    }
  ],
  "count": 3
}

Success (background, create_prs=True):

{
  "status": "processing",
  "message": "Processing 3 errors in background",
  "count": 3
}

5. Dependency Map (3+ levels deep)

generate_fixes_batch()
├── get_error_monitor() → ErrorMonitor singleton
│   ├── ErrorMonitor.__init__() → Config.LOKI_ENABLED, Config.LOKI_QUERY_URL
│   └── ErrorMonitor.initialize() → httpx.AsyncClient(timeout=10.0)
├── get_bug_fix_generator() → BugFixGenerator singleton
│   ├── BugFixGenerator.__init__()
│   │   ├── Config.ANTHROPIC_API_KEY (required)
│   │   ├── Config.GITHUB_TOKEN (optional)
│   │   └── Config.ANTHROPIC_MODEL
│   └── BugFixGenerator.initialize() → httpx.AsyncClient + _validate_api_key()
├── monitor.error_patterns.values() → dict iteration
│   └── pattern.to_dict().get("error_type") → filter by error_ids list
├── generator.process_multiple_errors() [background or sync]
│   ├── asyncio.gather(*tasks, return_exceptions=True) → parallel processing
│   └── process_error() [per error, see #1754]
│       ├── generate_fix()
│       │   ├── analyze_error() → Claude API POST
│       │   └── _make_claude_request() → Claude API POST (fix generation)
│       ├── create_branch_and_commit() → git subprocess
│       └── create_pull_request() → GitHub API POST

6. Supabase Queries

None - No database interaction.

7. Redis Operations

None - No Redis interaction.

8. External API Calls

Service	Operation	URL	Details
Anthropic Claude API	POST	`https://api.anthropic.com/v1/messages`	Analysis per error (max_tokens=1024)
Anthropic Claude API	POST	`https://api.anthropic.com/v1/messages`	Fix generation per error (max_tokens=2048)
GitHub API	POST	`https://api.github.com/repos/{repo}/pulls`	PR creation per fix (if create_prs=True)

Note: All errors are processed in parallel via asyncio.gather(). For N errors, this makes up to 2N Claude API calls + N GitHub API calls.

9. Prometheus Metrics

None directly.

10. Error Handling Paths

Error	Status Code	Condition
`HTTPException(404)`	404	No matching error patterns found for any of the provided error_ids
`HTTPException(500)`	500	Any unhandled exception
`RuntimeError`	500	`ANTHROPIC_API_KEY` not configured (from `get_bug_fix_generator()`)

Note: Individual error processing failures in process_multiple_errors() are caught by asyncio.gather(return_exceptions=True) and logged but do not cause the batch to fail. The response only includes successful fixes.

11. Mermaid Diagram

flowchart TD
    A[POST /error-monitor/fixes/generate-batch] --> B[get_error_monitor singleton]
    B --> C[get_bug_fix_generator singleton]
    C --> D{ANTHROPIC_API_KEY configured?}
    D -->|No| E[RuntimeError 500]
    D -->|Yes| F[Filter error_patterns by error_ids list]
    F --> G{Any patterns matched?}
    G -->|No| H[HTTPException 404]
    G -->|Yes| I{create_prs == True?}
    I -->|Yes| J[BackgroundTasks.add_task: process_multiple_errors]
    J --> K[Return status: processing, count: N]
    I -->|No| L[Synchronous: process_multiple_errors]
    L --> M[asyncio.gather - parallel processing]
    M --> N[For each error: analyze + generate fix via Claude]
    N --> O[Filter successful BugFix results]
    O --> P{Any exceptions in results?}
    P -->|Yes| Q[Log errors, continue with successes]
    P -->|No| R[Return all fixes]
    Q --> R
    R --> S[Return status: success, fixes list, count]

12. Important Notes

Error patterns are matched by checking pattern.to_dict().get("error_type") in error_ids which performs a list membership test
Parallel processing via asyncio.gather means all Claude API calls fire concurrently - could hit rate limits on the Anthropic API
Failed individual fix generations (returning Exception from gather) are silently logged and excluded from the response
error_ids is a list[str] query parameter - in FastAPI, this means the URL would be: ?error_ids=id1&error_ids=id2&error_ids=id3

Generated by AI documentation tool

Issue: #1756

API Endpoint Documentation: POST /error-monitor/monitor/start

Handler: `start_continuous_monitoring()` in `src/routes/error_monitor.py` (lines 334-351)

1. Overview

Starts a continuous background error monitoring loop that periodically scans Loki logs for errors, classifies them, and stores error patterns. The monitoring runs indefinitely until the application shuts down.

Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])

2. Authentication & Middleware

Authentication: NONE - No authentication dependency
Rate Limiting: Global middleware only (IP-based via security_middleware.py)
Middleware Pipeline: Sentry → Observability → Timeout → Security → GZip → Trace → Handler

3. Request Parameters

Parameter	Type	Source	Required	Default	Validation
`interval`	`int`	Query	No	`300`	`ge=60, le=3600` (1 min to 1 hour)
`background_tasks`	`BackgroundTasks`	DI	No	`BackgroundTasks()`	FastAPI injected

4. Response Schema

{
  "status": "started",
  "interval_seconds": 300,
  "message": "Continuous monitoring started in background"
}

5. Dependency Map (3+ levels deep)

start_continuous_monitoring()
├── get_error_monitor() → ErrorMonitor singleton
│   ├── ErrorMonitor.__init__()
│   │   ├── Config.LOKI_ENABLED
│   │   └── Config.LOKI_QUERY_URL
│   └── ErrorMonitor.initialize() → httpx.AsyncClient(timeout=10.0)
└── BackgroundTasks.add_task(monitor.monitor_continuously, interval=interval)
    └── monitor_continuously(interval) [infinite loop]
        ├── ErrorMonitor.initialize() → creates new httpx session
        └── while True loop:
            ├── get_critical_errors(hours=1)
            │   ├── fetch_recent_errors(hours=1)
            │   │   └── HTTP GET to Loki: {base_url}/loki/api/v1/query_range
            │   │       └── LogQL query: '{level="ERROR"}'
            │   └── analyze_errors(raw_errors)
            │       ├── extract_error_details() → ErrorPattern dataclass
            │       ├── classify_error() → (ErrorCategory, ErrorSeverity)
            │       ├── determine_fixability() → (bool, str|None)
            │       └── group_similar_errors() → deduplicated dict
            ├── store_error_pattern() → updates in-memory error_patterns dict
            ├── get_fixable_errors(hours=1)
            │   ├── fetch_recent_errors(hours=1) → Loki HTTP GET
            │   └── analyze_errors(raw_errors)
            └── asyncio.sleep(interval)

6. Supabase Queries

None - No database interaction.

7. Redis Operations

None - No Redis interaction.

8. External API Calls

Service	Operation	URL	Details
Loki	GET	`{LOKI_QUERY_URL}/loki/api/v1/query_range`	Periodic error log queries

Loki Query Parameters:

query: {level="ERROR"}
limit: 100
direction: backward

Frequency: Every interval seconds (default 300s / 5 minutes)

9. Prometheus Metrics

None directly.

10. Error Handling Paths

Error	Status Code	Condition
`HTTPException(500)`	500	Any exception during setup (before background task starts)

Background loop error handling: Within monitor_continuously():

Individual cycle errors are caught, logged with exc_info=True, and the loop continues
KeyboardInterrupt stops the loop gracefully
The finally block calls self.close() to clean up the httpx session

Note: There is no duplicate-start prevention. Calling this endpoint multiple times creates multiple concurrent monitoring loops.

11. Mermaid Diagram

flowchart TD
    A[POST /error-monitor/monitor/start] --> B[get_error_monitor singleton]
    B --> C[Add background task: monitor_continuously]
    C --> D[Return status: started]
    
    subgraph Background Loop
        E[monitor_continuously starts] --> F[initialize httpx session]
        F --> G[Scan for critical errors from Loki]
        G --> H{Loki enabled?}
        H -->|No| I[Return empty list]
        H -->|Yes| J[HTTP GET Loki query_range]
        J --> K[Parse JSON log entries]
        K --> L[classify_error per entry]
        L --> M[group_similar_errors]
        M --> N[Store patterns in memory]
        N --> O[Scan for fixable errors]
        O --> P[asyncio.sleep interval]
        P --> G
    end

12. Important Notes

The monitoring loop runs indefinitely as a background task - it only stops when the application shuts down or a KeyboardInterrupt is received
No duplicate prevention: Multiple calls create multiple parallel monitoring loops, each making Loki queries at the configured interval
Error patterns are stored in-memory (ErrorMonitor.error_patterns dict) - lost on restart
The initialize() call inside monitor_continuously() creates a new httpx session separate from the singleton's session
Loki connectivity is required (Config.LOKI_ENABLED and Config.LOKI_QUERY_URL must be set) for the monitoring to actually find errors
Error classification supports 10 categories: PROVIDER_ERROR, DATABASE_ERROR, RATE_LIMIT_ERROR, AUTH_ERROR, TIMEOUT_ERROR, VALIDATION_ERROR, CACHE_ERROR, EXTERNAL_SERVICE_ERROR, INTERNAL_ERROR, UNKNOWN

Generated by AI documentation tool

Issue: #1757

API Endpoint Documentation: POST /error-monitor/monitor/scan

Handler: `scan_for_errors()` in `src/routes/error_monitor.py` (lines 354-395)

1. Overview

Triggers a one-time manual scan of Loki logs for errors. Analyzes and stores error patterns, and optionally kicks off automated fix generation for fixable errors in the background.

Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])

2. Authentication & Middleware

Authentication: NONE - No authentication dependency
Rate Limiting: Global middleware only (IP-based via security_middleware.py)
Middleware Pipeline: Sentry → Observability → Timeout → Security → GZip → Trace → Handler

3. Request Parameters

Parameter	Type	Source	Required	Default	Validation
`hours`	`int`	Query	No	`1`	`ge=1, le=24`
`auto_fix`	`bool`	Query	No	`False`	Boolean
`background_tasks`	`BackgroundTasks`	DI	No	`BackgroundTasks()`	FastAPI injected

4. Response Schema

{
  "status": "scanned",
  "errors_found": 5,
  "hours": 1,
  "critical_errors": 2,
  "auto_fixes_started": 1
}

The auto_fixes_started field is only present when auto_fix=True and fixable errors exist.

5. Dependency Map (3+ levels deep)

scan_for_errors()
├── get_error_monitor() → ErrorMonitor singleton
│   └── [see #1754 for init chain]
├── get_bug_fix_generator() → BugFixGenerator singleton
│   └── [see #1754 for init chain]
├── monitor.fetch_recent_errors(hours=hours)
│   └── HTTP GET Loki: {base_url}/loki/api/v1/query_range
│       ├── LogQL query: '{level="ERROR"}'
│       ├── Params: limit=100, direction=backward
│       └── Response: parsed JSON log entries
├── monitor.analyze_errors(raw_errors)
│   ├── extract_error_details() per error
│   │   ├── classify_error() → (ErrorCategory, ErrorSeverity)
│   │   └── regex extraction: file, line, function from stack traces
│   ├── determine_fixability() per pattern
│   │   └── Category-based rules (rate_limit→True, timeout→True, etc.)
│   └── group_similar_errors() → deduplicated by "category:message[:50]"
├── monitor.store_error_pattern() per pattern
│   └── Updates in-memory error_patterns dict (merges counts, examples)
└── [if auto_fix and fixable patterns exist]
    └── BackgroundTasks.add_task(generator.process_multiple_errors, fixable, create_prs=True)
        └── asyncio.gather → parallel process_error() calls
            ├── generate_fix() → 2x Claude API calls
            ├── create_branch_and_commit() → git subprocess
            └── create_pull_request() → GitHub API POST

6. Supabase Queries

None - No database interaction.

7. Redis Operations

None - No Redis interaction.

8. External API Calls

During scan (synchronous):

Service	Operation	URL	Details
Loki	GET	`{LOKI_QUERY_URL}/loki/api/v1/query_range`	Fetch error logs

During auto-fix (background, if auto_fix=True):

Service	Operation	URL	Details
Anthropic Claude API	POST	`https://api.anthropic.com/v1/messages`	Error analysis per fixable error
Anthropic Claude API	POST	`https://api.anthropic.com/v1/messages`	Fix generation per fixable error
GitHub API	POST	`https://api.github.com/repos/{repo}/pulls`	PR creation per fix

9. Prometheus Metrics

None directly.

10. Error Handling Paths

Error	Status Code	Condition
`HTTPException(500)`	500	Any exception during scan or setup
`RuntimeError`	500	`ANTHROPIC_API_KEY` not configured (from `get_bug_fix_generator()`)

Note: The get_bug_fix_generator() is called even if auto_fix=False, meaning the endpoint will fail with 500 if ANTHROPIC_API_KEY is not set, regardless of auto_fix setting.

11. Mermaid Diagram

flowchart TD
    A[POST /error-monitor/monitor/scan] --> B[get_error_monitor singleton]
    B --> C[get_bug_fix_generator singleton]
    C --> D{ANTHROPIC_API_KEY set?}
    D -->|No| E[RuntimeError 500]
    D -->|Yes| F[fetch_recent_errors from Loki]
    F --> G{Loki enabled?}

Home

Reading Path (start here, in order)

Testing

Security & Access

Billing

Monitoring

Features

Providers

Operations

Data References

API Mappings

API Mappings - Gatewayz Backend

Table of Contents

Admin

Deep-Dive API Documentation: POST /admin/add_credits

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

API Documentation: GET /admin/balance

Section 1: High-Level Overview

Section 2: Low-Level Detailed Documentation

2.1 Requirements & Pipeline

2.2 Mermaid Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/monitor

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/cache-status

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/huggingface-cache-status

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/debug-models

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/test-huggingface/{hugging_face_id}

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/trial/analytics

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/users/growth

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/users/count

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map

2.4 Side Effects

Deep-Dive API Documentation: GET /admin/users/stats

Section 1: High-Level Overview

Section 2: Low-Level Deep-Dive

2.1 Requirements and Pipeline

2.2 Flow Diagram

2.3 Complete Dependency Map