-
Notifications
You must be signed in to change notification settings - Fork 1
API Mappings
Auto-generated deep-dive documentation for all API endpoints in gatewayz-backend. Total endpoints documented: 450 Generated: 2026-03-04
- Admin (46 endpoints)
- Analytics (5 endpoints)
- Authentication (5 endpoints)
- Chat & Messaging (20 endpoints)
- Circuit Breakers (4 endpoints)
- Code Router (5 endpoints)
- Coupons (3 endpoints)
- Credits (6 endpoints)
- Diagnostics (2 endpoints)
- Error Monitoring (12 endpoints)
- General Router (4 endpoints)
- Health & Monitoring (30 endpoints)
- Metrics & Observability (6 endpoints)
- Models & Catalog (23 endpoints)
- Other (19 endpoints)
- Status (2 endpoints)
- Users (8 endpoints)
46 endpoints
Issue: #1600
The POST /admin/add_credits endpoint allows authenticated admin users to add credits to any user's account, identified by their API key. It enforces two safety limits: a per-transaction cap (ADMIN_MAX_CREDIT_GRANT env var) and a 24-hour rolling daily limit (ADMIN_DAILY_GRANT_LIMIT env var). On success it writes to the users table (purchased_credits), logs the transaction in credit_transactions, and invalidates the user's in-memory cache entry.
Authentication: Admin role required. Uses require_admin dependency which chains: get_current_user -> get_api_key -> validate_api_key_security -> get_user -> validate_trial_expiration -> check user.role == "admin" OR user.is_admin == True.
Admin Auth Chain:
- get_api_key(): Bearer token extraction, validate_api_key_security, audit log
- get_current_user(): get_user with 5-min cache, validate_trial_expiration
- require_admin(): checks user.get("is_admin", False) OR user.get("role") == "admin"; if not: audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS"); raises 403
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling for /api/admin paths) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema (AddCreditsRequest, src/schemas/payments.py):
- api_key: str (required) - Target user's API key
- credits: float (required) - Amount to add (must be positive, validated by add_credits_to_user)
- reason: str (required, min 10 chars enforced by Pydantic schema) - Reason for grant
Response Schema:
- status: "success"
- message: str - "Added {credits} credits to user {username}"
- new_balance: float - User's balance after the credit addition
- user_id: int
- reason: str
Safety Controls:
- Per-transaction cap: req.credits > Config.ADMIN_MAX_CREDIT_GRANT -> 400
- Daily rolling limit: asyncio.to_thread(get_admin_daily_grant_total, admin_id) checks credit_transactions sum for past 24 hours; if daily_total + req.credits > Config.ADMIN_DAILY_GRANT_LIMIT -> 400
Error Codes:
- 400: credits > ADMIN_MAX_CREDIT_GRANT; OR daily_total + credits > ADMIN_DAILY_GRANT_LIMIT; OR ValueError from add_credits_to_user (credits <= 0, user not found)
- 401: Invalid/missing auth
- 403: Not admin role
- 404: Target user not found (get_user returns None)
- 500: Unexpected database error
Request -> require_admin dep (full auth chain) -> extract admin_id, max_single_grant, daily_limit -> check req.credits > max_single_grant -> 400 if exceeded -> asyncio.to_thread(get_admin_daily_grant_total, admin_id) -> check daily_total + req.credits > daily_limit -> 400 if exceeded -> asyncio.to_thread(get_user, req.api_key) -> 404 if not found -> description = req.reason -> asyncio.to_thread(add_credits_to_user, user_id, credits, "admin_credit", description, metadata={reason, admin_user_id, admin_username}, created_by="admin:{admin_id}") -> asyncio.to_thread(get_user, req.api_key) again to get updated balance -> log action -> return response
| Component | Location | Details |
|---|---|---|
| admin_add_credits() handler | src/routes/admin.py:105 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Checks role="admin" or is_admin=True; logs violation if not admin |
| Config.ADMIN_MAX_CREDIT_GRANT | src/config/config.py | Max credits per single grant (env: ADMIN_MAX_CREDIT_GRANT) |
| Config.ADMIN_DAILY_GRANT_LIMIT | src/config/config.py | Max credits per admin per 24 hours (env: ADMIN_DAILY_GRANT_LIMIT) |
| get_admin_daily_grant_total() | src/db/credit_transactions.py | SELECT SUM(amount) FROM credit_transactions WHERE created_by LIKE 'admin:{admin_id}' AND created_at >= now()-24h AND transaction_type='admin_credit' |
| get_user() | src/db/users.py:407 | With 5-min in-memory cache; looks up by api_key |
| add_credits_to_user() | src/db/users.py:505 | Fetches current balances, updates purchased_credits, logs transaction |
| Supabase SELECT users | src/db/users.py:537 | SELECT subscription_allowance, purchased_credits FROM users WHERE id=user_id |
| Supabase UPDATE users | src/db/users.py:582 | UPDATE users SET purchased_credits=purchased_after, updated_at=now WHERE id=user_id |
| log_credit_transaction() | src/db/credit_transactions.py:68 | INSERT INTO credit_transactions {user_id, amount, transaction_type="admin_credit", description, balance_before, balance_after, metadata, created_by="admin:{admin_id}"} |
| invalidate_user_cache_by_id() | src/db/users.py:48 | Scans _user_cache dict and removes entries matching user_id |
| AddCreditsRequest schema | src/schemas/payments.py | api_key:str, credits:float, reason:str (min 10 chars) |
Supabase Operations:
- SELECT SUM from credit_transactions (daily grant total check)
- SELECT from api_keys_new + users (get_user lookup for target user)
- SELECT subscription_allowance, purchased_credits from users (get current balance)
- UPDATE users SET purchased_credits=new_value, updated_at=now (credit addition)
- INSERT into credit_transactions (transaction log with metadata including reason, admin_user_id, admin_username)
- SELECT from api_keys_new + users (get_user again for updated balance in response)
- DB READ: get_admin_daily_grant_total reads credit_transactions
- DB READ: get_user reads api_keys_new and users tables (x2 — before and after credit addition)
- DB WRITE: UPDATE users.purchased_credits and users.updated_at
- DB WRITE: INSERT into credit_transactions with full audit trail (amount, balances, reason, admin identity)
- Cache INVALIDATION: invalidate_user_cache_by_id() removes all _user_cache entries for the target user (in-process only, not Redis)
- Logging: logger.info with admin username, credits added, target username, reason
- No email notifications (only registration triggers welcome email)
- Audit log: audit_logger.log_api_key_usage() during auth chain; audit_logger.log_security_violation() if non-admin attempts
- ObservabilityMiddleware: Records http_requests_total{method="POST", endpoint="/admin/add_credits"} post-response
- Sentry sampling: 50% (admin endpoint)
Issue: #1601
The GET /admin/balance endpoint returns the credit balances and account timestamps for every user in the system. It is an admin-only endpoint consumed by internal tooling and the admin dashboard to get a snapshot of all user balances for financial reporting and debugging. Because it fetches every user record without pagination, it is intended for relatively small datasets and internal use only.
| Aspect | Detail |
|---|---|
| HTTP Method | GET |
| Path | /admin/balance |
| Authentication |
require_admin dependency — valid Bearer API key with role=admin
|
| Rate Limiting | SecurityMiddleware IP controls; 50% Sentry sampling for admin endpoints |
| Request Schema | No body or query parameters |
| Response Schema | { status, total_users, users: [{ api_key, credits, created_at, updated_at }] } |
| Error Codes | 401 (not authenticated), 403 (not admin), 500 (internal) |
| Tags | admin |
Auth chain: require_admin → get_current_user → get_api_key → validate_api_key_security → get_user → validate_trial_expiration → role check
Request lifecycle:
-
require_adminvalidates caller's Bearer token and confirmsrole=admin. -
get_all_users()called viaasyncio.to_thread— fetches all users from Supabaseuserstable. - Each user's
api_key,credits,created_at,updated_atextracted into response list. - Returns
{ status, total_users, users }.
sequenceDiagram
participant Admin
participant SEC as SecurityMiddleware
participant AUTH as require_admin
participant ROUTE as admin_get_all_balances()
participant THREAD as asyncio.to_thread
participant DB as db/users.py get_all_users()
participant SB as Supabase (users table)
Admin->>SEC: GET /admin/balance
SEC->>AUTH: pass (IP OK)
AUTH->>AUTH: validate API key + role=admin check
AUTH->>ROUTE: admin_user dict
ROUTE->>THREAD: get_all_users()
THREAD->>DB: get_all_users()
DB->>SB: SELECT api_key, credits, created_at, updated_at FROM users
SB-->>DB: all user rows
DB-->>THREAD: list of user dicts
THREAD-->>ROUTE: users list
ROUTE-->>Admin: 200 { status, total_users, users: [{ api_key, credits, ... }] }
| Layer | Name | Purpose |
|---|---|---|
| Route file | src/routes/admin.py |
Route definition and handler |
| DB module |
src/db/users.py — get_all_users()
|
Fetches all user records from Supabase |
| Security dep |
src/security/deps.py — require_admin
|
Admin role enforcement |
| Security |
src/security/security.py — audit_logger
|
Audit logging on key usage and violations |
| Config | src/config/config.py |
IS_DEVELOPMENT, Sentry, etc. |
| Middleware | src/middleware/security_middleware.py |
IP-level rate limiting |
| Middleware | src/middleware/observability_middleware.py |
Prometheus metrics |
| Stdlib | asyncio.to_thread |
Offloads synchronous DB call to threadpool |
| Database | Supabase users table |
Source of all user balance data |
| Side Effect | Detail |
|---|---|
| No DB writes | Read-only query |
| Audit log |
audit_logger.log_api_key_usage() records admin endpoint access |
| No cache | No cache reads or writes; always fetches fresh from Supabase |
| No notifications | No emails or events emitted |
| Scale concern | Returns all users in a single response — may be slow or memory-intensive at large scale |
Issue: #1602
The GET /admin/monitor endpoint returns a comprehensive system monitoring snapshot for admins, including user counts, credit totals, API usage metrics for today and the past 30 days, and per-user activity breakdowns. It executes multiple parallel Supabase queries against the users, activity_log, usage_records, and api_keys_new tables, merges results from both modern (activity_log) and legacy (usage_records) data sources with deduplication, and returns an aggregated monitoring payload with timestamp.
Authentication: Admin role required. Uses require_admin dependency: get_current_user -> get_api_key -> validate_api_key_security -> get_user (5-min cache) -> validate_trial_expiration -> check role == "admin".
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- status: "success" | (with "warning" if data has errors)
- timestamp: ISO datetime string
- data: object from get_admin_monitor_data() containing:
- total_users: int (exact server-side COUNT(*))
- total_credits: float (sum of all user credits)
- api_calls_today: int (count from activity_log last 24h + deduped usage_records)
- api_calls_month: int (count from activity_log last 30 days + deduped usage_records)
- tokens_today: int
- tokens_month: int
- revenue_today: float
- revenue_month: float
- users_today: int (new users in last 24h)
- recent_usage: list of recent activity records
- user_metrics: dict keyed by api_key with per-user stats
- If data contains "error" key: response includes "warning" field
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: get_admin_monitor_data returns None or falsy; or exception
Request -> require_admin dep -> asyncio.to_thread(get_admin_monitor_data) -> multiple Supabase queries in sequence -> aggregate and merge activity_log + usage_records -> return data dict -> handler checks "error" in data -> if error: return with warning field -> else: return status="success" with data -> HTTPException 500 if monitor_data is None/falsy or on exception
| Component | Location | Details |
|---|---|---|
| admin_monitor() handler | src/routes/admin.py:205 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| asyncio.to_thread() | stdlib | Runs blocking get_admin_monitor_data() in thread pool |
| get_admin_monitor_data() | src/db/users.py:1481 | Orchestrates all DB queries |
| get_supabase_client() | src/config/supabase_config.py | Supabase client |
| Query 1 | users table | SELECT id WITH count="exact" -> server-side COUNT(*) for total_users |
| Query 2 | users table | SELECT id, credits, api_key LIMIT 10000 -> for credit totals and user mapping |
| Query 3 | activity_log table | SELECT id WITH count="exact" -> total_activity_count |
| Query 4 | activity_log table | SELECT * WHERE timestamp >= now-24h ORDER BY timestamp DESC LIMIT 10000 -> today's logs |
| Query 5 | activity_log table | SELECT * WHERE timestamp >= now-30days ORDER BY timestamp DESC LIMIT 50000 -> month's logs |
| Query 6 | usage_records table | SELECT * WHERE timestamp >= now-24h LIMIT 10000 -> legacy today |
| Query 7 | usage_records table | SELECT * WHERE timestamp >= now-30days LIMIT 50000 -> legacy month |
| Query 8 | api_keys_new table | SELECT user_id, api_key, is_primary LIMIT 10000 -> user_id<->api_key mapping |
| make_composite_key() | src/db/users.py:1669 | Creates dedup key: "{user_id} |
| sanitize_for_logging() | src/utils/security_validators.py | Used throughout for safe logging |
All Supabase Queries in get_admin_monitor_data():
-
users.select("id", count="exact").execute()— server COUNT(*) -
users.select("id, credits, api_key").limit(10000).execute()— credit data -
activity_log.select("id", count="exact").execute()— total count -
activity_log.select("*").gte("timestamp", day_ago_iso).order("timestamp", desc=True).limit(10000).execute()— today -
activity_log.select("*").gte("timestamp", month_ago_iso).order("timestamp", desc=True).limit(50000).execute()— month -
usage_records.select("*").gte("timestamp", day_ago_iso).limit(10000).execute()— legacy today -
usage_records.select("*").gte("timestamp", month_ago_iso).limit(50000).execute()— legacy month -
api_keys_new.select("user_id, api_key, is_primary").limit(10000).execute()— key mapping
Error Handling: Each query is wrapped in its own try/except; on failure, empty lists/zeros are used and processing continues.
- DB READ x8: Multiple SELECT queries across users, activity_log, usage_records, api_keys_new tables
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no cache invalidations, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/monitor"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Performance note: This endpoint issues up to 8 sequential Supabase queries and can take 500ms-2s depending on data volume
Issue: #1603
The GET /admin/cache-status endpoint returns metadata about the in-process provider model cache, including whether it has data, how old the cache is in seconds, its configured TTL, whether it is currently valid, and how many providers are cached. This is a diagnostic endpoint for monitoring the health of the provider catalog cache.
Authentication: Admin role required. Uses require_admin dependency: get_current_user -> get_api_key -> validate_api_key_security -> get_user (5-min cache) -> validate_trial_expiration -> check role == "admin".
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- status: "success"
- cache_info: object containing:
- has_data: bool - whether provider_cache["data"] is not None
- cache_age_seconds: float|None - seconds since cache was populated (None if no timestamp)
- ttl_seconds: int - configured TTL (default 1800 seconds / 30 minutes)
- is_valid: bool - cache_age_seconds is not None AND cache_age_seconds < ttl_seconds
- total_cached_providers: int - len(provider_cache["data"]) or 0
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Exception in get_provider_cache_metadata() or cache age calculation
Request -> require_admin dep -> get_provider_cache_metadata() (reads in-process _provider_cache dict) -> if timestamp exists: compute cache_age = (now - timestamp).total_seconds() else: cache_age = None -> build response with has_data, cache_age_seconds, ttl_seconds, is_valid, total_cached_providers -> return -> Exception -> log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| admin_cache_status() handler | src/routes/admin.py:290 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_provider_cache_metadata() | src/services/model_catalog_cache.py | Returns the _provider_cache dict metadata |
| _provider_cache | src/main.py:103 OR src/services/model_catalog_cache.py | In-process dict: {"data": None, "timestamp": None, "ttl": 1800} |
| datetime.now(UTC) | stdlib | Used for cache age calculation: (now - provider_cache["timestamp"]).total_seconds() |
Cache Structure (provider_cache):
- data: list of provider dicts or None (populated by get_cached_providers())
- timestamp: datetime object or None (set when data was last fetched)
- ttl: int (seconds, default 1800 = 30 minutes)
is_valid calculation: cache_age is not None AND cache_age < provider_cache.get("ttl", 1800)
Note on cache source: The _provider_cache is an in-process module-level dict in src/services/model_catalog_cache.py (or the legacy dict in src/main.py). It is populated when get_cached_providers() is called and a cache miss occurs. It is NOT stored in Redis — it is purely in-memory, meaning each process instance has its own cache.
- In-process memory READ: Reads _provider_cache dict — no I/O, extremely fast (< 1ms)
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB reads or writes, no Redis operations, no cache invalidations, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/cache-status"} post-response
- Sentry: 50% sampling rate for admin endpoints
Issue: #1604
The GET /admin/huggingface-cache-status endpoint returns metadata about the in-process HuggingFace model cache, including age, validity, total count of cached models, and the list of all cached model IDs. It is a diagnostic endpoint for monitoring the HuggingFace catalog cache state without triggering a cache refresh.
Authentication: Admin role required. Uses require_admin dependency (same chain as other admin endpoints).
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- huggingface_cache: object containing:
- age_seconds: float|None - seconds since cache was populated
- is_valid: bool - age_seconds is not None AND age_seconds < cache TTL (default 1800s)
- total_cached_models: int - len of hf_data list
- cached_model_ids: list[str] - list of model["id"] values for all cached model dicts that have an "id" key
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Exception in get_gateway_cache_metadata() or processing
Request -> require_admin dep -> get_gateway_cache_metadata("huggingface") (reads in-process gateway cache for "huggingface" key) -> if timestamp: compute cache_age = (now - timestamp).total_seconds() else: cache_age = None -> hf_data = hf_cache.get("data") or [] -> cached_ids = [m["id"] for m in hf_data if isinstance(m, dict) and m.get("id")] -> build and return response -> Exception -> log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| admin_huggingface_cache_status() handler | src/routes/admin.py:317 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_gateway_cache_metadata("huggingface") | src/services/model_catalog_cache.py | Returns metadata dict for the "huggingface" gateway cache entry |
| Gateway cache structure | src/services/model_catalog_cache.py | In-process dict keyed by gateway name: {"data": list or None, "timestamp": datetime or None, "ttl": int} |
| datetime.now(UTC) | stdlib | Used to compute cache_age |
| hf_data filtering | src/routes/admin.py:327 | List comprehension: [model.get("id") for model in hf_data if isinstance(model, dict) and model.get("id")] |
get_gateway_cache_metadata("huggingface"): Returns dict with keys:
- data: list of HuggingFace model dicts (each with "id" key at minimum) or None
- timestamp: datetime when data was last fetched, or None
- ttl: int seconds (default 1800)
In-process cache: The gateway cache is module-level in src/services/model_catalog_cache.py. Each process instance has an independent cache — no Redis involved for this status check. The cache is populated when get_cached_models() fetches HuggingFace data.
cached_model_ids extraction: Only models that are dicts AND have a truthy "id" field are included. Malformed entries (None, non-dict, or missing "id") are silently skipped.
- In-process memory READ: Reads gateway cache dict — no I/O, extremely fast
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB reads or writes, no Redis operations, no cache invalidations, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/huggingface-cache-status"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Note: cached_model_ids can be very large (1000+ items) for a warm HuggingFace cache; response payload may be substantial
Issue: #1605
The GET /admin/debug-models endpoint is a diagnostic tool for administrators to inspect the state of the model and provider caches. It retrieves the first 3 models and 3 providers from the in-process caches, tests provider-slug matching for the first 2 models, and returns cache metadata (timestamps, ages) for both the OpenRouter gateway cache (used as the main models cache proxy) and the provider cache. This is used to debug model catalog and provider resolution issues.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- models_cache: object containing:
- total_models: int - total count of all cached models
- sample_models: list - first 3 model dicts from cache
- cache_timestamp: datetime|None
- cache_age_seconds: float|None
- providers_cache: object containing:
- total_providers: int
- sample_providers: list - first 3 provider dicts from cache
- cache_timestamp: datetime|None
- cache_age_seconds: float|None
- provider_matching_test: list of objects (up to 2) containing:
- model_id: str
- provider_slug: str|None (part before "/" in model_id)
- found_provider: bool
- provider_site_url: str|None
- provider_data: dict|None - full matching provider dict
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Exception (raises HTTPException with detail message)
Request -> require_admin dep -> asyncio.to_thread(get_cached_models) -> asyncio.to_thread(get_cached_providers) -> sample_models = models[:3], sample_providers = providers[:3] -> provider matching test: for each of first 2 models, extract provider_slug = model_id.split("/")[0], linear scan providers list for slug match -> get_gateway_cache_metadata("openrouter") -> get_provider_cache_metadata() -> compute cache ages -> build and return response -> Exception -> log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| admin_debug_models() handler | src/routes/admin.py:418 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| asyncio.to_thread() | stdlib | Runs blocking get_cached_models() and get_cached_providers() in thread pool |
| get_cached_models() | src/services/models.py | Returns list of all models from in-process cache or fetches from DB/API if stale |
| get_cached_providers() | src/services/providers.py | Returns list of all providers from in-process cache or fetches from DB if stale |
| get_gateway_cache_metadata("openrouter") | src/services/model_catalog_cache.py | Returns metadata for the "openrouter" gateway cache entry (used as proxy for models cache) |
| get_provider_cache_metadata() | src/services/model_catalog_cache.py | Returns provider cache metadata dict |
| Provider slug matching | src/routes/admin.py:432-453 | Linear O(n) scan: for provider in providers: if provider.get("slug") == provider_slug: break |
Provider Matching Logic:
- provider_slug = model_id.split("/")[0] if "/" in model_id else None
- Linear scan of all providers looking for provider.get("slug") == provider_slug
- First match wins (break)
- O(n) complexity where n = total provider count
Cache metadata sources:
- Models: get_gateway_cache_metadata("openrouter") — treats OpenRouter cache as proxy for main model list
- Providers: get_provider_cache_metadata() — dedicated provider cache
- In-process cache READ: get_cached_models() and get_cached_providers() may trigger DB/API fetches if caches are stale
- Potential DB READ: If model or provider cache is stale, underlying fetch functions may query the database
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No direct DB writes, no Redis operations (unless cache fetch triggers them), no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/debug-models"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Note: sample_models and provider_data in provider_matching_test include full raw model/provider dicts which may contain sensitive configuration data
Issue: #1606
The GET /admin/test-huggingface/{hugging_face_id} endpoint is a diagnostic tool that fetches raw model data from the HuggingFace API for a specific model ID, caches the result in Redis with a 1-hour TTL, and returns the raw API response alongside extracted author data. It is used to debug HuggingFace API connectivity and validate the data structure returned for specific models.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Path Parameters:
- hugging_face_id: str (default: "openai/gpt-oss-120b") - The HuggingFace model ID in "author/model-name" format
Response Schema (on success):
- hugging_face_id: str
- raw_response: dict - Complete raw JSON from HuggingFace API
- author_data_extracted: object containing:
- has_author_data: bool - whether hf_data["author_data"] exists and is truthy
- author_data: dict|None - raw author_data from HuggingFace response
- author: str|None - hf_data.get("author")
- extracted_author_data: object containing:
- name: str|None - from author_data["name"]
- fullname: str|None - from author_data["fullname"]
- avatar_url: str|None - from author_data["avatarUrl"]
- follower_count: int - from author_data["followerCount"] (default 0)
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 404: HuggingFace model not found (fetch_huggingface_model returns None) OR HuggingFace API returns 404
- 500: Network error, timeout, or unexpected exception
Request -> require_admin dep -> fetch_huggingface_model(hugging_face_id) -> httpx.get("https://huggingface.co/api/models/{id}", timeout=10.0) -> response.raise_for_status() -> parse JSON -> try: get_redis_manager().set_json("huggingface:model:{id}", model_data, ttl=3600) (warning logged if fails) -> return model_data -> if 404: log warning, return None -> if other HTTP error: log error, return None -> if fetch returns None: raise HTTPException 404 -> build response with raw_response and extracted author_data -> return -> HTTPException 404 re-raised -> Exception -> log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| admin_test_huggingface() handler | src/routes/admin.py:364 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| fetch_huggingface_model() | src/services/models.py:2557 | Synchronous blocking HTTP fetch from HuggingFace API |
| httpx.get() | httpx library | Synchronous GET to https://huggingface.co/api/models/{hugging_face_id} with 10.0s timeout |
| response.raise_for_status() | httpx | Raises HTTPStatusError for 4xx/5xx responses |
| get_redis_manager() | src/config/redis_config.py | Returns Redis manager instance |
| redis_manager.set_json() | src/config/redis_config.py | Redis SET with JSON serialization; key="huggingface:model:{hugging_face_id}", TTL=3600s (1 hour) |
| Author data extraction | src/routes/admin.py:379-407 | Direct dict access on raw HuggingFace response for author, author_data, author_data.name/fullname/avatarUrl/followerCount |
External API Call:
- URL: https://huggingface.co/api/models/{hugging_face_id}
- Method: GET (synchronous via httpx.get)
- Timeout: 10.0 seconds
- No authentication (public HuggingFace API)
- Response: JSON dict with HuggingFace model metadata
Redis Operation:
- Key pattern: huggingface:model:{hugging_face_id} (e.g., "huggingface:model:openai/gpt-oss-120b")
- Operation: SET with JSON serialization
- TTL: 3600 seconds (1 hour)
- Failure handling: try/except with warning log — cache miss does not fail the request
Note: fetch_huggingface_model runs synchronously (blocking). When called from the async handler it blocks the event loop. The handler does NOT wrap it in asyncio.to_thread(), which is a potential performance issue for slow HuggingFace responses.
- External HTTP GET: Calls https://huggingface.co/api/models/{hugging_face_id} — synchronous, blocks event loop
- Redis WRITE: SET huggingface:model:{hugging_face_id} with 3600s TTL (best-effort, failure is non-fatal)
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB reads or writes, no in-process cache changes, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/test-huggingface/{hugging_face_id}"} post-response
- Sentry: 50% sampling rate for admin endpoints
- raw_response: The full HuggingFace API response is returned verbatim — may be large for models with many metadata fields
Issue: #1607
The GET /admin/trial/analytics endpoint returns aggregated trial analytics including conversion rates, usage statistics, and trial status breakdowns. It first checks Redis for a cached result (TTL 300 seconds / 5 minutes); on a cache miss it paginated-fetches all api_keys_new records to collect trial data, computes analytics in Python, caches the result in Redis, and returns the analytics. This endpoint is designed for monitoring trial user behavior and conversion metrics.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- success: bool - always True on success
- analytics: object containing:
- total_trials: int - count of api_keys_new rows where is_trial=True
- active_trials: int - trial keys where trial_end_date > now
- expired_trials: int - trial keys where trial_end_date <= now or missing
- converted_trials: int - trial keys where trial_converted=True
- conversion_rate: float (rounded 2 decimal places) - converted_trials/total_trials*100
- usage_statistics: object with total_tokens_used, total_requests_used, total_credits_used, total_credits_allocated, credits_utilization_rate
- average_usage_per_trial: object with tokens, requests, credits (all rounded 2 decimal places)
- trial_status_breakdown: object with active, expired, converted, pending_conversion counts
- OR on error: {"error": str}
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Exception from get_trial_analytics() (raises HTTPException)
Request -> require_admin dep -> get_trial_analytics() called -> try Redis GET "trial:analytics:summary" -> if cached: json.loads() and return immediately -> else: get_supabase_client() -> paginated loop: SELECT is_trial, trial_converted, trial_start_date, trial_end_date, trial_used_tokens, trial_used_requests, trial_used_credits, trial_credits, subscription_status FROM api_keys_new RANGE 0-999, then 1000-1999, etc. until < page_size rows -> filter trial_keys = [k for k in all if k.get("is_trial")] -> compute analytics in Python -> try Redis SET "trial:analytics:summary" json_data TTL=300 (warning if fails) -> return analytics_data -> handler returns {"success": True, "analytics": analytics}
| Component | Location | Details |
|---|---|---|
| get_trial_analytics_admin() handler | src/routes/admin.py:520 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_trial_analytics() | src/db/trials.py:196 | Core analytics function |
| CACHE_KEY | src/db/trials.py:198 | "trial:analytics:summary" |
| CACHE_TTL | src/db/trials.py:199 | 300 seconds (5 minutes) |
| get_redis_config() | src/config/redis_config.py | Returns Redis configuration/client wrapper |
| redis_config.get_cache(CACHE_KEY) | Redis | GET "trial:analytics:summary" -> returns bytes or None |
| json.loads(cached_data) | stdlib | Deserializes cached analytics |
| get_supabase_client() | src/config/supabase_config.py | Supabase client |
| Paginated SELECT loop | api_keys_new table | SELECT is_trial, trial_converted, trial_start_date, trial_end_date, trial_used_tokens, trial_used_requests, trial_used_credits, trial_credits, subscription_status RANGE(offset, offset+999) until < 1000 rows |
| trial_keys filter | src/db/trials.py:247 | [k for k in all_trial_stats if k.get("is_trial", False)] |
| date parsing | src/db/trials.py:256-277 | datetime.fromisoformat() with Z->+00:00 replacement; naive datetimes assumed UTC |
| tag_wrapper | src/services/pyroscope_config.py | Pyroscope profiling tags for cache operations |
| redis_config.set_cache() | Redis | SET "trial:analytics:summary" json_str EX 300 |
Redis Operations:
- GET "trial:analytics:summary" — check for cached result
- SET "trial:analytics:summary" EX 300 — cache computed result for 5 minutes (best-effort)
Supabase Query:
- Table: api_keys_new
- Operation: SELECT (paginated)
- Columns: is_trial, trial_converted, trial_start_date, trial_end_date, trial_used_tokens, trial_used_requests, trial_used_credits, trial_credits, subscription_status
- No filters (fetches ALL rows across all pages)
- Page size: 1000 rows per page
- Continues until page returns < 1000 rows
Python Aggregation (in-memory after fetch):
- Filter: is_trial == True
- Count active vs expired by comparing trial_end_date to now (UTC)
- Count conversions: trial_converted == True
- Sum tokens, requests, credits from trial_keys
- Redis READ: GET "trial:analytics:summary" on every request
- Redis WRITE: SET "trial:analytics:summary" EX 300 on cache miss (best-effort)
- DB READ: Paginated SELECT from api_keys_new (all rows, no filter) on cache miss
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- Pyroscope tagging: tag_wrapper adds profiling context for cache read and write operations
- No DB writes, no notifications, no in-process cache changes
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/trial/analytics"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Performance: On cache miss, fetches ALL api_keys_new rows in pages — can be expensive at scale
Issue: #1608
The GET /admin/users/growth endpoint returns daily cumulative user registration counts over a specified time period, designed to power user growth charts in admin dashboards. It queries the users table for created_at timestamps in the date range, groups registrations by day, adds a pre-period baseline count, and computes a growth rate percentage. It falls back to the registration_date column if created_at fails, and returns empty data arrays rather than errors on query failure.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Query Parameters:
- days: int, ge=1, le=365, default=30 - Number of days to analyze
Response Schema:
- status: "success"
- days: int - days parameter used
- start_date: str (ISO date YYYY-MM-DD)
- end_date: str (ISO date YYYY-MM-DD)
- data: list of objects {date: str YYYY-MM-DD, value: int (cumulative), new_users: int (daily)}
- total: int - cumulative total at end of period (includes pre-period users)
- growth_rate: float (rounded 2 decimal places) - percentage growth from first to last day
- timestamp: ISO datetime string
On both query failures returns same schema with data=[], total=0, growth_rate=0.
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Unexpected exception (raises HTTPException with "Failed to get user growth data")
Request -> require_admin dep -> compute end_date = today, start_date = today - (days-1) days -> try: query users.created_at in range -> if fails: fallback to registration_date query -> if fallback fails: return empty data -> initialize daily_data dict {date_str: 0} for each day in range -> count registrations per day from query results -> try: query COUNT() for users created before start_date (baseline) -> cumulative_total = baseline_count -> iterate sorted days: cumulative_total += new_users_today, append {date, value, new_users} -> compute growth_rate = (end-start)/start100 if len>=2 -> return response -> Exception -> log with traceback -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| get_user_growth() handler | src/routes/admin.py:531 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_supabase_client() | src/config/supabase_config.py | Supabase client (imported inline) |
| Primary query | users table | SELECT created_at FROM users WHERE created_at >= start_date AND created_at <= end_date ORDER BY created_at ASC |
| Fallback query | users table | SELECT registration_date FROM users WHERE registration_date >= start_date AND registration_date <= end_date ORDER BY registration_date ASC |
| Baseline count query | users table | SELECT id count="exact" FROM users WHERE created_at < start_date |
| Date parsing | src/routes/admin.py:626-643 | datetime.fromisoformat() with Z->+00:00 replacement; invalid dates logged and skipped |
| Growth rate calculation | src/routes/admin.py:676-681 | (end_value - start_value) / start_value * 100 if start_value > 0 else 0 |
Supabase Queries:
- Primary:
users.select("created_at").gte("created_at", start_date.isoformat()).lte("created_at", end_date.isoformat()).order("created_at", desc=False).execute() - Fallback:
users.select("registration_date").gte("registration_date", start_date.isoformat()).lte("registration_date", end_date.isoformat()).order("registration_date", desc=False).execute()(maps registration_date to created_at) - Baseline:
users.select("id", count="exact").lt("created_at", start_date.isoformat()).execute()— uses count_result.count for server-side COUNT(*)
Daily aggregation algorithm:
- Initialize dict: {each_date_in_range: 0}
- For each user registration: increment daily_data[date_key]
- cumulative_total starts at baseline count (users before start_date)
- Iterate sorted dates: cumulative_total += daily_count, append to result
Growth rate: Compares cumulative_data[0]["value"] (day 1 total) to cumulative_data[-1]["value"] (final day total). Returns 0 if start_value is 0 or fewer than 2 data points.
- DB READ x2-3: Primary query, optionally fallback query, baseline COUNT query (all on users table)
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no cache invalidations, no notifications
- Error logging: traceback.format_exc() on unexpected exceptions
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/growth"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Note: baseline COUNT query may be slow on large user tables without an index on created_at; primary and fallback queries return all matching rows (no limit) — can be large for high-growth periods
Issue: #1609
The GET /admin/users/count endpoint is an ultra-fast endpoint that returns only the total count of all users in the database using a server-side COUNT(*) query (via Supabase count="exact"). It is designed for dashboard counters that need only a number, not user data, with typical response times of 5-20ms.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- count: int - Total number of users (0 if query fails)
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Exception from Supabase query (raises HTTPException with "Failed to get users count")
Request -> require_admin dep -> get_supabase_client() -> users.select("id", count="exact").execute() -> total_count = count_result.count if count_result.count is not None else 0 -> return {"count": total_count, "timestamp": now} -> Exception -> log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| get_users_count() handler | src/routes/admin.py:702 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_supabase_client() | src/config/supabase_config.py | Imported inline in handler |
| Supabase query | users table | SELECT id FROM users with count="exact" -> server-side COUNT(*), returns count attribute |
| datetime.now(UTC).isoformat() | stdlib | Timestamp for response |
Supabase Query Details:
- Table: users
- Operation: SELECT with count="exact"
- Columns: id (minimal column to minimize data transfer, only count is used)
- No filters — counts ALL users
- count="exact" triggers a server-side COUNT(*) in PostgreSQL, not row fetching
- Returns: count_result.count attribute (integer or None)
- Fallback: 0 if count_result.count is None
Why count="exact": This is the PostgREST way to get accurate PostgreSQL COUNT(*). Without it, Supabase returns at most 1000 rows by default. With count="exact", only the count is returned (no row data), making this extremely lightweight.
- DB READ: Single COUNT(*) query on users table — extremely lightweight, no row data fetched
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no cache invalidations, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/count"} post-response
- Sentry: 50% sampling rate for admin endpoints
Issue: #1610
The GET /admin/users/stats endpoint returns aggregated user statistics without returning user data. It runs up to 5 separate Supabase queries (count, roles, active/inactive status, credits, subscription breakdown) with optional filters for email, API key, and is_active status. It is designed for dashboard stats cards that need counts and aggregates, not user records — approximately 10-50ms vs 500ms+ for the full /admin/users list.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Query Parameters:
- email: str|None - Case-insensitive partial match (ilike %email%)
- api_key: str|None - Case-insensitive partial match on api_keys_new.api_key (ilike %api_key%) — triggers JOIN to api_keys_new
- is_active: bool|None - Filter by users.is_active column
Response Schema:
- status: "success"
- total_users: int - COUNT of matching users
- filters_applied: {email, api_key, is_active} showing applied filter values
- statistics: object containing:
- active_users: int - users where is_active is True (boolean True, not truthy)
- inactive_users: int - total_users - active_users
- admin_users: int - users with role="admin"
- developer_users: int - users with role="developer"
- regular_users: int - users with role="user" or role=None
- total_credits: float (rounded 2 decimal places) - sum of all users.credits
- average_credits: float (rounded 2 decimal places)
- subscription_breakdown: dict keyed by subscription_status value -> count
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: Exception from any query (raises HTTPException with "Failed to get users statistics")
Request -> require_admin dep -> compute email_pattern = "%{email}%" if email else None -> Query 1: count_query SELECT id [JOIN api_keys_new if api_key] count="exact" with filters -> total_users = count_result.count -> Query 2: role_query SELECT role [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> count admin/developer/regular roles -> Query 3: status_query SELECT is_active [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> count active (is_active is True) -> Query 4: credits_query SELECT credits [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> sum credits -> Query 5: subscription_query SELECT subscription_status [JOIN api_keys_new if api_key] LIMIT 100000 with filters -> group by subscription_status -> build and return response
| Component | Location | Details |
|---|---|---|
| get_users_stats() handler | src/routes/admin.py:736 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_supabase_client() | src/config/supabase_config.py | Imported inline |
Supabase Query 1 - Count:
- Without api_key:
users.select("id", count="exact")with .ilike("email", pattern) and/or .eq("is_active", is_active) - With api_key:
users.select("id, api_keys_new!inner(api_key)", count="exact")with .ilike("api_keys_new.api_key", "%api_key%") - Returns: count_result.count
Supabase Query 2 - Roles:
-
users.select("role" [+ join]).limit(100000)with same filters - Python aggregation: sum(1 for u in role_data if u.get("role") == "admin"), "developer", or "user"/None
Supabase Query 3 - Status:
-
users.select("is_active" [+ join]).limit(100000)with same filters - active_users = sum(1 for u in status_data if u.get("is_active") is True) — strict True check, not truthy
Supabase Query 4 - Credits:
-
users.select("credits" [+ join]).limit(100000)with same filters - total_credits = sum(float(u.get("credits", 0)) for u in credits_data)
- avg_credits = round(total_credits / total_users, 2)
Supabase Query 5 - Subscriptions:
-
users.select("subscription_status" [+ join]).limit(100000)with same filters - subscription_stats = {status: count} dict
JOIN pattern (when api_key provided): api_keys_new!inner(api_key) — INNER JOIN on api_keys_new table, filter: .ilike("api_keys_new.api_key", f"%{api_key}%")
Email filter: Uses PostgreSQL ILIKE with %{email}% pattern — case-insensitive partial match anywhere in email string
Important note on active_users: Uses u.get("is_active") is True (identity check), not u.get("is_active") (truthiness). This means users with is_active=1 (integer) would NOT be counted as active. Only Python bool True matches.
- DB READ x5: Five separate sequential Supabase queries on users table (with optional api_keys_new JOIN)
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no cache invalidations, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/stats"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Performance: LIMIT 100000 on roles/status/credits/subscription queries — fetches up to 100k rows per query which may be significant memory usage on large databases
Issue: #1611
The GET /admin/users endpoint returns a paginated, filterable list of user records for admin consumption. For email-only searches it uses a PostgreSQL RPC function (search_users_by_email) for performance on Cloudflare-hosted instances; for complex filters it falls back to a standard query with JOIN on api_keys_new. It returns user identity and status fields without statistics (see /admin/users/stats for aggregates) and supports pagination up to 10,000 records per page.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Query Parameters:
- email: str|None - Case-insensitive partial match (ilike %email%)
- api_key: str|None - Case-insensitive partial match on api_keys_new.api_key
- is_active: bool|None - Filter by active status
- limit: int, ge=1, le=10000, default=100 - Records per page
- offset: int, ge=0, default=0 - Records to skip
Fast Path (email-only, no api_key, no is_active filter):
- Calls RPC function: search_users_by_email(search_term, result_limit, result_offset)
- Returns: total_count from first row, user records without total_count field
- On RPC failure: raises HTTPException 500 with message about missing RPC function (does NOT fall back to standard query for email-only to avoid Cloudflare crashes)
Standard Path (any other combination):
- Count query with optional JOIN
- Data query with optional JOIN, specific column selection
- Pagination via .range(offset, offset+limit-1) and .order("created_at", desc=True)
Response Schema:
- status: "success"
- total_users: int - total matching the filters
- has_more: bool - (offset + limit) < total_users
- pagination: {limit, offset, current_page (offset//limit)+1, total_pages}
- filters_applied: {email, api_key, is_active}
- users: list of user dicts (cleaned of api_keys_new join data)
- timestamp: ISO datetime string
User dict columns (standard path): id, username, email, credits, is_active, role, registration_date, auth_method, subscription_status, trial_expires_at, created_at, updated_at (+ api_key field stripped from JOIN)
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 500: RPC failure for email-only; or data query failure; or unexpected exception
Request -> require_admin dep -> check filter combination -> if email AND NOT api_key AND is_active is None: RPC path -> client.rpc("search_users_by_email", {search_term, result_limit, result_offset}) -> if RPC fails: raise HTTPException 500 -> extract total_count from first row -> clean total_count from user dicts -> return -> else: Standard path -> count query (with optional api_keys_new JOIN and filters) -> data query (with optional JOIN, column selection, filters) -> sort by created_at desc, range pagination -> clean api_keys_new from user dicts -> build response
| Component | Location | Details |
|---|---|---|
| get_all_users_info() handler | src/routes/admin.py:942 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_supabase_client() | src/config/supabase_config.py | Imported inline |
Fast Path - RPC Query:
- Function: search_users_by_email
- Parameters: {search_term: email, result_limit: limit, result_offset: offset}
- Returns: rows with all user columns + total_count field on each row
- total_users = users_data[0]["total_count"] if users_data else 0
- Users cleaned by: {k: v for k, v in user.items() if k != "total_count"}
Standard Path - Count Query:
- Without api_key:
users.select("id", count="exact") - With api_key:
users.select("id, api_keys_new!inner(api_key)", count="exact") - Filters: .ilike("email", f"%{email}%"), .ilike("api_keys_new.api_key", f"%{api_key}%"), .eq("is_active", is_active)
- On count failure: logs error, falls back to total_users = 0
Standard Path - Data Query:
- Without api_key:
users.select("id, username, email, credits, is_active, role, registration_date, auth_method, subscription_status, trial_expires_at, created_at, updated_at") - With api_key: above + ", api_keys_new!inner(api_key)"
- Same filters applied
- Order: .order("created_at", desc=True)
- Pagination: .range(offset, offset + limit - 1)
- Users cleaned: {k: v for k, v in user.items() if k != "api_keys_new"}
has_more calculation: (offset + limit) < total_users
- DB READ (fast path): 1 RPC call (search_users_by_email function) with pagination
- DB READ (standard path): 2 sequential queries (count + data) with optional JOIN
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no cache invalidations, no notifications
- Error logging: traceback.format_exc() on unexpected exceptions
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Note: api_keys_new JOIN is an INNER JOIN meaning users without any api_keys_new entries are excluded from api_key filter results; limit up to 10000 can return large payloads
Issue: #1612
The GET /admin/users/{user_id} endpoint retrieves comprehensive information about a specific user by their numeric ID, including all user record fields, all associated API keys from api_keys_new, the 10 most recent usage_records entries, and the 10 most recent activity_log entries. It runs 4 sequential Supabase queries and returns a unified response; usage_records and activity_log failures are silently swallowed returning empty arrays.
Authentication: Admin role required. Uses require_admin dependency.
Path Parameters:
- user_id: int - The numeric user ID (primary key in users table)
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Request Schema: No body, no query parameters.
Response Schema:
- status: "success"
- user: dict - Full users table row (all columns including sensitive data like api_key, email, credits)
- api_keys: list - All rows from api_keys_new WHERE user_id=? (all columns including raw api_key strings)
- recent_usage: list - Up to 10 rows from usage_records WHERE user_id=? ORDER BY created_at DESC LIMIT 10; [] on query failure
- recent_activity: list - Up to 10 rows from activity_log WHERE user_id=? ORDER BY created_at DESC LIMIT 10; [] on query failure
- timestamp: ISO datetime string
Error Codes:
- 401: Invalid/missing auth
- 403: Not admin role
- 404: users table query returns no rows for given user_id
- 500: Exception in users or api_keys query (raises HTTPException "Failed to get user information")
Request -> require_admin dep -> get_supabase_client() -> Query 1: SELECT * FROM users WHERE id=user_id -> if no data: raise 404 -> user = data[0] -> Query 2: SELECT * FROM api_keys_new WHERE user_id=user_id -> api_keys = data or [] -> try: Query 3: SELECT * FROM usage_records WHERE user_id=user_id ORDER BY created_at DESC LIMIT 10 -> except: recent_usage = [] -> try: Query 4: SELECT * FROM activity_log WHERE user_id=user_id ORDER BY created_at DESC LIMIT 10 -> except: recent_activity = [] -> return {status, user, api_keys, recent_usage, recent_activity, timestamp} -> HTTPException re-raised -> except Exception: log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| get_user_info_by_id() handler | src/routes/admin.py:1509 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_supabase_client() | src/config/supabase_config.py | Imported inline |
Supabase Query 1 - User:
users.select("*").eq("id", user_id).execute()- Returns all users columns
- 404 if no rows returned
Supabase Query 2 - API Keys:
api_keys_new.select("*").eq("user_id", user_id).execute()- Returns all api_keys_new columns for all keys owned by this user
- Returns [] if no keys found (api_keys_result.data is None or empty)
- Note: Returns raw api_key strings in plaintext
Supabase Query 3 - Usage Records (legacy):
usage_records.select("*").eq("user_id", user_id).order("created_at", desc=True).limit(10).execute()- Returns up to 10 most recent usage records
- Silent failure: bare
except Exception: recent_usage = []— no logging
Supabase Query 4 - Activity Log:
activity_log.select("*").eq("user_id", user_id).order("created_at", desc=True).limit(10).execute()- Returns up to 10 most recent activity entries
- Silent failure: bare
except Exception: recent_activity = []— no logging
Note on silent failures: Both usage_records and activity_log queries use bare except Exception: result = [] without logging — query failures go completely undetected in the response.
- DB READ x4: users (1 row), api_keys_new (all user keys), usage_records (LIMIT 10), activity_log (LIMIT 10) — sequential queries
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no cache invalidations, no notifications
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/{user_id}"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Security note: Response includes all API key strings in plaintext and all user data including sensitive fields — treat as highly sensitive
Issue: #1613
The GET /admin/users/by-api-key endpoint performs an exact-match lookup to find which user owns a specific API key. It uses a PostgreSQL RPC function (search_user_by_api_key) for fast indexed lookup and returns a slim user object with key identity and status fields. This is designed for support workflows where an admin needs to find a user from a known API key. Requires the full exact API key — partial matching is not supported.
Authentication: Admin role required. Uses require_admin dependency.
Middleware Pipeline: SecurityMiddleware (50% Sentry sampling) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware
Query Parameters:
- api_key: str (required, ...) - The complete API key to look up (exact match, no partial matching)
Response Schema:
- status: "success"
- user: object containing:
- id: int|None - user_data.get("user_id") — note: RPC returns "user_id" which is mapped to "id"
- username: str|None
- email: str|None
- credits: float - default 0
- is_active: bool - default True
- role: str - default "user"
- subscription_status: str - default "trial"
- created_at: str|None
- timestamp: ISO datetime string
Error Codes:
- 400: Pydantic validation (api_key is required, cannot be empty)
- 401: Invalid/missing auth
- 403: Not admin role
- 404: RPC returns empty result (no user with this exact API key)
- 500: Exception from RPC call or data processing
Request -> require_admin dep -> get_supabase_client() -> log lookup (first 20 chars of api_key) -> client.rpc("search_user_by_api_key", {"search_api_key": api_key}).execute() -> if no data or empty: raise 404 "No user found with API key: {api_key[:20]}..." -> user_data = result.data[0] -> build user dict mapping RPC field names to response field names -> return {status, user, timestamp} -> HTTPException re-raised -> Exception -> log error -> raise HTTPException 500
| Component | Location | Details |
|---|---|---|
| get_user_by_api_key() handler | src/routes/admin.py:1294 | Route handler |
| require_admin dependency | src/security/deps.py:220 | Admin role check |
| get_supabase_client() | src/config/supabase_config.py | Imported inline |
| search_user_by_api_key RPC | Supabase PostgreSQL function | Exact match lookup: presumably does SELECT u.id as user_id, u.username, u.email, u.credits, u.is_active, u.role, u.subscription_status, u.created_at FROM api_keys_new ak JOIN users u ON u.id = ak.user_id WHERE ak.api_key = search_api_key LIMIT 1 |
RPC Call Details:
- Function name: search_user_by_api_key
- Parameters: {"search_api_key": api_key} — full exact string match
- Expected response: list with 0 or 1 rows; each row contains: user_id, username, email, credits, is_active, role, subscription_status, created_at
- Empty list or None data -> 404
Field Mapping (RPC result -> response):
- user_data.get("user_id") -> user["id"]
- user_data.get("username") -> user["username"]
- user_data.get("email") -> user["email"]
- user_data.get("credits", 0) -> user["credits"]
- user_data.get("is_active", True) -> user["is_active"]
- user_data.get("role", "user") -> user["role"]
- user_data.get("subscription_status", "trial") -> user["subscription_status"]
- user_data.get("created_at") -> user["created_at"]
404 message: f"No user found with API key: {api_key[:20]}..." — truncates to first 20 chars to avoid logging full key
Logging: logger.info(f"Looking up user by API key: {api_key[:20]}...") — first 20 chars logged before lookup
Performance: Documented as ~10-20ms (indexed lookup via RPC). The RPC function uses an index on api_keys_new.api_key for O(log n) lookup.
- DB READ: 1 RPC call (search_user_by_api_key PostgreSQL function) — uses indexed exact match on api_keys_new.api_key
- Cache READ: require_admin chain reads _user_cache for admin user
- Audit log WRITE: audit_logger.log_api_key_usage() during auth chain
- No DB writes, no Redis operations, no in-process cache changes, no notifications
- Logging: First 20 characters of provided api_key are logged at INFO level
- ObservabilityMiddleware: Records http_requests_total{method="GET", endpoint="/admin/users/by-api-key"} post-response
- Sentry: 50% sampling rate for admin endpoints
- Route ordering note: This route (/admin/users/by-api-key) must be registered BEFORE /admin/users/{user_id} in FastAPI's router to avoid "by-api-key" being interpreted as a user_id integer — however since "by-api-key" is not a valid integer, the path parameter route would return a validation error rather than matching anyway
Issue: #1614
This admin-only endpoint retrieves complete details for a specific API key identified by its numeric database ID. It performs a single joined Supabase query against api_keys_new and users, returning full key metadata (including the plaintext key string) and the owning user profile. The endpoint is intended for admin tooling, support workflows, and audit inspection of specific API keys.
Authentication chain (Depends(require_admin) in src/security/deps.py):
-
require_admincallsget_current_user(user)(deps.py:220) -
get_current_usercallsget_api_key(credentials)(deps.py:74) -
get_api_key: extracts Bearer token; in development (Config.IS_DEVELOPMENT) returns dev key bypassing validation; otherwise callsvalidate_api_key_security(api_key, client_ip, referer)fromsrc/security/security.py, thenaudit_logger.log_api_key_usage(user_id, key_id, endpoint, ip, user_agent) -
validate_api_key_security: checks key active status, expiration date, request limits, IP allowlist membership, domain referrer restrictions -
get_current_user: callsvalidate_trial_expiration(user)fromsrc/utils/trial_utils.py— raises HTTP 402 if trial expired -
require_admin: checksuser.get("is_admin", False) or user.get("role") == "admin"— raises HTTP 403, callsaudit_logger.log_security_violation(UNAUTHORIZED_ADMIN_ACCESS, user_id)if not admin
Path parameter: api_key_id: int — numeric primary key of the api_keys_new row
Request body: None
Response (200 OK):
{
"status": "success",
"api_key": {
"id": int,
"api_key": str, // Full plaintext API key string
"key_name": str | null,
"environment_tag": str, // e.g. "live", "test", "staging"
"is_active": bool,
"is_primary": bool,
"scope_permissions": dict,
"max_requests": int | null,
"requests_used": int,
"ip_allowlist": list,
"domain_referrers": list,
"created_at": str,
"updated_at": str,
"last_used_at": str | null,
"expiration_date": str | null,
"user": {
"id": int,
"email": str,
"username": str,
"credits": float,
"is_active": bool,
"role": str,
"subscription_status": str,
"created_at": str
}
},
"timestamp": str // UTC ISO 8601
}
Error codes:
| Code | Condition |
|---|---|
| 401 | Missing/invalid Authorization header; key inactive or expired |
| 402 | Caller's own trial period has expired |
| 403 | Caller is not an admin (role != "admin" and is_admin != True) |
| 404 | No row in api_keys_new with the given api_key_id
|
| 500 | Supabase query exception or unexpected error |
Middleware effects:
-
SecurityMiddleware(src/middleware/security_middleware.py): IP-based rate limiting, behavioral analysis, velocity mode checks. Authenticated admin users are exempt from IP-level limits. - OpenTelemetry tracing middleware: span created for request lifecycle
- Sentry middleware: uncaught exceptions automatically captured
- GZip middleware: response compressed when client supports it
flowchart TD
A([GET /admin/api-keys/id]) --> B[SecurityMiddleware\nIP rate limit check]
B --> C{Credentials present?}
C -->|No| D[HTTP 401]
C -->|Yes| E[validate_api_key_security\nactive/expiry/IP/domain]
E -->|Invalid| F[HTTP 401 or 403]
E -->|Valid| G[validate_trial_expiration]
G -->|Expired| H[HTTP 402]
G -->|OK| I{user.role == admin?}
I -->|No| J[HTTP 403\n+ audit log UNAUTHORIZED]
I -->|Yes| K[get_supabase_client]
K --> L[SELECT api_keys_new.*\n+ users!inner JOIN\nWHERE id=api_key_id]
L -->|No rows| M[HTTP 404]
L -->|Exception| N[HTTP 500\nlogger.error]
L -->|Row found| O[Pop nested users dict\nBuild response_data]
O --> P[Return 200 JSON\n{status, api_key, timestamp}]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Calls get_current_user chain; checks admin role |
get_api_key |
src/security/deps.py:74 |
Auth | Extracts Bearer token; dev bypass if IS_DEVELOPMENT |
validate_api_key_security |
src/security/security.py |
Auth | Checks: key exists, is_active=True, not expired, requests_used < max_requests, client IP in ip_allowlist (if set), referer domain in domain_referrers (if set) |
validate_trial_expiration |
src/utils/trial_utils.py |
Auth | Raises HTTP 402 if user.trial_expires_at < now and user is trial user |
audit_logger.log_api_key_usage |
src/security/security.py |
Side effect | Writes to audit log: user_id, key_id, endpoint path, IP, user-agent |
audit_logger.log_security_violation |
src/security/security.py |
Side effect | Writes UNAUTHORIZED_ADMIN_ACCESS violation entry |
get_supabase_client |
src/config/supabase_config.py |
DB connection | Returns singleton PostgREST client |
api_keys_new table |
Supabase | SELECT | client.table("api_keys_new").select("*, users!inner(id, email, username, credits, is_active, role, subscription_status, created_at)").eq("id", api_key_id).execute() |
users table |
Supabase | JOIN | Inner join via foreign key in above query; columns: id, email, username, credits, is_active, role, subscription_status, created_at |
- No database writes — this is a pure read endpoint.
-
Audit log:
audit_logger.log_api_key_usagefires on every authenticated request, recording user_id, key_id, endpoint, client IP, and user-agent. -
Security violation audit log (conditional):
audit_logger.log_security_violation(UNAUTHORIZED_ADMIN_ACCESS, ...)fires when a non-admin user attempts this endpoint. - No Redis operations on this endpoint path.
- No Prometheus metrics emitted directly by this handler (middleware-level request count and latency histograms still apply).
- Sensitive data disclosure: The full plaintext API key string is included in the response. This endpoint requires admin authentication and must only be served over HTTPS in production.
Issue: #1615
This admin-only endpoint retrieves all credit transactions across all users with comprehensive filtering, sorting, and pagination capabilities. It delegates to get_all_transactions() in src/db/credit_transactions.py, which queries the credit_transactions table directly with optional filters. Unlike the per-user endpoint, this admin view can span all accounts and optionally include a per-user transaction summary.
Authentication chain: Same Depends(require_admin) chain as documented for Issue #1614.
Query Parameters:
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
limit |
int | 50 | 1–1000 | Max transactions to return |
offset |
int | 0 | >= 0 | Skip N transactions (pagination) |
user_id |
int | null | optional | Filter to specific user |
transaction_type |
str | null | optional | One of: trial, purchase, api_usage, admin_credit, admin_debit, refund, bonus, transfer |
from_date |
str | null | optional | YYYY-MM-DD or ISO format start date |
to_date |
str | null | optional | YYYY-MM-DD or ISO format end date |
min_amount |
float | null | optional | Minimum absolute amount filter |
max_amount |
float | null | optional | Maximum absolute amount filter |
direction |
str | null | "credit" or "charge" | credit = positive amounts, charge = negative amounts |
payment_id |
int | null | optional | Filter by payment record ID |
sort_by |
str | "created_at" | "created_at", "amount", "transaction_type" | Sort field |
sort_order |
str | "desc" | "asc" or "desc" | Sort direction |
include_summary |
bool | false | optional | Include per-user summary (only when user_id provided) |
Handler-level validation (raises HTTP 400):
-
directionnot in ("credit", "charge") -
sort_bynot in ("created_at", "amount", "transaction_type") -
sort_ordernot in ("asc", "desc")
Response (200 OK):
{
"transactions": [
{
"id": int,
"user_id": int,
"amount": float,
"transaction_type": str,
"description": str,
"balance_before": float,
"balance_after": float,
"created_at": str,
"payment_id": int | null,
"metadata": dict,
"created_by": str | null
}
],
"pagination": {
"total": int, // count in current page (not total in DB)
"limit": int,
"offset": int,
"has_more": bool // true if len(results) == limit
},
"filters_applied": { ... all filter params ... },
"summary": { ... } // only if include_summary=true AND user_id provided
}
Error codes:
| Code | Condition |
|---|---|
| 400 | Invalid direction, sort_by, or sort_order value |
| 401 | Invalid/missing credentials |
| 402 | Caller trial expired |
| 403 | Not admin |
| 500 | DB query failure |
flowchart TD
A([GET /admin/credit-transactions]) --> B[require_admin auth chain]
B -->|fail| C[401/402/403]
B -->|OK| D{Validate direction\nsort_by\nsort_order}
D -->|invalid| E[HTTP 400]
D -->|valid| F[get_all_transactions\nsrc/db/credit_transactions.py]
F --> G[get_supabase_client]
G --> H[SELECT * FROM credit_transactions]
H --> I{user_id filter?}
I -->|yes| J[.eq user_id]
I -->|no| K[all users]
J --> L{transaction_type?}
K --> L
L --> M{date range?}
M --> N{direction filter?}
N --> O{payment_id?}
O --> P[Apply sort order]
P --> Q{min/max amount?}
Q -->|yes| R[Fetch all, filter client-side\nthen paginate in Python]
Q -->|no| S[DB-side range pagination]
R --> T[Format transactions]
S --> T
T --> U{include_summary\nAND user_id?}
U -->|yes| V[get_transaction_summary\nuser_id, dates]
U -->|no| W[Build response]
V --> W
W --> X[Return 200]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain as documented |
get_all_transactions |
src/db/credit_transactions.py:290 |
DB read | Queries credit_transactions table with all filters applied |
get_supabase_client |
src/config/supabase_config.py |
DB conn | PostgREST client |
credit_transactions table |
Supabase | SELECT * | Filters: eq(user_id), eq(transaction_type), gte/lte(created_at), gt/lt(amount for direction), eq(payment_id). Sort: order(sort_by, desc=bool). Pagination: range(offset, offset+limit-1) if no min/max amount; else client-side slicing |
get_transaction_summary |
src/db/credit_transactions.py:492 |
DB read | Called only when include_summary=True and user_id provided. SELECT * FROM credit_transactions WHERE user_id=X and date filters. Computes: total_transactions, total_credits_added, total_credits_used, net_change, by_type breakdown, daily_breakdown, largest_credit, largest_charge, average_transaction, transaction_count_by_direction |
TransactionType class |
src/db/credit_transactions.py:21 |
Constants | Defines: TRIAL, PURCHASE, ADMIN_CREDIT, ADMIN_DEBIT, API_USAGE, REFUND, BONUS, TRANSFER, SUBSCRIPTION_RENEWAL/CANCELLATION/UPGRADE/DOWNGRADE |
Key implementation detail — min/max amount filtering:
When min_amount or max_amount is provided, get_all_transactions cannot use DB-side pagination efficiently. It fetches ALL matching rows first (no LIMIT), then filters by abs(float(amount)) in Python, then applies slice [offset:offset+limit]. For large datasets this can be expensive.
- No database writes.
-
Audit log:
audit_logger.log_api_key_usagefires on every authenticated call. -
Performance warning: When
min_amount/max_amountfilters are used, all matching rows are loaded into Python memory before pagination — can be expensive on large datasets. -
Summary performance: When
include_summary=Truewith nouser_id, summary is silently skipped with a warning log (logger.warning) to prevent expensive full-table aggregation. - No Redis operations.
- No direct Prometheus metrics.
Issue: #1616
This admin endpoint retrieves paginated chat completion request records with flexible multi-field filtering. It queries the chat_completion_requests table with inner joins to models and providers, executing two queries per request (data fetch + count). It provides a full view of every recorded inference call for analytics and monitoring purposes.
Authentication chain: Depends(require_admin) — same full chain as documented in Issue #1614.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model_id |
int | null | Filter by model ID (exact match on chat_completion_requests.model_id) |
provider_id |
int | null | Filter by provider ID (via models.provider_id join) |
model_name |
str | null | Filter by model name (case-insensitive contains match) |
start_date |
str | null | ISO format start date filter (gte on created_at) |
end_date |
str | null | ISO format end date filter (lte on created_at) |
limit |
int | 100 | 1–100000 max records |
offset |
int | 0 | Pagination offset |
Response (200 OK):
{
"success": true,
"data": [
{
// All columns from chat_completion_requests
// Plus nested: models.{id, model_id, model_name, provider_model_id, provider_id,
// providers.{id, name, slug}}
}
],
"metadata": {
"total_count": int,
"limit": int,
"offset": int,
"returned_count": int,
"filters": {
"model_id": int|null,
"provider_id": int|null,
"model_name": str|null,
"start_date": str|null,
"end_date": str|null
},
"timestamp": str
}
}
Error codes:
| Code | Condition |
|---|---|
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 500 | Supabase query failure |
Middleware: Security (IP; admins exempt), OpenTelemetry, Sentry, GZip.
flowchart TD
A([GET /admin/monitoring/chat-requests]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[get_supabase_client]
D --> E[Build data query\nchat_completion_requests SELECT *\n+ models!inner + providers!inner]
E --> F{model_id?}
F -->|yes| G[.eq model_id]
F -->|no| H
G --> H{provider_id?}
H -->|yes| I[.eq models.provider_id]
H -->|no| J
I --> J{model_name?}
J -->|yes| K[.ilike models.model_name]
J -->|no| L
K --> L{start_date?}
L -->|yes| M[.gte created_at]
L -->|no| N
M --> N{end_date?}
N -->|yes| O[.lte created_at]
N -->|no| P
O --> P[.order created_at desc\n.range offset to offset+limit-1]
P --> Q[Execute data query]
Q --> R[Build count query\nsame filters + count=exact head=True]
R --> S[Execute count query]
S --> T[total_count = count_result.count\nor len data]
T --> U[Return 200]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
get_supabase_client |
src/config/supabase_config.py |
DB | PostgREST client |
chat_completion_requests table |
Supabase | SELECT |
SELECT *, models!inner(id, model_id, model_name, provider_model_id, provider_id, providers!inner(id, name, slug)) with optional eq(model_id), eq(models.provider_id), ilike(models.model_name, %X%), gte(created_at), lte(created_at); ordered by created_at DESC; paginated via range(offset, offset+limit-1) |
chat_completion_requests count |
Supabase | COUNT | Same query structure with count="exact", head=True
|
models table |
Supabase | JOIN | Inner join: id, model_id, model_name, provider_model_id, provider_id |
providers table |
Supabase | JOIN | Inner join via models: id, name, slug |
- No database writes.
- Two Supabase queries per request: one data fetch, one count query.
-
Performance risk:
limitcan be set up to 100,000 — very large result sets can cause memory and timeout issues. No Redis caching on this endpoint. - No Redis operations.
- No direct Prometheus metrics.
-
Audit log: auth chain fires
audit_logger.log_api_key_usageon every call.
Issue: #1617
This admin endpoint returns aggregate summary statistics for chat completion requests, optionally filtered by model, provider, model name, and date range. Results are cached in Redis with a 60-second TTL using an MD5 hash of the filter parameters as the cache key. A cache miss triggers get_chat_completion_summary_by_filters() from src/db/chat_completion_requests.py. This endpoint is designed specifically for analytics dashboards and avoids fetching raw request records.
Authentication chain: Depends(require_admin) — same chain as documented in Issue #1614.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model_id |
int | null | Filter by model ID |
provider_id |
int | null | Filter by provider ID |
model_name |
str | null | Partial match on model name |
start_date |
str | null | ISO format (YYYY-MM-DDTHH:MM:SS) |
end_date |
str | null | ISO format |
Response (200 OK):
{
"summary": {
"total_requests": int,
"total_input_tokens": int,
"total_output_tokens": int,
"total_tokens": int,
"avg_input_tokens": float,
"avg_output_tokens": float,
"avg_processing_time_ms": float,
"completed_requests": int,
"failed_requests": int,
"success_rate": float,
"first_request_at": str,
"last_request_at": str,
"total_cost_usd": float
},
"filters": { model_id, provider_id, model_name, start_date, end_date },
"timestamp": str,
"cached": bool
}
Error codes: 401, 402, 403, 500.
flowchart TD
A([GET /admin/monitoring/chat-requests/summary]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[Build filter_str\nmodel_id:provider_id:model_name:start_date:end_date]
D --> E[MD5 hash filter_str\ncache_key = chat_summary:filters:HASH]
E --> F[get_redis_client]
F --> G{Redis available?}
G -->|yes| H[redis.get cache_key]
H -->|hit| I[Parse JSON\nset cached=True\nreturn 200]
H -->|miss or error| J[Cache MISS log]
G -->|no| J
J --> K[get_chat_completion_summary_by_filters\nsrc/db/chat_completion_requests.py]
K --> L[DB aggregation query]
L --> M[Build response dict\ncached=False]
M --> N{Redis available?}
N -->|yes| O[redis.setex cache_key\nTTL=60s\nJSON serialized]
N -->|no or error| P
O --> P[Return 200]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
hashlib.md5 |
stdlib | Computation |
filter_str = f"{model_id}:{provider_id}:{model_name}:{start_date}:{end_date}" then .hexdigest()
|
get_redis_client |
src/config/redis_config.py |
Redis conn | Returns Redis client or None if unavailable |
| Redis GET | Redis | Read | Key: chat_summary:filters:{md5_hash}; returns JSON bytes or None |
get_chat_completion_summary_by_filters |
src/db/chat_completion_requests.py |
DB aggregate | Executes aggregation query on chat_completion_requests with optional filters for model_id, provider_id (via models join), model_name (ilike), start_date (gte), end_date (lte) |
| Redis SETEX | Redis | Write | Key: chat_summary:filters:{md5_hash}; TTL: 60 seconds; value: JSON-serialized response (using json.dumps(response, default=str)) |
Redis key pattern: chat_summary:filters:{md5_hex_of_filter_string}
Redis TTL: 60 seconds
Redis data structure: String (JSON serialized response object)
- No database writes.
- Redis write (on cache miss): Stores full response JSON with 60-second TTL.
- Redis read (on every request): Attempts to retrieve cached response.
-
Redis failures are non-fatal: Both read and write errors are caught with
logger.warning, execution continues without caching. - Cache invalidation: No explicit invalidation — entries expire naturally after 60 seconds. Admin-triggered refreshes of providers/models do NOT invalidate these summary caches.
-
Audit log:
audit_logger.log_api_key_usagefires on every authenticated call. - No direct Prometheus metrics.
- Performance: Cache hit ~5–10ms. Cache miss with DB RPC ~30–50ms. DB fallback without RPC slower.
Issue: #1618
This admin endpoint returns data optimized for frontend chart rendering. It executes two Supabase queries: one fetching the last 10 full request records for display (with model and provider metadata), and one fetching ALL matching requests but only 4 lightweight fields (input_tokens, output_tokens, processing_time_ms, created_at). The lightweight fields are compressed into parallel arrays for efficient network transfer and direct use in charting libraries.
Authentication chain: Depends(require_admin) — full chain as documented in Issue #1614.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model_id |
int | null | Filter by model ID (exact match on model_id) |
provider_id |
int | null | Filter by provider ID (post-processed client-side on recent_requests; NOT applied to plot query) |
start_date |
str | null | ISO format (gte on created_at) |
end_date |
str | null | ISO format (lte on created_at) |
Important limitation: provider_id is applied via Python-side filtering ONLY to recent_requests. It is NOT applied to the plot_data query (all_requests fetch). This means the plot arrays include records from all providers even when provider_id is specified.
Response (200 OK):
{
"success": true,
"recent_requests": [
// Last 10 records with full detail:
// id, request_id, model_id, input_tokens, output_tokens,
// processing_time_ms, status, error_message, created_at,
// total_tokens (computed),
// models.{id, model_id, model_name, provider_model_id,
// providers.{id, name, slug}}
],
"plot_data": {
"tokens": [int, ...], // total_tokens per request (input+output)
"latency": [float, ...], // processing_time_ms per request
"timestamps": [str, ...] // created_at per request
},
"metadata": {
"recent_count": int,
"total_count": int,
"timestamp": str,
"compression": "arrays",
"format_version": "1.0"
}
}
Error codes: 401, 402, 403, 500.
flowchart TD
A([GET /admin/monitoring/chat-requests/plot-data]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[get_supabase_client]
D --> E[Query 1: recent_requests\nchat_completion_requests SELECT\nfull fields + models + providers JOIN\nwith model_id/start_date/end_date filters\n.order created_at desc .limit 10]
E --> F[Execute recent query]
F --> G{provider_id filter?}
G -->|yes| H[Filter recent_requests in Python\nby providers.id == provider_id]
G -->|no| I[Use all 10 records]
H --> I
I --> J[Add total_tokens to each record\ninput_tokens + output_tokens]
D --> K[Query 2: plot query\nchat_completion_requests SELECT\ninput_tokens output_tokens\nprocessing_time_ms created_at ONLY\nwith model_id/start_date/end_date filters\nNO provider filter\n.order created_at asc\nNO limit]
K --> L[Execute plot query\nFetches ALL matching records]
L --> M[Build parallel arrays\nfor each record:\ntokens_array.append input+output\nlatency_array.append processing_time_ms\ntimestamps_array.append created_at]
J --> N[Build response]
M --> N
N --> O[Return 200]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
get_supabase_client |
src/config/supabase_config.py |
DB | PostgREST client |
| Query 1 — recent_requests | chat_completion_requests |
SELECT |
SELECT id, request_id, model_id, input_tokens, output_tokens, processing_time_ms, status, error_message, created_at, models!inner(id, model_id, model_name, provider_model_id, providers!inner(id, name, slug)). Filters: eq(model_id) if set, gte(created_at) if start_date, lte(created_at) if end_date. Order: created_at DESC. Limit: 10 |
| Query 2 — plot data | chat_completion_requests |
SELECT |
SELECT input_tokens, output_tokens, processing_time_ms, created_at. Same filters EXCEPT provider_id NOT applied. No LIMIT — fetches entire matching dataset. Order: created_at ASC |
| Python post-processing | Handler | Filtering | provider_id applied to recent_requests only: [r for r in recent_requests if r.get("models", {}).get("providers", {}).get("id") == provider_id]
|
| Python post-processing | Handler | Computation |
total_tokens = input_tokens + output_tokens added to each recent_request |
| Python array building | Handler | Computation | Three parallel arrays built from all_requests in a single pass |
- No database writes.
- Memory risk: Plot query fetches ALL matching records with NO LIMIT. On large deployments this can return millions of rows into Python memory. No Redis caching.
- provider_id filter discrepancy: plot_data arrays include records from all providers; only recent_requests is provider-filtered. Frontend must be aware of this inconsistency.
- No Redis operations.
- No direct Prometheus metrics.
-
Audit log:
audit_logger.log_api_key_usagefires on every authenticated call.
Issue: #1619
This admin endpoint retrieves paginated chat completion requests for a specific API key identified by its full key string (exact match required). It first resolves the API key to its numeric ID via get_api_key_by_key(), then calls get_chat_completion_requests_by_api_key() from src/db/chat_completion_requests.py to fetch the paginated results. The include_summary parameter is deprecated; a separate /summary endpoint is preferred for statistics.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str | required | Full API key string (exact match, e.g. "gw_live_abc123...") |
limit |
int | 100 | 1–1000 max records |
offset |
int | 0 | Pagination offset |
include_summary |
bool | false | DEPRECATED — include summary stats in response |
Response (200 OK):
{
"requests": [
// Chat completion request records from get_chat_completion_requests_by_api_key()
],
"total_count": int,
"api_key_info": {
"id": int,
"key_name": str | null,
"user_id": int,
"environment_tag": str,
"is_active": bool,
"created_at": str
},
"limit": int,
"offset": int,
"pagination": {
"limit": int,
"offset": int,
"has_more": bool,
"current_page": int,
"total_pages": int,
"next_offset": int | null,
"prev_offset": int | null
},
"timestamp": str,
"summary": { ... } // only if include_summary=true (deprecated)
}
Error codes:
| Code | Condition |
|---|---|
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 404 | API key string not found in api_keys_new |
| 500 | DB error or missing ID field |
flowchart TD
A([GET /admin/monitoring/chat-requests/by-api-key]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[get_api_key_by_key api_key\nsrc/db/api_keys.py]
D -->|not found| E[HTTP 404]
D -->|found| F[Extract api_key_id]
F -->|missing id| G[HTTP 500]
F -->|has id| H[get_chat_completion_requests_by_api_key\napi_key_id limit offset\nsrc/db/chat_completion_requests.py]
H --> I[Extract requests, total_count, summary]
I --> J[Compute pagination metadata\nhas_more current_page total_pages]
J --> K{include_summary?}
K -->|yes + deprecated warning| L[Add summary to response\nlogger.warning]
K -->|no| M[Build response without summary]
L --> N[Return 200]
M --> N
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
get_api_key_by_key |
src/db/api_keys.py |
DB read | Looks up api_keys_new by exact key string match; returns full key record or None |
get_chat_completion_requests_by_api_key |
src/db/chat_completion_requests.py |
DB read | Queries chat_completion_requests WHERE api_key_id=X with pagination (limit, offset); returns dict with keys: requests, total_count, summary |
api_keys_new table |
Supabase | SELECT | Via get_api_key_by_key: exact match on api_key column |
chat_completion_requests table |
Supabase | SELECT | Filtered by api_key_id; paginated |
Deprecation note: When include_summary=True, a logger.warning is emitted: "include_summary parameter is deprecated for api_key_id=X. Use /admin/monitoring/chat-requests/by-api-key/summary endpoint instead..."
Pagination computation:
has_more = (offset + limit) < total_countcurrent_page = (offset // limit) + 1total_pages = (total_count + limit - 1) // limit if total_count > 0 else 0next_offset = offset + limit if has_more else Noneprev_offset = max(0, offset - limit) if offset > 0 else None
- No database writes.
-
Deprecation warning logged: When
include_summary=True, a warning is emitted to server logs. - No Redis operations.
- No direct Prometheus metrics.
-
Audit log:
audit_logger.log_api_key_usagefires on every authenticated call.
Issue: #1620
This endpoint returns a list of all AI model providers that have at least one associated chat completion request recorded in the system. For each provider, it reports the count of distinct models used and total request volume. It is used by admin dashboards to populate provider selection dropdowns and build provider-level analytics views. The endpoint attempts to use an optimized PostgreSQL RPC function (get_provider_request_stats) and falls back to a manual join-and-aggregate approach if the RPC is unavailable.
Authentication & Authorization:
- Requires a valid Gatewayz API key with
role = 'admin'. - Auth chain:
get_api_key→get_current_user→require_admin.
Request Schema: No query parameters.
Response Schema:
{
"success": true,
"data": [
{
"provider_id": 1,
"name": "OpenAI",
"slug": "openai",
"models_with_requests": 5,
"total_requests": 12500
}
],
"metadata": {
"total_providers": 8,
"timestamp": "2026-01-01T00:00:00Z"
}
}Results are sorted by total_requests descending.
Error Codes:
| Code | Condition |
|---|---|
| 401 | Invalid or missing API key |
| 403 | Not an admin |
| 500 | Database query failure |
sequenceDiagram
participant C as Client
participant R as Route Handler<br/>get_providers_with_requests_admin()
participant Auth as require_admin
participant SB as Supabase
C->>R: GET /admin/monitoring/chat-requests/providers
R->>Auth: Depends(require_admin)
Auth-->>R: admin_user
R->>SB: RPC: get_provider_request_stats()
alt RPC available and returns data
SB-->>R: Aggregated provider stats
R-->>C: 200 { success, data (from RPC), metadata }
end
note over R,SB: Fallback path (RPC not available)
R->>SB: SELECT model_id, models!inner(<br/>providers!inner(id, name, slug))<br/>FROM chat_completion_requests
SB-->>R: Raw join results
R->>R: Group by provider_id,<br/>accumulate unique model_ids
loop For each provider
R->>SB: COUNT from chat_completion_requests<br/>WHERE model_id IN [provider_model_ids]
SB-->>R: total_requests count
end
R->>R: Sort by total_requests DESC
R-->>C: 200 { success, data, metadata }
| Category | Name | Location | Purpose |
|---|---|---|---|
| Route file | admin.py |
src/routes/admin.py |
Handler |
| Auth | require_admin |
src/security/deps.py |
Admin enforcement |
| DB client | get_supabase_client |
src/config/supabase_config.py |
Supabase client |
| DB RPC | get_provider_request_stats |
Supabase (PostgreSQL function) | Optimized aggregate (primary path) |
| DB table | chat_completion_requests |
Supabase | Request records (fallback path) |
| DB table | models |
Supabase | Model→Provider mapping (fallback) |
| DB table | providers |
Supabase | Provider names/slugs (fallback) |
| Framework |
FastAPI, Depends
|
fastapi |
HTTP layer |
| Logging | logging |
stdlib | Debug logging for RPC fallback |
- Read-only. No writes.
- No caching. Results are always fetched live.
- Audit log: Written on successful auth.
-
Fallback path: If
get_provider_request_statsRPC is unavailable, the endpoint executes multiple COUNT queries (one per provider), which can be slow with many providers. RPC failure is logged at DEBUG level only. - No notifications or external calls.
Issue: #1621
This is a lightweight endpoint that returns request counts grouped by model, sorted by count descending. It is designed as a simpler, faster alternative to the /models endpoint when the caller only needs usage volume per model (not full token statistics). It is used by admin dashboards to build "most used models" leaderboards and quick-glance usage metrics.
Authentication & Authorization:
- Requires a valid Gatewayz API key with
role = 'admin'. - Auth chain:
get_api_key→get_current_user→require_admin.
Request Schema: No query parameters.
Response Schema:
{
"success": true,
"data": [
{
"model_id": 4,
"model_name": "GPT-4o",
"model_identifier": "openai/gpt-4o",
"provider_name": "OpenAI",
"provider_slug": "openai",
"request_count": 5250
}
],
"metadata": {
"total_models": 12,
"total_requests": 48000,
"timestamp": "2026-01-01T00:00:00Z"
}
}Error Codes:
| Code | Condition |
|---|---|
| 401 | Invalid or missing API key |
| 403 | Not an admin |
| 500 | Database query failure |
sequenceDiagram
participant C as Client
participant R as Route Handler<br/>get_request_counts_by_model_admin()
participant Auth as require_admin
participant SB as Supabase
C->>R: GET /admin/monitoring/chat-requests/counts
R->>Auth: Depends(require_admin)
Auth-->>R: admin_user
R->>SB: SELECT model_id,<br/>models!inner(id, model_name, provider_model_id,<br/>providers!inner(name, slug))<br/>FROM chat_completion_requests
SB-->>R: All rows (model_id + join data)
R->>R: Group by model_id in memory,<br/>count occurrences,<br/>accumulate model metadata
R->>R: Sort by request_count DESC
R-->>C: 200 { success, data, metadata }
| Category | Name | Location | Purpose |
|---|---|---|---|
| Route file | admin.py |
src/routes/admin.py |
Handler |
| Auth | require_admin |
src/security/deps.py |
Admin enforcement |
| DB client | get_supabase_client |
src/config/supabase_config.py |
Supabase client |
| DB table | chat_completion_requests |
Supabase | All request records |
| DB table | models |
Supabase | Model metadata (joined) |
| DB table | providers |
Supabase | Provider name/slug (joined) |
| Framework |
FastAPI, Depends
|
fastapi |
HTTP layer |
| Logging | logging |
stdlib | Error logging |
- Read-only. No writes.
- No caching. Results fetched live on every call.
-
In-memory aggregation: The endpoint fetches ALL rows from
chat_completion_requestsjoined with models and providers, then groups them in Python memory. For high-volume systems this could fetch very large result sets. For systems with millions of requests, prefer the RPC-based/modelsendpoint which does aggregation in the database. - Audit log: Written on auth.
- No notifications or external calls.
Issue: #1622
This endpoint returns all unique AI models that have at least one recorded chat completion request, along with their request statistics (token totals, averages, processing latency). Results can be filtered by provider ID. It is used by admin dashboards to build model-level analytics views and to enumerate which models have been actively used. The endpoint attempts to use optimized PostgreSQL RPC functions for both the model list and per-model stats, falling back to standard queries if the RPCs are unavailable.
Authentication & Authorization:
- Requires a valid Gatewayz API key with
role = 'admin'. - Auth chain:
get_api_key→get_current_user→require_admin.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
provider_id |
int | None | Filter results to models from a specific provider |
Response Schema:
{
"success": true,
"data": [
{
"model_id": 4,
"model_identifier": "openai/gpt-4o",
"model_name": "GPT-4o",
"provider_model_id": "gpt-4o",
"provider": { "id": 1, "name": "OpenAI", "slug": "openai" },
"stats": {
"total_requests": 5250,
"total_input_tokens": 2625000,
"total_output_tokens": 1575000,
"total_tokens": 4200000,
"avg_processing_time_ms": 1150.5
}
}
],
"metadata": {
"total_models": 12,
"timestamp": "2026-01-01T00:00:00Z",
"method": "rpc"
}
}Results are sorted by total_requests descending.
Error Codes:
| Code | Condition |
|---|---|
| 401 | Invalid or missing API key |
| 403 | Not an admin |
| 500 | Database query failure |
sequenceDiagram
participant C as Client
participant R as Route Handler<br/>get_models_with_requests_admin()
participant Auth as require_admin
participant SB as Supabase
C->>R: GET /admin/monitoring/chat-requests/models?[provider_id]
R->>Auth: Depends(require_admin)
Auth-->>R: admin_user
R->>SB: RPC: get_models_with_requests[_by_provider](provider_id?)
alt RPC returns data
SB-->>R: Aggregated model stats
R-->>C: 200 { success, data, metadata(method=rpc) }
end
note over R,SB: Fallback path
R->>SB: SELECT models + providers<br/>[WHERE provider_id = ?]
SB-->>R: Model rows with provider info
loop For each model
R->>SB: RPC: get_model_request_stats(model_id)
alt RPC works
SB-->>R: { total_requests, tokens, avg_latency }
else RPC fails
R->>SB: COUNT WHERE model_id = ?
SB-->>R: count only (no token stats)
end
R->>R: Skip model if total_requests == 0
end
R->>R: Sort by total_requests DESC
R-->>C: 200 { success, data, metadata }
| Category | Name | Location | Purpose |
|---|---|---|---|
| Route file | admin.py |
src/routes/admin.py |
Handler |
| Auth | require_admin |
src/security/deps.py |
Admin enforcement |
| DB client | get_supabase_client |
src/config/supabase_config.py |
Supabase client |
| DB RPC | get_models_with_requests |
Supabase | Optimized aggregate (no filter) |
| DB RPC | get_models_with_requests_by_provider |
Supabase | Optimized aggregate (provider filter) |
| DB RPC | get_model_request_stats |
Supabase | Per-model stats (fallback inner loop) |
| DB table | models |
Supabase | Model catalog |
| DB table | providers |
Supabase | Provider names/slugs |
| DB table | chat_completion_requests |
Supabase | COUNT fallback |
| Framework |
FastAPI, Query, Depends
|
fastapi |
HTTP layer |
| Logging | logging |
stdlib | Debug/error logging |
- Read-only. No writes.
- No caching. Always fetches live data.
- Audit log: Written on auth.
- Fallback behavior: If the primary RPCs fail, the endpoint enters a per-model loop issuing individual COUNT and stats queries. With many models this can result in dozens of database roundtrips. RPC failures are logged at DEBUG level.
- No notifications or external calls.
Issue: #1623
This admin endpoint reads from the model_usage_analytics database view (a pre-aggregated view that combines chat completion requests, models, providers, and pricing data) and returns paginated, searchable, sortable model usage statistics. All aggregation is done at the database view level, making individual queries fast. It supports page-based pagination (not offset-based like other endpoints), case-insensitive partial model name search, and sorting by multiple fields.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614. Note: admin_user is passed as a dependency but not used in the handler body beyond gate enforcement.
Query Parameters:
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
page |
int | 1 | >= 1 | Page number (1-based) |
limit |
int | 50 | 1–500 | Items per page |
model_name |
str | null | optional | Case-insensitive partial match on model_name |
sort_by |
str | "total_cost_usd" | whitelist | Sort field (invalid values silently default to "total_cost_usd") |
sort_order |
str | "desc" | "asc"/"desc" | Sort direction (invalid values silently default to "desc") |
Valid sort_by values: model_name, provider_name, successful_requests, total_cost_usd, avg_cost_per_request_usd, total_input_tokens, total_output_tokens, total_tokens, avg_processing_time_ms, first_request_at, last_request_at
Response (200 OK):
{
"success": true,
"data": [
// Rows from model_usage_analytics view
// Columns depend on view definition, typically:
// model_name, provider_name, successful_requests, total_cost_usd,
// avg_cost_per_request_usd, total_input_tokens, total_output_tokens,
// total_tokens, avg_processing_time_ms, first_request_at, last_request_at,
// pricing fields, model metadata
],
"pagination": {
"page": int,
"limit": int,
"total_items": int,
"total_pages": int,
"has_next": bool,
"has_prev": bool,
"offset": int
},
"filters": {
"model_name": str | null,
"sort_by": str,
"sort_order": str
},
"metadata": {
"timestamp": str,
"items_in_page": int
}
}
Error codes: 401, 402, 403, 500.
flowchart TD
A([GET /admin/model-usage-analytics]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[Compute offset = page-1 * limit]
D --> E[get_supabase_client]
E --> F[SELECT * FROM model_usage_analytics\ncount=exact]
F --> G{model_name filter?}
G -->|yes| H[.ilike model_name %value%]
G -->|no| I
H --> I{sort_by valid?}
I -->|invalid| J[Default to total_cost_usd]
I -->|valid| K
J --> K{sort_order valid?}
K -->|invalid| L[Default to desc]
K -->|valid| M
L --> M[.order sort_by desc=bool]
M --> N[.range offset to offset+limit-1]
N --> O[Execute query]
O --> P[total_count = result.count]
P --> Q[Compute total_pages, has_next, has_prev]
Q --> R[Return 200]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
get_supabase_client |
src/config/supabase_config.py |
DB | PostgREST client |
model_usage_analytics view |
Supabase | SELECT |
client.table("model_usage_analytics").select("*", count="exact") with optional .ilike("model_name", f"%{model_name}%"), .order(sort_by, desc=bool), .range(offset, offset+limit-1). Count is obtained from result.count (part of PostgREST count=exact response). |
Security note on sort_by: The handler validates sort_by against a whitelist of allowed field names before passing to .order(). Invalid values silently fall back to "total_cost_usd" rather than raising an error. This prevents SQL injection via the sort field.
Pagination formula:
offset = (page - 1) * limittotal_pages = (total_count + limit - 1) // limit if total_count > 0 else 0has_next = page < total_pageshas_prev = page > 1
- No database writes.
-
View-backed: Queries against a pre-aggregated
model_usage_analyticsview. View refresh/staleness depends on database view type (likely not materialized — queries are live). - No Redis operations.
- No direct Prometheus metrics.
-
Audit log:
audit_logger.log_api_key_usageon every authenticated call. -
Silent field validation: Invalid
sort_byandsort_ordervalues are silently corrected to defaults without returning an error — callers should not rely on validation errors for these params.
Issue: #1624
This admin endpoint sets or updates rate limit configuration for a specific user's API key. It writes to the rate_limit_configs table via set_user_rate_limits(), then immediately reads back the saved configuration via get_user_rate_limits() to verify the write and return the current state. If the API key is not found in the database, a 404 is returned.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.
Request Schema (SetRateLimitRequest from src/schemas/admin.py):
{
"api_key": str, // The API key string to configure limits for
"rate_limits": {
"requests_per_minute": int, // default 60
"requests_per_hour": int, // default 1000
"requests_per_day": int, // default 10000
"tokens_per_minute": int, // default 10000
"tokens_per_hour": int, // default 100000
"tokens_per_day": int // default 1000000
}
}
Pydantic model chain:
SetRateLimitRequest.api_key: strSetRateLimitRequest.rate_limits: RateLimitConfig-
RateLimitConfigfields (allintwith defaults as above)
Response (200 OK):
{
"status": "success",
"message": "Rate limits updated for user {api_key[:10]}...",
"rate_limits": {
"requests_per_minute": int, // derived: max_requests // 60
"requests_per_hour": int, // stored as max_requests
"requests_per_day": int, // derived: max_requests * 24
"tokens_per_minute": int, // derived: max_tokens // 60
"tokens_per_hour": int, // stored as max_tokens
"tokens_per_day": int // derived: max_tokens * 24
}
}
Error codes:
| Code | Condition |
|---|---|
| 400 |
set_user_rate_limits raises ValueError (key not found) |
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 404 |
get_user_rate_limits returns None after write |
| 500 | Unexpected exception |
flowchart TD
A([POST /admin/limit]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[set_user_rate_limits\nreq.api_key, req.rate_limits.model_dump]
D --> E[get_supabase_client]
E --> F[SELECT id FROM api_keys_new\nWHERE api_key=req.api_key]
F -->|not found| G[raise ValueError\n-> HTTP 400]
F -->|found| H[api_key_id extracted]
H --> I[Prepare rate_limit_config:\nmax_requests=requests_per_hour\nmax_tokens=tokens_per_hour\nburst_limit concurrency_limit\nwindow_size=3600]
I --> J[SELECT id FROM rate_limit_configs\nWHERE api_key_id=X]
J -->|exists| K[UPDATE rate_limit_configs\nWHERE api_key_id=X]
J -->|not exists| L[INSERT rate_limit_configs]
K --> M[get_user_rate_limits req.api_key]
L --> M
M --> N[SELECT api_keys_new WHERE api_key\nSELECT rate_limit_configs WHERE api_key_id]
N -->|no config| O[Return None -> HTTP 404]
N -->|config| P[Derive minute/hour/day values\nfrom max_requests, max_tokens]
P --> Q[Return 200 with rate_limits]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
set_user_rate_limits |
src/db/rate_limits.py:55 |
DB write | Async function wrapping sync ops in asyncio.to_thread. Looks up api_keys_new, then upserts rate_limit_configs. Stores: max_requests=requests_per_hour, max_tokens=tokens_per_hour, burst_limit (from req or default 100), concurrency_limit (default 50), window_size=3600 |
api_keys_new table |
Supabase | SELECT |
SELECT id WHERE api_key=api_key — to resolve API key string to numeric ID |
rate_limit_configs table |
Supabase | SELECT | Check for existing config: SELECT id WHERE api_key_id=X
|
rate_limit_configs table |
Supabase | INSERT or UPDATE | Upsert pattern: UPDATE if existing, INSERT if new |
get_user_rate_limits |
src/db/rate_limits.py:12 |
DB read | Synchronous. Reads api_keys_new -> rate_limit_configs to return current limits. Returns None if no config found. Derives: requests_per_minute = max_requests // 60, requests_per_day = max_requests * 24 |
SetRateLimitRequest |
src/schemas/admin.py:49 |
Schema | Pydantic model: api_key: str, rate_limits: RateLimitConfig |
RateLimitConfig |
src/schemas/admin.py:40 |
Schema | Pydantic: requests_per_minute=60, requests_per_hour=1000, requests_per_day=10000, tokens_per_minute=10000, tokens_per_hour=100000, tokens_per_day=1000000 |
Storage note: Only requests_per_hour (stored as max_requests) and tokens_per_hour (stored as max_tokens) are persisted. The per-minute and per-day values visible in the response are derived by dividing or multiplying — they are not stored independently.
-
Database write: Upserts a row in
rate_limit_configs(INSERT or UPDATE based on existence check). -
Rate limiting cache NOT explicitly cleared: The
get_rate_limit_manager()LRU cache (insrc/services/rate_limiting.py) is NOT cleared by this endpoint. New limits take effect only when the cached manager expires or is cleared via/admin/clear-rate-limit-cache. - No Redis operations.
- No direct Prometheus metrics.
-
Audit log:
audit_logger.log_api_key_usageon every authenticated call. -
rate_limits.model_dump(): Pydantic v2 call — converts RateLimitConfig to dict for the DB layer.
Issue: #1625
This admin endpoint forces a provider catalog cache refresh by invalidating the in-memory provider cache and then immediately fetching fresh data via get_cached_providers(). The invalidation is handled by invalidate_provider_catalog("providers") from the model_catalog_cache module, which uses a debouncing mechanism to prevent cache thrashing. The fresh data fetch is run in a thread pool via asyncio.to_thread.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.
Request: No body or query parameters.
Response (200 OK):
{
"status": "success",
"message": "Provider cache refreshed successfully",
"total_providers": int,
"timestamp": str
}
Error codes:
| Code | Condition |
|---|---|
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 500 | Cache invalidation or provider fetch failure |
flowchart TD
A([POST /admin/refresh-providers]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[invalidate_provider_catalog providers]
D --> E[InvalidationDebouncer.invalidate\nkey=providers]
E --> F{Pending timer exists?}
F -->|yes| G[Cancel existing timer]
G --> H[Schedule new timer\ndelay=1.0s]
F -->|no| H
H --> I[asyncio.to_thread\nget_cached_providers]
I --> J[get_supabase_client]
J --> K[SELECT * FROM providers\norder by name]
K --> L[Store in provider cache\nwith TTL metadata]
L --> M[Return providers list]
M --> N[total_providers = len providers]
N --> O[Return 200]
D -->|exception| P[HTTP 500\nlogger.error]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
invalidate_provider_catalog |
src/services/model_catalog_cache.py |
Cache invalidation | Calls InvalidationDebouncer.invalidate("providers"). Debounce delay: 1.0 second. Cancels any pending timer for "providers" key, schedules new 1s timer that clears the in-memory provider cache dict. |
get_cached_providers |
src/services/providers.py |
DB + cache | Fetches providers from Supabase providers table. Returns list of provider records. Cache TTL: PROVIDER_MODELS_CACHE_TTL = 1800 seconds (30 min). |
asyncio.to_thread |
stdlib | Threading | Wraps synchronous get_cached_providers call to avoid blocking the event loop |
InvalidationDebouncer |
src/services/model_catalog_cache.py |
Debouncing | Thread-safe timer-based debouncer. Uses threading.Timer. Prevents cache thrashing from rapid invalidation calls. |
providers table |
Supabase | SELECT | Queried by get_cached_providers to fetch all provider records |
Cache details:
- Cache type: In-memory Python dict (not Redis)
- Cache key: "providers"
- TTL: 1800 seconds (30 minutes)
- Invalidation: Debounced 1-second delay via
InvalidationDebouncer - Prometheus metric:
catalog_cache_operations_totalcounter (if initialized) with labels: operation, cache_layer, result
- No direct database writes.
- In-memory cache invalidation: Provider cache cleared and repopulated synchronously within this request.
- Debounce timer: A 1-second background timer is set for the "providers" cache key. Rapid successive calls will reset the timer.
-
Prometheus metric (conditional):
catalog_cache_operations_totalcounter incremented if available. - No Redis operations (provider cache is in-memory, not Redis-backed).
-
Audit log:
audit_logger.log_api_key_usageon every authenticated call.
Issue: #1626
This admin endpoint clears the in-memory HuggingFace model cache to force a refresh on the next catalog request. It calls invalidate_gateway_catalog("huggingface") from the model_catalog_cache module, which uses the same InvalidationDebouncer mechanism as the provider refresh endpoint. Unlike /refresh-providers, this endpoint does NOT immediately fetch fresh data — it only invalidates the cache. The next incoming request for HuggingFace models will trigger the actual fetch.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.
Request: No body or query parameters.
Response (200 OK):
{
"message": "Hugging Face cache cleared successfully",
"timestamp": str
}
Error codes:
| Code | Condition |
|---|---|
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 500 | Cache invalidation failure |
flowchart TD
A([POST /admin/refresh-huggingface-cache]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[invalidate_gateway_catalog huggingface]
D --> E[InvalidationDebouncer.invalidate\nkey=huggingface]
E --> F{Pending timer for huggingface?}
F -->|yes| G[Cancel existing timer]
G --> H[Schedule new 1s timer\nclears huggingface cache entry]
F -->|no| H
H --> I{Exception?}
I -->|yes| J[HTTP 500\nlogger.error]
I -->|no| K[Return 200\n{message, timestamp}]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
invalidate_gateway_catalog |
src/services/model_catalog_cache.py |
Cache invalidation | Calls InvalidationDebouncer.invalidate("huggingface"). Uses 1-second debounce delay. Clears the "huggingface" entry from the in-memory gateway catalog cache dict. |
InvalidationDebouncer |
src/services/model_catalog_cache.py |
Debouncing | Thread-safe threading.Timer-based debouncer. Cancels and reschedules on rapid calls. |
Key differences from /admin/refresh-providers:
- This endpoint only INVALIDATES — it does NOT fetch fresh data immediately
- The next organic request for HuggingFace catalog will trigger the lazy-load fetch
- Response body uses
"message"key instead of"status"key (inconsistency in admin API)
Cache details:
- Cache type: In-memory Python dict (not Redis)
- Cache key: "huggingface" in gateway catalog cache
- TTL:
CATALOG_RESPONSE_CACHE_TTL= 300 seconds (5 min) orPROVIDER_MODELS_CACHE_TTL= 1800s depending on cache tier - Invalidation: Debounced 1-second delay
- No database writes.
- In-memory cache invalidation: "huggingface" entry removed from gateway catalog cache after 1-second debounce.
- Lazy refresh: Next HuggingFace catalog request after invalidation will incur full fetch latency (500ms–2s) as cache is cold.
- Debounce timer: 1-second background timer for "huggingface" key.
- No Redis operations.
- No direct Prometheus metrics (middleware metrics apply).
-
Audit log:
audit_logger.log_api_key_usageon every authenticated call.
Issue: #1627
This admin endpoint clears the in-memory rate limit configuration cache held by the RateLimitManager service, forcing the next rate-limited request to reload limits from the database. It directly accesses the RateLimitManager singleton via get_rate_limit_manager() (which is LRU-cached), clears its key_configs dict, and then calls cache_clear() on the LRU cache itself to fully reset the manager reference. This ensures both the per-key config cache and the manager singleton are reset.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.
Request: No body or query parameters.
Response (200 OK):
{
"status": "success",
"message": "Rate limit cache cleared successfully. New requests will reload configuration.",
"timestamp": str
}
Error codes:
| Code | Condition |
|---|---|
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 500 | Exception during cache clearing |
flowchart TD
A([POST /admin/clear-rate-limit-cache]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[Import get_rate_limit_manager\nfrom src.services.rate_limiting]
D --> E[manager = get_rate_limit_manager]
E --> F{manager is not None?}
F -->|yes| G[manager.key_configs.clear\nclear all cached per-key configs]
G --> H[logger.info Cleared rate limit manager key_configs cache]
F -->|no| H
H --> I[get_rate_limit_manager.cache_clear\nclear LRU cache reference]
I --> J{Exception?}
J -->|yes| K[HTTP 500\nlogger.error\nf-string with str e]
J -->|no| L[Return 200]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
get_rate_limit_manager |
src/services/rate_limiting.py |
LRU-cached function | Decorated with @lru_cache. Returns the singleton RateLimitManager instance. |
manager.key_configs |
src/services/rate_limiting.py |
In-memory dict | Dict mapping API key strings to their RateLimitConfig dataclass instances. Cleared by .clear(). |
get_rate_limit_manager.cache_clear() |
src/services/rate_limiting.py |
LRU cache clear | Python's built-in functools.lru_cache cache_clear method. Removes the cached return value so next call to get_rate_limit_manager() creates a fresh instance. |
Rate limiting manager structure (from src/services/rate_limiting.py):
-
RateLimitManager: Manages per-key sliding window rate limits using Redis -
key_configs: dict[str, RateLimitConfig]: In-memory per-key config cache loaded from DB -
RateLimitConfigdataclass: requests_per_minute=250, requests_per_hour=1000, requests_per_day=10000, tokens_per_minute=10000, tokens_per_hour=100000, tokens_per_day=1000000, burst_limit=100, concurrency_limit=50, window_size_seconds=60
Two-level cache clearing:
-
manager.key_configs.clear()— removes all cached per-key rate limit configs (loaded from DB byget_rate_limit_config()insrc/db/rate_limits.py) -
get_rate_limit_manager.cache_clear()— removes the LRU-cached manager reference, causing nextget_rate_limit_manager()call to instantiate a freshRateLimitManager
- No database writes.
-
In-memory cache cleared:
RateLimitManager.key_configsdict emptied. -
LRU cache reset:
get_rate_limit_managerLRU cache cleared — next rate-limited request will instantiate a newRateLimitManagerand reload configs from DB. - Performance impact: First few requests after clearing will incur DB lookup latency for rate limit config (~5–20ms per key).
- No Redis operations (Redis stores rate limit counters, not configs — those remain intact).
- No direct Prometheus metrics.
-
Audit log:
audit_logger.log_api_key_usageon every authenticated call. - Rate limiting not disrupted: Existing Redis sliding window counters are unaffected. Only the in-memory config cache is cleared. Active connections continue to be tracked.
Issue: #1628
This admin endpoint deletes all user accounts whose email address matches a given domain suffix. It has a critical safety mechanism: dry_run=true (the default) performs a preview-only operation returning which users would be deleted without actually deleting them. Six major email providers (gmail.com, yahoo.com, outlook.com, hotmail.com, icloud.com, protonmail.com) are permanently protected from deletion. When dry_run=false, it deletes users one-by-one in a loop, continuing past individual failures.
Authentication chain: Depends(require_admin) — full chain documented in Issue #1614.
Path parameter: domain: str — email domain (e.g., "spam-domain.org")
Query parameter:
| Parameter | Type | Default | Description |
|---|---|---|---|
dry_run |
bool | true | If true: preview only, no deletions |
Handler-level validation (raises HTTP 400):
- Domain is normalized:
.lower().strip() - Domain in protected set: {gmail.com, yahoo.com, outlook.com, hotmail.com, icloud.com, protonmail.com} → HTTP 400
Supabase query for finding users:
SELECT id, email, username, created_at, credits FROM users WHERE email ILIKE '%@{domain}'
Response (200 OK — dry_run=true):
{
"status": "success",
"message": "DRY RUN: Would delete N users from domain: {domain}",
"dry_run": true,
"count": int,
"users": [{ "id": int, "email": str, "username": str, "created_at": str, "credits": float }],
"timestamp": str
}
Response (200 OK — dry_run=false):
{
"status": "success",
"message": "Deleted N users from domain: {domain}",
"dry_run": false,
"count": int, // successful deletions
"failed": [{ "id": int, "error": str }],
"users": [{ ... }], // all matching users (including failed deletions)
"timestamp": str
}
Response (200 OK — no users found):
{
"status": "success",
"message": "No users found with email domain: {domain}",
"dry_run": bool,
"count": 0,
"users": [],
"timestamp": str
}
Error codes:
| Code | Condition |
|---|---|
| 400 | Domain is in the protected domains set |
| 401 | Invalid/missing credentials |
| 402 | Trial expired |
| 403 | Not admin |
| 500 | Supabase query failure |
flowchart TD
A([DELETE /admin/users/by-domain/domain]) --> B[require_admin]
B -->|fail| C[401/402/403]
B -->|OK| D[Normalize: domain.lower.strip]
D --> E{domain in protected_domains?}
E -->|yes| F[HTTP 400 Cannot delete protected domain]
E -->|no| G[SELECT id email username created_at credits\nFROM users WHERE email ILIKE %@domain]
G --> H{users_to_delete empty?}
H -->|empty| I[Return 200 count=0 empty list]
H -->|found| J[Build user_summary list]
J --> K{dry_run == true?}
K -->|yes| L[logger.info DRY RUN log\nReturn 200 with user list\nno deletions]
K -->|no| M[For each user in users_to_delete]
M --> N[DELETE FROM users WHERE id=user.id]
N -->|success| O[deleted_count += 1\nlogger.info user deleted]
N -->|exception| P[logger.error\nfailed_deletions.append id+error]
O --> Q{more users?}
P --> Q
Q -->|yes| M
Q -->|done| R[Return 200 with count+failed+users]
| Dependency | File | Operation | Details |
|---|---|---|---|
require_admin |
src/security/deps.py:220 |
Auth | Full chain |
get_supabase_client |
src/config/supabase_config.py |
DB | PostgREST client |
users table SELECT |
Supabase | SELECT |
client.table("users").select("id, email, username, created_at, credits").ilike("email", f"%@{domain}") — case-insensitive suffix match |
users table DELETE |
Supabase | DELETE | Per-user: client.table("users").delete().eq("id", user["id"]).execute() — individual DELETE per user in a loop |
Protected domains (hardcoded set): gmail.com, yahoo.com, outlook.com, hotmail.com, icloud.com, protonmail.com
Cascade behavior: When a user row is deleted, any foreign-key-constrained child rows (api_keys_new, credit_transactions, chat_history, etc.) are deleted or nullified depending on Supabase/PostgreSQL cascade rules. The handler itself does not explicitly handle cascade.
When dry_run=false:
-
Database deletes: One DELETE per matching user in
userstable. Cascade rules apply to related tables. - Audit trail via logger.info: Each successful deletion is logged with user_id, email, domain, and admin_id.
-
No atomic transaction: Deletions are done in a loop. Partial failures leave some users deleted and others intact. The
failedlist captures which user IDs could not be deleted.
Always:
-
Audit log:
audit_logger.log_api_key_usageon every authenticated call. - No Redis operations (no cache invalidation of deleted users from any cache).
- No direct Prometheus metrics.
-
Potential data loss: This operation is irreversible when
dry_run=false. Thedry_run=truedefault is a critical safety feature.
Issue: #1634
Admin-only endpoint that analyzes the quality of API key tracking in chat_completion_requests. It queries how many requests have a non-null api_key_id vs. null, and produces recommendations based on thresholds.
Type: HTTP Bearer (Authorization: Bearer <token>)
Dependency: get_admin_key() in src/security/deps.py
- Reads
ADMIN_API_KEYfrom environment variable - Uses
secrets.compare_digest()for constant-time comparison (timing-attack-safe) - Input validation: key must be non-empty and at least 10 characters (
ensure_api_key_like) - On failure: logs
INVALID_ADMIN_KEY_ATTEMPTto audit logger - Returns
HTTP 401if missing, invalid, or key not configured
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
hours |
int |
24 |
ge=1, le=168 |
Time window in hours (1–168) |
Query 1 — Total requests in window:
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
Query 2 — Requests with api_key_id (non-null):
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NOT NULL
Query 3 — Requests without api_key_id (null):
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NULL
Query 4 — Null api_key_id but valid user_id (authenticated but untracked):
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NULL AND user_id IS NOT NULL
Query 5 — Both null (anonymous traffic):
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <start_time> AND created_at <= <end_time>
AND api_key_id IS NULL AND user_id IS NULL
All queries use count="exact" for PostgreSQL COUNT aggregation. Count is accessed via result.count.
Tracking rate calculation:
tracking_rate = round((requests_with_key / total_requests) * 100, 2) if total_requests > 0 else 0Alert status thresholds:
-
"ok"— tracking_rate >= 90% -
"warning"— 70% <= tracking_rate < 90% -
"critical"— tracking_rate < 70%
Recommendation triggers:
-
null_key_valid_user > 0→ API key lookup failure warning -
both_null > total_requests * 0.2→ High anonymous traffic warning (>20% threshold) -
tracking_rate < 90→ General tracking below threshold warning - All good → "No action needed"
{
"total_requests": 1500,
"requests_with_api_key": 1425,
"requests_without_api_key": 75,
"tracking_rate_percent": 95.0,
"breakdown": {
"null_key_with_valid_user": 20,
"both_null_likely_anonymous": 55,
"null_key_with_valid_user_percent": 1.33,
"both_null_percent": 3.67
},
"time_window": {
"hours": 24,
"start_time": "2026-03-03T12:00:00+00:00",
"end_time": "2026-03-04T12:00:00+00:00"
},
"alert_status": "ok",
"recommendations": ["API key tracking quality is good. No action needed."]
}| Scenario | Behavior |
|---|---|
| Any unhandled exception |
logger.error() with exc_info=True, returns dict with "error" key and "alert_status": "error"
|
| Admin key missing/invalid |
HTTP 401 raised before handler executes |
| No data in time window | Returns zeros, tracking_rate_percent: 0, alert_status: "ok"
|
Error response shape (does NOT raise HTTPException — returns 200 with error info):
{
"error": "...",
"total_requests": 0,
"requests_with_api_key": 0,
"requests_without_api_key": 0,
"tracking_rate_percent": 0,
"alert_status": "error",
"recommendations": ["Failed to retrieve tracking quality metrics. Check logs."]
}This endpoint does not use Redis caching or emit Prometheus metrics. It performs direct Supabase queries on every call.
router = APIRouter(prefix="/admin/monitoring", tags=["Admin", "Monitoring"])
# Full path: GET /admin/monitoring/api-key-tracking-qualityIssue: #1635
Admin-only endpoint that provides a daily time-series breakdown of API key tracking quality over a configurable number of days. Iterates day-by-day using a loop, performing 2 Supabase queries per day.
Type: HTTP Bearer (Authorization: Bearer <token>)
Dependency: get_admin_key() in src/security/deps.py
- Reads
ADMIN_API_KEYfrom environment variable - Uses
secrets.compare_digest()for constant-time comparison - Returns
HTTP 401if missing, invalid, or environment variable not set - Logs
INVALID_ADMIN_KEY_ATTEMPTsecurity violation to audit logger on failure
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
days |
int |
7 |
ge=1, le=30 |
Number of days to analyze (1–30) |
Per-day loop — for each of days iterations:
Query A — Total requests for day N:
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <day_start> AND created_at < <day_end>
Query B — Requests with api_key_id for day N:
SELECT *, COUNT(*) FROM chat_completion_requests
WHERE created_at >= <day_start> AND created_at < <day_end>
AND api_key_id IS NOT NULL
- Date ranges use
gte/lt(NOTlte) so days are non-overlapping - Both use
count="exact"mode -
Total Supabase queries = 2 × days (up to 60 queries for
days=30)
Per-day tracking rate:
tracking_rate = round((with_key / total) * 100, 2) if total > 0 else 0Summary calculation (post-loop aggregation):
total_all = sum(d["total_requests"] for d in trend_data)
with_key_all = sum(d["requests_with_api_key"] for d in trend_data)
avg_tracking_rate = round((with_key_all / total_all) * 100, 2) if total_all > 0 else 0{
"trend_data": [
{
"date": "2026-02-26",
"total_requests": 500,
"requests_with_api_key": 490,
"tracking_rate_percent": 98.0
},
{
"date": "2026-02-27",
"total_requests": 620,
"requests_with_api_key": 600,
"tracking_rate_percent": 96.77
}
],
"summary": {
"period_days": 7,
"total_requests": 3500,
"requests_with_api_key": 3400,
"average_tracking_rate_percent": 97.14,
"start_date": "2026-02-25",
"end_date": "2026-03-04"
}
}trend_data array is chronologically ordered, oldest day first (day_offset 0 = start_time, day_offset N-1 = most recent).
| Scenario | Behavior |
|---|---|
| Any unhandled exception |
logger.error(), returns 200 with error dict |
| Empty database | Returns trend_data with zeros per day, summary zeros |
| Admin key invalid |
HTTP 401 before handler executes |
Error response shape (200 status, not an exception):
{
"error": "...",
"trend_data": [],
"summary": {
"period_days": 7,
"total_requests": 0,
"requests_with_api_key": 0,
"average_tracking_rate_percent": 0
}
}For days=30, this endpoint fires 60 synchronous Supabase HTTP calls sequentially. There is no batching, parallelism, or caching. Large time windows on tables with high row counts may be slow.
This endpoint does not use Redis caching or emit Prometheus metrics. All computation is in-process after direct DB queries.
router = APIRouter(prefix="/admin/monitoring", tags=["Admin", "Monitoring"])
# Full path: GET /admin/monitoring/api-key-tracking-trendIssue: #1719
Handler: list_coupons_endpoint() in src/routes/coupons.py (line 205)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
| Param | Type | Default | Description |
|---|---|---|---|
scope |
str | None |
None |
Filter by coupon_scope ("user_specific" or "global") |
coupon_type |
str | None |
None |
Filter by coupon type |
is_active |
bool | None |
None |
Filter by active status |
limit |
int |
100 |
Max results |
offset |
int |
0 |
Pagination offset |
| Field | Type | Description |
|---|---|---|
coupons |
list[CouponResponse] |
List of coupon records |
total |
int |
Count of returned coupons (not total in DB) |
offset |
int |
Current offset |
limit |
int |
Applied limit |
| Field | Type | Default |
|---|---|---|
id |
int |
- |
code |
str |
- |
value_usd |
float |
- |
coupon_scope |
str |
- |
coupon_type |
str |
- |
max_uses |
int |
- |
times_used |
int |
- |
valid_from |
datetime |
- |
valid_until |
datetime |
- |
is_active |
bool |
- |
created_at |
datetime |
- |
assigned_to_user_id |
int | None |
None |
created_by |
int | None |
None |
created_by_type |
str |
- |
description |
str | None |
None |
list_coupons_endpoint(scope, coupon_type, is_active, limit, offset, user)
├── Depends(require_admin) # src/security/deps.py:220
│ └── Depends(get_current_user)
│ └── (auth chain: get_api_key → validate_api_key_security → get_user → validate_trial)
│ └── Check is_admin or role=="admin"
│ └── If not admin → 403 + audit log
├── list_coupons(scope, coupon_type, is_active, # src/db/coupons.py:135
│ limit, offset)
│ ├── get_supabase_client()
│ └── client.table("coupons").select("*")
│ + conditional .eq("coupon_scope", scope)
│ + conditional .eq("coupon_type", coupon_type)
│ + conditional .eq("is_active", is_active)
│ + .order("created_at", desc=True)
│ + .range(offset, offset + limit - 1)
│ + .execute()
└── Return ListCouponsResponse with [CouponResponse(**c)]
| Operation | Table | Columns | Filters | Order | Pagination |
|---|---|---|---|---|---|
| SELECT | coupons |
* |
Optional: coupon_scope, coupon_type, is_active
|
created_at DESC |
.range(offset, offset+limit-1) |
None directly.
None.
- Standard middleware pipeline
- Bearer token authentication + admin role verification
- Audit log on unauthorized admin access attempt
| Error Path | Status Code | Detail |
|---|---|---|
| Auth failures | 401/402/403/404/429 | Various auth errors |
| Non-admin user | 403 | "Administrator privileges required" + audit log |
| Supabase query error | 500 | "Internal server error" |
On Supabase error in list_coupons, returns [] (empty list), so the endpoint would return an empty list rather than 500.
flowchart TD
A[GET /admin/coupons] --> B[require_admin dependency]
B -->|Not admin| B1[403 Admin required + audit log]
B -->|Auth fail| B2[401/402/404/429]
B -->|Admin| C{try block}
C --> D[list_coupons with filters]
D --> E[Build Supabase query]
E --> F{scope filter?}
F -->|Yes| F1[.eq coupon_scope]
F -->|No| G{coupon_type filter?}
F1 --> G
G -->|Yes| G1[.eq coupon_type]
G -->|No| H{is_active filter?}
G1 --> H
H -->|Yes| H1[.eq is_active]
H -->|No| I[.order + .range pagination]
H1 --> I
I --> J[Execute query]
J --> K[Map to CouponResponse list]
K --> L[Return ListCouponsResponse]
C -->|HTTPException| M[Re-raise]
C -->|Other Exception| N[500 Internal server error]
Issue: #1720
Handler: get_coupon_stats_endpoint() in src/routes/coupons.py (line 342)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
| Field | Type | Description |
|---|---|---|
total_coupons |
int |
Total number of coupons in system |
active_coupons |
int |
Currently active coupons |
user_specific_coupons |
int |
User-specific scoped coupons |
global_coupons |
int |
Globally scoped coupons |
total_redemptions |
int |
Total number of redemptions |
unique_redeemers |
int |
Unique users who redeemed |
total_value_distributed |
float |
Total USD distributed |
average_redemption_value |
float |
Average redemption value |
get_coupon_stats_endpoint(user)
├── Depends(require_admin) # (admin auth chain)
├── get_all_coupons_stats() # src/db/coupons.py:557
│ ├── get_supabase_client()
│ ├── client.table("coupons").select("*").execute()
│ │ └── Fetches ALL coupons (no pagination!)
│ ├── client.table("coupon_redemptions").select("*").execute()
│ │ └── Fetches ALL redemptions (no pagination!)
│ └── Aggregations:
│ ├── Filter active coupons (is_active=True)
│ ├── Filter by coupon_scope ("user_specific" vs "global")
│ ├── Sum value_applied across all redemptions
│ ├── Count unique user_ids in redemptions
│ └── Calculate average_redemption_value
└── Return CouponStatsResponse(**stats)
| Operation | Table | Columns | Filters | Notes |
|---|---|---|---|---|
| SELECT | coupons |
* |
None | Fetches all rows |
| SELECT | coupon_redemptions |
* |
None | Fetches all rows |
Performance Warning: Both queries fetch all rows without pagination. This could be slow with large datasets.
None.
None.
- Standard middleware pipeline
- Admin authentication required
| Error Path | Status Code | Detail |
|---|---|---|
| Auth failures | 401/402/403/404/429 | Various auth errors |
| Non-admin | 403 | "Administrator privileges required" |
Supabase error in get_all_coupons_stats
|
Returns {}
|
Empty dict, then Pydantic validation fails → 500 |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[GET /admin/coupons/stats/overview] --> B[require_admin dependency]
B -->|Not admin| B1[403 Admin required]
B -->|Auth fail| B2[401/402/404/429]
B -->|Admin| C{try block}
C --> D[get_all_coupons_stats]
D --> E[SELECT * FROM coupons]
E --> F[SELECT * FROM coupon_redemptions]
F --> G[Filter active coupons]
G --> H[Filter by scope: user_specific vs global]
H --> I[Sum total_value_distributed]
I --> J[Count unique redeemers]
J --> K[Calculate average_redemption_value]
K --> L[Return CouponStatsResponse]
C -->|HTTPException| M[Re-raise]
C -->|Other Exception| N[500 Internal server error]
Issue: #1721
Handler: get_coupon_endpoint() in src/routes/coupons.py (line 244)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
| Param | Type | Description |
|---|---|---|
coupon_id |
int |
Coupon ID to retrieve |
| Field | Type | Default |
|---|---|---|
id |
int |
- |
code |
str |
- |
value_usd |
float |
- |
coupon_scope |
str |
- |
coupon_type |
str |
- |
max_uses |
int |
- |
times_used |
int |
- |
valid_from |
datetime |
- |
valid_until |
datetime |
- |
is_active |
bool |
- |
created_at |
datetime |
- |
assigned_to_user_id |
int | None |
None |
created_by |
int | None |
None |
created_by_type |
str |
- |
description |
str | None |
None |
get_coupon_endpoint(coupon_id, user)
├── Depends(require_admin) # (admin auth chain)
├── get_coupon_by_id(coupon_id) # src/db/coupons.py:118
│ ├── get_supabase_client()
│ └── client.table("coupons")
│ .select("*")
│ .eq("id", coupon_id)
│ .execute()
└── If None → 404; else Return CouponResponse(**coupon)
| Operation | Table | Columns | Filters |
|---|---|---|---|
| SELECT | coupons |
* |
.eq("id", coupon_id) |
None.
None.
- Standard middleware pipeline
- Admin authentication required
| Error Path | Status Code | Detail |
|---|---|---|
| Auth failures | 401/402/403/404/429 | Various auth errors |
| Non-admin | 403 | "Administrator privileges required" |
| Coupon not found | 404 | "Coupon not found" |
Supabase error in get_coupon_by_id
|
Returns None → 404 |
Logs error, returns None |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[GET /admin/coupons/coupon_id] --> B[require_admin dependency]
B -->|Not admin| B1[403 Admin required]
B -->|Admin| C{try block}
C --> D[get_coupon_by_id]
D --> E[SELECT * FROM coupons WHERE id = coupon_id]
E --> F{Coupon found?}
F -->|No| G[404 Coupon not found]
F -->|Yes| H[Return CouponResponse]
C -->|HTTPException| I[Re-raise]
C -->|Other Exception| J[500 Internal server error]
Issue: #1722
Handler: get_coupon_analytics_endpoint() in src/routes/coupons.py (line 314)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
| Param | Type | Description |
|---|---|---|
coupon_id |
int |
Coupon ID to get analytics for |
| Field | Type | Description |
|---|---|---|
coupon |
CouponResponse |
Full coupon details |
total_redemptions |
int |
Total redemptions for this coupon |
unique_users |
int |
Unique users who redeemed |
total_value_distributed |
float |
Total USD distributed |
redemption_rate |
float |
% of max_uses consumed |
remaining_uses |
int |
max_uses - times_used |
is_expired |
bool |
Whether valid_until has passed |
get_coupon_analytics_endpoint(coupon_id, user)
├── Depends(require_admin) # (admin auth chain)
├── get_coupon_analytics(coupon_id) # src/db/coupons.py:509
│ ├── get_coupon_by_id(coupon_id) # src/db/coupons.py:118
│ │ ├── get_supabase_client()
│ │ └── SELECT * FROM coupons WHERE id = coupon_id
│ ├── If coupon not found → return {}
│ ├── get_supabase_client()
│ ├── client.table("coupon_redemptions")
│ │ .select("*")
│ │ .eq("coupon_id", coupon_id)
│ │ .execute()
│ └── Compute:
│ ├── total_value_distributed = sum(value_applied)
│ ├── unique_users = len(set(user_id))
│ ├── redemption_rate = (count / max_uses * 100)
│ ├── remaining_uses = max_uses - times_used
│ ├── is_expired = valid_until < now(UTC)
│ └── recent_redemptions = last 10 (not exposed in response)
└── Return CouponAnalyticsResponse
| Operation | Table | Columns | Filters |
|---|---|---|---|
| SELECT | coupons |
* |
.eq("id", coupon_id) |
| SELECT | coupon_redemptions |
* |
.eq("coupon_id", coupon_id) |
None.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth failures | 401/402/403/404/429 | Various auth errors |
| Non-admin | 403 | "Administrator privileges required" |
Coupon not found (get_coupon_analytics returns {}) |
404 | "Coupon not found" |
| Supabase error | 500 | "Internal server error" |
flowchart TD
A[GET /admin/coupons/coupon_id/analytics] --> B[require_admin dependency]
B -->|Not admin| B1[403]
B -->|Admin| C{try block}
C --> D[get_coupon_analytics]
D --> E[get_coupon_by_id]
E --> F{Coupon found?}
F -->|No| G[Return empty dict]
F -->|Yes| H[SELECT * FROM coupon_redemptions WHERE coupon_id]
H --> I[Sum value_applied]
I --> J[Count unique user_ids]
J --> K[Calculate redemption_rate]
K --> L[Check is_expired]
L --> M[Return analytics dict]
G --> N{analytics empty?}
M --> N
N -->|Empty| O[404 Coupon not found]
N -->|Has data| P[Return CouponAnalyticsResponse]
C -->|HTTPException| Q[Re-raise]
C -->|Other Exception| R[500 Internal server error]
Issue: #1724
Handler: create_coupon_endpoint() in src/routes/coupons.py (line 163)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
| Field | Type | Default | Validation |
|---|---|---|---|
code |
str |
required |
min_length=3, max_length=50, must be alphanumeric (hyphens/underscores OK), uppercased |
value_usd |
float |
required |
gt=0, le=1000
|
coupon_scope |
CouponScope |
required |
"user_specific" or "global"
|
max_uses |
int |
required |
gt=0; if user_specific, must be 1 |
valid_until |
datetime |
required | Expiration date |
coupon_type |
CouponType |
"promotional" |
"promotional", "referral", "compensation", "partnership"
|
assigned_to_user_id |
int | None |
None |
Required for user_specific, forbidden for global |
description |
str | None |
None |
max_length=500 |
valid_from |
datetime | None |
None |
Defaults to now |
Validators:
-
code_must_be_alphanumeric: Strips non-alphanumeric (except-and_), uppercases -
validate_user_assignment: Cross-validates scope vs assigned_to_user_id -
validate_max_uses: user_specific requires max_uses=1
create_coupon_endpoint(coupon_request, user)
├── Depends(require_admin) # (admin auth chain)
├── create_coupon(...) # src/db/coupons.py:20
│ ├── get_supabase_client()
│ ├── Validate scope + assignment:
│ │ ├── user_specific without assigned_to → ValueError
│ │ ├── global with assigned_to → ValueError
│ │ └── user_specific with max_uses != 1 → ValueError
│ ├── Prepare coupon_data dict:
│ │ ├── code → uppercased
│ │ ├── value_usd, coupon_scope, max_uses, coupon_type
│ │ ├── created_by_type, valid_until, valid_from (default: now)
│ │ ├── conditional: created_by, assigned_to_user_id, description
│ │ └── Note: code stored UPPERCASED
│ └── client.table("coupons").insert(coupon_data).execute()
├── If result is None → 500 "Failed to create coupon"
└── Return CouponResponse(**coupon)
| Operation | Table | Columns Inserted | Notes |
|---|---|---|---|
| INSERT | coupons |
code, value_usd, coupon_scope, max_uses, coupon_type, created_by_type, valid_until, valid_from, [created_by, assigned_to_user_id, description] |
Code uppercased; unique constraint on code likely enforced at DB level |
None.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Pydantic validation (code format, scope rules) | 422 | Automatic |
Scope/assignment ValueError in create_coupon
|
400 | Error message |
| Insert returns None | 500 | "Failed to create coupon" |
| Duplicate code (DB constraint) | 500 via exception | "Internal server error" |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[POST /admin/coupons] --> B[Pydantic validation]
B -->|Invalid code/scope/max_uses| B1[422]
B -->|Valid| C[require_admin]
C -->|Not admin| C1[403]
C -->|Admin| D{try block}
D --> E[create_coupon]
E --> F[Validate scope + assignment rules]
F -->|Invalid| F1[ValueError → 400]
F -->|Valid| G[Prepare coupon_data]
G --> H[INSERT INTO coupons]
H --> I{Insert result?}
I -->|None| J[500 Failed to create]
I -->|Data| K[Return CouponResponse]
D -->|HTTPException| L[Re-raise]
D -->|ValueError| M[400 error detail]
D -->|Other| N[500 Internal server error]
Issue: #1725
Handler: update_coupon_endpoint() in src/routes/coupons.py (line 262)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
| Param | Type | Description |
|---|---|---|
coupon_id |
int |
Coupon ID to update |
| Field | Type | Default | Validation |
|---|---|---|---|
valid_until |
datetime | None |
None |
Optional new expiration |
max_uses |
int | None |
None |
gt=0 |
is_active |
bool | None |
None |
Toggle active status |
description |
str | None |
None |
max_length=500 |
All fields are optional; only set fields are included via exclude_unset=True.
update_coupon_endpoint(coupon_id, update_request, user)
├── Depends(require_admin) # (admin auth chain)
├── update_request.dict(exclude_unset=True) # Only fields explicitly set
│ └── If empty → 400 "No fields to update"
├── update_coupon(coupon_id, updates) # src/db/coupons.py:192
│ ├── get_supabase_client()
│ ├── Filter updates to allowed_fields only:
│ │ └── ["valid_until", "max_uses", "is_active", "description"]
│ │ └── Any other fields silently dropped
│ ├── If no valid fields after filtering → ValueError
│ └── client.table("coupons")
│ .update(filtered_updates)
│ .eq("id", coupon_id)
│ .execute()
└── If None → 404; else Return CouponResponse
| Operation | Table | Columns Updated | Filters |
|---|---|---|---|
| UPDATE | coupons |
Only allowed: valid_until, max_uses, is_active, description
|
.eq("id", coupon_id) |
Security Note: The DB layer enforces an allowlist of updatable fields. Even if extra fields are sent in the request, they are silently dropped by update_coupon().
None.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| No fields set in request | 400 | "No fields to update" |
| No valid fields after allowlist filter | raises ValueError | Caught by exception handler |
| Coupon not found (update returns None) | 404 | "Coupon not found or update failed" |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[PATCH /admin/coupons/coupon_id] --> B[require_admin]
B -->|Not admin| B1[403]
B -->|Admin| C{try block}
C --> D[update_request.dict exclude_unset]
D --> E{Any updates?}
E -->|No| F[400 No fields to update]
E -->|Yes| G[update_coupon]
G --> H[Filter to allowed fields]
H --> I{Valid fields remain?}
I -->|No| J[ValueError raised]
I -->|Yes| K[UPDATE coupons SET ... WHERE id = coupon_id]
K --> L{Update result?}
L -->|None| M[404 Not found or update failed]
L -->|Data| N[Return CouponResponse]
C -->|HTTPException| O[Re-raise]
C -->|Other| P[500 Internal server error]
Issue: #1726
Handler: deactivate_coupon_endpoint() in src/routes/coupons.py (line 291)
Tags: ["admin", "coupons"]
Authentication: Required - require_admin (admin role)
Note: This is a soft delete -- it deactivates the coupon rather than removing it.
| Param | Type | Description |
|---|---|---|
coupon_id |
int |
Coupon ID to deactivate |
{"success": True, "message": "Coupon deactivated successfully"}deactivate_coupon_endpoint(coupon_id, user)
├── Depends(require_admin) # (admin auth chain)
├── deactivate_coupon(coupon_id) # src/db/coupons.py:228
│ ├── get_supabase_client()
│ └── client.table("coupons")
│ .update({"is_active": False})
│ .eq("id", coupon_id)
│ .execute()
│ └── Returns True if data returned, False otherwise
└── If False → 404; else Return success dict
| Operation | Table | Columns Updated | Filters |
|---|---|---|---|
| UPDATE | coupons |
is_active = False |
.eq("id", coupon_id) |
None.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Coupon not found or already inactive | 404 | "Coupon not found or already inactive" |
Supabase error in deactivate_coupon
|
Returns False → 404 |
Logs error |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[DELETE /admin/coupons/coupon_id] --> B[require_admin]
B -->|Not admin| B1[403]
B -->|Admin| C{try block}
C --> D[deactivate_coupon]
D --> E[UPDATE coupons SET is_active=False WHERE id]
E --> F{Update returned data?}
F -->|No| G[404 Not found or already inactive]
F -->|Yes| H[Return success: true]
C -->|HTTPException| I[Re-raise]
C -->|Other| J[500 Internal server error]
Issue: #1737
Lists downtime incidents with optional filtering by status, severity, and environment. Admin-only endpoint requiring authentication through the full auth chain (API key -> user lookup -> admin role check).
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin (Bearer token -> API key validation -> user lookup -> admin role check)
HTTP Method: GET
Return Type: dict[str, Any]
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
limit |
int |
50 |
ge=1, le=500 |
Max incidents to return |
status |
str | None |
None |
regex: ^(ongoing|resolved|investigating)$
|
Filter by status |
severity |
str | None |
None |
regex: ^(low|medium|high|critical)$
|
Filter by severity |
environment |
str | None |
None |
none | Filter by environment |
-
Authorization: Bearer <api_key>(required)
{
"status": "success",
"total_incidents": 5,
"ongoing": 1,
"resolved": 4,
"incidents": [
{
"id": "uuid",
"started_at": "2026-03-01T00:00:00+00:00",
"detected_at": "2026-03-01T00:01:00+00:00",
"health_endpoint": "/health",
"error_message": "Connection refused",
"http_status_code": 503,
"status": "resolved",
"severity": "high",
"environment": "production",
"ended_at": "2026-03-01T00:15:00+00:00",
"logs_captured": [...],
"log_count": 150,
"resolved_by": "admin:user@example.com",
"notes": "Resolution notes"
}
]
}| Status | Condition |
|---|---|
| 401 | Missing/invalid API key |
| 402 | Trial expired |
| 403 | User is not admin |
| 404 | User not found |
| 500 | Internal server error |
-
list_downtime_incidents()insrc/routes/downtime_logs.py(line 33-79)
-
require_adminfromsrc/security/deps.py(FastAPI Depends) -
get_recent_incidents()fromsrc/db/downtime_incidents.py
-
require_admin()->get_current_user()->get_api_key()->HTTPBearer() -
get_current_user()callsget_user()fromsrc/services/user_lookup_cache.py -
get_current_user()callsvalidate_trial_expiration()fromsrc/utils/trial_utils.py -
require_admin()checksuser.get("is_admin", False) or user.get("role") == "admin" - On failure: logs via
audit_logger.log_security_violation()and raises 403
- Calls
execute_with_retry(_get_recent, max_retries=2, retry_delay=0.2) -
_get_recent(client)builds Supabase query:- Table:
downtime_incidents - Operation:
SELECT * - Optional filters:
.eq("status", status),.eq("severity", severity),.eq("environment", environment) - Order:
.order("started_at", desc=True) - Limit:
.limit(limit)
- Table:
- Retries up to
max_retries(2) withretry_delay(0.2s) between attempts - Passes Supabase client to the operation callable
- Handles connection errors with retry logic
| Table | Operation | Columns | Filters | Order | Limit |
|---|---|---|---|---|---|
downtime_incidents |
SELECT | * |
Optional: status, severity, environment (all .eq()) |
started_at DESC |
limit param (default 50, max 500) |
Retry config: max_retries=2, retry_delay=0.2s
None directly. The get_user() call in the auth chain uses user_lookup_cache which may involve Redis caching.
None directly emitted by this endpoint. The auth middleware pipeline may increment standard request metrics.
None. Uses FastAPI Query() parameter validation with regex patterns. Return type is dict[str, Any].
- Standard middleware pipeline (sentry, observability, timeout, security, gzip, trace)
- Subject to
ConcurrencyMiddleware - Authentication via
require_admindependency injection chain:-
HTTPBearer()extracts Bearer token -
get_api_key()validates API key (format, active status, expiration, IP allowlist, domain restrictions) -
get_current_user()looks up user and validates trial expiration -
require_admin()checks admin role
-
| Exception | Status | Handler |
|---|---|---|
HTTPException (from auth chain) |
401/402/403/404 | Re-raised at line 75-76 |
Generic Exception
|
500 | Caught at line 77-79, logged with exc_info=True, raises HTTPException(500, "Internal server error")
|
Auth chain error paths:
- Missing credentials -> 401
- Invalid/inactive/expired API key -> 401
- Rate limited key -> 429
- IP/domain restriction -> 403
- User not found -> 404
- Trial expired -> 402
- Not admin -> 403
flowchart TD
A[GET /admin/downtime/incidents] --> B[require_admin dependency]
B --> C[get_current_user]
C --> D[get_api_key - validate Bearer token]
D --> E{API key valid?}
E -->|No| F[401/403/429 HTTPException]
E -->|Yes| G[get_user from cache]
G --> H{User found?}
H -->|No| I[404 HTTPException]
H -->|Yes| J[validate_trial_expiration]
J --> K{Trial expired?}
K -->|Yes| L[402 HTTPException]
K -->|No| M{is_admin or role==admin?}
M -->|No| N[403 HTTPException]
M -->|Yes| O[Execute handler]
O --> P[get_recent_incidents from Supabase]
P --> Q[SELECT * FROM downtime_incidents with filters]
Q --> R[execute_with_retry max_retries=2]
R --> S[Calculate summary: total, ongoing, resolved counts]
S --> T[Return success response]
O -->|Exception| U[Log error, raise 500]
list_downtime_incidents()
├── src/security/deps.py::require_admin (Depends)
│ ├── get_current_user()
│ │ ├── get_api_key() -> validate_api_key_security()
│ │ │ └── src/security/security.py
│ │ ├── get_user() -> src/services/user_lookup_cache.py
│ │ └── validate_trial_expiration() -> src/utils/trial_utils.py
│ └── audit_logger.log_security_violation()
├── src/db/downtime_incidents.py::get_recent_incidents()
│ └── src/config/supabase_config.py::execute_with_retry()
│ └── Supabase client -> downtime_incidents table
└── logging (stdlib)
Issue: #1738
Lists all currently ongoing downtime incidents. Admin-only endpoint. A specialized, no-parameter version of the incidents list filtered to status=ongoing.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin (Bearer token -> API key -> user -> admin check)
HTTP Method: GET
Return Type: dict[str, Any]
-
Authorization: Bearer <api_key>(required)
No query parameters.
{
"status": "success",
"count": 2,
"incidents": [
{
"id": "uuid",
"started_at": "2026-03-01T00:00:00+00:00",
"detected_at": "...",
"status": "ongoing",
"severity": "high",
"environment": "production",
...
}
]
}| Status | Condition |
|---|---|
| 401 | Missing/invalid API key |
| 402 | Trial expired |
| 403 | User is not admin |
| 404 | User not found |
| 500 | Internal server error |
-
list_ongoing_incidents()insrc/routes/downtime_logs.py(line 82-106)
-
require_adminfromsrc/security/deps.py(same auth chain as #1737) -
get_ongoing_incidents()fromsrc/db/downtime_incidents.py
- Calls
execute_with_retry(_get_ongoing, max_retries=2, retry_delay=0.2) -
_get_ongoing(client)builds Supabase query:- Table:
downtime_incidents - Operation:
SELECT * - Filter:
.eq("status", "ongoing") - Order:
.order("started_at", desc=True)
- Table:
- Returns
result.dataor empty list[] - On exception: logs error, calls
_maybe_log_missing_table_hint(), returns[]
| Table | Operation | Columns | Filters | Order |
|---|---|---|---|---|
downtime_incidents |
SELECT | * |
status = 'ongoing' |
started_at DESC |
Retry config: max_retries=2, retry_delay=0.2s
None directly. Auth chain may use user lookup cache.
None directly emitted.
None. Return type is dict[str, Any].
Same as #1737: standard pipeline + ConcurrencyMiddleware + require_admin auth chain.
| Exception | Status | Handler |
|---|---|---|
HTTPException (auth) |
401/402/403/404 | Re-raised at line 101-102 |
Generic Exception
|
500 | Logged with exc_info=True, raises HTTPException(500) at line 103-106 |
Note: If the downtime_incidents table is missing, get_ongoing_incidents() catches the error internally and returns [] (empty list) rather than raising. The handler would then return {"status": "success", "count": 0, "incidents": []}.
flowchart TD
A[GET /admin/downtime/incidents/ongoing] --> B[require_admin auth chain]
B --> C{Auth successful?}
C -->|No| D[401/402/403/404 HTTPException]
C -->|Yes| E[get_ongoing_incidents from Supabase]
E --> F[SELECT * FROM downtime_incidents WHERE status=ongoing ORDER BY started_at DESC]
F --> G[execute_with_retry max_retries=2]
G --> H[Return count + incidents list]
E -->|Exception| I[Log error, raise 500]
list_ongoing_incidents()
├── src/security/deps.py::require_admin (Depends)
│ └── (full auth chain: get_api_key -> get_current_user -> admin check)
├── src/db/downtime_incidents.py::get_ongoing_incidents()
│ └── src/config/supabase_config.py::execute_with_retry()
│ └── Supabase client -> downtime_incidents table
└── logging (stdlib)
Issue: #1739
Returns aggregated downtime statistics for a configurable time period, including total incidents, downtime duration, and breakdowns by severity and status. Admin-only endpoint.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
days |
int |
30 |
ge=1, le=365 |
Number of days to analyze |
-
Authorization: Bearer <api_key>(required)
{
"status": "success",
"period_days": 30,
"statistics": {
"total_incidents": 12,
"total_downtime_seconds": 3600,
"average_duration_seconds": 300,
"by_severity": {
"high": 5,
"critical": 2,
"medium": 5
},
"by_status": {
"resolved": 10,
"ongoing": 2
}
}
}{
"status": "success",
"period_days": 30,
"statistics": {
"total_incidents": 0,
"total_downtime_seconds": 0,
"average_duration_seconds": 0,
"by_severity": {},
"by_status": {}
}
}-
get_downtime_statistics()insrc/routes/downtime_logs.py(line 360-388)
-
require_adminfromsrc/security/deps.py -
get_incident_statistics(days)fromsrc/db/downtime_incidents.py
- Calculates
cutoff_dtasnow() - (days * 86400)seconds - Calls
get_incidents_by_date_range(cutoff_dt, now()) - Aggregates: total count, total downtime from
duration_secondsfield, severity counts, status counts - Returns stats dict
- Calls
execute_with_retry(_get_by_range, max_retries=2, retry_delay=0.2) - Supabase query:
- Table:
downtime_incidents - Operation:
SELECT * - Filters:
.gte("started_at", start_date.isoformat()),.lte("started_at", end_date.isoformat()) - Order:
.order("started_at", desc=True)
- Table:
| Table | Operation | Columns | Filters | Order |
|---|---|---|---|---|
downtime_incidents |
SELECT | * |
started_at >= cutoff_date AND started_at <= now() |
started_at DESC |
Retry config: max_retries=2, retry_delay=0.2s
None directly.
None directly emitted.
None. Return type is dict[str, Any].
Same as other admin endpoints: standard pipeline + ConcurrencyMiddleware + require_admin auth chain.
| Exception | Status | Handler |
|---|---|---|
HTTPException (auth) |
401/402/403/404 | Re-raised |
Generic Exception
|
500 | Logged, raises HTTPException(500)
|
Note: get_incident_statistics() has its own internal error handling and returns a zeroed-out stats dict on failure rather than raising. So a Supabase failure would result in a 200 response with all-zero statistics.
total_downtime = sum(inc.get("duration_seconds", 0) for inc in incidents if inc.get("duration_seconds"))
average_duration = total_downtime // len(incidents) if incidents else 0
# severity_counts: count per severity value
# status_counts: count per status valueflowchart TD
A[GET /admin/downtime/statistics?days=30] --> B[require_admin auth chain]
B --> C{Auth OK?}
C -->|No| D[401/402/403/404]
C -->|Yes| E[get_incident_statistics days=30]
E --> F[Calculate cutoff_dt = now - 30 days]
F --> G[get_incidents_by_date_range cutoff_dt to now]
G --> H[SELECT * FROM downtime_incidents WHERE started_at BETWEEN dates]
H --> I{Incidents found?}
I -->|No| J[Return zeroed stats]
I -->|Yes| K[Sum total_downtime from duration_seconds]
K --> L[Count by severity and status]
L --> M[Calculate average_duration]
M --> N[Return statistics]
get_downtime_statistics()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident_statistics()
│ └── get_incidents_by_date_range()
│ └── execute_with_retry() -> Supabase downtime_incidents table
└── logging (stdlib)
Issue: #1740
Retrieves full details of a specific downtime incident by UUID, including captured logs and metadata. Admin-only endpoint.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]
| Parameter | Type | Description |
|---|---|---|
incident_id |
str |
UUID of the incident |
-
Authorization: Bearer <api_key>(required)
{
"status": "success",
"incident": {
"id": "uuid",
"started_at": "2026-03-01T00:00:00+00:00",
"detected_at": "2026-03-01T00:01:00+00:00",
"ended_at": "2026-03-01T00:15:00+00:00",
"health_endpoint": "/health",
"error_message": "Connection refused",
"http_status_code": 503,
"response_body": "...",
"status": "resolved",
"severity": "high",
"environment": "production",
"logs_captured": [...],
"log_count": 150,
"logs_file_path": null,
"resolved_by": "admin:user@example.com",
"notes": "Resolution notes",
"server_info": {},
"metrics_snapshot": {}
}
}| Status | Condition |
|---|---|
| 401 | Missing/invalid API key |
| 403 | Not admin |
| 404 | Incident not found |
| 500 | Internal server error |
-
get_downtime_incident()insrc/routes/downtime_logs.py(line 109-139)
-
require_adminfromsrc/security/deps.py -
get_incident(incident_id)fromsrc/db/downtime_incidents.py
- Calls
execute_with_retry(_get_incident, max_retries=2, retry_delay=0.2) -
_get_incident(client):- Table:
downtime_incidents - Operation:
SELECT * - Filter:
.eq("id", str(incident_id))
- Table:
- Returns first row or
None - On exception: logs error, calls
_maybe_log_missing_table_hint(), returnsNone
| Table | Operation | Columns | Filters |
|---|---|---|---|
downtime_incidents |
SELECT | * |
id = incident_id |
Retry config: max_retries=2, retry_delay=0.2s
None directly.
None directly emitted.
None.
Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.
| Exception | Status | Handler |
|---|---|---|
| Auth chain failures | 401/402/403/404 | Re-raised |
get_incident() returns None
|
404 |
HTTPException(404, "Incident not found") at line 128 |
HTTPException (any) |
varies | Re-raised at line 135-136 |
Generic Exception
|
500 | Logged, raises HTTPException(500) at line 137-139 |
Note: If get_incident() fails due to missing table/Supabase error, it returns None internally (does not raise), which the handler interprets as 404.
flowchart TD
A["GET /admin/downtime/incidents/{incident_id}"] --> B[require_admin auth chain]
B --> C{Auth OK?}
C -->|No| D[401/402/403/404]
C -->|Yes| E[get_incident from Supabase]
E --> F["SELECT * FROM downtime_incidents WHERE id = incident_id"]
F --> G{Incident found?}
G -->|No| H[404 Incident not found]
G -->|Yes| I[Return success with incident data]
E -->|Exception| J[Log error, raise 500]
get_downtime_incident()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│ └── execute_with_retry() -> Supabase downtime_incidents table
└── logging (stdlib)
Issue: #1741
Retrieves and filters captured logs for a specific downtime incident. Supports filtering by log level, logger name, and full-text search. Admin-only endpoint.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]
| Parameter | Type | Description |
|---|---|---|
incident_id |
str |
UUID of the incident |
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
level |
str | None |
None |
regex: ^(ERROR|WARNING|INFO|DEBUG)$
|
Filter by log level |
logger_name |
str | None |
None |
none | Filter by logger name (e.g. src.routes.chat) |
search |
str | None |
None |
none | Case-insensitive search in log messages |
-
Authorization: Bearer <api_key>(required)
{
"status": "success",
"total_logs": 25,
"total_captured": 150,
"filters": {
"level": "ERROR",
"logger": null,
"search": null
},
"logs": [
{
"timestamp": "2026-03-01T00:05:00+00:00",
"level": "ERROR",
"logger": "src.routes.chat",
"message": "Provider timeout after 30s",
"labels": {...}
}
]
}{
"status": "success",
"message": "No logs captured for this incident",
"total_logs": 0,
"logs": []
}-
get_incident_logs()insrc/routes/downtime_logs.py(line 142-206)
-
require_adminfromsrc/security/deps.py -
get_incident()fromsrc/db/downtime_incidents.py -
get_filtered_logs()fromsrc/services/downtime_log_capture.py
Pure in-memory filtering function (no I/O):
- If
levelprovided: filter wherelog.get("level") == level - If
logger_nameprovided: filter wherelog.get("logger") == logger_name - If
search_termprovided: filter wheresearch_term.lower() in log.get("message", "").lower() - Returns filtered list
| Table | Operation | Columns | Filters |
|---|---|---|---|
downtime_incidents |
SELECT | * |
id = incident_id |
None.
None.
None.
Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.
| Exception | Status | Handler |
|---|---|---|
| Auth chain failures | 401/402/403/404 | Re-raised |
Incident not found (get_incident() returns None) |
404 | HTTPException(404, "Incident not found") |
No logs_captured in incident |
200 | Returns {"total_logs": 0, "logs": []} (not an error) |
HTTPException (any) |
varies | Re-raised |
Generic Exception
|
500 | Logged, raises HTTPException(500)
|
# Applied sequentially - all filters are AND conditions
filtered = logs
if level: filtered = [l for l in filtered if l.get("level") == level]
if logger: filtered = [l for l in filtered if l.get("logger") == logger_name]
if search: filtered = [l for l in filtered if search.lower() in l.get("message","").lower()]flowchart TD
A["GET /admin/downtime/incidents/{id}/logs"] --> B[require_admin auth]
B --> C{Auth OK?}
C -->|No| D[401/402/403/404]
C -->|Yes| E[get_incident from Supabase]
E --> F{Incident found?}
F -->|No| G[404 Incident not found]
F -->|Yes| H[Get logs_captured from incident]
H --> I{Logs exist?}
I -->|No| J[Return total_logs=0, empty logs array]
I -->|Yes| K[get_filtered_logs with level, logger_name, search]
K --> L[Apply level filter if provided]
L --> M[Apply logger_name filter if provided]
M --> N[Apply search filter if provided - case insensitive]
N --> O[Return filtered logs with counts and filter metadata]
get_incident_logs()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│ └── execute_with_retry() -> Supabase downtime_incidents table
├── src/services/downtime_log_capture.py::get_filtered_logs()
│ └── (pure in-memory filtering, no external deps)
└── logging (stdlib)
Issue: #1742
Analyzes captured logs for a downtime incident, providing error statistics and patterns including error counts, warning counts, error type distribution, and top error messages. Admin-only endpoint.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: GET
Return Type: dict[str, Any]
| Parameter | Type | Description |
|---|---|---|
incident_id |
str |
UUID of the incident |
-
Authorization: Bearer <api_key>(required)
{
"status": "success",
"incident_id": "uuid",
"analysis": {
"total_logs": 150,
"error_count": 25,
"warning_count": 40,
"error_types": {
"ConnectionError": 10,
"TimeoutError": 8,
"Unknown": 7
},
"top_errors": [
["Provider timeout after 30s", 8],
["Connection refused to database", 6],
["Redis connection lost", 3]
]
}
}{
"status": "success",
"message": "No logs to analyze",
"analysis": null
}-
analyze_incident_logs()insrc/routes/downtime_logs.py(line 209-255)
-
require_adminfromsrc/security/deps.py -
get_incident()fromsrc/db/downtime_incidents.py -
analyze_logs_for_errors()fromsrc/services/downtime_log_capture.py
Pure in-memory analysis function:
- Filters
errors= logs wherelevel == "ERROR" - Filters
warnings= logs wherelevel == "WARNING" - Counts error types from
error_typefield (default "Unknown") - Counts error messages (truncated to 200 chars)
- Sorts top 10 errors by count descending
- Returns analysis dict
| Table | Operation | Columns | Filters |
|---|---|---|---|
downtime_incidents |
SELECT | * |
id = incident_id |
None.
None.
None.
Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.
| Exception | Status | Handler |
|---|---|---|
| Auth chain failures | 401/402/403/404 | Re-raised |
| Incident not found | 404 | HTTPException(404, "Incident not found") |
| No logs in incident | 200 | Returns {"analysis": null, "message": "No logs to analyze"}
|
HTTPException (any) |
varies | Re-raised |
Generic Exception
|
500 | Logged, raises HTTPException(500)
|
errors = [log for log in logs if log.get("level") == "ERROR"]
warnings = [log for log in logs if log.get("level") == "WARNING"]
# Count by error_type field
error_types = {} # {"ConnectionError": 10, "TimeoutError": 8}
# Count by message (truncated to 200 chars)
error_messages = {} # {"msg": count}
top_errors = sorted(error_messages.items(), key=count, reverse=True)[:10]flowchart TD
A["GET /admin/downtime/incidents/{id}/analysis"] --> B[require_admin auth]
B --> C{Auth OK?}
C -->|No| D[401/402/403/404]
C -->|Yes| E[get_incident from Supabase]
E --> F{Incident found?}
F -->|No| G[404 Incident not found]
F -->|Yes| H[Get logs_captured from incident]
H --> I{Logs exist?}
I -->|No| J["Return analysis=null, message='No logs to analyze'"]
I -->|Yes| K[analyze_logs_for_errors]
K --> L[Filter ERROR level logs]
K --> M[Filter WARNING level logs]
K --> N[Count error_types]
K --> O[Count and rank top 10 error messages]
L --> P[Return analysis dict]
M --> P
N --> P
O --> P
analyze_incident_logs()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│ └── execute_with_retry() -> Supabase downtime_incidents table
├── src/services/downtime_log_capture.py::analyze_logs_for_errors()
│ └── (pure in-memory analysis, no external deps)
└── logging (stdlib)
Issue: #1743
Manually triggers log capture from Grafana Loki for an ongoing downtime incident. Queries Loki for logs from 5 minutes before the incident started to current time, and stores them in the database. Admin-only endpoint.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: POST
Return Type: dict[str, Any]
| Parameter | Type | Description |
|---|---|---|
incident_id |
str |
UUID of the incident |
-
Authorization: Bearer <api_key>(required)
No request body.
{
"status": "success",
"message": "Log capture triggered",
"result": {
"success": true,
"log_count": 250,
"truncated": false,
"storage": "database"
}
}{
"status": "success",
"message": "Log capture triggered",
"result": {
"success": false,
"log_count": 0,
"error": "No logs found in Loki"
}
}-
trigger_log_capture()insrc/routes/downtime_logs.py(line 258-304)
-
require_adminfromsrc/security/deps.py -
get_incident()fromsrc/db/downtime_incidents.py -
capture_logs_for_ongoing_incident()fromsrc/services/downtime_log_capture.py
- Delegates to
capture_downtime_logs(incident_id, downtime_start, downtime_end=None, save_to_file=False)
- Calculates time range:
start = downtime_start - 5 minutes,end = now()(ongoing) - Calls
query_loki_logs(start, end)to fetch logs from Grafana Loki - If
save_to_file=False(default for manual capture):- Truncates to
MAX_LOGS_TO_CAPTURE(10,000) - Calls
update_incident(incident_id, logs_captured=logs)to save to database
- Truncates to
- Checks
Config.LOKI_ENABLED- returns[]if disabled - Checks
Config.LOKI_QUERY_URL- returns[]if not set - Makes HTTP GET to
{LOKI_QUERY_URL}/loki/api/v1/query_rangewith:-
query:{app="gatewayz-api"} -
start: nanosecond timestamp -
end: nanosecond timestamp -
limit: 10,000 -
direction: forward (chronological)
-
- Auth: Basic auth with
GRAFANA_LOKI_USERNAME/GRAFANA_LOKI_API_KEYif configured - Uses
httpx.Client(sync) withtimeout=30.0 - Parses Loki stream response, extracts timestamps and log lines (JSON or plain text)
- Builds update dict with
logs_capturedandlog_count - Supabase:
UPDATE downtime_incidents SET logs_captured=..., log_count=... WHERE id=incident_id - Retry config:
max_retries=2,retry_delay=0.2s
| Table | Operation | Columns | Filters | Notes |
|---|---|---|---|---|
downtime_incidents |
SELECT | * |
id = incident_id |
Get incident details |
downtime_incidents |
UPDATE |
logs_captured, log_count
|
id = incident_id |
Store captured logs |
None.
None directly emitted.
| Service | Method | URL | Auth | Timeout |
|---|---|---|---|---|
| Grafana Loki | GET | {LOKI_QUERY_URL}/loki/api/v1/query_range |
Basic (GRAFANA_LOKI_USERNAME/GRAFANA_LOKI_API_KEY) | 30s |
| Param | Value |
|---|---|
query |
{app="gatewayz-api"} |
start |
(incident_started_at - 5min) in nanoseconds |
end |
now() in nanoseconds |
limit |
10,000 |
direction |
forward |
| Config | Env Var | Description |
|---|---|---|
Config.LOKI_ENABLED |
LOKI_ENABLED |
Must be truthy for log capture to work |
Config.LOKI_QUERY_URL |
LOKI_QUERY_URL |
Loki query endpoint base URL |
Config.GRAFANA_LOKI_USERNAME |
GRAFANA_LOKI_USERNAME |
Basic auth username (optional) |
Config.GRAFANA_LOKI_API_KEY |
GRAFANA_LOKI_API_KEY |
Basic auth password (optional) |
| Exception | Status | Handler |
|---|---|---|
| Auth chain failures | 401/402/403/404 | Re-raised |
| Incident not found | 404 | HTTPException(404, "Incident not found") |
| Incident not ongoing | 400 | HTTPException(400, "Can only capture logs for ongoing incidents") |
HTTPException (any) |
varies | Re-raised |
Generic Exception
|
500 | Logged, raises HTTPException(500)
|
| Loki query fails | 200 | Returns {"result": {"success": false, "error": "..."}} (handled internally) |
| Constant | Value | Description |
|---|---|---|
PRE_DOWNTIME_MINUTES |
5 | Minutes before incident to capture |
POST_DOWNTIME_MINUTES |
5 | Minutes after incident to capture |
MAX_LOGS_TO_CAPTURE |
10,000 | Max log entries to store |
flowchart TD
A["POST /admin/downtime/incidents/{id}/capture-logs"] --> B[require_admin auth]
B --> C{Auth OK?}
C -->|No| D[401/402/403/404]
C -->|Yes| E[get_incident from Supabase]
E --> F{Incident found?}
F -->|No| G[404 Incident not found]
F -->|Yes| H{Status == ongoing?}
H -->|No| I[400 Can only capture logs for ongoing incidents]
H -->|Yes| J[Parse started_at from incident]
J --> K[capture_logs_for_ongoing_incident]
K --> L[Calculate time range: started_at - 5min to now]
L --> M{LOKI_ENABLED?}
M -->|No| N["Return success=false, error='Loki not enabled'"]
M -->|Yes| O[HTTP GET Loki /loki/api/v1/query_range]
O --> P{Logs found?}
P -->|No| Q["Return success=false, 'No logs found'"]
P -->|Yes| R[Truncate to 10,000 max]
R --> S[UPDATE downtime_incidents SET logs_captured, log_count]
S --> T[Return success with log_count]
trigger_log_capture()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│ └── execute_with_retry() -> Supabase downtime_incidents table (SELECT)
├── src/services/downtime_log_capture.py::capture_logs_for_ongoing_incident()
│ └── capture_downtime_logs()
│ ├── query_loki_logs() -> HTTP GET Grafana Loki API
│ │ ├── Config.LOKI_ENABLED
│ │ ├── Config.LOKI_QUERY_URL
│ │ ├── Config.GRAFANA_LOKI_USERNAME
│ │ ├── Config.GRAFANA_LOKI_API_KEY
│ │ └── httpx.Client (sync, timeout=30s)
│ └── update_incident() -> Supabase downtime_incidents table (UPDATE)
│ └── execute_with_retry()
├── datetime (stdlib)
└── logging (stdlib)
Issue: #1744
Manually resolves a downtime incident, setting its status to "resolved", recording the resolution timestamp, and storing the resolving admin's identity and optional notes. Admin-only endpoint.
Router: APIRouter() (no prefix)
Tags: ["admin", "monitoring"]
Auth: require_admin
HTTP Method: POST
Return Type: dict[str, Any]
| Parameter | Type | Description |
|---|---|---|
incident_id |
str |
UUID of the incident |
| Parameter | Type | Default | Description |
|---|---|---|---|
notes |
str | None |
None |
Optional resolution notes |
-
Authorization: Bearer <api_key>(required)
{
"status": "success",
"message": "Incident resolved",
"incident": {
"id": "uuid",
"status": "resolved",
"ended_at": "2026-03-04T12:00:00+00:00",
"resolved_by": "admin:user@example.com",
"notes": "Fixed database connection pool"
}
}-
resolve_downtime_incident()insrc/routes/downtime_logs.py(line 307-357)
-
require_adminfromsrc/security/deps.py -
get_incident()fromsrc/db/downtime_incidents.py -
resolve_incident()fromsrc/db/downtime_incidents.py
- Delegates to
update_incident()with:ended_at=datetime.now(UTC)status="resolved"resolved_by=resolved_bynotes=notes
- Builds update dict from provided fields
- Calls
execute_with_retry(_update_incident, max_retries=2, retry_delay=0.2) - Supabase:
UPDATE downtime_incidents SET ended_at, status, resolved_by, notes WHERE id=incident_id
resolved_by = f"admin:{admin_user.get('email', admin_user.get('id'))}"Uses admin's email, falling back to user ID.
| Table | Operation | Columns Updated | Filters |
|---|---|---|---|
downtime_incidents |
SELECT | * |
id = incident_id (get incident) |
downtime_incidents |
UPDATE |
ended_at, status, resolved_by, notes
|
id = incident_id |
None.
None.
None.
Standard pipeline + ConcurrencyMiddleware + require_admin auth chain.
| Exception | Status | Handler |
|---|---|---|
| Auth chain failures | 401/402/403/404 | Re-raised |
| Incident not found | 404 | HTTPException(404, "Incident not found") |
| Incident already resolved | 400 | HTTPException(400, "Incident is already resolved") |
HTTPException (any) |
varies | Re-raised |
Generic Exception
|
500 | Logged, raises HTTPException(500)
|
flowchart TD
A["POST /admin/downtime/incidents/{id}/resolve"] --> B[require_admin auth]
B --> C{Auth OK?}
C -->|No| D[401/402/403/404]
C -->|Yes| E[get_incident from Supabase]
E --> F{Incident found?}
F -->|No| G[404 Incident not found]
F -->|Yes| H{Status == resolved?}
H -->|Yes| I[400 Incident is already resolved]
H -->|No| J["Build resolved_by = admin:{email or id}"]
J --> K[resolve_incident -> update_incident]
K --> L["UPDATE downtime_incidents SET ended_at=now, status=resolved, resolved_by, notes WHERE id=..."]
L --> M[Return success with updated incident]
resolve_downtime_incident()
├── src/security/deps.py::require_admin (Depends)
├── src/db/downtime_incidents.py::get_incident()
│ └── execute_with_retry() -> Supabase SELECT
├── src/db/downtime_incidents.py::resolve_incident()
│ └── update_incident()
│ └── execute_with_retry() -> Supabase UPDATE downtime_incidents
├── datetime (stdlib)
└── logging (stdlib)
5 endpoints
Issue: #1631
This endpoint accepts a single analytics event from the frontend and forwards it to both Statsig and PostHog analytics platforms. It is designed to avoid ad-blocker interference with client-side analytics by routing events through the backend. Authentication is optional — authenticated users have their user ID resolved from the token, while unauthenticated requests use a caller-provided user_id or fall back to "anonymous".
Authentication: Depends(get_current_user) with current_user: dict | None — optional auth. The dependency resolves to None if no valid credentials are provided (non-fatal). Note: despite using get_current_user, this is effectively optional because the parameter type allows None.
Request Schema (AnalyticsEvent):
{
"event_name": str, // Required. Event name (e.g., "chat_message_sent")
"user_id": str | null, // Optional. Used if not authenticated
"value": str | null, // Optional. Event value
"metadata": dict[str, Any] | null // Optional. Event metadata
}
User ID resolution logic:
- If
current_useris authenticated:user_id = str(current_user.get("user_id", "anonymous")) - Else if
event.user_idprovided:user_id = event.user_id - Else:
user_id = "anonymous"
Response (200 OK):
{
"success": true,
"message": "Event '{event_name}' logged successfully"
}
Error codes:
| Code | Condition |
|---|---|
| 500 | statsig_service.log_event or posthog_service.capture raises exception |
flowchart TD
A([POST /v1/analytics/events]) --> B[get_current_user optional auth]
B -->|no/invalid creds| C[current_user = None]
B -->|valid creds| D[current_user = user dict]
C --> E{current_user set?}
D --> E
E -->|yes| F[user_id = str current_user.user_id or anonymous]
E -->|no| G{event.user_id set?}
G -->|yes| H[user_id = event.user_id]
G -->|no| I[user_id = anonymous]
F --> J[statsig_service.log_event\nuser_id, event_name, value, metadata]
H --> J
I --> J
J --> K[posthog_service.capture\ndistinct_id=user_id\nevent=event_name\nproperties=metadata]
K -->|exception| L[logger.error\nHTTP 500]
K -->|OK| M[Return 200 success]
| Dependency | File | Operation | Details |
|---|---|---|---|
get_current_user |
src/security/deps.py:192 |
Optional auth | Returns user dict or raises (but type hint allows None in analytics route context). Actually get_current_user raises if user not found — the analytics route accepts `dict |
statsig_service |
src/services/statsig_service.py |
Event logging | Singleton StatsigService instance. log_event(user_id, event_name, value, metadata) — creates StatsigUser with user_id, calls Statsig.log_event(). Requires STATSIG_SERVER_SECRET_KEY env var. Falls back to logging-only if SDK unavailable. |
posthog_service |
src/services/posthog_service.py |
Event capture | Singleton PostHogService instance. capture(distinct_id, event, properties) — calls PostHog Python SDK client.capture(). Requires POSTHOG_API_KEY env var. Uses async mode (sync_mode=False). Falls back gracefully if not initialized. |
StatsigService.log_event |
src/services/statsig_service.py |
External API | Batches events via statsig-python-core SDK. Flush interval: 10s. Max queue size: 50. |
PostHogService.capture |
src/services/posthog_service.py |
External API | Async PostHog capture. SDK: posthog Python package. Host: POSTHOG_HOST (default: https://us.i.posthog.com). |
External API calls:
-
Statsig: Event batched locally, flushed to
https://api.statsig.comevery 10 seconds or when queue reaches 50 events -
PostHog: Event sent asynchronously to
POSTHOG_HOST(default:https://us.i.posthog.com)
Environment variables required:
-
STATSIG_SERVER_SECRET_KEY: Required for Statsig. Missing = service logs warning, operates in logging-only fallback. -
POSTHOG_API_KEY: Required for PostHog. Missing = service disabled with warning log. -
POSTHOG_HOST: Optional, defaults tohttps://us.i.posthog.com
- No database writes.
- No Redis operations.
-
External API calls (async/batched):
- Statsig: event queued locally, flushed in background
- PostHog: event sent asynchronously
- No direct Prometheus metrics.
-
No audit log (analytics endpoint does not call
audit_logger.log_api_key_usage). -
Graceful degradation: Both analytics services fail silently (logging warnings) if not configured. The endpoint will still return 200 in those cases since exceptions would only be raised from
statsig_service.log_eventorposthog_service.capture— which both have try/except internally that may or may not re-raise.
Issue: #1632
This endpoint accepts a batch of analytics events in a single request and forwards each one to both Statsig and PostHog sequentially. It is the preferred method when the frontend needs to log multiple events at once (e.g., on page unload, after a session, or when catching up on buffered events). Authentication is optional; the authenticated user's ID is used as the default for all events in the batch, while individual events can override their user_id field.
Authentication & Authorization:
-
Optional authentication. Uses
get_current_userdependency. - Unauthenticated requests are accepted;
user_iddefaults to"anonymous"unless overridden per event. - If authenticated, the user's ID serves as the default for any event that does not specify its own
user_id.
Request Schema:
[
{
"event_name": "chat_message_sent",
"user_id": null,
"value": null,
"metadata": { "model": "openai/gpt-4o" }
},
{
"event_name": "model_selected",
"user_id": "override-user-456",
"value": "openai/gpt-4o",
"metadata": {}
}
]Schema: list[AnalyticsEvent] (each item is AnalyticsEvent from src/routes/analytics.py).
User ID Resolution (per event):
- Uses
event.user_idif set, otherwise falls back to the authenticated user's ID or"anonymous".
Response Schema:
{
"success": true,
"message": "3 events logged successfully"
}Error Codes:
| Code | Condition |
|---|---|
| 500 | Any Statsig or PostHog service call failure |
sequenceDiagram
participant C as Client (Frontend)
participant R as Route Handler<br/>log_batch_events()
participant Auth as get_current_user (optional)
participant Statsig as Statsig Service
participant PostHog as PostHog Service
C->>R: POST /v1/analytics/batch [ {event_name, ...}, ... ]
R->>Auth: Depends(get_current_user)
alt Authenticated
Auth-->>R: current_user
R->>R: default user_id = str(current_user["user_id"])
else Not authenticated
Auth-->>R: None
R->>R: default user_id = "anonymous"
end
loop For each event in events list
R->>R: event_user_id = event.user_id or default user_id
R->>Statsig: statsig_service.log_event(<br/>user_id=event_user_id,<br/>event_name, value, metadata)
Statsig-->>R: OK
R->>PostHog: posthog_service.capture(<br/>distinct_id=event_user_id,<br/>event=event_name, properties=metadata)
PostHog-->>R: OK
end
R-->>C: 200 { success: true, message: "N events logged successfully" }
| Category | Name | Location | Purpose |
|---|---|---|---|
| Route file | analytics.py |
src/routes/analytics.py |
Handler |
| Auth | get_current_user |
src/security/deps.py |
Optional user identification |
| Schema | AnalyticsEvent |
src/routes/analytics.py |
Per-event model |
| Service | statsig_service |
src/services/statsig_service.py |
Statsig event logging |
| Service | posthog_service |
src/services/posthog_service.py |
PostHog event capture |
| External | Statsig | SaaS | Analytics / feature flags |
| External | PostHog | SaaS | Product analytics |
| Framework |
FastAPI, APIRouter, Depends, HTTPException
|
fastapi |
HTTP layer |
| Logging | logging |
stdlib | Error logging |
-
External writes to Statsig: One
log_event()call per event in the batch. -
External writes to PostHog: One
capture()call per event in the batch. - Processing is sequential (not concurrent): Events are iterated in order; a slow network call to Statsig or PostHog for one event will delay processing subsequent events. There is no parallelism or timeout per event.
- Fail-fast error handling: If any event's Statsig or PostHog call raises an exception, the entire batch fails with HTTP 500. Events processed before the failure are already logged; events after are not.
- No database writes to any Supabase table.
- No caching reads or writes.
- No notifications.
Issue: #1633
This endpoint logs a session start event to both Statsig and PostHog for DAU/WAU/MAU (Daily/Weekly/Monthly Active User) tracking and product growth metrics computation. It should be called when a user opens the application, logs in, or returns after an idle period. The session_start event is specifically named to align with Statsig's built-in Product Growth metric computation pipeline. Authentication is optional — anonymous sessions are tracked with user_id = "anonymous".
Authentication & Authorization:
-
Optional authentication. Uses
get_current_userdependency. - Unauthenticated requests are accepted and logged as anonymous sessions.
- Authenticated user's ID is used if available.
Request Schema:
{
"platform": "web",
"metadata": {
"version": "2.0.4",
"referrer": "https://google.com",
"utm_source": "email"
}
}Schema: SessionStartEvent (defined in src/routes/analytics.py).
Platform values: web, ios, android, desktop (validated by Pydantic Field(default="web")).
Response Schema:
{
"success": true,
"message": "Session start logged successfully"
}Error Codes:
| Code | Condition |
|---|---|
| 500 | Statsig or PostHog service call failure |
sequenceDiagram
participant C as Client (Frontend / Mobile)
participant R as Route Handler<br/>log_session_start()
participant Auth as get_current_user (optional)
participant Statsig as Statsig Service
participant PostHog as PostHog Service
C->>R: POST /v1/analytics/session/start<br/>{ platform: "web", metadata: {...} }
R->>Auth: Depends(get_current_user)
alt Authenticated
Auth-->>R: current_user dict
R->>R: user_id = str(current_user["user_id"])
else Not authenticated
Auth-->>R: None
R->>R: user_id = "anonymous"
end
R->>Statsig: statsig_service.log_session_start(<br/>user_id=user_id,<br/>platform=session.platform,<br/>metadata=session.metadata)
Statsig-->>R: OK (logs "session_start" event<br/>for DAU/WAU/MAU computation)
R->>PostHog: posthog_service.capture(<br/>distinct_id=user_id,<br/>event="session_start",<br/>properties={"platform": "web", ...metadata})
PostHog-->>R: OK
R->>R: logger.debug("Session start logged for user X on web")
R-->>C: 200 { success: true, message: "Session start logged successfully" }
| Category | Name | Location | Purpose |
|---|---|---|---|
| Route file | analytics.py |
src/routes/analytics.py |
Handler |
| Auth | get_current_user |
src/security/deps.py |
Optional user identification |
| Schema | SessionStartEvent |
src/routes/analytics.py |
Request body |
| Service | statsig_service |
src/services/statsig_service.py |
Statsig session start logging |
| Service method | statsig_service.log_session_start() |
src/services/statsig_service.py |
Specialized session event |
| Service | posthog_service |
src/services/posthog_service.py |
PostHog session capture |
| External | Statsig | SaaS | DAU/WAU/MAU + Product Growth metrics |
| External | PostHog | SaaS | Session tracking, retention analysis |
| Framework |
FastAPI, APIRouter, Depends, HTTPException
|
fastapi |
HTTP layer |
| Logging | logging |
stdlib | Debug logging |
-
External write to Statsig: Calls
statsig_service.log_session_start()which logs a namedsession_startevent. Statsig uses this specific event name to compute Product Growth metrics including DAU, WAU, MAU, stickiness, and retention rates. This is not a genericlog_event()call — it uses a dedicated method to ensure the event is structured correctly for Statsig's metric pipeline. -
External write to PostHog: Calls
posthog_service.capture()withevent="session_start"and aplatformproperty plus any additional metadata. PostHog uses this for funnel analysis, session recording correlation, and retention cohorts. - No database writes to any Supabase table.
- No caching reads or writes.
- No notifications.
-
Debug log: A
logger.debugline is emitted for each session start (not info/warning level), so it does not appear in production log aggregation unless debug logging is enabled.
Issue: #1661
Handler: get_cache_analytics() in src/routes/butter_analytics.py line 26
Returns Butter.dev LLM response cache performance analytics for the authenticated user over a configurable time window (1-90 days). Queries the chat_completion_requests Supabase table with a join to models and providers, then aggregates cache hit/miss statistics, cost savings, and per-model breakdown in Python.
Dependency: get_api_key (src/security/deps.py). Bearer token validated.
Then calls get_user(api_key) from src/db/users.py to retrieve the full user record. Returns HTTP 401 if no user found for the key.
GET /v1/analytics/cache?days=30 Authorization: Bearer api_key
Query parameter: days (int, optional, default=30, min=1, max=90) - analysis window in days
FastAPI Query validation: ge=1, le=90. Values outside range return HTTP 422 Unprocessable Entity.
Level 1 get_cache_analytics() src/routes/butter_analytics.py:26-163:
- Call get_user(api_key) to get user record including id and preferences
- Compute since_date = datetime.now(UTC) - timedelta(days=days)
- Call get_supabase_client() to get Supabase client
- Execute Supabase query on chat_completion_requests table
- Aggregate statistics in Python (cache hits, misses, savings)
- Sort and filter top_cached_models
- Return response dict
Level 2 get_user() from src/db/users.py (imported at top of file):
Note: src/routes/butter_analytics.py imports from src.db.users. This is the database-backed user lookup, not the cached version. Returns full user dict or None.
Level 2 get_supabase_client() from src/config/supabase_config.py:
Returns the configured Supabase Python client singleton.
Level 2 Supabase Query (src/routes/butter_analytics.py:59-68):
result = (
client.table("chat_completion_requests")
.select("model_id, cost_usd, metadata, created_at, models(model_name, providers(name, slug))")
.eq("user_id", user_id)
.eq("status", "completed")
.gte("created_at", since_date.isoformat())
.execute()
)Table: chat_completion_requests Operation: SELECT with JOIN Columns selected: model_id, cost_usd, metadata, created_at Joined tables: models (model_name), providers (name, slug) Filters:
- user_id = user_id (integer equality)
- status = 'completed'
- created_at >= since_date (ISO-8601 timestamp) No LIMIT applied - fetches all matching rows.
Level 3 Aggregation Logic (src/routes/butter_analytics.py:73-134):
For each request record:
- Check metadata.butter_cache_hit (boolean)
- If cache hit: increment cache_hits counter, add metadata.actual_cost_usd to total_savings
- Else: increment cache_misses counter
- Track per-model stats in model_stats dict
Top cached models filtering:
- Only includes models with total_requests >= 5
- Sorted by cache_hit_rate_percent descending
- Truncated to top 10
Derived metrics:
- cache_hit_rate = (cache_hits / total_requests * 100) if total_requests > 0 else 0
- estimated_monthly_savings = (total_savings * 30 / days) if days > 0 else 0
chat_completion_requests:
- model_id (column)
- cost_usd (column)
- metadata (JSONB column) - contains: butter_cache_hit (bool), actual_cost_usd (float)
- created_at (timestamp)
- user_id (FK to users)
- status (enum, filtered on 'completed')
models (joined via model_id FK):
- model_name
providers (joined via models.provider_id FK):
- name
- slug
{
"period_days": 30,
"start_date": "2026-02-02T12:00:00.000000+00:00",
"end_date": "2026-03-04T12:00:00.000000+00:00",
"total_requests": 1250,
"cache_hits": 437,
"cache_misses": 813,
"cache_hit_rate_percent": 34.96,
"total_savings_usd": 12.847293,
"estimated_monthly_savings_usd": 12.85,
"top_cached_models": [
{
"model_name": "gpt-4",
"provider": "OpenAI",
"total_requests": 320,
"cache_hits": 198,
"cache_hit_rate_percent": 61.88,
"savings_usd": 8.943241
}
],
"cache_enabled": true,
"system_enabled": true
}cache_enabled: from user.preferences.enable_butter_cache (default true if not set) system_enabled: from Config.BUTTER_DEV_ENABLED environment variable
Inner try/except blocks: None (single outer handler) Outer try/except:
- HTTPException: re-raised (preserves 401 from get_user check)
- All other Exception: logs via sanitize_for_logging, raises HTTP 500 "Failed to retrieve cache analytics"
HTTP error codes:
- 401: Invalid API key
- 422: Invalid days parameter (FastAPI validation)
- 500: Failed to retrieve cache analytics
Redis: Not used Supabase: SELECT on chat_completion_requests with JOIN to models and providers In-memory: None
Config.BUTTER_DEV_ENABLED: boolean env var controlling whether Butter.dev caching is active system-wide user.preferences.enable_butter_cache: per-user opt-in/opt-out (defaults to True)
Issue: #1662
Handler: get_cache_summary() in src/routes/butter_analytics.py line 166
Returns a quick summary of Butter.dev cache performance for the authenticated user. Tries a Supabase RPC function first (get_user_cache_savings), falls back to manual query if RPC is unavailable. Returns minimal response if cache is disabled for the user or system-wide.
Same as get_cache_analytics: get_api_key (src/security/deps.py) + get_user(api_key) lookup. Returns HTTP 401 if invalid key.
GET /v1/analytics/cache/summary Authorization: Bearer api_key
No query parameters.
Level 1 get_cache_summary() src/routes/butter_analytics.py:166-266:
- Call get_user(api_key) -> get user record
- Extract cache_enabled = user.preferences.enable_butter_cache (default True)
- If not cache_enabled OR not Config.BUTTER_DEV_ENABLED: return minimal response immediately
- Compute since_date = datetime.now(UTC) - timedelta(days=30) (hardcoded 30 days)
- Try Supabase RPC call first
- On RPC failure: fall back to manual query
- Return aggregated response
Level 2a Supabase RPC (src/routes/butter_analytics.py:205-221):
result = client.rpc(
"get_user_cache_savings",
{"p_user_id": user_id, "p_days": 30}
).execute()RPC function: get_user_cache_savings Parameters: p_user_id (integer), p_days (integer, hardcoded 30) Expected return columns: total_requests, cache_hits, cache_hit_rate_percent, total_savings_usd, estimated_monthly_savings_usd
If RPC returns data and len(data) > 0: return response using RPC results. If RPC raises Exception: log at DEBUG level and fall through to manual query (NOT an error).
Level 2b Manual Supabase Query (fallback, src/routes/butter_analytics.py:226-244):
result = (
client.table("chat_completion_requests")
.select("metadata")
.eq("user_id", user_id)
.eq("status", "completed")
.gte("created_at", since_date.isoformat())
.execute()
)Table: chat_completion_requests Operation: SELECT Columns: metadata only (minimal data transfer vs full analytics endpoint) Filters: user_id equality, status='completed', created_at >= 30 days ago
Level 3 Manual Aggregation (src/routes/butter_analytics.py:235-255):
Simpler than get_cache_analytics:
- Iterates metadata field only
- Counts metadata.butter_cache_hit truthy values
- Sums metadata.actual_cost_usd for hits
- No per-model breakdown
- Estimated monthly = total_savings (already 30 days, no scaling needed)
If user.preferences.enable_butter_cache == False OR Config.BUTTER_DEV_ENABLED == False:
{
"cache_enabled": false,
"system_enabled": true,
"message": "Cache is disabled. Enable it in settings to start saving on API costs.",
"total_savings_usd": 0.0,
"cache_hit_rate_percent": 0.0
}Message differs based on which flag triggered: user preference vs system disable.
{
"cache_enabled": true,
"system_enabled": true,
"total_requests": 1250,
"cache_hits": 437,
"cache_hit_rate_percent": 34.96,
"total_savings_usd": 12.847,
"estimated_monthly_savings_usd": 12.85
}Same structure but total_savings_usd rounded to 6 decimal places, estimated_monthly_savings_usd = total_savings rounded to 2 decimal places (already 30-day window, no projection applied).
- HTTPException: re-raised (401 from get_user check)
- RPC Exception: caught silently at DEBUG level, triggers fallback query
- All other Exception: logs via sanitize_for_logging, raises HTTP 500 "Failed to retrieve cache summary"
HTTP error codes: 401, 500
RPC: get_user_cache_savings(p_user_id int, p_days int) - PostgreSQL function Table read: chat_completion_requests (select metadata only, filtered)
Redis: Not used In-memory: None
Config.BUTTER_DEV_ENABLED: system-wide enable/disable (env var) user.preferences: JSONB column on users table, key enable_butter_cache (bool, defaults to True when not set)
5 endpoints
Issue: #1645
Primary authentication endpoint using Privy as the identity provider. Handles both new user registration and existing user login in a single call. Extracts identity from Privy linked accounts (email, Google OAuth, GitHub, phone/SMS), performs email quality verification, creates users on first login, and returns an API key.
No auth required. This endpoint is unauthenticated (it IS the auth endpoint).
-
Type:
AuthRateLimitType.LOGIN - Limit: 10 attempts per 15 minutes per IP (sliding window)
-
Key: Client IP (extracted via
get_client_ip()) - Algorithm: In-memory deque, asyncio Lock
-
On exceed: HTTP 429 with
{"error": "Rate limit exceeded", "retry_after": N}+Retry-Afterheader
class PrivyAuthRequest(BaseModel):
user: PrivyUserData # Required
token: str | None = None # Privy access token (not currently validated)
email: str | None = None # Optional top-level email override
privy_access_token: str | None = None
refresh_token: str | None = None
session_update_action: str | None = None
is_new_user: bool | None = None
referral_code: str | None = None # User referral OR partner code (e.g., "REDBEARD")
environment_tag: str | None = "live" # Validated: "live" | "test" | "development"
auto_create_api_key: bool | None = Trueclass PrivyUserData(BaseModel):
id: str # Privy user ID (required, non-empty)
created_at: int
linked_accounts: list[PrivyLinkedAccount] = []
mfa_methods: list[str] = []
has_accepted_terms: bool = False
is_guest: bool = Falseclass PrivyLinkedAccount(BaseModel):
type: str # Normalized: "email", "phone", "google_oauth", "github", etc.
subject: str | None = None
email: str | None = None
address: str | None = None
name: str | None = None
phone_number: str | None # AliasChoices: "phone_number" or "phoneNumber"
verified_at: int | None = None1. request.email (top-level field from frontend)
2. Linked account type "email" → email field
3. Linked account type "google_oauth" → address/email field + display_name
4. Linked account type "phone" → phone_number
5. Linked account type "github" → name as display_name
Auth method priority (set last wins):
- Default:
AuthMethod.EMAIL - GitHub sets to
AuthMethod.GITHUBif no email found - Phone sets to
AuthMethod.PHONEif no email found
Cache check (in-memory): get_cached_user_by_privy_id(request.user.id) — Redis-backed, invalidated on updates
DB fallback (with timeout): users_module.get_user_by_privy_id(request.user.id) — Supabase query:
SELECT * FROM users WHERE privy_user_id = <privy_id> LIMIT 1Timeout: USER_LOOKUP_TIMEOUT seconds (configured constant).
Secondary fallback: If privy_id lookup fails, tries username:
SELECT * FROM users WHERE username = <base_username> LIMIT 1If found by username, updates users.privy_user_id and invalidates cache.
- Fetches active API keys from
api_keys_new:SELECT api_key, is_primary, created_at FROM api_keys_new WHERE user_id = <id> AND is_active = true ORDER BY is_primary DESC, created_at ASC
- Returns primary key if present, else oldest active key
- Detects and rejects temporary API key patterns (pattern check via
_is_temporary_api_key()) - Auto-creates new primary key if none exists and
auto_create_api_key=True - Computes tiered credits (subscription allowance + purchased, in cents for frontend)
- Raises HTTP 503 if user exists but has no API key available
Background tasks:
-
_send_welcome_email_background— sends if email valid and not@privy.user/@privy.placeholder -
_log_auth_activity_background— inserts toactivitytable
-
Email verification via
_get_subscription_status_for_email():- Checks local blocklist →
is_blocked_email_domain()→ HTTP 400 if blocked - Checks local temp email list → marks as "bot"
- Calls Emailable API →
verify_email(email)→ blocksshould_block, marksis_botas "bot" - On API failure: falls back gracefully, allows registration
- Checks local blocklist →
- Generates unique username:
_generate_unique_username()— up to 5 collision retries, then appends random 4-byte hex - Creates user:
users_module.create_enhanced_user()— starts with $5 credits, 3-day trial - Fallback manual insert if
create_enhanced_userfails - Partner/referral code processing (background):
- Partner codes (e.g.,
"REDBEARD"):_apply_partner_trial_background→PartnerTrialService.start_partner_trial() - User codes:
_process_referral_code_background→ updatesusers.referred_by_code
- Partner codes (e.g.,
class PrivyAuthResponse(BaseModel):
success: bool
message: str # "Login successful" or "Account created successfully"
user_id: int | None
api_key: str | None # Raw API key (gw_live_... prefix)
auth_method: AuthMethod | None
privy_user_id: str | None
is_new_user: bool | None
display_name: str | None
email: str | None
phone_number: str | None
credits: float | None # Total credits in dollars
timestamp: datetime | None
subscription_status: str | None # "trial", "active", "bot", "inactive"
tier: str | None # "basic", "pro", "max"
tier_display_name: str | None # "Basic", "Pro", "MAX"
trial_expires_at: str | None # ISO string
subscription_end_date: int | None # Unix timestamp
subscription_allowance: int | None # Monthly allowance in cents
purchased_credits: int | None # One-time credits in cents
total_credits: int | None # Sum in cents
allowance_reset_date: str | None| Scenario | HTTP Status | Detail |
|---|---|---|
| Rate limit exceeded | 429 | {error, message, retry_after} |
| Blocked email domain | 400 | "This email address is not allowed..." |
| User exists but no API key | 503 | "Your account exists but no API key is available..." |
| New user created but no API key | 500 | "Account created but API key generation failed..." |
| Supabase URL misconfigured | 503 | "Service configuration error: Database URL is misconfigured..." |
| General failure | 500 | "Authentication failed: ..." |
Issue: #1646
Direct user registration endpoint (non-Privy). Creates a new user account with username + email, generates an API key, sends a welcome email, and processes optional referral codes. Intended for direct registration flows not using Privy auth.
No auth required. This is a registration endpoint.
-
Type:
AuthRateLimitType.REGISTER - Limit: 3 attempts per hour per IP
- Window: 3600 seconds (1 hour)
- Key: Client IP
- Algorithm: In-memory sliding window, asyncio Lock
-
On exceed: HTTP 429 with
{"error": "Rate limit exceeded", "retry_after": N}+Retry-Afterheader
class UserRegistrationRequest(BaseModel):
username: str # Required
email: EmailStr # Required, Pydantic EmailStr validation
auth_method: AuthMethod = AuthMethod.EMAIL
environment_tag: str = "live"
key_name: str = "Primary Key"
referral_code: str | None = None # Optional user referral codeAuthMethod enum (from src/schemas/common.py):
class AuthMethod(str, Enum):
EMAIL = "email"
GOOGLE = "google"
GITHUB = "github"
PHONE = "phone"
# ... other OAuth methodsStep 1 — Rate limit check:
rate_limit_result = await check_auth_rate_limit(client_ip, AuthRateLimitType.REGISTER)Step 2 — Email quality verification:
subscription_status, should_block = await _get_subscription_status_for_email(request.email)
# should_block=True → HTTP 400
# subscription_status="bot" → marks as bot, still allows registrationProcess:
- Check local blocklist (
is_blocked_email_domain) - Check local temp email list (
is_temporary_email_domain) - Call Emailable API for comprehensive verification
Step 3 — Uniqueness checks (with query timeout):
-- Email check
SELECT id FROM users WHERE email = <email>
-- Username check
SELECT id FROM users WHERE username = <username>Both use safe_query_with_timeout() with AUTH_QUERY_TIMEOUT. Returns HTTP 503 on timeout, HTTP 400 on conflict.
Step 4 — User creation:
user_data = users_module.create_enhanced_user(
username=request.username,
email=request.email,
auth_method=auth_method_str,
privy_user_id=None, # No Privy for direct registration
credits=5, # $5 trial credits
subscription_status=subscription_status,
)Fallback manual insert if create_enhanced_user fails:
fallback_payload = {
"username": request.username,
"email": request.email,
"credits": 5,
"privy_user_id": None,
"auth_method": ...,
"subscription_status": "bot" if is_temp_email else "trial",
"trial_expires_at": (datetime.now(UTC) + timedelta(days=3)).isoformat(),
"tier": "basic",
}
client.table("users").insert(fallback_payload).execute()Then creates API key via create_api_key(user_id, key_name, environment_tag, is_primary=True).
Step 5 — Referral code processing (background task):
if request.referral_code:
background_tasks.add_task(
_process_referral_code_background,
referral_code=request.referral_code,
user_id=user_data["user_id"],
username=request.username,
is_new_user=True,
)Calls track_referral_signup() → updates users.referred_by_code → sends referral notification email.
Step 6 — Welcome email (synchronous, not background):
success = notif_module.enhanced_notification_service.send_welcome_email(...)
if success:
mark_welcome_email_sent(user_data["user_id"]) # UPDATE users SET welcome_email_sent=trueclass UserRegistrationResponse(BaseModel):
user_id: int
username: str
email: str
api_key: str # Raw primary API key
credits: int # Starting credits ($5)
environment_tag: str
scope_permissions: dict[str, list[str]]
auth_method: AuthMethod
subscription_status: SubscriptionStatus # Always "trial" on success
message: str # "Account created successfully"
timestamp: datetime| Field | Value |
|---|---|
credits |
5 (dollars) |
subscription_status |
"trial" (or "bot" for temp emails) |
tier |
"basic" |
trial_expires_at |
now + 3 days |
is_primary key |
True |
welcome_email_sent |
True (if email sent successfully) |
| Scenario | HTTP Status | Detail |
|---|---|---|
| Rate limit exceeded | 429 | {error, message, retry_after} |
| Blocked email | 400 | "This email address is not allowed..." |
| Email already exists | 400 | "User with this email already exists" |
| Username already taken | 400 | "Username already taken" |
| DB timeout on uniqueness check | 503 | "Service temporarily unavailable" |
| User creation failure | 500 | "Failed to create user account" |
| General failure | 500 | "Registration failed: ..." |
| Aspect | POST /auth | POST /auth/register |
|---|---|---|
| Identity provider | Privy (OAuth/social) | Direct (email+username) |
| Rate limit | 10/15min | 3/hour |
| Privy user ID | Stored | None |
| Email from | Linked accounts | Request body |
| Welcome email | Background task | Synchronous |
| Partner codes | Supported | Not supported |
Issue: #1647
Initiates a password reset flow. Looks up a user by email address and sends a reset email via the notification service. Uses a deliberately vague response message to prevent email enumeration attacks.
No auth required. Public endpoint.
-
Type:
AuthRateLimitType.PASSWORD_RESET - Limit: 3 attempts per hour per IP
- Window: 3600 seconds (1 hour)
-
Key: Client IP (from
get_client_ip(raw_request)) - Algorithm: In-memory sliding window, asyncio Lock
-
On exceed: HTTP 429 with
{"error": "Rate limit exceeded", "retry_after": N}+Retry-Afterheader
Note: The email is taken as a query parameter (not a JSON body), as the handler signature is:
async def request_password_reset(email: str, raw_request: Request):This means the request is: POST /auth/password-reset?email=user@example.com
Step 1 — Rate limit check:
rate_limit_result = await check_auth_rate_limit(client_ip, AuthRateLimitType.PASSWORD_RESET)Step 2 — User lookup:
SELECT id, username, email FROM users
WHERE email = <email>Uses direct Supabase client (no timeout wrapper on this query). If user not found, returns generic 200 response (does NOT reveal whether email exists).
Step 3 — Send reset email:
reset_token = notif_module.enhanced_notification_service.send_password_reset_email(
user_id=user["id"],
username=user["username"],
email=user["email"]
)The notification service generates a reset token and sends via Resend email API. The token is stored in the password_reset_tokens table.
On success (user found, email sent):
{"message": "Password reset email sent successfully"}On user not found (email enumeration prevention):
{"message": "If an account with that email exists, a password reset link has been sent."}Both return HTTP 200.
| Scenario | HTTP Status | Detail |
|---|---|---|
| Rate limit exceeded | 429 | {error, message, retry_after} |
| Email service failure | 500 | "Failed to send password reset email" |
| Unhandled exception | 500 | "Internal server error" |
| User not found | 200 | Generic message (intentional — no 404 to prevent enumeration) |
- Constant-time-like response: Both "user found" and "user not found" return similar messages, preventing attackers from enumerating registered emails.
- Rate limiting: 3 attempts/hour/IP prevents email bombing.
-
Token storage: Reset token stored in
password_reset_tokenstable with expiry (used byreset_passwordendpoint).
| Table | Operation | Purpose |
|---|---|---|
users |
SELECT | Look up user by email |
password_reset_tokens |
INSERT (via notification service) | Store reset token with expiry |
- The email query has no timeout wrapper (unlike the registration uniqueness checks)
- The endpoint accepts email as a query parameter, not JSON body (atypical for a POST)
Issue: #1648
Completes the password reset flow. Validates a one-time reset token from the password_reset_tokens table, checks expiry, and marks the token as used. Note: The current implementation does not actually update the password hash — it only marks the token consumed (placeholder implementation).
No auth required. Public endpoint (token itself is the credential).
-
Type:
AuthRateLimitType.PASSWORD_RESET -
Limit: 3 attempts per hour per IP (shared with
/auth/password-reset) - Window: 3600 seconds
- Key: Client IP
-
On exceed: HTTP 429 with
{error, message, retry_after}+Retry-Afterheader
Security rationale: Prevents token enumeration attacks by rate-limiting guesses.
The token is passed as a query parameter (not a JSON body):
async def reset_password(token: str, raw_request: Request):Request: POST /auth/reset-password?token=<reset_token_value>
Step 1 — Rate limit check:
rate_limit_result = await check_auth_rate_limit(client_ip, AuthRateLimitType.PASSWORD_RESET)Step 2 — Token validation:
SELECT * FROM password_reset_tokens
WHERE token = <token>
AND used = falseReturns HTTP 400 if no matching unused token found.
Step 3 — Expiry check:
expires_at = datetime.fromisoformat(token_data["expires_at"].replace("Z", "+00:00"))
if datetime.now(UTC).replace(tzinfo=expires_at.tzinfo) > expires_at:
raise HTTPException(status_code=400, detail="Reset token has expired")Step 4 — Mark token as used:
UPDATE password_reset_tokens
SET used = true
WHERE id = <token_id>| Column | Type | Description |
|---|---|---|
id |
int | Primary key |
token |
str | Opaque reset token value |
user_id |
int | Associated user ID |
used |
bool | Whether token has been consumed |
expires_at |
timestamp | Expiry datetime (ISO string with timezone) |
{"message": "Password reset successfully"}HTTP 200 on success.
| Scenario | HTTP Status | Detail |
|---|---|---|
| Rate limit exceeded | 429 | {error, message, retry_after} |
| Invalid/used token | 400 | "Invalid or expired reset token" |
| Token expired | 400 | "Reset token has expired" |
| Unhandled exception | 500 | "Internal server error" |
Important caveat: The current implementation comment reads:
# Update password (in a real app, you'd hash this)
# For now, we'll just mark the token as usedThis means the endpoint:
- Validates and consumes the token
- Does NOT actually update a password field in the
userstable - Is effectively a placeholder that confirms the token is valid
A complete implementation would need:
- New password in request body
- Password hashing (bcrypt/argon2)
UPDATE users SET password_hash = <hash> WHERE id = <token_data.user_id>
- One-time use: token
usedflag prevents replay attacks - Expiry: tokens have a time-limited validity window
- Rate limiting: prevents brute-force token guessing
- Token from
password_reset_tokens(not JWT — opaque DB-backed token)
Issue: #1688
- Method: GET
-
Path:
/v1/huggingface/author/{author}/models -
Handler:
list_author_models_endpoint()insrc/routes/catalog.py -
Service:
list_models_by_author()insrc/services/huggingface_hub_service.py - Auth: None required (public endpoint)
-
SDK:
huggingface_hubPython SDK (HfApi.list_models(author=...)) - Purpose: Returns all public models published by a specific HuggingFace author or organization
| Parameter | Type | Description |
|---|---|---|
author |
str | HuggingFace username or organization name (e.g. meta-llama, google, mistralai) |
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
limit |
int | 20 | ge=1, le=100 | Max models to return |
@router.get("/huggingface/author/{author}/models")
async def list_author_models_endpoint(
author: str,
limit: int = Query(20, ge=1, le=100),
):
models = await asyncio.to_thread(list_models_by_author, author=author, limit=limit)
return {"author": author, "models": models, "count": len(models)}Located in src/services/huggingface_hub_service.py:
def list_models_by_author(author: str, limit: int = 20) -> list[dict]:
api = get_hf_api_client()
models_iter = api.list_models(
author=author,
limit=limit,
sort="downloads",
direction=-1, # Descending
cardData=True,
)
results = []
for model in models_iter:
normalized = normalize_model_info(model)
if normalized: # Skip private/gated
results.append(normalized)
return resultsHuggingFace API call: GET https://huggingface.co/api/models?author={author}&limit={limit}&sort=downloads&direction=-1
- Filters by
authorfield (exact match on namespace prefix) - Always sorted by downloads descending
- Returns only models where
model.id.startswith(f"{author}/")
def get_hf_api_client() -> HfApi:
return HfApi(token=Config.HUG_API_KEY)Authenticated via HUG_API_KEY for higher rate limits.
def normalize_model_info(model: ModelInfo) -> dict | None:
if getattr(model, "private", False):
return None
if getattr(model, "gated", False):
return None
return {
"id": model.id,
"name": model.id.split("/")[-1].replace("-", " ").replace("_", " ").title(),
"description": getattr(model, "description", "") or "",
"pipeline_tag": getattr(model, "pipeline_tag", None),
"downloads": getattr(model, "downloads", 0),
"likes": getattr(model, "likes", 0),
"created_at": str(getattr(model, "created_at", "")),
"author": model.id.split("/")[0] if "/" in model.id else None,
"tags": getattr(model, "tags", []),
"library_name": getattr(model, "library_name", None),
}Private and gated models are silently excluded. The author field in the response is extracted from the model ID namespace, which will always equal the path {author} parameter for results returned by the author= filter.
This endpoint does not use Redis.
This endpoint does not query Supabase.
{
"author": "meta-llama",
"count": 18,
"models": [
{
"id": "meta-llama/Llama-2-70b-chat-hf",
"name": "Llama 2 70B Chat HF",
"description": "",
"pipeline_tag": "text-generation",
"downloads": 4823912,
"likes": 3201,
"created_at": "2023-07-18 12:00:00+00:00",
"author": "meta-llama",
"tags": ["transformers", "llama", "text-generation"],
"library_name": "transformers"
},
{
"id": "meta-llama/Meta-Llama-3-8B-Instruct",
"name": "Meta Llama 3 8B Instruct",
"description": "",
"pipeline_tag": "text-generation",
"downloads": 12394821,
"likes": 8934,
"created_at": "2024-04-18 00:00:00+00:00",
"author": "meta-llama",
"tags": ["transformers", "safetensors", "llama"],
"library_name": "transformers"
}
]
}| Scenario | HTTP Status | Behavior |
|---|---|---|
| Author not found / no public models | 200 | Returns {"models": [], "count": 0}
|
| All models are private/gated | 200 | Returns {"models": [], "count": 0} (filtered in Python) |
| HuggingFace API rate limit | 500 |
HfHubHTTPError propagated from thread |
| HuggingFace API timeout | 500 | Exception propagated from asyncio.to_thread()
|
Invalid author (special chars) |
200 | HF API returns empty list |
No explicit 404 is raised for unknown authors — HuggingFace API returns an empty list.
| Feature | /discovery |
/search |
/author/{author}/models |
|---|---|---|---|
| Filter | Task type | Text query | Author/org name |
| Sort | Downloads | Downloads | Downloads |
| Auth needed | No | No | No |
| Result scope | All public HF | Search matches | Author's public models |
| 404 for empty | No | No | No (returns empty list) |
- Latency: 200ms–2s (single HuggingFace author filter API call)
- No caching: Every request makes a live API call
-
Server-side filtering:
author=filter applied at HuggingFace servers - Client-side filtering: Private/gated exclusion in Python after response
- Limit: Enforced at HuggingFace API level (server-side pagination)
-
Thread: Blocking SDK call in
asyncio.to_thread()thread pool
20 endpoints
Issue: #1629
These two routes (/api/chat/ai-sdk and /api/chat/ai-sdk-completions) are registered to the same handler function ai_sdk_chat_completion() in src/routes/ai_sdk.py. They provide a Vercel AI SDK-compatible chat completion interface. The handler validates the user, checks trial access, adapts the request format via AISDKChatAdapter, and routes it through the unified ChatInferenceHandler. For streaming requests a StreamingResponse is returned with SSE headers; for non-streaming a standard JSON response is returned. Credit deduction, usage recording, and request metadata saving are handled as background tasks.
Authentication: Depends(get_api_key) — NOT require_admin. Regular user API key authentication.
Auth chain:
-
get_api_key: extracts Bearer token, callsvalidate_api_key_security(api_key, client_ip, referer) -
get_user(api_key): looks up user — HTTP 401 if not found -
validate_trial_access(api_key): checks trial validity — HTTP 403 if not valid
Request Schema (AISDKChatRequest):
{
"model": str, // Required. Format: "provider/model-name"
"messages": [
{ "role": str, "content": str } // Required list
],
"max_tokens": int | null, // Optional
"temperature": float | null, // Optional, 0.0-2.0
"top_p": float | null, // Optional
"frequency_penalty": float | null, // Optional
"presence_penalty": float | null, // Optional
"stream": bool | null // Optional, default false
}
Response Schema (AISDKChatResponse for non-streaming):
{
"choices": [
{
"message": { "role": str, "content": str },
"finish_reason": str | null
}
],
"usage": {
"prompt_tokens": int,
"completion_tokens": int,
"total_tokens": int
}
}
Streaming response: StreamingResponse with media_type="text/event-stream". Headers:
X-Accel-Buffering: noCache-Control: no-cache, no-transformConnection: keep-alive
SSE format: data: {"choices": [{"delta": {"role": "assistant", "content": "..."}}]}\n\n
Final chunks: data: {"choices": [{"finish_reason": "stop"}]}\n\n then data: [DONE]\n\n
Error codes:
| Code | Condition |
|---|---|
| 401 | API key invalid or user not found |
| 403 | Trial access denied (validate_trial_access failed) |
| 500 | General processing error |
| 503 | AI SDK or OpenRouter not configured (ValueError) |
flowchart TD
A([POST /api/chat/ai-sdk]) --> B[get_api_key auth]
B -->|invalid| C[HTTP 401]
B -->|OK| D[get_user api_key]
D -->|not found| E[HTTP 401]
D -->|found| F[validate_trial_access api_key]
F -->|denied| G[HTTP 403]
F -->|OK| H[Generate request_id UUID\nrecord start_time]
H --> I[AISDKChatAdapter.to_internal_request\nConvert AI SDK format to internal]
I --> J[ChatInferenceHandler\napi_key, background_tasks]
J --> K{request.stream?}
K -->|true| L[handler.process_stream internal_request\nAdapter.from_internal_stream]
L --> M[StreamingResponse SSE\nX-Accel-Buffering: no]
K -->|false| N[await handler.process internal_request]
N --> O[adapter.from_internal_response]
O --> P[Return AISDKChatResponse]
subgraph Error Handling
Q[HTTPException] --> R[background_tasks: save_chat_completion_request\nstatus=failed]
S[ValueError] --> T[logger.error\nSentry capture\nHTTP 503]
U[Exception] --> V[logger.error\nSentry capture\nHTTP 500]
end
| Dependency | File | Operation | Details |
|---|---|---|---|
get_api_key |
src/security/deps.py:74 |
Auth | Bearer token extraction + validate_api_key_security |
get_user |
src/db/users.py |
DB read | Look up user by API key |
validate_trial_access |
src/services/trial_validation.py |
Validation | Checks trial validity; returns dict with is_valid, is_trial, is_expired |
AISDKChatAdapter |
src/adapters/chat.py |
Format conversion | Converts AI SDK format -> internal format; converts internal response -> AI SDK format |
ChatInferenceHandler |
src/handlers/chat_handler.py |
Inference routing | Unified handler for all chat inference. Handles provider routing, streaming, non-streaming. |
handler.process(internal_request) |
src/handlers/chat_handler.py |
Inference | Awaitable non-streaming inference call |
handler.process_stream(internal_request) |
src/handlers/chat_handler.py |
Streaming | Returns async generator of internal stream chunks |
adapter.from_internal_stream |
src/adapters/chat.py |
Format conversion | Converts internal stream -> AI SDK SSE format async generator |
save_chat_completion_request |
src/db/chat_completion_requests.py |
DB write (background) | Saves request metadata to chat_completion_requests table. Called via background_tasks.add_task() on both success and failure paths. |
deduct_credits |
src/db/users.py |
DB write | Credit deduction (executed in legacy code path after line 361 — currently unreachable in normal execution flow due to early return at line 360) |
record_usage |
src/db/users.py |
DB write | Records usage stats (same legacy code path issue) |
calculate_cost |
src/services/pricing.py |
Computation | Calculates USD cost from model name + token counts |
track_trial_usage |
src/services/trial_validation.py |
DB write | Tracks trial token usage |
sentry_sdk.capture_exception |
sentry_sdk | Error capture | Fires on ValueError (503) and unexpected Exception (500) |
_check_trial_override |
src/routes/ai_sdk.py:158 |
Logic | Defense-in-depth: overrides is_trial if user has active Stripe subscription |
Important code note (lines 360-441): There is dead/legacy code after return processed at line 360. The credit deduction, trial tracking, and save_chat_completion_request background task calls in the non-streaming path (lines 363–441) are unreachable in normal execution flow because the return processed statement at line 360 exits the function. The streaming path and error paths do save metadata correctly via background_tasks.add_task().
On success (non-streaming):
- Background task:
save_chat_completion_requestwrites tochat_completion_requeststable (request_id, model_name, input_tokens, output_tokens, processing_time_ms, status="completed", user_id, provider_name, api_key_id) - Legacy dead code (unreachable): deduct_credits, record_usage, track_trial_usage
On failure (HTTPException, ValueError, Exception):
- Background task:
save_chat_completion_requestwrites tochat_completion_requeststable with status="failed" and error_message -
sentry_sdk.capture_exceptionon ValueError and Exception paths
Always:
-
audit_logger.log_api_key_usageon every authenticated call - Request correlation ID (
uuid4()) generated and used for distributed tracing
Streaming side effects:
- Credit deduction and usage recording occur AFTER streaming completes, inside the async generator function
- Token estimation fallback (1 token ≈ 4 chars) applied when provider doesn't return usage data
-
capture_payment_errorfromsrc/utils/sentry_context.pycalled if post-stream credit deduction fails
Issue: #1630
These two routes (/api/chat/ai-sdk and /api/chat/ai-sdk-completions) are registered to the same handler function ai_sdk_chat_completion() in src/routes/ai_sdk.py. They provide a Vercel AI SDK-compatible chat completion interface. The handler validates the user, checks trial access, adapts the request format via AISDKChatAdapter, and routes it through the unified ChatInferenceHandler. For streaming requests a StreamingResponse is returned with SSE headers; for non-streaming a standard JSON response is returned. Credit deduction, usage recording, and request metadata saving are handled as background tasks.
Authentication: Depends(get_api_key) — NOT require_admin. Regular user API key authentication.
Auth chain:
-
get_api_key: extracts Bearer token, callsvalidate_api_key_security(api_key, client_ip, referer) -
get_user(api_key): looks up user — HTTP 401 if not found -
validate_trial_access(api_key): checks trial validity — HTTP 403 if not valid
Request Schema (AISDKChatRequest):
{
"model": str, // Required. Format: "provider/model-name"
"messages": [
{ "role": str, "content": str } // Required list
],
"max_tokens": int | null, // Optional
"temperature": float | null, // Optional, 0.0-2.0
"top_p": float | null, // Optional
"frequency_penalty": float | null, // Optional
"presence_penalty": float | null, // Optional
"stream": bool | null // Optional, default false
}
Response Schema (AISDKChatResponse for non-streaming):
{
"choices": [
{
"message": { "role": str, "content": str },
"finish_reason": str | null
}
],
"usage": {
"prompt_tokens": int,
"completion_tokens": int,
"total_tokens": int
}
}
Streaming response: StreamingResponse with media_type="text/event-stream". Headers:
X-Accel-Buffering: noCache-Control: no-cache, no-transformConnection: keep-alive
SSE format: data: {"choices": [{"delta": {"role": "assistant", "content": "..."}}]}\n\n
Final chunks: data: {"choices": [{"finish_reason": "stop"}]}\n\n then data: [DONE]\n\n
Error codes:
| Code | Condition |
|---|---|
| 401 | API key invalid or user not found |
| 403 | Trial access denied (validate_trial_access failed) |
| 500 | General processing error |
| 503 | AI SDK or OpenRouter not configured (ValueError) |
flowchart TD
A([POST /api/chat/ai-sdk]) --> B[get_api_key auth]
B -->|invalid| C[HTTP 401]
B -->|OK| D[get_user api_key]
D -->|not found| E[HTTP 401]
D -->|found| F[validate_trial_access api_key]
F -->|denied| G[HTTP 403]
F -->|OK| H[Generate request_id UUID\nrecord start_time]
H --> I[AISDKChatAdapter.to_internal_request\nConvert AI SDK format to internal]
I --> J[ChatInferenceHandler\napi_key, background_tasks]
J --> K{request.stream?}
K -->|true| L[handler.process_stream internal_request\nAdapter.from_internal_stream]
L --> M[StreamingResponse SSE\nX-Accel-Buffering: no]
K -->|false| N[await handler.process internal_request]
N --> O[adapter.from_internal_response]
O --> P[Return AISDKChatResponse]
subgraph Error Handling
Q[HTTPException] --> R[background_tasks: save_chat_completion_request\nstatus=failed]
S[ValueError] --> T[logger.error\nSentry capture\nHTTP 503]
U[Exception] --> V[logger.error\nSentry capture\nHTTP 500]
end
| Dependency | File | Operation | Details |
|---|---|---|---|
get_api_key |
src/security/deps.py:74 |
Auth | Bearer token extraction + validate_api_key_security |
get_user |
src/db/users.py |
DB read | Look up user by API key |
validate_trial_access |
src/services/trial_validation.py |
Validation | Checks trial validity; returns dict with is_valid, is_trial, is_expired |
AISDKChatAdapter |
src/adapters/chat.py |
Format conversion | Converts AI SDK format -> internal format; converts internal response -> AI SDK format |
ChatInferenceHandler |
src/handlers/chat_handler.py |
Inference routing | Unified handler for all chat inference. Handles provider routing, streaming, non-streaming. |
handler.process(internal_request) |
src/handlers/chat_handler.py |
Inference | Awaitable non-streaming inference call |
handler.process_stream(internal_request) |
src/handlers/chat_handler.py |
Streaming | Returns async generator of internal stream chunks |
adapter.from_internal_stream |
src/adapters/chat.py |
Format conversion | Converts internal stream -> AI SDK SSE format async generator |
save_chat_completion_request |
src/db/chat_completion_requests.py |
DB write (background) | Saves request metadata to chat_completion_requests table. Called via background_tasks.add_task() on both success and failure paths. |
deduct_credits |
src/db/users.py |
DB write | Credit deduction (executed in legacy code path after line 361 — currently unreachable in normal execution flow due to early return at line 360) |
record_usage |
src/db/users.py |
DB write | Records usage stats (same legacy code path issue) |
calculate_cost |
src/services/pricing.py |
Computation | Calculates USD cost from model name + token counts |
track_trial_usage |
src/services/trial_validation.py |
DB write | Tracks trial token usage |
sentry_sdk.capture_exception |
sentry_sdk | Error capture | Fires on ValueError (503) and unexpected Exception (500) |
_check_trial_override |
src/routes/ai_sdk.py:158 |
Logic | Defense-in-depth: overrides is_trial if user has active Stripe subscription |
Important code note (lines 360-441): There is dead/legacy code after return processed at line 360. The credit deduction, trial tracking, and save_chat_completion_request background task calls in the non-streaming path (lines 363–441) are unreachable in normal execution flow because the return processed statement at line 360 exits the function. The streaming path and error paths do save metadata correctly via background_tasks.add_task().
On success (non-streaming):
- Background task:
save_chat_completion_requestwrites tochat_completion_requeststable (request_id, model_name, input_tokens, output_tokens, processing_time_ms, status="completed", user_id, provider_name, api_key_id) - Legacy dead code (unreachable): deduct_credits, record_usage, track_trial_usage
On failure (HTTPException, ValueError, Exception):
- Background task:
save_chat_completion_requestwrites tochat_completion_requeststable with status="failed" and error_message -
sentry_sdk.capture_exceptionon ValueError and Exception paths
Always:
-
audit_logger.log_api_key_usageon every authenticated call - Request correlation ID (
uuid4()) generated and used for distributed tracing
Streaming side effects:
- Credit deduction and usage recording occur AFTER streaming completes, inside the async generator function
- Token estimation fallback (1 token ≈ 4 chars) applied when provider doesn't return usage data
-
capture_payment_errorfromsrc/utils/sentry_context.pycalled if post-stream credit deduction fails
Issue: #1689
- Method: POST
-
Path:
/v1/chat/completions -
Handler:
chat_completions()insrc/routes/chat.py - Auth: Optional — supports both anonymous (no key) and authenticated (API key) users
- Purpose: Primary chat inference endpoint. Routes requests to 30+ AI providers with failover, credit billing, streaming, rate limiting, web search injection, and full observability
Defined in src/schemas/proxy.py:
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
model |
str | Required | — | Model ID (e.g. openrouter/meta-llama/llama-3.1-70b) |
messages |
list[Message] | Required | min_length=1 | Conversation messages |
max_tokens |
int | 4096 | — | Max tokens to generate |
temperature |
float | 1.0 | ge=0, le=2 | Sampling temperature |
top_p |
float | 1.0 | ge=0, le=1 | Nucleus sampling probability |
n |
int | 1 | ge=1 | Number of completions |
stop |
str|list | None | max 4 if list | Stop sequences |
frequency_penalty |
float | 0.0 | ge=-2, le=2 | Frequency penalty |
presence_penalty |
float | 0.0 | ge=-2, le=2 | Presence penalty |
stream |
bool | False | — | Enable SSE streaming |
stream_options |
dict | None | — | Streaming options |
tools |
list | None | — | Function/tool definitions |
tool_choice |
any | None | — | Tool selection strategy |
parallel_tool_calls |
bool | True | — | Allow parallel tool calls |
response_format |
dict | None | — | Output format (e.g. JSON mode) |
logprobs |
bool | None | — | Return log probabilities |
top_logprobs |
int | None | ge=0, le=20 | Number of top logprobs |
logit_bias |
dict | None | — | Token bias map |
seed |
int | None | — | Random seed |
user |
str | None | — | User identifier |
service_tier |
str | None | — | Service tier hint |
provider |
str | None | — | Force specific provider |
auto_web_search |
str | "auto" |
— | Web search mode: auto/always/never |
web_search_threshold |
float | 0.5 | ge=0, le=1 | Confidence threshold for auto search |
Message Schema (src/schemas/proxy.py):
-
role: str — validated against{system, user, assistant, tool, function, developer} -
content: str|list|None -
name: str (optional) -
tool_calls: list (optional) -
tool_call_id: str (optional)
Config: extra="allow" — unknown fields passed through to providers.
@router.post("/v1/chat/completions")
async def chat_completions(
request: ProxyRequest,
http_request: Request,
background_tasks: BackgroundTasks,
api_key: str | None = Depends(get_optional_api_key),
):Anonymous path (no API key):
-
validate_anonymous_request(request, http_request)— IP rate limit + model whitelist check - Detect provider + transform model ID
- Route to anonymous provider handler
Authenticated path (API key present):
- Parallel auth:
asyncio.gather(get_user_task, get_api_key_id_task, get_trial_task) - Trial validation:
validate_trial_request()if on trial plan - Plan check:
check_user_plan()— verify subscription active - Rate limiting:
check_rate_limit()— Redis-based per-key/per-user limits - Credit check:
check_sufficient_credits()— Supabase balance lookup - Auto web search injection (if enabled)
- Router detection: auto/general/code router prefix
- Provider detection + model ID transformation
- Failover chain construction
- Health-based provider selection
- Streaming or non-streaming dispatch
- Background tasks: credit deduction, activity log, chat history, health capture
Located in src/services/anonymous_rate_limiter.py:
-
Redis operation:
INCR anon_rate:{ip_hash}:{minute_bucket}with TTL 60s -
Model whitelist: Checks model ID against
ANONYMOUS_ALLOWED_MODELSlist - Raises HTTP 429 if rate limit exceeded, HTTP 403 if model not in whitelist
Three concurrent tasks:
user_task = get_user_by_api_key(api_key) # Supabase: users table
api_key_id_task = get_api_key_id(api_key) # Supabase: api_keys table
trial_task = get_user_trial_status(user_id) # Supabase: trials tableLocated in src/services/rate_limiting.py:
# Redis pipeline
pipe = redis.pipeline()
pipe.incr(f"rate_limit:{api_key_id}:{minute_bucket}")
pipe.expire(f"rate_limit:{api_key_id}:{minute_bucket}", 60)
pipe.get(f"rate_limit_config:{user_id}")
results = await pipe.execute()- Key:
rate_limit:{api_key_id}:{minute_bucket} - Config key:
rate_limit_config:{user_id}(custom limits) - Default limits from plan tier
model_id = request.model
if model_id.startswith("router:"):
router_type = model_id.split(":")[1] # "auto", "general", "code"
# Select actual model via NotDiamond/benchmark routing
actual_model = await select_router_model(router_type, request.messages)
request.model = actual_modelLocated in src/services/model_transformations.py:
def detect_provider_from_model_id(model_id: str) -> str:
prefix = model_id.split("/")[0]
return PROVIDER_PREFIX_MAP.get(prefix, "openrouter") # Default: openrouterMaps prefixes like featherless/, chutes/, deepinfra/ to provider slugs.
Located in src/services/provider_failover.py:
def build_provider_failover_chain(primary_provider: str, model_id: str) -> list[str]:
chain = [primary_provider]
for fallback in PROVIDER_FAILOVER_MAP.get(primary_provider, []):
if not is_circuit_open(fallback): # Check circuit breaker
chain.append(fallback)
return chain-
Redis:
GET circuit_breaker:{provider}per fallback provider
async def stream_generator():
async for chunk in provider_stream:
# Track time-to-first-chunk
if first_chunk:
ttfc = time.time() - start_time
track_time_to_first_chunk(provider, model, ttfc) # Prometheus
first_chunk = False
# Normalize SSE chunk across providers
normalized = StreamNormalizer.normalize(chunk, provider)
yield f"data: {json.dumps(normalized)}\n\n"
yield "data: [DONE]\n\n"
# Non-blocking background post-processing
asyncio.create_task(_process_stream_completion_background(
user_id, api_key_id, model, tokens, cost, session_id
))async def _process_stream_completion_background(...):
# 1. Deduct credits
await deduct_credits(user_id, cost)
# Supabase: UPDATE credit_transactions INSERT + users UPDATE balance
# 2. Log activity
await log_activity(user_id, api_key_id, model, tokens, cost)
# Supabase: INSERT INTO activity (user_id, api_key_id, model, ...)
# 3. Save chat history
if session_id:
await save_chat_message(session_id, "assistant", response_content, model, tokens)
# Supabase: INSERT INTO chat_messages + UPDATE chat_sessions
# 4. Capture model health
capture_model_health(model, provider, success=True, latency=latency)
# Redis: LPUSH model_health:{model} + EXPIRE| Metric | Type | Labels | When Recorded |
|---|---|---|---|
model_inference_requests |
Counter | provider, model, status (success/error) | Every request completion |
model_inference_duration |
Histogram | provider, model | Request duration (buckets: 0.1–60s) |
tokens_used |
Counter | provider, model, token_type (prompt/completion) | After completion |
credits_used |
Counter | provider, model | After credit deduction |
api_cost_usd_total |
Counter | provider, model | After cost calculation |
api_cost_per_request |
Histogram | provider, model | Per-request cost distribution |
TTFC (time-to-first-chunk) tracked via track_time_to_first_chunk() (custom Prometheus metric).
| Operation | Key Pattern | Purpose |
|---|---|---|
| INCR + EXPIRE | anon_rate:{ip_hash}:{minute} |
Anonymous rate limiting |
| INCR + EXPIRE | rate_limit:{api_key_id}:{minute} |
Authenticated rate limiting |
| GET | rate_limit_config:{user_id} |
Custom rate limit config |
| GET | circuit_breaker:{provider} |
Provider circuit breaker state |
| GET | model_catalog:{provider} |
Model metadata lookup |
| LPUSH + EXPIRE | model_health:{model} |
Health data recording |
| Table | Operation | Trigger |
|---|---|---|
users |
SELECT by API key | Auth |
api_keys |
SELECT by key hash | Auth |
trials |
SELECT by user_id | Trial validation |
plans |
SELECT by user_id | Plan check |
rate_limits |
SELECT by user_id | Custom rate limits |
credit_transactions |
INSERT | Credit deduction |
users |
UPDATE credit_balance
|
Credit deduction |
activity |
INSERT | Activity logging |
chat_sessions |
INSERT/UPDATE | Chat history |
chat_messages |
INSERT | Chat history |
model_health |
INSERT | Health capture |
| Scenario | HTTP Status | Response |
|---|---|---|
| No API key + model not in whitelist | 403 | {"detail": "Model not available for anonymous use"} |
| Anonymous rate limit exceeded | 429 | {"detail": "Rate limit exceeded"} |
| Insufficient credits | 402 | {"detail": "Insufficient credits"} |
| Rate limit exceeded | 429 | {"detail": "Rate limit exceeded. Limit: X RPM"} |
| All providers in failover fail | 502 | {"detail": "All providers failed"} |
| Provider timeout | 504 | {"detail": "Request timed out"} |
| Trial expired | 403 | {"detail": "Trial expired"} |
| Invalid messages | 422 | FastAPI/Pydantic validation |
Maps provider slugs to async handler functions:
PROVIDER_ROUTING = {
"openrouter": {"request": openrouter_request, "stream": openrouter_stream, ...},
"featherless": {"request": featherless_request, "stream": featherless_stream, ...},
"chutes": {...}, "deepinfra": {...}, "fireworks": {...},
"together": {...}, "groq": {...}, "cerebras": {...},
# ... 25+ total providers
}Issue: #1691
- Method: GET
-
Path:
/v1/chat/sessions -
Handler:
get_sessions()insrc/routes/chat_history.py -
Auth: Required — API key via
Depends(get_api_key) - Purpose: Returns a paginated list of the authenticated user's chat sessions, ordered by most recently updated. Each session includes basic metadata (id, title, model, timestamps) without message content.
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
limit |
int | 20 | ge=1, le=100 | Max sessions to return |
offset |
int | 0 | ge=0 | Pagination offset |
@router.get("/sessions")
async def get_sessions(
limit: int = Query(20, ge=1, le=100),
offset: int = Query(0, ge=0),
api_key: str = Depends(get_api_key),
):
user = await get_user(api_key)
if not user:
raise HTTPException(status_code=401, detail="User not found")
sessions = await get_user_chat_sessions(
user_id=user["id"],
limit=limit,
offset=offset,
)
return ChatSessionsListResponse(
success=True,
data=sessions,
count=len(sessions),
message=f"Found {len(sessions)} sessions",
)Located in src/services/user_lookup_cache.py:
async def get_user(api_key: str) -> dict | None:
# Check in-process LRU cache first
cached = _user_cache.get(api_key)
if cached:
return cached
# Fetch from DB
user = await get_user_by_api_key(api_key)
if user:
_user_cache[api_key] = user # Cache for TTL
return userCache: cachetools.TTLCache, max 512 entries, TTL 300s (5 minutes)
Reduces Supabase queries by ~95% for repeat requests from same API key.
Supabase Query (on cache miss):
supabase.table("api_keys")
.select("user_id, is_active")
.eq("key_hash", hmac_sha256(api_key))
.eq("is_active", True)
.single()
.execute()
# Then:
supabase.table("users")
.select("*")
.eq("id", user_id)
.single()
.execute()Located in src/db/chat_history.py:
@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
async def get_user_chat_sessions(
user_id: str,
limit: int = 20,
offset: int = 0,
) -> list[dict]:
result = (
supabase.table("chat_sessions")
.select("*")
.eq("user_id", user_id)
.eq("is_active", True)
.order("updated_at", desc=True)
.range(offset, offset + limit - 1)
.execute()
)
return result.data or []Supabase Query:
- Table:
chat_sessions - Operation:
SELECT * - Filters:
user_id = {user_id}ANDis_active = True - Order:
updated_at DESC - Pagination:
.range(offset, offset + limit - 1)(Supabase server-side)
@with_retry decorator:
- Max attempts: 3
- Initial delay: 0.1s (exponential backoff)
- Max delay: 2.0s
- Retries on:
RemoteProtocolError,ConnectError,ReadTimeout
async def _execute_with_connection_retry(func, *args, max_retries=3, **kwargs):
for attempt in range(max_retries):
try:
return await func(*args, **kwargs)
except (RemoteProtocolError, ConnectError, ReadTimeout) as e:
if attempt == max_retries - 1:
raise
delay = 0.1 * (2 ** attempt) # 0.1s, 0.2s, 0.4s
await asyncio.sleep(delay)This endpoint does not use Redis. User lookup uses in-process TTLCache only.
| Table | Operation | Columns | Filters | Notes |
|---|---|---|---|---|
api_keys |
SELECT | user_id, is_active |
key_hash = ? AND is_active = True
|
On user cache miss |
users |
SELECT | * |
id = ? |
On user cache miss |
chat_sessions |
SELECT | * |
user_id = ? AND is_active = True ORDER BY updated_at DESC |
Always |
Defined in src/schemas/chat.py:
{
"success": true,
"count": 3,
"message": "Found 3 sessions",
"data": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"user_id": "user-uuid-here",
"title": "Python debugging help",
"model": "openrouter/meta-llama/llama-3.1-70b-instruct",
"created_at": "2026-03-04T08:00:00Z",
"updated_at": "2026-03-04T09:30:00Z",
"is_active": true,
"messages": []
}
]
}ChatSession Schema fields (src/schemas/chat.py):
-
id: UUID string -
user_id: UUID string -
title: str or None -
model: str or None -
created_at: datetime string -
updated_at: datetime string -
is_active: bool (default True) -
messages: list[ChatMessage] (empty for list endpoint — populated only in detail endpoint)
| Scenario | HTTP Status | Response |
|---|---|---|
| Missing API key | 401 | {"detail": "API key required"} |
| Invalid API key | 401 | {"detail": "Invalid API key"} |
| User not found | 401 | {"detail": "User not found"} |
| No sessions exist | 200 | {"success": true, "count": 0, "data": []} |
| DB connection error (after retries) | 500 | Exception propagated |
limit > 100 |
422 | FastAPI validation error |
- User cache hit: ~1ms (in-process TTLCache lookup)
- User cache miss: ~50–200ms (2 Supabase queries for key lookup + user)
-
Session query: ~20–100ms (indexed by
user_id+is_active) - Total warm path: ~25–110ms
- Retry overhead: Up to 0.7s additional (3 retries: 0.1 + 0.2 + 0.4s) on transient errors
-
Pagination: Server-side via Supabase
.range()— no Python-side slicing
Key columns:
-
idUUID PRIMARY KEY -
user_idUUID REFERENCES users(id) -
titleTEXT -
modelTEXT -
is_activeBOOLEAN DEFAULT true -
created_atTIMESTAMPTZ DEFAULT now() -
updated_atTIMESTAMPTZ DEFAULT now()
Index: (user_id, is_active, updated_at DESC) for efficient pagination.
Issue: #1692
- Method: GET
-
Path:
/v1/chat/sessions/{session_id} -
Handler:
get_session()insrc/routes/chat_history.py -
Auth: Required — API key via
Depends(get_api_key) - Purpose: Returns a single chat session with its full message history. The session must belong to the authenticated user.
| Parameter | Type | Description |
|---|---|---|
session_id |
str | UUID of the chat session |
@router.get("/sessions/{session_id}")
async def get_session(
session_id: str,
api_key: str = Depends(get_api_key),
):
user = await get_user(api_key)
if not user:
raise HTTPException(status_code=401, detail="User not found")
session = await get_chat_session(
session_id=session_id,
user_id=user["id"],
)
if not session:
raise HTTPException(status_code=404, detail="Session not found")
return ChatSessionResponse(
success=True,
data=session,
message="Session retrieved successfully",
)- Resolves user from API key (with in-process cache)
- Calls
get_chat_session(session_id, user_id)— both filters applied for ownership check - Returns 404 if session not found or doesn't belong to user
Located in src/services/user_lookup_cache.py:
- In-process TTLCache: max 512 entries, TTL 300s
-
Cache miss Supabase queries:
api_keysSELECT +usersSELECT
Located in src/db/chat_history.py:
@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
async def get_chat_session(session_id: str, user_id: str) -> dict | None:
# Query 1: Fetch session
session_result = (
supabase.table("chat_sessions")
.select("*")
.eq("id", session_id)
.eq("user_id", user_id) # Ownership enforcement
.eq("is_active", True)
.single()
.execute()
)
if not session_result.data:
return None
session = session_result.data
# Query 2: Fetch messages for this session
messages_result = (
supabase.table("chat_messages")
.select("*")
.eq("session_id", session_id)
.order("created_at", desc=False) # Chronological order
.execute()
)
session["messages"] = messages_result.data or []
return sessionTwo sequential Supabase queries per request:
- Fetch session metadata (with ownership check)
- Fetch all messages for that session in chronological order
The user_id filter in the session query (eq("user_id", user_id)) serves as the authorization check. A session belonging to a different user will return None → HTTP 404, preventing information leakage.
@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0):
- Retries on
RemoteProtocolError,ConnectError,ReadTimeout - Exponential backoff: 0.1s, 0.2s, 0.4s between attempts
- After max attempts, exception propagates (→ HTTP 500)
| Table | Operation | Columns | Filters | Notes |
|---|---|---|---|---|
api_keys |
SELECT | user_id, is_active |
key_hash = ? AND is_active = True
|
User cache miss only |
users |
SELECT | * |
id = ? |
User cache miss only |
chat_sessions |
SELECT | * |
id = ? AND user_id = ? AND is_active = True
|
Always |
chat_messages |
SELECT | * |
session_id = ? ORDER BY created_at ASC |
Always (if session found) |
This endpoint does not use Redis. All data from Supabase, user lookup from in-process TTLCache.
Defined in src/schemas/chat.py:
{
"success": true,
"message": "Session retrieved successfully",
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"user_id": "user-uuid-here",
"title": "Python debugging help",
"model": "openrouter/meta-llama/llama-3.1-70b-instruct",
"created_at": "2026-03-04T08:00:00Z",
"updated_at": "2026-03-04T09:30:00Z",
"is_active": true,
"messages": [
{
"id": "msg-uuid-1",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"role": "user",
"content": "Why is my Python code throwing a KeyError?",
"model": null,
"tokens": 0,
"created_at": "2026-03-04T08:00:05Z"
},
{
"id": "msg-uuid-2",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"role": "assistant",
"content": "A KeyError occurs when you try to access a dictionary key that doesn't exist...",
"model": "openrouter/meta-llama/llama-3.1-70b-instruct",
"tokens": 147,
"created_at": "2026-03-04T08:00:08Z"
}
]
}
}ChatMessage Schema fields:
-
id: UUID string -
session_id: UUID string -
role: str (user,assistant,system,tool) -
content: str -
model: str or None (which model generated the message) -
tokens: int (default 0) -
created_at: datetime string
| Scenario | HTTP Status | Response |
|---|---|---|
| Missing API key | 401 | {"detail": "API key required"} |
| Invalid API key | 401 | {"detail": "Invalid API key"} |
| User not found | 401 | {"detail": "User not found"} |
| Session not found | 404 | {"detail": "Session not found"} |
| Session belongs to different user | 404 |
{"detail": "Session not found"} (same as not found — no info leakage) |
Deleted session (is_active=False) |
404 | {"detail": "Session not found"} |
| DB error after retries | 500 | Exception propagated |
- User cache hit: ~1ms (TTLCache)
- User cache miss: ~50–200ms (2 Supabase queries)
-
Session query: ~20–80ms (indexed by
id+user_id) -
Messages query: ~20–200ms (depends on message count; indexed by
session_id+created_at) - Total warm path: ~45–285ms
- Message volume: No pagination on messages — all messages returned for the session
- Large sessions: Sessions with 1000+ messages may have significant response size
Key columns:
-
idUUID PRIMARY KEY -
session_idUUID REFERENCES chat_sessions(id) -
roleTEXT NOT NULL -
contentTEXT NOT NULL -
modelTEXT -
tokensINTEGER DEFAULT 0 -
created_atTIMESTAMPTZ DEFAULT now()
Index: (session_id, created_at ASC) for efficient chronological message retrieval.
Issue: #1693
- Method: GET
-
Path:
/v1/chat/stats -
Handler:
get_stats()insrc/routes/chat_history.py -
Auth: Required — API key via
Depends(get_api_key) - Purpose: Returns aggregate statistics about the authenticated user's chat history: total session count, total message count, and total tokens used across all sessions
None.
@router.get("/stats")
async def get_stats(
api_key: str = Depends(get_api_key),
):
user = await get_user(api_key)
if not user:
raise HTTPException(status_code=401, detail="User not found")
stats = await get_chat_session_stats(user_id=user["id"])
return ChatSessionStatsResponse(
success=True,
stats=stats,
message="Chat statistics retrieved successfully",
)Located in src/services/user_lookup_cache.py:
- In-process TTLCache: max 512 entries, TTL 300s
- Cache miss: 2 Supabase queries (api_keys + users tables)
Located in src/db/chat_history.py:
@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
async def get_chat_session_stats(user_id: str) -> dict:
# Query 1: Count active sessions
sessions_result = (
supabase.table("chat_sessions")
.select("id", count="exact")
.eq("user_id", user_id)
.eq("is_active", True)
.execute()
)
session_count = sessions_result.count or 0
# Query 2: Count total messages across all user's sessions
messages_result = (
supabase.table("chat_messages")
.select("id", count="exact")
.eq("chat_sessions.user_id", user_id) # JOIN filter
.execute()
)
message_count = messages_result.count or 0
# Query 3: Sum total tokens across all user's messages
tokens_result = (
supabase.table("chat_messages")
.select("tokens, chat_sessions!inner(user_id)")
.eq("chat_sessions.user_id", user_id) # JOIN on chat_sessions
.execute()
)
total_tokens = sum(
row.get("tokens", 0) or 0
for row in (tokens_result.data or [])
)
return {
"total_sessions": session_count,
"total_messages": message_count,
"total_tokens": total_tokens,
}Three sequential Supabase queries:
- COUNT of active sessions for user
- COUNT of messages via JOIN with chat_sessions (filtered by user_id)
- SUM of tokens via JOIN with chat_sessions (fetches all rows, sums in Python)
@with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0):
- Retries all 3 queries as a unit on transient connection errors
- Exponential backoff: 0.1s, 0.2s, 0.4s
The token sum is computed in Python application layer (not SQL SUM):
total_tokens = sum(
row.get("tokens", 0) or 0
for row in (tokens_result.data or [])
)This fetches ALL message rows and sums locally. For users with many messages, this can return large amounts of data.
| Table | Operation | Columns | Filters | Count Mode |
|---|---|---|---|---|
api_keys |
SELECT | user_id, is_active |
key_hash = ? AND is_active = True
|
User cache miss |
users |
SELECT | * |
id = ? |
User cache miss |
chat_sessions |
SELECT (count) | id |
user_id = ? AND is_active = True
|
count="exact" (Supabase COUNT) |
chat_messages |
SELECT (count) | id |
JOIN chat_sessions.user_id = ?
|
count="exact" (Supabase COUNT) |
chat_messages |
SELECT | tokens, chat_sessions(user_id) |
JOIN chat_sessions.user_id = ?
|
Fetch all for Python sum |
This endpoint does not use Redis. Stats are computed fresh from Supabase on each request.
Defined in src/schemas/chat.py:
{
"success": true,
"message": "Chat statistics retrieved successfully",
"stats": {
"total_sessions": 47,
"total_messages": 1293,
"total_tokens": 842750
}
}Stats fields:
-
total_sessions: Count of active (is_active=True) chat sessions -
total_messages: Total message count across all sessions -
total_tokens: Sum oftokenscolumn across all chat_messages for this user
| Scenario | HTTP Status | Response |
|---|---|---|
| Missing API key | 401 | {"detail": "API key required"} |
| Invalid API key | 401 | {"detail": "Invalid API key"} |
| User not found | 401 | {"detail": "User not found"} |
| No sessions/messages | 200 | {"stats": {"total_sessions": 0, "total_messages": 0, "total_tokens": 0}} |
| DB error after retries | 500 | Exception propagated |
Message tokens field is None |
200 |
None or 0 guard handles gracefully |
- User cache hit: ~1ms (TTLCache)
- User cache miss: ~50–200ms (2 Supabase queries)
- Session count query: ~20–50ms (COUNT with user_id index)
- Message count query: ~30–100ms (COUNT with JOIN)
- Token sum query: ~50ms–5s+ (fetches ALL message rows with tokens; grows linearly with message count)
- Total warm path: ~100–400ms for typical users
-
Scalability concern: The token sum query fetches all message rows in Python memory. Users with 10,000+ messages may experience slow responses. A SQL
SUM(tokens)aggregation would be more efficient.
Replace the Python-side token sum with a Supabase RPC call:
SELECT SUM(m.tokens)
FROM chat_messages m
JOIN chat_sessions s ON m.session_id = s.id
WHERE s.user_id = $1 AND s.is_active = TRUE;This would reduce data transfer from O(n messages) to a single integer.
chat_sessions relevant columns:
-
id,user_id,is_active
chat_messages relevant columns:
-
id,session_id,tokens(INTEGER, stores token count for each message)
Session-to-messages relationship: chat_messages.session_id → chat_sessions.id
User-to-sessions relationship: chat_sessions.user_id → users.id
Issue: #1694
Returns the authenticated user's feedback history with optional filtering by feedback type, session ID, and model name. Supports pagination via limit and offset query parameters.
Route: GET /v1/chat/feedback
Router prefix: /v1/chat
Tags: chat-history
Response model: MessageFeedbackListResponse
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
feedback_type |
str | None |
None |
None (any string accepted) | Filter by feedback type (thumbs_up, thumbs_down, regenerate) |
session_id |
int | None |
None |
None | Filter by chat session ID |
model |
str | None |
None |
None | Filter by model name |
limit |
int |
50 |
ge=1, le=100 |
Max records to return |
offset |
int |
0 |
ge=0 |
Pagination offset |
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
required | Operation success flag |
data |
list[MessageFeedback] |
required | List of feedback records |
count |
int |
required | Number of records returned |
message |
str | None |
None |
Human-readable message |
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
id |
int | None |
None |
- | Feedback record ID |
session_id |
int | None |
None |
- | Associated session |
message_id |
int | None |
None |
- | Associated message |
user_id |
int |
required | - | User who submitted |
feedback_type |
Literal["thumbs_up","thumbs_down","regenerate"] |
required | Literal check | Feedback type |
rating |
int | None |
None |
ge=1, le=5 |
Star rating |
comment |
str | None |
None |
- | Text comment |
model |
str | None |
None |
- | Model name |
metadata |
dict[str, Any] | None |
None |
- | Additional context |
created_at |
datetime | None |
None |
- | Creation timestamp |
updated_at |
datetime | None |
None |
- | Last update timestamp |
get_my_feedback()
├── get_api_key() [src/security/deps.py]
│ ├── HTTPBearer (extracts Bearer token)
│ ├── validate_api_key_security() [src/security/security.py]
│ │ └── Checks: active, expired, request limits, IP allowlist, domain
│ ├── get_user() [src/services/user_lookup_cache.py] → audit logging
│ └── audit_logger.log_api_key_usage()
├── get_user(api_key) [src/services/user_lookup_cache.py]
│ └── db_get_user() [src/db/users.py] (60s TTL in-memory cache)
└── get_user_feedback() [src/db/feedback.py]
└── Supabase query on message_feedback table
-
Table:
message_feedback -
Operation:
SELECT * -
Filters:
-
.eq("user_id", user_id)(always) -
.eq("feedback_type", feedback_type)(if provided) -
.eq("session_id", session_id)(if provided) -
.eq("model", model)(if provided)
-
-
Order:
.order("created_at", desc=True) -
Pagination:
.range(offset, offset + limit - 1) -
Retry:
_execute_with_connection_retry(3 retries, exponential backoff 0.1s initial)
None directly. The user lookup uses in-memory caching (60s TTL) in src/db/users.py, not Redis.
None directly emitted by this endpoint. Middleware-level metrics (request latency, status codes) apply via the global middleware pipeline.
-
Security Middleware (
src/middleware/security_middleware.py): IP rate limiting, behavioral analysis, velocity mode -
Sentry Middleware (
src/middleware/sentry_middleware.py): Error tracking -
Observability Middleware (
src/middleware/observability_middleware.py): Request/response logging -
Timeout Middleware (
src/middleware/timeout_middleware.py): Request timeout -
GZip Middleware (
src/middleware/gzip_middleware.py): Response compression -
Trace Middleware (
src/middleware/trace_middleware.py): OpenTelemetry tracing
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Missing/invalid API key (from get_api_key) |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(500) |
500 | Any unhandled exception |
Error flow: All HTTPExceptions re-raised. Generic exceptions caught and wrapped in 500.
flowchart TD
A[GET /v1/chat/feedback] --> B{Auth: get_api_key}
B -->|Invalid/Missing| C[401 Unauthorized]
B -->|Valid| D[get_user api_key]
D -->|None| E[401 Invalid API key]
D -->|User found| F[get_user_feedback]
F --> G{Apply filters}
G --> H[feedback_type filter?]
G --> I[session_id filter?]
G --> J[model filter?]
H --> K[Query message_feedback table]
I --> K
J --> K
K -->|Success| L[Return MessageFeedbackListResponse]
K -->|DB Error| M[500 Internal Server Error]
L --> N[200 OK with feedback list]
Issue: #1695
Returns aggregated feedback statistics for the authenticated user over a configurable time period. Includes counts by type, average rating, thumbs up/down rates, and per-model breakdown.
Route: GET /v1/chat/feedback/stats
Router prefix: /v1/chat
Tags: chat-history
Response model: FeedbackStatsResponse
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
model |
str | None |
None |
None | Filter stats by model name |
days |
int |
30 |
ge=1, le=365 |
Number of days to aggregate |
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
required | Operation success flag |
stats |
dict[str, Any] |
required | Aggregated statistics dict |
message |
str | None |
None |
Human-readable message |
| Key | Type | Description |
|---|---|---|
total_feedback |
int |
Total feedback count in period |
thumbs_up |
int |
Count of thumbs_up feedback |
thumbs_down |
int |
Count of thumbs_down feedback |
regenerate |
int |
Count of regenerate feedback |
thumbs_up_rate |
float |
Percentage (0-100), rounded to 2 decimals |
thumbs_down_rate |
float |
Percentage (0-100), rounded to 2 decimals |
average_rating |
float | None |
Average of 1-5 ratings, rounded to 2 decimals |
by_model |
dict[str, dict] |
Per-model breakdown with thumbs_up/down/regenerate/total counts |
period_days |
int |
The days parameter used |
get_my_feedback_stats()
├── get_api_key() [src/security/deps.py]
│ ├── HTTPBearer → validate_api_key_security() → audit logging
│ └── Returns validated API key string
├── get_user(api_key) [src/services/user_lookup_cache.py]
│ └── db_get_user() [src/db/users.py] (60s TTL in-memory cache)
└── get_feedback_stats() [src/db/feedback.py]
└── Supabase query on message_feedback table
└── Python-side aggregation (counts, rates, averages, by_model grouping)
-
Table:
message_feedback -
Operation:
SELECT feedback_type, rating, model, created_at -
Filters:
-
.gte("created_at", from_date.isoformat())-from_date=now - timedelta(days=days)truncated to midnight -
.eq("user_id", user_id)(if provided, always provided from route) -
.eq("model", model)(if provided)
-
- No pagination - fetches all matching records for aggregation
-
Retry:
_execute_with_connection_retry(3 retries, exponential backoff 0.1s)
- Counts:
thumbs_up,thumbs_down,regeneratebyfeedback_typefield - Rates:
(count / total) * 100, rounded to 2 decimals - Average rating:
sum(ratings) / len(ratings)for non-null ratings, rounded to 2 decimals - By-model: Groups by
modelfield (uses "unknown" for null), counts per type
None. User lookup uses in-memory cache only.
None directly. Standard middleware metrics apply.
Standard middleware pipeline: Security → Sentry → Observability → Timeout → GZip → Trace
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(500) |
500 | Any unhandled exception (DB errors, aggregation errors) |
Error flow: HTTPExceptions re-raised directly. All other exceptions caught at route level → 500.
flowchart TD
A[GET /v1/chat/feedback/stats] --> B{Auth: get_api_key}
B -->|Invalid| C[401 Unauthorized]
B -->|Valid| D[get_user api_key]
D -->|None| E[401 Invalid API key]
D -->|User found| F[get_feedback_stats user_id, model, days]
F --> G[Query message_feedback table]
G --> H[Fetch all records in date range]
H --> I{Records found?}
I -->|Yes| J[Count by feedback_type]
J --> K[Calculate rates]
K --> L[Calculate avg rating]
L --> M[Group by model]
M --> N[Build stats dict]
I -->|No| O[Return zero stats]
N --> P[Return FeedbackStatsResponse]
O --> P
G -->|DB Error| Q[500 Internal Server Error]
Issue: #1696
Returns all feedback records for a specific chat session. Verifies session ownership before returning results.
Route: GET /v1/chat/sessions/{session_id}/feedback
Router prefix: /v1/chat
Tags: chat-history
Response model: MessageFeedbackListResponse
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
session_id |
int |
Chat session ID to get feedback for |
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
required | Operation success flag |
data |
list[MessageFeedback] |
required | List of feedback records |
count |
int |
required | Number of records returned |
message |
str | None |
None |
Human-readable message |
(See issue #1694 for full MessageFeedback schema.)
get_session_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
│ └── db_get_user() [src/db/users.py] (60s TTL cache)
├── get_chat_session(session_id, user_id) [src/db/chat_history.py]
│ └── 2 Supabase queries: session + messages
└── get_feedback_by_session(session_id, user_id) [src/db/feedback.py]
└── Supabase query on message_feedback table
-
Table:
chat_sessions -
Operation:
SELECT * -
Filters:
.eq("id", session_id).eq("user_id", user_id).eq("is_active", True) -
Retry:
_execute_with_connection_retry(3 retries)
-
Table:
chat_messages -
Operation:
SELECT * -
Filters:
.eq("session_id", session_id) -
Order:
.order("created_at", desc=False)
-
Table:
message_feedback -
Operation:
SELECT * -
Filters:
.eq("session_id", session_id).eq("user_id", user_id) -
Order:
.order("created_at", desc=True) -
Retry:
_execute_with_connection_retry(3 retries)
None.
None directly.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 | Session not found or doesn't belong to user |
HTTPException(500) |
500 | Any unhandled exception |
flowchart TD
A[GET /v1/chat/sessions/session_id/feedback] --> B{Auth: get_api_key}
B -->|Invalid| C[401 Unauthorized]
B -->|Valid| D[get_user api_key]
D -->|None| E[401 Invalid API key]
D -->|User found| F[get_chat_session session_id, user_id]
F -->|None| G[404 Chat session not found]
F -->|Session found| H[get_feedback_by_session session_id, user_id]
H --> I[Query message_feedback table]
I -->|Success| J[Return MessageFeedbackListResponse]
I -->|DB Error| K[500 Internal Server Error]
Issue: #1697
Creates a new chat session for the authenticated user. Uses cached user lookup for performance and logs session creation activity in the background (non-blocking).
Route: POST /v1/chat/sessions
Router prefix: /v1/chat
Tags: chat-history
Response model: ChatSessionResponse
Auth: Required (Bearer token via get_api_key)
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
title |
str | None |
None |
None | Session title. Auto-generated as "Chat YYYY-MM-DD HH:MM" if not provided |
model |
str | None |
None |
None | Model name. Defaults to "openai/gpt-3.5-turbo" in DB layer if not provided |
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
required | Operation success flag |
data |
ChatSession | None |
None |
Created session object |
message |
str | None |
None |
Human-readable message |
| Field | Type | Default | Description |
|---|---|---|---|
id |
int | None |
None |
Session ID |
user_id |
int |
required | Owner user ID |
title |
str |
required | Session title |
model |
str |
required | Model name |
created_at |
datetime | None |
None |
Creation timestamp |
updated_at |
datetime | None |
None |
Last update timestamp |
is_active |
bool | None |
True |
Active flag |
messages |
list[ChatMessage] | None |
[] |
Messages list |
create_session()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
│ └── db_get_user() [src/db/users.py] (60s TTL cache)
├── create_chat_session(user_id, title, model) [src/db/chat_history.py]
│ ├── @with_retry(max_attempts=3, initial_delay=0.1, max_delay=2.0)
│ └── _execute_with_connection_retry() (3 retries, exponential backoff)
└── log_activity_background() [src/services/background_tasks.py]
├── Creates asyncio task if event loop running
└── Falls back to synchronous db_log_activity()
└── log_activity() [src/db/activity.py] → INSERT into activity table
-
Table:
chat_sessions -
Operation:
INSERT -
Data:
-
user_id: int -
title: str (auto-generated if None:"Chat YYYY-MM-DD HH:MM") -
model: str (defaults to"openai/gpt-3.5-turbo"if None) -
created_at: ISO datetime (UTC) -
updated_at: ISO datetime (UTC) -
is_active:True
-
-
Retry:
@with_retrydecorator (3 attempts) +_execute_with_connection_retry(3 retries per attempt)
-
Table:
activity(viasrc/db/activity.py) -
Operation:
INSERT - Data: user_id, model, provider="Chat History", tokens=0, cost=0.0, finish_reason="session_created", app="Chat", metadata with action/session_id/title
None directly.
None directly. Standard middleware metrics apply.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(500) |
500 | DB insert fails or any unhandled exception |
Background activity logging errors are caught and logged but do NOT fail the request.
- User lookup: Cached with 60s TTL (reduces DB queries by ~95%)
- Activity logging: Non-blocking (background task)
- Performance metrics: Logs
user_lookup_msandsession_create_mstiming
flowchart TD
A[POST /v1/chat/sessions] --> B{Auth: get_api_key}
B -->|Invalid| C[401 Unauthorized]
B -->|Valid| D[get_user api_key - cached]
D -->|None| E[401 Invalid API key]
D -->|User found| F[create_chat_session user_id, title, model]
F --> G{Title provided?}
G -->|No| H[Auto-generate: Chat YYYY-MM-DD HH:MM]
G -->|Yes| I[Use provided title]
H --> J[INSERT into chat_sessions]
I --> J
J -->|Success| K[Log activity in background]
K --> L{Background log success?}
L -->|Yes| M[Continue]
L -->|No| N[Log error, continue anyway]
M --> O[Return ChatSessionResponse 200]
N --> O
J -->|Failure after retries| P[500 Internal Server Error]
Issue: #1698
Searches chat sessions by title and message content. Combines results from both title matching and content matching, deduplicates, sorts by updated_at, and returns up to limit results.
Route: POST /v1/chat/search
Router prefix: /v1/chat
Tags: chat-history
Response model: ChatSessionsListResponse
Auth: Required (Bearer token via get_api_key)
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
query |
str |
required | None | Search query text |
limit |
int | None |
20 |
None | Maximum results to return |
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
required | Operation success flag |
data |
list[ChatSession] |
required | Matching sessions |
count |
int |
required | Number of results |
message |
str | None |
None |
Human-readable message |
search_sessions()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
│ └── db_get_user() [src/db/users.py] (60s TTL cache)
└── search_chat_sessions(user_id, query, limit) [src/db/chat_history.py]
├── Search 1: Title matching (ILIKE)
├── Search 2: Message content matching (ILIKE)
├── Search 3: Fetch sessions by matching message session_ids
└── Python-side: combine, deduplicate by session ID, sort, limit
-
Table:
chat_sessions -
Operation:
SELECT * -
Filters:
.eq("user_id", user_id).eq("is_active", True).ilike("title", f"%{query}%") -
Retry:
_execute_with_connection_retry(3 retries)
-
Table:
chat_messages -
Operation:
SELECT session_id -
Filters:
.ilike("content", f"%{query}%") - Note: This query does NOT filter by user_id at message level — session ownership is enforced in the next query
-
Retry:
_execute_with_connection_retry(3 retries)
-
Table:
chat_sessions -
Operation:
SELECT * -
Filters:
.eq("user_id", user_id).eq("is_active", True).in_("id", list(session_ids)) - Only executed if: Query 2 returned session_ids
-
Retry:
_execute_with_connection_retry(3 retries)
- Combine title results + message session results
- Deduplicate by session
id(dict keying) - Sort by
updated_atdescending - Slice to
limit
None.
None directly.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(500) |
500 | Any unhandled exception |
Note: No HTTPException re-raise guard — any exception (including from get_user returning None check) falls through to the generic 500 handler. The raise HTTPException(401) is inside the try/except that catches all Exceptions.
flowchart TD
A[POST /v1/chat/search] --> B{Auth: get_api_key}
B -->|Invalid| C[401 Unauthorized]
B -->|Valid| D[get_user api_key]
D -->|None| E[401 - wraps as 500 since no HTTPException guard]
D -->|User found| F[search_chat_sessions]
F --> G[Query 1: ILIKE title search]
F --> H[Query 2: ILIKE message content search]
H --> I{Message matches found?}
I -->|Yes| J[Query 3: Get sessions by IDs + user filter]
I -->|No| K[Empty message sessions]
G --> L[Combine title + message results]
J --> L
K --> L
L --> M[Deduplicate by session ID]
M --> N[Sort by updated_at DESC]
N --> O[Slice to limit]
O --> P[Return ChatSessionsListResponse 200]
G -->|DB Error| Q[500 Internal Server Error]
Issue: #1699
Saves a single message to a chat session. Verifies session ownership, checks for duplicates (within 5 minutes), inserts the message, and updates the session's updated_at timestamp and model.
Route: POST /v1/chat/sessions/{session_id}/messages
Router prefix: /v1/chat
Tags: chat-history
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
session_id |
int |
Target chat session ID |
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
role |
str |
required | None | Message role: "user" or "assistant"
|
content |
str |
required | None | Message text content |
model |
str | None |
None |
None | Model that generated response |
tokens |
int | None |
0 |
None | Token count |
created_at |
str | None |
None |
None | ISO datetime from frontend (not used in DB layer) |
{
"success": true,
"data": { "id": 123, "session_id": 1, "role": "user", "content": "...", ... },
"message": "Message saved successfully"
}save_message()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── get_chat_session(session_id, user_id) [src/db/chat_history.py]
│ ├── Query chat_sessions (ownership check)
│ └── Query chat_messages (session messages)
└── save_chat_message() [src/db/chat_history.py]
├── @with_retry(max_attempts=3)
├── Duplicate check (SELECT within last 5 min)
├── INSERT into chat_messages
└── UPDATE chat_sessions (updated_at, model)
-
Table:
chat_sessions→SELECT * WHERE id=session_id AND user_id=user_id AND is_active=True -
Table:
chat_messages→SELECT * WHERE session_id=session_id ORDER BY created_at ASC
-
Table:
chat_messages -
Operation:
SELECT * -
Filters:
.eq("session_id", session_id).eq("role", role).eq("content", content).gte("created_at", five_minutes_ago) -
Order:
.order("created_at", desc=True).limit(1) - If duplicate found: Returns existing message immediately (no insert)
- If check fails: Logs warning, proceeds with insert anyway
-
Table:
chat_messages -
Operation:
INSERT -
Data:
session_id, role, content, model, tokens, created_at(UTC ISO) -
Retry:
_execute_with_connection_retry+@with_retry
-
Table:
chat_sessions -
Operation:
UPDATE -
Data:
updated_at(always),model(if provided) -
Filters:
.eq("id", session_id)+.eq("user_id", user_id)(if provided)
None.
None directly.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 | Session not found or not owned by user |
HTTPException(500) |
500 | DB insert/update failure |
Duplicate check failures do NOT cause request failure — logged and skipped.
flowchart TD
A[POST /v1/chat/sessions/session_id/messages] --> B{Auth: get_api_key}
B -->|Invalid| C[401 Unauthorized]
B -->|Valid| D[get_user api_key]
D -->|None| E[401 Invalid API key]
D -->|User found| F[get_chat_session ownership check]
F -->|None| G[404 Session not found]
F -->|Found| H[save_chat_message]
H --> I{Duplicate check}
I -->|Duplicate found| J[Return existing message]
I -->|No duplicate| K[INSERT into chat_messages]
I -->|Check failed| K
K -->|Success| L[UPDATE chat_sessions timestamp]
L --> M[Return 200 with message data]
K -->|Failure after retries| N[500 Internal Server Error]
Issue: #1700
Saves multiple messages to a chat session in a single request. Reduces API overhead by 60-80% compared to individual calls. Processes each message individually, collecting successes and failures separately. Partial success is possible.
Route: POST /v1/chat/sessions/{session_id}/messages/batch
Router prefix: /v1/chat
Tags: chat-history
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
session_id |
int |
Target chat session ID |
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
messages |
list[SaveChatMessageRequest] |
required | None | Array of messages to save |
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
role |
str |
required | None |
"user" or "assistant"
|
content |
str |
required | None | Message content |
model |
str | None |
None |
None | Model name |
tokens |
int | None |
0 |
None | Token count |
created_at |
str | None |
None |
None | ISO datetime |
{
"success": true, // true only if ALL messages saved
"data": {
"saved": [{"success": true, "message_id": 1, "data": {...}}, ...],
"failed": [{"success": false, "error": "...", "content_preview": "first 50 chars"}],
"total": 5,
"success_count": 4,
"failure_count": 1
},
"message": "Saved 4/5 messages successfully"
}save_messages_batch()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── get_chat_session(session_id, user_id) [src/db/chat_history.py]
│ └── Ownership verification
└── for each message in request.messages:
└── save_chat_message() [src/db/chat_history.py]
├── Duplicate check (5-min window)
├── INSERT into chat_messages
└── UPDATE chat_sessions timestamp/model
-
Duplicate check:
SELECT * FROM chat_messages WHERE session_id=X AND role=Y AND content=Z AND created_at >= 5_min_ago LIMIT 1 -
Insert message:
INSERT INTO chat_messages (session_id, role, content, model, tokens, created_at) -
Update session:
UPDATE chat_sessions SET updated_at=NOW(), model=M WHERE id=session_id AND user_id=user_id
Each query has _execute_with_connection_retry (3 retries) + @with_retry decorator (3 attempts).
Total worst-case queries: 1 (session check) + N * 3 (per message) where N = number of messages.
None.
None directly.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 | Session not found / not owned |
HTTPException(500) |
500 | Outer exception (before loop) |
Individual message failures do NOT abort the batch. Failed messages are collected in failed_messages array. Response success field is True only if failed_messages is empty.
flowchart TD
A[POST /v1/chat/sessions/session_id/messages/batch] --> B{Auth}
B -->|Invalid| C[401]
B -->|Valid| D[get_user]
D -->|None| E[401]
D -->|Found| F[get_chat_session ownership check]
F -->|Not found| G[404]
F -->|Found| H[Loop through messages]
H --> I{For each message}
I --> J[save_chat_message]
J -->|Success| K[Add to saved_messages]
J -->|Error| L[Add to failed_messages]
K --> M{More messages?}
L --> M
M -->|Yes| I
M -->|No| N{Any failures?}
N -->|No| O[Return success=true with results]
N -->|Yes| P[Return success=false with partial results]
Issue: #1701
Submits feedback for a chat message (thumbs up/down, regenerate, star rating, comment). Validates session and message ownership when IDs are provided. Logs activity in the background.
Route: POST /v1/chat/feedback
Router prefix: /v1/chat
Tags: chat-history
Response model: MessageFeedbackResponse
Auth: Required (Bearer token via get_api_key)
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
session_id |
int | None |
None |
None | Optional associated session |
message_id |
int | None |
None |
None | Optional associated message |
feedback_type |
Literal["thumbs_up","thumbs_down","regenerate"] |
required | Literal enforcement by Pydantic | Type of feedback |
rating |
int | None |
None |
ge=1, le=5 |
Optional 1-5 star rating |
comment |
str | None |
None |
None | Optional text feedback |
model |
str | None |
None |
None | Model that generated response |
metadata |
dict[str, Any] | None |
None |
None | Additional context |
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
required | Operation success flag |
data |
MessageFeedback | None |
None |
Created feedback record |
message |
str | None |
None |
Human-readable message |
submit_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── [conditional] get_chat_session(session_id, user_id) [src/db/chat_history.py]
│ └── Only if session_id is not None
├── [conditional] validate_message_ownership() [src/db/chat_history.py]
│ └── Only if message_id is not None
│ └── Joins chat_messages with chat_sessions to verify user ownership
├── save_message_feedback() [src/db/feedback.py]
│ ├── @with_retry(max_attempts=3)
│ ├── Validates feedback_type against VALID_FEEDBACK_TYPES set
│ ├── Validates rating range (1-5)
│ └── INSERT into message_feedback
└── log_activity_background() [src/services/background_tasks.py]
└── Async INSERT into activity table
-
Table:
chat_sessions -
Condition: Only if
request.session_id is not None -
Operation:
SELECT * WHERE id=session_id AND user_id=user_id AND is_active=True
-
Table:
chat_messageswithchat_sessions!innerjoin -
Condition: Only if
request.message_id is not None -
Operation:
SELECT id, session_id, chat_sessions!inner(id, user_id) WHERE id=message_id AND chat_sessions.user_id=user_id -
Additional filter:
.eq("session_id", session_id)if session_id provided
-
Table:
message_feedback -
Operation:
INSERT -
Data:
user_id, feedback_type, created_at, updated_at(always) + optional:session_id, message_id, rating, comment, model, metadata -
Retry:
@with_retry(3 attempts) +_execute_with_connection_retry(3 retries)
-
Table:
activity - Data: user_id, model, provider="Chat Feedback", action="submit_feedback", metadata with feedback details
None.
None directly.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 | Session not found (when session_id provided) |
HTTPException(404) |
404 | Message not found (when message_id provided) |
HTTPException(400) |
400 |
ValueError from DB layer (invalid feedback_type or rating) |
HTTPException(500) |
500 | Any unhandled exception |
Pydantic validation: feedback_type Literal and rating ge/le constraints are enforced before handler is reached (422 Unprocessable Entity).
DB-layer validation: Double-checks feedback_type and rating at save time.
Background activity logging errors are caught and do NOT fail the request.
flowchart TD
A[POST /v1/chat/feedback] --> B{Auth: get_api_key}
B -->|Invalid| C[401]
B -->|Valid| D[get_user]
D -->|None| E[401]
D -->|Found| F{session_id provided?}
F -->|Yes| G[get_chat_session ownership check]
G -->|Not found| H[404 Chat session not found]
G -->|Found| I{message_id provided?}
F -->|No| I
I -->|Yes| J[validate_message_ownership]
J -->|Invalid| K[404 Message not found]
J -->|Valid| L[save_message_feedback]
I -->|No| L
L --> M{DB validation}
M -->|Invalid type/rating| N[400 ValueError]
M -->|Success| O[INSERT into message_feedback]
O --> P[Log activity background]
P --> Q[Return MessageFeedbackResponse 200]
O -->|DB Error| R[500 Internal Server Error]
Issue: #1702
Updates a chat session's title and/or model. After updating, fetches and returns the updated session with all its messages.
Route: PUT /v1/chat/sessions/{session_id}
Router prefix: /v1/chat
Tags: chat-history
Response model: ChatSessionResponse
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
session_id |
int |
Session ID to update |
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
title |
str | None |
None |
None | New session title |
model |
str | None |
None |
None | New model name |
update_session()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
├── update_chat_session(session_id, user_id, title, model) [src/db/chat_history.py]
│ ├── @with_retry(max_attempts=3)
│ └── UPDATE chat_sessions
└── get_chat_session(session_id, user_id) [src/db/chat_history.py]
├── SELECT from chat_sessions
└── SELECT from chat_messages
-
Table:
chat_sessions -
Operation:
UPDATE -
Data:
updated_at(always),title(if truthy),model(if truthy) -
Filters:
.eq("id", session_id).eq("user_id", user_id) -
Returns:
Falseif no rows matched (session not found / not owned) -
Retry:
@with_retry+_execute_with_connection_retry
-
Table:
chat_sessions→SELECT * WHERE id AND user_id AND is_active -
Table:
chat_messages→SELECT * WHERE session_id ORDER BY created_at ASC
None.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 |
update_chat_session returns False
|
HTTPException(500) |
500 | Any unhandled exception |
flowchart TD
A[PUT /v1/chat/sessions/session_id] --> B{Auth}
B -->|Invalid| C[401]
B -->|Valid| D[get_user]
D -->|None| E[401]
D -->|Found| F[update_chat_session]
F -->|False / not found| G[404 Chat session not found]
F -->|True| H[get_chat_session - fetch updated]
H --> I[Return ChatSessionResponse 200]
F -->|DB Error| J[500]
Issue: #1703
Updates an existing feedback record. Only the record's owner can update it. All fields are optional — only provided fields are updated.
Route: PUT /v1/chat/feedback/{feedback_id}
Router prefix: /v1/chat
Tags: chat-history
Response model: MessageFeedbackResponse
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
feedback_id |
int |
Feedback record ID to update |
| Field | Type | Default | Validation | Description |
|---|---|---|---|---|
feedback_type |
Literal["thumbs_up","thumbs_down","regenerate"] | None |
None |
Literal (Pydantic) | New feedback type |
rating |
int | None |
None |
ge=1, le=5 |
New star rating |
comment |
str | None |
None |
None | New comment text |
update_my_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
└── update_feedback(feedback_id, user_id, ...) [src/db/feedback.py]
├── @with_retry(max_attempts=3)
├── Validates feedback_type against VALID_FEEDBACK_TYPES
├── Validates rating range (1-5)
└── UPDATE message_feedback WHERE id AND user_id
-
Table:
message_feedback -
Operation:
UPDATE -
Data:
updated_at(always) + optional:feedback_type,rating,comment -
Filters:
.eq("id", feedback_id).eq("user_id", user_id) -
Returns:
Noneif no rows matched (not found or not owned) -
Retry:
@with_retry+_execute_with_connection_retry
None.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 |
update_feedback returns None
|
HTTPException(400) |
400 |
ValueError (invalid feedback_type or rating in DB layer) |
HTTPException(500) |
500 | Any unhandled exception |
flowchart TD
A[PUT /v1/chat/feedback/feedback_id] --> B{Auth}
B -->|Invalid| C[401]
B -->|Valid| D[get_user]
D -->|None| E[401]
D -->|Found| F[update_feedback]
F --> G{Validation}
G -->|Invalid type/rating| H[400 ValueError]
G -->|Valid| I[UPDATE message_feedback]
I -->|No rows matched| J[404 Feedback not found]
I -->|Updated| K[Return MessageFeedbackResponse 200]
I -->|DB Error| L[500]
Issue: #1704
Soft-deletes a chat session by setting is_active = False. Does NOT physically remove the record or associated messages.
Route: DELETE /v1/chat/sessions/{session_id}
Router prefix: /v1/chat
Tags: chat-history
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
session_id |
int |
Session ID to delete |
{ "success": true, "message": "Chat session deleted successfully" }delete_session()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
└── delete_chat_session(session_id, user_id) [src/db/chat_history.py]
├── @with_retry(max_attempts=3)
└── UPDATE chat_sessions SET is_active=False, updated_at=NOW()
-
Table:
chat_sessions -
Operation:
UPDATE(NOT DELETE) -
Data:
{"is_active": False, "updated_at": datetime.now(UTC).isoformat()} -
Filters:
.eq("id", session_id).eq("user_id", user_id) -
Returns:
Falseif no rows matched -
Retry:
@with_retry(3 attempts) +_execute_with_connection_retry(3 retries)
None.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 | Session not found / not owned |
HTTPException(500) |
500 | Any unhandled exception |
flowchart TD
A[DELETE /v1/chat/sessions/session_id] --> B{Auth}
B -->|Invalid| C[401]
B -->|Valid| D[get_user]
D -->|None| E[401]
D -->|Found| F[delete_chat_session - soft delete]
F -->|False| G[404 Chat session not found]
F -->|True| H[Return 200 success]
F -->|DB Error| I[500]
Issue: #1705
Permanently deletes a feedback record. Only the record's owner can delete it. This is a hard delete (not soft delete).
Route: DELETE /v1/chat/feedback/{feedback_id}
Router prefix: /v1/chat
Tags: chat-history
Auth: Required (Bearer token via get_api_key)
| Parameter | Type | Description |
|---|---|---|
feedback_id |
int |
Feedback record ID to delete |
{ "success": true, "message": "Feedback deleted successfully" }delete_my_feedback()
├── get_api_key() [src/security/deps.py]
├── get_user(api_key) [src/services/user_lookup_cache.py]
└── delete_feedback(feedback_id, user_id) [src/db/feedback.py]
├── @with_retry(max_attempts=3)
└── DELETE FROM message_feedback WHERE id AND user_id
-
Table:
message_feedback -
Operation:
DELETE -
Filters:
.eq("id", feedback_id).eq("user_id", user_id) -
Returns:
Falseif no rows matched -
Retry:
@with_retry(3 attempts) +_execute_with_connection_retry(3 retries)
None.
| Error | Status | Condition |
|---|---|---|
HTTPException(401) |
401 | Invalid/missing API key |
HTTPException(401) |
401 |
get_user() returns None
|
HTTPException(404) |
404 | Feedback not found / not owned |
HTTPException(500) |
500 | Any unhandled exception |
flowchart TD
A[DELETE /v1/chat/feedback/feedback_id] --> B{Auth}
B -->|Invalid| C[401]
B -->|Valid| D[get_user]
D -->|None| E[401]
D -->|Found| F[delete_feedback - HARD DELETE]
F -->|False| G[404 Feedback not found]
F -->|True| H[Return 200 success]
F -->|DB Error| I[500]
Issue: #1706
Returns tokens-per-second throughput metrics for a specific model and provider within a time range. Filtered to only allow requests for top 3 most popular models plus minimum 1 model per provider. Output is in Prometheus text exposition format.
Route: GET /v1/chat/completions/metrics/tokens-per-second
Router prefix: /v1/chat/completions/metrics
Tags: chat-metrics
Auth: None (public endpoint)
Response: text/plain (Prometheus format)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
time |
str |
required | Must be one of: hour, week, month, 1year, 2year
|
Time range filter |
model_id |
int |
required | None | Model ID integer |
provider_id |
str |
required | None | Provider slug (e.g., "openrouter") |
# HELP gatewayz_tokens_per_second Token throughput (tokens/second) by model and provider
# TYPE gatewayz_tokens_per_second gauge
# Generated: 2026-03-04T12:00:00+00:00
# Time range: week
# Filtered to: top 3 models + minimum 1 per provider
gatewayz_tokens_per_second{model="gpt-4o",provider="openai",requests="150",total_tokens="50000"} 125.5
get_tokens_per_second()
├── _get_top_models_async(limit=3)
│ └── get_top_models_by_requests() [src/db/chat_completion_requests.py]
│ ├── SELECT * FROM models
│ └── For each model: COUNT from chat_completion_requests + SUM tokens
├── _get_all_providers_async()
│ └── get_all_providers() [src/db/chat_completion_requests.py]
│ └── SELECT providers.slug FROM models JOIN providers
├── _get_models_with_min_one_per_provider_async()
│ └── get_models_with_min_one_per_provider() [src/db/chat_completion_requests.py]
│ └── For missing providers: query models + count requests
├── Model ID access check (403 if not in filtered list)
├── _calculate_tokens_per_second_async()
│ └── calculate_tokens_per_second() [src/db/chat_completion_requests.py]
│ ├── SELECT tokens + processing_time FROM chat_completion_requests
│ └── SELECT model_name, provider FROM models JOIN providers
└── _format_tokens_per_second_metric() → Prometheus text
-
Table:
models→SELECT *(all models) -
For each model:
-
chat_completion_requests→SELECT *, count=exact WHERE model_id=X AND status=completed -
chat_completion_requests→SELECT input_tokens, output_tokens WHERE model_id=X AND status=completed
-
- Sort: Python-side by request count DESC, take top 3
-
Table:
modelswithproviders!innerjoin →SELECT providers.slug - Deduplicate: Python set
-
Table:
models→SELECT id, model_name, providers!inner(slug) WHERE providers.slug=X AND is_active=True -
For each model in missing provider:
chat_completion_requests→ count query
-
Table:
chat_completion_requests→SELECT input_tokens, output_tokens, processing_time_ms, created_at WHERE model_id=X AND status=completed- With time filter:
.gte("created_at", start_time)based on time range
- With time filter:
-
Table:
models→SELECT model_name, providers!inner(slug) WHERE id=model_id -
Calculation:
total_tokens / (total_time_ms / 1000)
None.
-
Name:
gatewayz_tokens_per_second - Type: Gauge
-
Labels:
model,provider,requests,total_tokens - Note: This metric is generated as text output, not registered in the Prometheus client registry
| Error | Status | Condition |
|---|---|---|
HTTPException(400) |
400 | Invalid time parameter |
HTTPException(403) |
403 | Model not in top 3 or minimum provider coverage |
HTTPException(500) |
500 | Any unhandled exception |
Graceful degradation: If top models or providers queries fail, they return empty lists. If calculation returns no data, returns empty Prometheus metrics (not an error).
flowchart TD
A[GET /tokens-per-second] --> B{Validate time param}
B -->|Invalid| C[400 Bad Request]
B -->|Valid| D[Get top 3 models]
D --> E[Get all providers]
E --> F[Ensure min 1 per provider]
F --> G{model_id in filtered list?}
G -->|No| H[403 Forbidden]
G -->|Yes| I[calculate_tokens_per_second]
I --> J{Data found?}
J -->|No| K[Return empty Prometheus metrics]
J -->|Yes| L[Format as Prometheus text]
L --> M[Return 200 text/plain]
D -->|Error| N[Empty list, continue]
E -->|Error| N
Issue: #1707
Returns tokens-per-second metrics for all time (no time filtering) for a specific model and provider. Unlike the time-filtered endpoint, this one does NOT enforce the top-3 model filter — any model_id can be queried. Output in Prometheus text format for Grafana/Prometheus scraping.
Route: GET /v1/chat/completions/metrics/tokens-per-second/all
Router prefix: /v1/chat/completions/metrics
Tags: chat-metrics
Auth: None (public endpoint)
Response: text/plain (Prometheus format)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
provider_id |
str |
required | None | Provider slug |
model_id |
int |
required | None | Model ID integer |
Same Prometheus text format as the time-filtered endpoint (see issue #1706), but with time_range: all.
get_all_tokens_per_second()
└── _calculate_tokens_per_second_async(model_id, provider_id, time_range=None)
└── calculate_tokens_per_second() [src/db/chat_completion_requests.py]
├── SELECT from chat_completion_requests (no time filter)
└── SELECT from models JOIN providers (model name lookup)
-
Table:
chat_completion_requests -
Operation:
SELECT input_tokens, output_tokens, processing_time_ms, created_at -
Filters:
.eq("model_id", model_id).eq("status", "completed") - No time filter (all-time query)
-
Table:
modelswithproviders!innerjoin -
Operation:
SELECT model_name, providers!inner(slug) -
Filters:
.eq("id", model_id)
total_tokens = sum(input_tokens + output_tokens)total_time_seconds = sum(processing_time_ms) / 1000tokens_per_second = total_tokens / total_time_seconds
None.
-
Name:
gatewayz_tokens_per_second - Type: Gauge
-
Labels:
model,provider,requests,total_tokens
| Error | Status | Condition |
|---|---|---|
HTTPException(500) |
500 | Any unhandled exception |
Graceful degradation: If no data found or result has error, returns empty Prometheus metrics (200 with comment # No data available).
flowchart TD
A[GET /tokens-per-second/all] --> B[calculate_tokens_per_second - all time]
B --> C[Query chat_completion_requests]
C --> D[Query models for name/provider]
D --> E{Result valid?}
E -->|No data or error| F[Return empty Prometheus metrics 200]
E -->|Data found| G[Calculate tokens/sec]
G --> H[Format as Prometheus text]
H --> I[Return 200 text/plain]
B -->|Exception| J[500 Internal Server Error]
4 endpoints
Issue: #1708
Returns the current state of all registered circuit breakers with summary counts by state. Provides real-time monitoring data for provider health dashboards.
Route: GET /circuit-breakers
Router prefix: /circuit-breakers
Tags: circuit-breakers, monitoring
Response model: dict[str, Any]
Auth: None (public monitoring endpoint)
{
"circuit_breakers": {
"openrouter": {
"provider": "openrouter",
"state": "closed",
"failure_count": 0,
"success_count": 15,
"failure_rate": 0.0,
"recent_requests": 15,
"opened_at": null,
"seconds_until_retry": 0
}
},
"total_count": 5,
"open_count": 1,
"half_open_count": 0,
"closed_count": 4
}get_all_circuit_breaker_states()
└── get_all_circuit_breakers() [src/services/circuit_breaker.py]
└── For each registered provider in _circuit_breakers dict:
└── breaker.get_state()
├── _load_state_from_redis() — loads state, counts, opened_at
└── _calculate_failure_rate() — rolling window calculation
None. Circuit breakers are in-memory + Redis only.
| Operation | Key Pattern | Type | Description |
|---|---|---|---|
GET |
circuit_breaker:{provider}:state |
string | Current state: "closed", "open", "half_open" |
GET |
circuit_breaker:{provider}:failure_count |
string (int) | Consecutive failure count |
GET |
circuit_breaker:{provider}:success_count |
string (int) | Consecutive success count |
GET |
circuit_breaker:{provider}:opened_at |
string (float) | Unix timestamp when opened |
GET |
circuit_breaker:{provider}:consecutive_opens |
string (int) | Consecutive open transitions |
TTL: All keys have 3600s (1 hour) TTL set during writes. Fallback: If Redis unavailable, uses in-memory state.
Not directly emitted by the endpoint, but the underlying CircuitBreaker.get_state() reads from the same state that emits these metrics on state changes:
| Metric | Type | Labels | Description |
|---|---|---|---|
circuit_breaker_state_transitions_total |
Counter |
provider, from_state, to_state
|
State transition count |
circuit_breaker_failures_total |
Counter |
provider, state
|
Failure count |
circuit_breaker_successes_total |
Counter |
provider, state
|
Success count |
circuit_breaker_rejected_requests_total |
Counter | provider |
Rejected request count |
circuit_breaker_current_state |
Gauge |
provider, state
|
1=active, 0=inactive |
| Error | Status | Condition |
|---|---|---|
HTTPException(500) |
500 | Any exception from get_all_circuit_breakers()
|
flowchart TD
A[GET /circuit-breakers] --> B[get_all_circuit_breakers]
B --> C[For each registered provider]
C --> D[breaker.get_state]
D --> E[Load from Redis]
E -->|Redis available| F[Parse state from Redis keys]
E -->|Redis unavailable| G[Use in-memory state]
F --> H[Calculate failure rate - rolling window]
G --> H
H --> I[Build state dict]
I --> J{More providers?}
J -->|Yes| C
J -->|No| K[Count open/half_open/closed]
K --> L[Return 200 with all states + summary]
B -->|Exception| M[500 Internal Server Error]
Issue: #1709
Returns the current state of a specific provider's circuit breaker. If the provider has no registered circuit breaker, one is created with default configuration.
Route: GET /circuit-breakers/{provider}
Router prefix: /circuit-breakers
Tags: circuit-breakers, monitoring
Response model: dict[str, Any]
Auth: None (public monitoring endpoint)
| Parameter | Type | Description |
|---|---|---|
provider |
str |
Provider name (e.g., "openrouter", "groq") |
{
"provider": "openrouter",
"state": "closed",
"failure_count": 0,
"success_count": 15,
"failure_rate": 0.0,
"recent_requests": 15,
"opened_at": null,
"seconds_until_retry": 0
}get_circuit_breaker_state()
├── get_circuit_breaker(provider) [src/services/circuit_breaker.py]
│ └── Gets or creates CircuitBreaker for provider (thread-safe via _registry_lock)
└── breaker.get_state()
├── _load_state_from_redis()
└── _calculate_failure_rate()
Same as GET /circuit-breakers (see issue #1708):
- 5 x
GEToperations oncircuit_breaker:{provider}:*keys - Fallback: In-memory state if Redis unavailable
None directly emitted. Same underlying metrics as documented in issue #1708.
| Error | Status | Condition |
|---|---|---|
HTTPException(500) |
500 | Any exception |
Note: There is no 404 case. If the provider has no existing circuit breaker, get_circuit_breaker() creates a new one with default config (CLOSED state, zero counts).
flowchart TD
A[GET /circuit-breakers/provider] --> B[get_circuit_breaker provider]
B --> C{Exists in registry?}
C -->|No| D[Create new CircuitBreaker with defaults]
C -->|Yes| E[Return existing breaker]
D --> F[breaker.get_state]
E --> F
F --> G[Load state from Redis]
G -->|Available| H[Parse Redis state]
G -->|Unavailable| I[Use in-memory defaults]
H --> J[Calculate failure rate]
I --> J
J --> K[Return 200 with state dict]
F -->|Exception| L[500 Internal Server Error]
Issue: #1710
Manually resets a specific provider's circuit breaker to CLOSED state. Used when a provider has recovered and traffic should be immediately resumed.
Route: POST /circuit-breakers/{provider}/reset
Router prefix: /circuit-breakers
Tags: circuit-breakers, monitoring
Response model: dict[str, Any]
Auth: None (public endpoint - consider adding auth for production)
| Parameter | Type | Description |
|---|---|---|
provider |
str |
Provider name to reset |
{
"success": true,
"message": "Circuit breaker for 'openrouter' has been reset",
"state": {
"provider": "openrouter",
"state": "closed",
"failure_count": 0,
"success_count": 0,
...
}
}reset_provider_circuit_breaker()
├── reset_circuit_breaker(provider) [src/services/circuit_breaker.py]
│ ├── Thread-safe via _registry_lock
│ ├── Returns False if provider not in registry
│ └── breaker.reset()
│ ├── _transition_to(CLOSED, "manual reset")
│ │ ├── Reset failure_count, success_count, consecutive_opens
│ │ ├── _save_state_to_redis() — pipeline SETEX x5
│ │ └── Update Prometheus metrics (state transition + current state)
│ └── Clear _recent_requests list
├── get_circuit_breaker(provider) — get updated state
└── breaker.get_state() — return new state
| Operation | Key Pattern | Value | TTL |
|---|---|---|---|
SETEX |
circuit_breaker:{provider}:state |
"closed" |
3600s |
SETEX |
circuit_breaker:{provider}:failure_count |
"0" |
3600s |
SETEX |
circuit_breaker:{provider}:success_count |
"0" |
3600s |
SETEX |
circuit_breaker:{provider}:opened_at |
"0.0" |
3600s |
SETEX |
circuit_breaker:{provider}:consecutive_opens |
"0" |
3600s |
Emitted during _transition_to():
| Metric | Operation | Labels |
|---|---|---|
circuit_breaker_state_transitions_total |
.inc() |
provider={provider}, from_state={old}, to_state=closed |
circuit_breaker_current_state |
.set(1) |
provider={provider}, state=closed |
circuit_breaker_current_state |
.set(0) |
provider={provider}, state={old_state} |
| Error | Status | Condition |
|---|---|---|
HTTPException(404) |
404 | Provider not found in circuit breaker registry |
HTTPException(500) |
500 | Any unhandled exception |
flowchart TD
A[POST /circuit-breakers/provider/reset] --> B[reset_circuit_breaker provider]
B --> C{Provider in registry?}
C -->|No| D[404 Not Found]
C -->|Yes| E[breaker.reset]
E --> F[Transition to CLOSED]
F --> G[Reset all counters]
G --> H[Save to Redis pipeline]
H --> I[Update Prometheus metrics]
I --> J[get_circuit_breaker - fetch updated]
J --> K[breaker.get_state]
K --> L[Return 200 with success + new state]
E -->|Exception| M[500 Internal Server Error]
Issue: #1711
Bulk operation that resets ALL registered circuit breakers to CLOSED state. Use with caution — only when confident all providers have recovered.
Route: POST /circuit-breakers/reset-all
Router prefix: /circuit-breakers
Tags: circuit-breakers, monitoring
Response model: dict[str, Any]
Auth: None (public endpoint - consider adding auth for production)
{
"success": true,
"message": "All circuit breakers have been reset",
"reset_count": 5,
"states": {
"openrouter": { "provider": "openrouter", "state": "closed", ... },
"groq": { "provider": "groq", "state": "closed", ... }
}
}reset_all_provider_circuit_breakers()
├── reset_all_circuit_breakers() [src/services/circuit_breaker.py]
│ ├── Thread-safe via _registry_lock
│ └── For each breaker in registry:
│ └── breaker.reset()
│ ├── _transition_to(CLOSED, "manual reset")
│ │ ├── _save_state_to_redis() — pipeline SETEX x5
│ │ └── Update Prometheus metrics
│ └── Clear _recent_requests
└── get_all_circuit_breakers() — fetch all states for response
└── For each breaker: get_state() → load from Redis + calculate rates
| Operation | Key Pattern | Value | TTL |
|---|---|---|---|
SETEX |
circuit_breaker:{provider}:state |
"closed" |
3600s |
SETEX |
circuit_breaker:{provider}:failure_count |
"0" |
3600s |
SETEX |
circuit_breaker:{provider}:success_count |
"0" |
3600s |
SETEX |
circuit_breaker:{provider}:opened_at |
"0.0" |
3600s |
SETEX |
circuit_breaker:{provider}:consecutive_opens |
"0" |
3600s |
Same key patterns as above.
Total Redis operations: N * 10 (5 writes + 5 reads per provider) using pipelines for writes.
For EACH provider that transitions (emitted per breaker):
| Metric | Labels |
|---|---|
circuit_breaker_state_transitions_total |
provider, from_state, to_state=closed |
circuit_breaker_current_state (new) |
provider, state=closed → set to 1 |
circuit_breaker_current_state (old) |
provider, state={old} → set to 0 |
Note: If a breaker is already CLOSED, _transition_to returns early (no-op).
| Error | Status | Condition |
|---|---|---|
HTTPException(500) |
500 | Any exception |
No 404 case — if no breakers registered, returns reset_count: 0 with empty states.
Logging: Uses logger.warning for the reset (audit trail).
flowchart TD
A[POST /circuit-breakers/reset-all] --> B[reset_all_circuit_breakers]
B --> C[For each breaker in registry]
C --> D[breaker.reset]
D --> E[Transition to CLOSED]
E --> F[Save to Redis]
F --> G[Update Prometheus]
G --> H{More breakers?}
H -->|Yes| C
H -->|No| I[get_all_circuit_breakers]
I --> J[Load all states]
J --> K[Return 200 with reset_count + all states]
B -->|Exception| L[500 Internal Server Error]
5 endpoints
Issue: #1712
Returns available settings options for the code router, including configurable fields and their descriptions, plus all available routing modes. Used by client applications to build dynamic settings UIs.
Route: GET /code-router/settings/options
Router prefix: /code-router
Tags: code-router
Auth: None (intentionally public — exposes non-sensitive configuration)
{
"success": true,
"options": {
"use_code_router": {
"type": "boolean",
"default": true,
"label": "Use Code Router",
"description": "Enable intelligent model selection based on task complexity"
},
"optimization_mode": {
"type": "select",
"default": "balanced",
"label": "Optimization Mode",
"description": "How to balance cost and quality",
"options": [
{"value": "balanced", "label": "Balanced", "description": "..."},
{"value": "price", "label": "Price Optimized", "description": "..."},
{"value": "quality", "label": "Quality Optimized", "description": "..."},
{"value": "agentic", "label": "Agentic Mode", "description": "..."}
],
"depends_on": {"use_code_router": true}
},
"manual_model": {
"type": "model_select",
"default": "anthropic/claude-sonnet-4",
"label": "Manual Model",
"depends_on": {"use_code_router": false}
},
"show_routing_info": { "type": "boolean", "default": true, ... },
"show_savings": { "type": "boolean", "default": true, ... }
},
"modes": [
{"value": "balanced", "label": "Balanced", "description": "Auto-select best price/performance balance"},
{"value": "price", "label": "Price", "description": "Optimize for lowest cost while maintaining quality"},
{"value": "quality", "label": "Quality", "description": "Optimize for highest quality, use better models"},
{"value": "agentic", "label": "Agentic", "description": "Always use premium models for complex tasks"}
]
}get_code_router_settings_options()
├── get_settings_options() [src/services/code_router_client.py]
│ └── Returns hardcoded dict describing all configurable options
└── CodeRouterMode enum [src/services/code_router_client.py]
└── Iterated to build modes list
└── _get_mode_description(mode) [src/routes/code_router.py]
└── Returns description string from hardcoded dict
| Value | Description |
|---|---|
BALANCED |
"balanced" - Auto-select best price/performance |
PRICE |
"price" - Optimize for lowest cost |
QUALITY |
"quality" - Optimize for highest quality |
AGENTIC |
"agentic" - Always use premium models |
None. Entirely in-memory/static configuration.
None.
None.
No explicit error handling. This endpoint is a simple dict return with no external dependencies. If an exception occurs (unlikely), FastAPI's default 500 handler catches it.
flowchart TD
A[GET /code-router/settings/options] --> B[get_settings_options - static config]
B --> C[Build options dict]
A --> D[Iterate CodeRouterMode enum]
D --> E[Build modes list with descriptions]
C --> F[Return 200 JSON]
E --> F
Issue: #1713
Returns model tier configuration for the code router, including models per tier, the fallback model, and baseline models for savings calculations. Data is loaded from the code_quality_priors.json file.
Route: GET /code-router/tiers
Router prefix: /code-router
Tags: code-router
Auth: None (intentionally public — exposes non-sensitive configuration)
{
"success": true,
"tiers": {
"1": {
"models": [
{
"id": "anthropic/claude-opus-4",
"name": "Claude Opus 4",
"provider": "anthropic",
"swe_bench": 72.5,
"human_eval": 96.4,
"price_input": 15.0,
"price_output": 75.0,
"strengths": ["code_generation", "debugging", "architecture"]
}
]
},
"2": { ... },
"3": { ... },
"4": { ... }
},
"fallback_model": {
"id": "zai/glm-4.7",
"provider": "zai"
},
"baselines": {
"gpt-4o": { "price_input": ..., "price_output": ... },
"claude-3.5-sonnet": { ... }
}
}get_code_router_tiers()
├── get_model_tiers() [src/services/code_router.py]
│ └── _load_quality_priors() — lazy-loads from code_quality_priors.json
│ └── File: src/services/code_quality_priors.json
│ └── Cached in module-level _quality_priors variable
├── get_fallback_model() [src/services/code_router.py]
│ └── _load_quality_priors() (cached, returns same dict)
└── get_baselines() [src/services/code_router.py]
└── _load_quality_priors() (cached, returns same dict)
-
File:
src/services/code_quality_priors.json -
Caching: Module-level
_quality_priorsvariable — loaded once, never reloaded -
Error handling: If file not found or JSON parse fails:
- Logs error
- Captures to Sentry (if available)
- Falls back to minimal config:
{"model_tiers": {}, "fallback_model": {"id": "zai/glm-4.7", "provider": "zai"}, "baselines": {}}
None. All data comes from static JSON file.
None.
None.
No explicit error handling in the route. The underlying _load_quality_priors() has its own error handling with fallback values. If the function somehow throws, FastAPI's default 500 handler catches it.
flowchart TD
A[GET /code-router/tiers] --> B{Quality priors loaded?}
B -->|Yes - cached| C[Return cached data]
B -->|No - first call| D[Load code_quality_priors.json]
D -->|Success| E[Cache in module variable]
D -->|File error| F[Log error + Sentry capture]
F --> G[Use minimal fallback config]
E --> H[Extract model_tiers]
G --> H
H --> I[Extract fallback_model]
I --> J[Extract baselines]
J --> K[Return 200 JSON with tiers + fallback + baselines]
Issue: #1714
Handler: get_code_router_stats() in src/routes/code_router.py (line 233)
Router prefix: /code-router
Tags: ["code-router"]
Authentication: None required (public endpoint)
Returns dict[str, Any] (no Pydantic model). Shape:
# Success case:
{
"success": True,
"stats": {
"tiers_loaded": int, # Number of model tiers
"models_available": int, # Total models across all tiers
"fallback_model": str | None, # Fallback model ID
"baselines": list[str], # Baseline model keys
"metrics_enabled": bool, # Whether Prometheus module found
}
}
# Error case (graceful degradation, still 200):
{
"success": False,
"error": str,
"stats": {}
}get_code_router_stats()
├── get_router() # src/services/code_router.py:405
│ └── CodeRouter.__init__() # src/services/code_router.py:93
│ ├── get_classifier() # src/services/code_classifier.py
│ ├── get_model_tiers() # src/services/code_router.py:66
│ │ └── _load_quality_priors() # src/services/code_router.py:32
│ │ └── Reads code_quality_priors.json from disk (cached globally)
│ ├── get_fallback_model() # src/services/code_router.py:71
│ │ └── _load_quality_priors() # (cached)
│ ├── get_baselines() # src/services/code_router.py:76
│ │ └── _load_quality_priors() # (cached)
│ └── _build_model_lookup() # src/services/code_router.py:102
│ └── Builds _model_lookup and _tier_models dicts
├── router_instance.model_tiers # Access cached tiers dict
├── router_instance.fallback_model # Access cached fallback
├── router_instance.baselines # Access cached baselines
└── importlib.util.find_spec("src.services.prometheus_metrics")
└── Checks if Prometheus metrics module is importable
None. This endpoint reads only from in-memory/file-cached data.
None. No Redis interaction.
None emitted directly by this endpoint. It only checks if the src.services.prometheus_metrics module is importable via importlib.util.find_spec().
- Standard FastAPI middleware pipeline applies (Sentry, observability, timeout, security, gzip, trace)
- No authentication middleware enforced (no
Depends()for auth)
| Error Path | Status Code | Detail |
|---|---|---|
| Any exception in handler | 200 | Returns {"success": False, "error": str(e), "stats": {}} - graceful degradation |
The endpoint intentionally does not raise HTTPException on failure. It returns a 200 with success: False for graceful degradation since stats are non-critical.
flowchart TD
A[GET /code-router/stats] --> B{try block}
B --> C[get_router - singleton]
C --> D{_router is None?}
D -->|Yes| E[CodeRouter.__init__]
E --> F[Load quality priors from JSON]
F --> G[Build model lookup]
D -->|No| H[Return cached instance]
G --> H
H --> I[Build stats dict]
I --> J[Count tiers_loaded]
I --> K[Sum models_available]
I --> L[Get fallback_model.id]
I --> M[List baselines keys]
I --> N[Check prometheus_metrics importable]
N --> O[Return success: true + stats]
B -->|Exception| P[Log error]
P --> Q[Return success: false + error string + empty stats]
Issue: #1715
Handler: test_code_routing() in src/routes/code_router.py (line 143)
Router prefix: /code-router
Tags: ["code-router"]
Authentication: None required (public endpoint)
| Field | Type | Default | Validation |
|---|---|---|---|
prompt |
str |
required | Must be non-empty |
mode |
RoutingMode (Literal["auto","price","quality","agentic"]) |
"auto" |
field_validator lowercases and validates against VALID_ROUTING_MODES
|
context |
dict[str, Any] | None |
None |
Optional context dict |
| Field | Type | Description |
|---|---|---|
model_id |
str |
Selected model identifier |
provider |
str |
Provider slug |
tier |
int |
Selected tier number (1-4) |
task_category |
str |
Classified task category |
complexity |
str |
Classified complexity level |
confidence |
float |
Classification confidence score |
mode |
str |
Routing mode used |
routing_latency_ms |
float |
Time taken for routing decision in ms |
savings_estimate |
dict[str, Any] |
Savings vs baselines |
model_info |
dict[str, Any] |
Selected model metadata |
test_code_routing(request)
├── route_code_prompt() # src/services/code_router.py:413
│ └── get_router().route() # src/services/code_router.py:113
│ ├── classifier.classify(prompt, context) # src/services/code_classifier.py
│ │ └── Pattern matching + keyword analysis
│ │ └── Returns {category, complexity, default_tier, min_tier, confidence}
│ ├── _calculate_target_tier() # src/services/code_router.py:198
│ │ └── Mode-based tier selection with quality gates
│ │ ├── agentic → always tier 1
│ │ ├── quality → max(1, default_tier - 1) clamped by min_tier
│ │ ├── price → default_tier clamped by min_tier
│ │ └── auto → default_tier clamped by min_tier
│ ├── _select_model_from_tier() # src/services/code_router.py:237
│ │ └── Score models by strengths, price/quality benchmarks
│ │ └── Returns highest-scored model or fallback_model
│ ├── _calculate_savings_estimate() # src/services/code_router.py:292
│ │ └── Compare selected model cost vs baselines (1K input + 500 output tokens)
│ └── _track_routing_metrics() # src/services/code_router.py:360
│ ├── code_router_requests_total.labels(...).inc()
│ ├── code_router_latency_seconds.observe(...)
│ └── code_router_savings_dollars.labels(...).inc()
└── Return RouteTestResponse
None. All data comes from in-memory/file-cached configuration.
None. No Redis interaction.
| Metric Name | Type | Labels |
|---|---|---|
code_router_requests_total |
Counter |
task_category, complexity, mode, selected_model, selected_tier
|
code_router_latency_seconds |
Histogram | (none) — buckets: 0.5ms to 100ms |
code_router_savings_dollars_total |
Counter |
baseline, task_category
|
Metrics are emitted via _track_routing_metrics() inside CodeRouter.route(). If prometheus_metrics module is not importable, metrics are silently skipped.
- Standard FastAPI middleware pipeline (Sentry, observability, timeout, security, gzip, trace)
- No authentication middleware (no
Depends()for auth) - Request body validated by Pydantic before handler runs
| Error Path | Status Code | Detail |
|---|---|---|
Invalid mode value |
422 | Pydantic validation error |
Missing prompt field |
422 | Pydantic validation error |
Any exception in route_code_prompt()
|
500 | "Routing test failed: {error}" |
flowchart TD
A[POST /code-router/test] --> B[Pydantic validation]
B -->|Invalid| B1[422 Validation Error]
B -->|Valid| C{try block}
C --> D[route_code_prompt]
D --> E[get_router - singleton]
E --> F[classifier.classify prompt]
F --> G[Determine category + complexity + tiers]
G --> H[_calculate_target_tier based on mode]
H --> I{Mode?}
I -->|agentic| I1[Tier 1 always]
I -->|quality| I2[Bump up tier, respect min_tier]
I -->|price| I3[Default tier, respect min_tier]
I -->|auto| I4[Default tier, respect min_tier]
I1 --> J[_select_model_from_tier]
I2 --> J
I3 --> J
I4 --> J
J --> K[Score models by strengths + mode preference]
K --> L{Models in tier?}
L -->|Yes| M[Select highest scored]
L -->|No| N[Use fallback_model]
M --> O[_calculate_savings_estimate]
N --> O
O --> P[_track_routing_metrics - Prometheus]
P --> Q[Return RouteTestResponse]
C -->|Exception| R[Log error]
R --> S[HTTPException 500]
Issue: #1716
Handler: validate_code_router_settings() in src/routes/code_router.py (line 175)
Router prefix: /code-router
Tags: ["code-router"]
Authentication: None required (public endpoint)
| Field | Type | Default | Validation |
|---|---|---|---|
use_code_router |
bool |
True |
Standard bool |
optimization_mode |
str |
"balanced" |
Validated against CodeRouterMode enum values |
manual_model |
str | None |
None |
Optional; required when use_code_router=False
|
| Field | Type | Default | Description |
|---|---|---|---|
valid |
bool |
- | Whether settings are valid |
model_string |
str |
- | Resulting model string (e.g., "router:code:price") |
errors |
list[str] |
[] |
Validation errors |
warnings |
list[str] |
[] |
Validation warnings |
validate_code_router_settings(request)
├── CodeRouterMode enum validation # src/services/code_router_client.py:27
│ └── Check optimization_mode against ["balanced", "price", "quality", "agentic"]
├── If use_code_router is False:
│ ├── Check manual_model is provided
│ └── _is_valid_model_id(manual_model) # src/routes/code_router.py:289
│ └── Check "/" in model_id OR model_id in known_aliases list
│ └── Known aliases: gpt-4, gpt-4o, gpt-4o-mini, gpt-3.5-turbo,
│ claude-3-opus, claude-3-sonnet, claude-3-haiku, gemini-pro, gemini-flash
├── CodeRouterSettings(...) # src/services/code_router_client.py:196
│ └── Pydantic model with mode, manual_model, router toggle
└── settings.get_model_string() # src/services/code_router_client.py:232
├── If not use_code_router → return manual_model
├── If BALANCED → "router:code"
└── Else → f"router:code:{mode.value}"
None. Pure validation logic, no database interaction.
None. No Redis interaction.
None. No metrics emitted.
- Standard FastAPI middleware pipeline (Sentry, observability, timeout, security, gzip, trace)
- No authentication middleware
- Request body validated by Pydantic before handler runs
| Error Path | Status Code | Detail |
|---|---|---|
| Pydantic validation failure on request | 422 | Automatic validation error |
Invalid optimization_mode
|
200 | Returns valid: false with error in errors list |
use_code_router=False without manual_model
|
200 | Returns valid: false with error |
Invalid manual_model format |
200 | Returns valid: true with warning in warnings list |
Exception in CodeRouterSettings construction |
200 | Returns valid: false with error string |
Note: This endpoint never raises HTTPException. All validation failures return 200 with valid: false.
flowchart TD
A[POST /code-router/settings/validate] --> B[Pydantic request validation]
B -->|Invalid| B1[422 Validation Error]
B -->|Valid| C[Initialize errors + warnings lists]
C --> D{optimization_mode in valid modes?}
D -->|No| E[Add error: invalid mode]
D -->|Yes| F{use_code_router?}
F -->|False| G{manual_model provided?}
G -->|No| H[Add error: manual_model required]
G -->|Yes| I{_is_valid_model_id?}
I -->|No| J[Add warning: model may not be available]
I -->|Yes| K[Continue]
F -->|True| K
J --> K
H --> K
E --> K
K --> L{errors list empty?}
L -->|No| M[Return valid:false + errors + warnings]
L -->|Yes| N{try: build CodeRouterSettings}
N -->|Success| O[settings.get_model_string]
O --> P[Return valid:true + model_string + warnings]
N -->|Exception| Q[Return valid:false + error string]
3 endpoints
Issue: #1717
Handler: get_available_coupons() in src/routes/coupons.py (line 88)
Tags: ["coupons"]
Authentication: Required - get_current_user (Bearer token)
| Field | Type | Description |
|---|---|---|
coupon_id |
int |
Coupon ID |
code |
str |
Coupon code |
value_usd |
float |
Dollar value |
coupon_scope |
str |
"user_specific" or "global" |
coupon_type |
str |
"promotional", "referral", "compensation", "partnership" |
description |
str | None |
Internal description |
valid_until |
datetime |
Expiration date |
remaining_uses |
int |
Uses remaining |
get_available_coupons(user)
├── Depends(get_current_user) # src/security/deps.py:192
│ ├── Depends(get_api_key) # src/security/deps.py:74
│ │ ├── HTTPBearer credential extraction
│ │ ├── validate_api_key_security() # src/security/security.py
│ │ │ └── Key lookup, status, expiry, IP, domain checks
│ │ ├── get_user(api_key) # src/services/user_lookup_cache.py
│ │ └── audit_logger.log_api_key_usage() # src/security/security.py
│ ├── get_user(api_key) # src/services/user_lookup_cache.py
│ └── validate_trial_expiration(user) # src/utils/trial_utils.py
│ └── Raises HTTPException(402) if trial expired
├── get_available_coupons_for_user(user_id) # src/db/coupons.py:450
│ ├── get_supabase_client() # src/config/supabase_config.py
│ └── client.rpc("get_available_coupons", # Supabase RPC call
│ {"p_user_id": user_id})
└── Return [AvailableCouponResponse(**c) for c in coupons]
| Operation | Table/RPC | Details |
|---|---|---|
| RPC call | get_available_coupons |
Params: {"p_user_id": user_id}
|
The get_available_coupons is a PostgreSQL function that returns both user-specific coupons assigned to this user and global coupons not yet redeemed by this user.
None directly. However, get_user() via user_lookup_cache may use Redis for user caching.
None emitted directly by this endpoint. Authentication middleware may emit metrics.
- Standard middleware pipeline (Sentry, observability, timeout, security, gzip, trace)
- Bearer token authentication via
get_current_userdependency chain - API key validated for: active status, expiration, request limits, IP allowlist, domain restrictions
- Trial expiration checked (raises 402 if expired)
- Audit log entry created for API key usage
| Error Path | Status Code | Detail |
|---|---|---|
| No Authorization header | 401 | "Authorization header is required" |
| Invalid/inactive API key | 401 | Various key validation messages |
| Expired API key | 401 | "API key expired" |
| Rate limit exceeded | 429 | "limit reached" |
| IP not allowed | 403 | "IP address not allowed" |
| User not found | 404 | "User not found" |
| Trial expired | 402 | Trial expiration message |
| RPC call fails | 500 | "Internal server error" |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[GET /coupons/available] --> B[get_current_user dependency]
B --> C[get_api_key - validate Bearer token]
C -->|No token| C1[401 Authorization required]
C -->|Invalid| C2[401/403/429 Key validation error]
C -->|Valid| D[get_user - lookup user]
D -->|Not found| D1[404 User not found]
D -->|Found| E[validate_trial_expiration]
E -->|Expired| E1[402 Trial expired]
E -->|Valid| F{try block}
F --> G[get_available_coupons_for_user]
G --> H[Supabase RPC: get_available_coupons]
H --> I[Return coupon list from DB function]
I --> J[Map to AvailableCouponResponse list]
J --> K[Return 200 with coupon list]
F -->|HTTPException| L[Re-raise]
F -->|Other Exception| M[Log error]
M --> N[500 Internal server error]
Issue: #1718
Handler: get_redemption_history() in src/routes/coupons.py (line 112)
Tags: ["coupons"]
Authentication: Required - get_current_user (Bearer token)
| Param | Type | Default | Validation |
|---|---|---|---|
limit |
int |
50 |
Standard int |
| Field | Type | Description |
|---|---|---|
redemptions |
list[RedemptionHistoryItem] |
List of redemption records |
total_redemptions |
int |
Count of redemptions |
total_value_redeemed |
float |
Sum of all values redeemed |
| Field | Type | Description |
|---|---|---|
id |
int |
Redemption record ID |
coupon_code |
str |
Coupon code |
coupon_scope |
str |
"user_specific" or "global" |
coupon_type |
str |
Type category |
value_applied |
float |
Value applied |
redeemed_at |
datetime |
Redemption timestamp |
user_balance_before |
float |
Balance before redemption |
user_balance_after |
float |
Balance after redemption |
get_redemption_history(limit, user)
├── Depends(get_current_user) # (same auth chain as #1717)
├── get_user_redemption_history(user_id, limit) # src/db/coupons.py:474
│ ├── get_supabase_client() # src/config/supabase_config.py
│ └── client.table("coupon_redemptions")
│ .select("*, coupons(code, coupon_type, coupon_scope)")
│ .eq("user_id", user_id)
│ .order("redeemed_at", desc=True)
│ .limit(limit)
│ .execute()
└── Transform data:
├── Extract nested "coupons" join data
├── Build RedemptionHistoryItem for each record
└── Sum total_value from value_applied fields
| Operation | Table | Columns | Filters | Order |
|---|---|---|---|---|
| SELECT | coupon_redemptions |
*, coupons(code, coupon_type, coupon_scope) |
.eq("user_id", user_id) |
.order("redeemed_at", desc=True) |
This uses a PostgREST foreign key join to fetch coupon details inline with redemption records. Limited by the limit parameter (default 50).
None directly. User lookup cache may use Redis.
None.
- Standard middleware pipeline
- Bearer token authentication via
get_current_userdependency chain - Trial expiration validation
| Error Path | Status Code | Detail |
|---|---|---|
| Auth failures | 401/402/403/404/429 | Various auth errors (same as #1717) |
| Supabase query error | 500 | "Internal server error" |
| Any other exception | 500 | "Internal server error" |
On Supabase error, get_user_redemption_history returns [] (empty list), which would result in an empty response rather than a 500.
flowchart TD
A[GET /coupons/history?limit=50] --> B[get_current_user dependency]
B -->|Auth fail| B1[401/402/403/404/429]
B -->|Success| C{try block}
C --> D[get_user_redemption_history]
D --> E[Supabase SELECT coupon_redemptions + JOIN coupons]
E --> F[Return redemptions list]
F --> G{For each redemption}
G --> H[Extract nested coupons data]
H --> I[Build RedemptionHistoryItem]
I --> J[Accumulate total_value]
J --> K[Return RedemptionHistoryResponse]
C -->|HTTPException| L[Re-raise]
C -->|Other Exception| M[500 Internal server error]
Issue: #1723
Handler: redeem_coupon_endpoint() in src/routes/coupons.py (line 47)
Tags: ["coupons"]
Authentication: Required - get_current_user (Bearer token)
| Field | Type | Default | Validation |
|---|---|---|---|
code |
str |
required |
min_length=3, max_length=50
|
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
- | Whether redemption succeeded |
message |
str |
- | Success/error message |
coupon_code |
str | None |
None |
Redeemed code |
coupon_value |
float | None |
None |
Dollar value applied |
previous_balance |
float | None |
None |
Balance before |
new_balance |
float | None |
None |
Balance after |
error_code |
str | None |
None |
Error code if failed |
redeem_coupon_endpoint(request, redemption_request, user)
├── Depends(get_current_user) # (auth chain)
├── Extract client_host + user_agent from request
├── redeem_coupon(code, user_id, ip, ua) # src/db/coupons.py:316
│ ├── get_supabase_client()
│ ├── Step 1: validate_coupon(code, user_id) # src/db/coupons.py:251
│ │ ├── get_supabase_client()
│ │ └── client.rpc("is_coupon_redeemable",
│ │ {"p_coupon_code": code, "p_user_id": user_id})
│ │ └── PostgreSQL function validates:
│ │ - Code exists, is_active, not expired
│ │ - User hasn't already redeemed
│ │ - max_uses not exceeded
│ │ - scope rules (user_specific assignment)
│ │ └── Returns: {is_valid, error_code, error_message, coupon_id, coupon_value}
│ ├── If not valid → return failure dict
│ ├── Step 2: Get user balance
│ │ └── SELECT credits FROM users WHERE id = user_id
│ ├── Step 3: Update user balance
│ │ └── UPDATE users SET credits = new_balance WHERE id = user_id
│ ├── Step 4: Increment coupon usage (two methods)
│ │ ├── SELECT times_used + manual UPDATE coupons (non-atomic)
│ │ └── client.rpc("increment", {"row_id": coupon_id, "x": 1})
│ ├── Step 5: Record redemption
│ │ └── INSERT INTO coupon_redemptions {coupon_id, user_id, value_applied,
│ │ user_balance_before, user_balance_after, ip_address, user_agent}
│ └── Return success dict
├── If result not success → JSONResponse(400)
└── If success → Return RedemptionResponse
| Step | Operation | Table | Details |
|---|---|---|---|
| 1 | RPC | is_coupon_redeemable |
Params: p_coupon_code, p_user_id
|
| 2 | SELECT | users |
Columns: credits, Filter: .eq("id", user_id)
|
| 3 | UPDATE | users |
Set: credits=new_balance, Filter: .eq("id", user_id)
|
| 4a | SELECT + UPDATE | coupons |
Read times_used, then update (non-atomic) |
| 4b | RPC | increment |
Params: row_id, x (atomic increment) |
| 5 | INSERT | coupon_redemptions |
All redemption fields |
Note: Step 4 has a race condition -- it does both a non-atomic read+write AND an RPC atomic increment. This could result in double-incrementing times_used.
None directly.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth failures | 401/402/403/404/429 | Various |
| Invalid code format | 422 | Pydantic validation (min/max length) |
| Coupon not redeemable (validation fails) | 400 | JSONResponse with result dict |
| User not found (during balance lookup) | Returns failure dict → 400 | |
| Balance update fails | 500 via exception path | Internal error |
| Redemption record insert fails | Still returns success | Audit failure only |
| Any other exception | 500 | "Internal server error" |
flowchart TD
A[POST /coupons/redeem] --> B[Pydantic validation]
B -->|Invalid code length| B1[422]
B -->|Valid| C[get_current_user auth]
C -->|Auth fail| C1[401/402/403/404/429]
C -->|Success| D{try block}
D --> E[Extract client_host + user_agent]
E --> F[redeem_coupon]
F --> G[validate_coupon via RPC]
G --> H{is_valid?}
H -->|No| I[Return 400 with error]
H -->|Yes| J[SELECT user credits]
J --> K{User found?}
K -->|No| L[Return failure dict → 400]
K -->|Yes| M[UPDATE users SET credits]
M --> N{Update succeeded?}
N -->|No| O[Raise exception → 500]
N -->|Yes| P[Increment coupon times_used]
P --> Q[INSERT coupon_redemptions]
Q --> R{Insert succeeded?}
R -->|No| S[Log error but continue]
R -->|Yes| T[Return 200 RedemptionResponse]
S --> T
D -->|HTTPException| U[Re-raise]
D -->|Other Exception| V[500 Internal server error]
6 endpoints
Issue: #1727
Handler: get_credits_summary_endpoint() in src/routes/credits.py (line 625)
Tags: ["credits", "admin"]
Authentication: Required - require_admin (admin role)
| Param | Type | Default | Description |
|---|---|---|---|
user_id |
int | None |
None |
Filter by specific user |
from_date |
str | None |
None |
Start date (YYYY-MM-DD) |
to_date |
str | None |
None |
End date (YYYY-MM-DD) |
User-specific response (when user_id is provided):
{
"status": "success",
"user_id": int,
"user_info": {"id": int, "username": str, "credits": float},
"current_balance": float,
"summary": { # from get_transaction_summary()
"total_transactions": int,
"total_credits_added": float,
"total_credits_used": float,
"net_change": float,
"by_type": {type: {"count": int, "total_amount": float, "average_amount": float}},
"daily_breakdown": [{"date": str, "credits_added": float, "credits_used": float, "count": int}],
"largest_credit": {...} | None,
"largest_charge": {...} | None,
"average_transaction": float,
"transaction_count_by_direction": {"credits": int, "charges": int}
},
"filters": {"from_date": str, "to_date": str},
"timestamp": str
}System-wide response (no user_id):
{
"status": "success",
"system_summary": {
"total_users": int,
"total_credits_in_system": float,
"average_credits_per_user": float,
"total_transactions": int,
"total_credits_added": float,
"total_credits_used": float,
"net_change": float,
"by_type": {type: {"count": int, "total_amount": float}}
},
"filters": {...},
"timestamp": str
}get_credits_summary_endpoint(user_id, from_date, to_date, admin_user)
├── Depends(require_admin) # (admin auth chain)
├── get_supabase_client()
├── If user_id provided:
│ ├── get_transaction_summary(user_id, from_date, to_date) # src/db/credit_transactions.py:492
│ │ ├── get_supabase_client()
│ │ ├── SELECT * FROM credit_transactions WHERE user_id = ?
│ │ │ + optional .gte("created_at", from_date)
│ │ │ + optional .lte("created_at", to_date)
│ │ └── Client-side aggregation:
│ │ ├── total_credits_added (positive amounts)
│ │ ├── total_credits_used (negative amounts)
│ │ ├── by_type breakdown
│ │ ├── daily_breakdown
│ │ ├── largest_credit / largest_charge
│ │ └── average_transaction
│ └── SELECT id, username, credits FROM users WHERE id = user_id
├── If no user_id (system-wide):
│ ├── SELECT id, credits FROM users (ALL users)
│ ├── SELECT transaction_type, amount FROM credit_transactions
│ │ + optional date filters
│ └── Client-side aggregation
└── Return dict response
| Operation | Table | Columns | Filters |
|---|---|---|---|
| SELECT | credit_transactions |
* |
.eq("user_id", user_id) + optional date range |
| SELECT | users |
id, username, credits |
.eq("id", user_id) |
| Operation | Table | Columns | Filters |
|---|---|---|---|
| SELECT | users |
id, credits |
None (all users) |
| SELECT | credit_transactions |
transaction_type, amount |
Optional date range |
Performance Warning: System-wide path fetches ALL users and ALL transactions for aggregation client-side.
None.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Any exception | 500 | "Failed to get credits summary" |
flowchart TD
A[GET /credits/summary] --> B[require_admin]
B -->|Not admin| B1[403]
B -->|Admin| C{user_id provided?}
C -->|Yes| D[get_transaction_summary for user]
D --> E[SELECT credit_transactions WHERE user_id]
E --> F[Client-side aggregation]
F --> G[SELECT user info from users]
G --> H[Return user-specific summary]
C -->|No| I[SELECT all users with credits]
I --> J[SELECT all credit_transactions]
J --> K[Client-side aggregation]
K --> L[Calculate totals, averages, by_type]
L --> M[Return system-wide summary]
C -->|Exception| N[500 Failed to get credits summary]
Issue: #1728
Handler: get_credits_transactions_endpoint() in src/routes/credits.py (line 741)
Tags: ["credits", "admin"]
Authentication: Required - require_admin (admin role)
| Param | Type | Default | Validation | Description |
|---|---|---|---|---|
limit |
int |
50 |
ge=1, le=1000 |
Max transactions to return |
offset |
int |
0 |
ge=0 |
Pagination offset |
user_id |
int | None |
None |
- | Filter by user |
transaction_type |
str | None |
None |
- | Filter by type (trial, purchase, api_usage, admin_credit, admin_debit, refund, bonus, transfer) |
from_date |
str | None |
None |
- | Start date (YYYY-MM-DD or ISO) |
to_date |
str | None |
None |
- | End date (YYYY-MM-DD or ISO) |
min_amount |
float | None |
None |
- | Min absolute amount |
max_amount |
float | None |
None |
- | Max absolute amount |
direction |
str | None |
None |
Must be "credit" or "charge" | Filter positive/negative |
sort_by |
str |
"created_at" |
Must be "created_at", "amount", or "transaction_type" | Sort field |
sort_order |
str |
"desc" |
Must be "asc" or "desc" | Sort order |
{
"status": "success",
"transactions": [{
"id": int, "user_id": int, "amount": float,
"transaction_type": str, "description": str,
"balance_before": float, "balance_after": float,
"created_at": str, "payment_id": int | None,
"metadata": dict, "created_by": str | None
}],
"pagination": {
"total": int, # Count of returned items (not DB total)
"limit": int, "offset": int, "has_more": bool
},
"filters_applied": {all filter values},
"timestamp": str
}get_credits_transactions_endpoint(...)
├── Depends(require_admin)
├── Validate direction, sort_by, sort_order
├── get_all_transactions(limit+1, ...) # src/db/credit_transactions.py:290
│ ├── get_supabase_client()
│ ├── Build query: SELECT * FROM credit_transactions
│ │ + optional .eq("user_id", user_id)
│ │ + optional .eq("transaction_type", type)
│ │ + optional .gte("created_at", from_date)
│ │ + optional .lte("created_at", to_date)
│ │ + optional .gt("amount", 0) for "credit" direction
│ │ + optional .lt("amount", 0) for "charge" direction
│ │ + .order(sort_by, desc=desc_order)
│ ├── If min_amount/max_amount:
│ │ └── Fetch ALL, filter client-side by abs(amount), then paginate
│ └── Else:
│ └── .range(offset, offset+limit-1) — DB-side pagination
├── has_more = len(results) > limit
├── Trim to limit
├── Format transactions list
└── Return response dict
| Operation | Table | Columns | Filters | Pagination |
|---|---|---|---|---|
| SELECT | credit_transactions |
* |
user_id, transaction_type, date range, direction | DB-side .range() OR client-side (if amount filters) |
Performance Note: When min_amount or max_amount are used, the query fetches ALL matching rows and filters client-side, then applies pagination. This can be slow for large datasets.
has_more detection: Fetches limit + 1 rows; if more than limit returned, has_more = True.
None.
None.
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Invalid direction | 400 | "direction must be 'credit' or 'charge'" |
| Invalid sort_by | 400 | "sort_by must be 'created_at', 'amount', or 'transaction_type'" |
| Invalid sort_order | 400 | "sort_order must be 'asc' or 'desc'" |
| Any exception | 500 | "Failed to get credit transactions" |
flowchart TD
A[GET /credits/transactions] --> B[require_admin]
B -->|Not admin| B1[403]
B -->|Admin| C[Validate direction, sort_by, sort_order]
C -->|Invalid| C1[400 validation error]
C -->|Valid| D[get_all_transactions with limit+1]
D --> E[Build Supabase query with filters]
E --> F{min/max amount filters?}
F -->|Yes| G[Fetch ALL rows]
G --> H[Filter client-side by abs amount]
H --> I[Apply offset + limit pagination]
F -->|No| J[DB-side .range pagination]
I --> K[Determine has_more]
J --> K
K --> L[Trim to limit]
L --> M[Format transaction dicts]
M --> N[Return response with pagination]
D -->|Exception| O[500 Failed to get transactions]
Issue: #1729
Handler: add_credits_endpoint() in src/routes/credits.py (line 189)
Tags: ["credits", "admin"]
Authentication: Required - require_admin (admin role)
| Field | Type | Default | Validation |
|---|---|---|---|
user_id |
int |
required | Target user ID |
amount |
float |
required |
gt=0 (must be positive) |
reason |
str |
required | min_length=10 |
description |
str |
"Admin credit addition" |
|
metadata |
dict[str, Any] | None |
None |
Optional |
| Field | Type | Default |
|---|---|---|
status |
str |
- |
message |
str |
- |
user_id |
int |
- |
previous_balance |
float |
- |
new_balance |
float |
- |
amount_changed |
float |
- |
transaction_id |
int | None |
None |
timestamp |
str |
- |
add_credits_endpoint(request, admin_user)
├── Depends(require_admin) # (admin auth chain)
├── _validate_admin_credit_grant(amount, admin) # src/routes/credits.py:134
│ ├── Check amount <= Config.ADMIN_MAX_CREDIT_GRANT (default $1000)
│ │ └── If exceeded → 400
│ ├── get_admin_daily_grant_total(admin_id) # src/db/credit_transactions.py:672
│ │ ├── get_supabase_client()
│ │ └── SELECT amount FROM credit_transactions
│ │ WHERE transaction_type = 'admin_credit'
│ │ AND created_by = 'admin:{id}'
│ │ AND created_at >= 24h_ago
│ │ AND amount > 0
│ │ └── Sum amounts; on error → return inf (fail closed)
│ └── Check daily_total + amount <= ADMIN_DAILY_GRANT_LIMIT (default $5000)
│ └── If exceeded → 400
├── get_supabase_client()
├── SELECT id, credits FROM users WHERE id = user_id
│ └── If not found → 404
├── Calculate balance_after = balance_before + amount
├── UPDATE users SET credits = balance_after, updated_at = now()
│ WHERE id = user_id
│ └── If no data returned → 500
├── log_credit_transaction(...) # src/db/credit_transactions.py:68
│ ├── execute_with_retry(do_insert, max_retries=2)
│ │ ├── get_supabase_client()
│ │ └── INSERT INTO credit_transactions
│ │ {user_id, amount, transaction_type='admin_credit',
│ │ description, balance_before, balance_after,
│ │ metadata={...reason, admin_user_id, admin_username},
│ │ created_by='admin:{id}', created_at}
│ └── On connection error → refresh_supabase_client() and retry
│ └── On final failure → capture_database_error (Sentry)
└── Return CreditResponse
| Step | Operation | Table | Columns | Filters |
|---|---|---|---|---|
| Safety check | SELECT | credit_transactions |
amount |
transaction_type='admin_credit', created_by='admin:{id}', created_at >= 24h_ago, amount > 0
|
| Get user | SELECT | users |
id, credits |
.eq("id", user_id) |
| Update balance | UPDATE | users |
credits, updated_at |
.eq("id", user_id) |
| Audit trail | INSERT | credit_transactions |
All fields | With retry logic |
None directly. execute_with_retry may trigger refresh_supabase_client() which resets the HTTP connection pool.
None emitted directly. The capture_database_error Sentry call is the observability hook.
| Control | Config Var | Default | Behavior |
|---|---|---|---|
| Per-transaction cap | ADMIN_MAX_CREDIT_GRANT |
$1000 |
400 if exceeded |
| 24-hour rolling limit | ADMIN_DAILY_GRANT_LIMIT |
$5000 |
400 if cumulative exceeds |
| Audit trail | - | Always | Transaction logged with admin ID, reason |
| Fail-closed | - | On query error |
get_admin_daily_grant_total returns inf
|
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Pydantic validation (amount<=0, reason<10 chars) | 422 | Automatic |
| Amount exceeds per-transaction cap | 400 | Detailed message with limit info |
| Would exceed daily grant limit | 400 | Detailed message with remaining budget |
| User not found | 404 | "User {id} not found" |
| Balance update fails | 500 | "Failed to update user credits" |
| Transaction log fails | Continues | Returns transaction_id: None
|
| Any other exception | 500 | "Failed to add credits" |
flowchart TD
A[POST /credits/add] --> B[Pydantic validation]
B -->|Invalid| B1[422]
B -->|Valid| C[require_admin]
C -->|Not admin| C1[403]
C -->|Admin| D[_validate_admin_credit_grant]
D --> E{Amount <= max single grant?}
E -->|No| E1[400 Exceeds per-transaction cap]
E -->|Yes| F[Query 24h admin grant total]
F --> G{daily_total + amount <= daily limit?}
G -->|No| G1[400 Exceeds daily limit]
G -->|Yes| H[SELECT user by ID]
H -->|Not found| H1[404]
H -->|Found| I[Calculate new balance]
I --> J[UPDATE users credits]
J -->|Fails| J1[500]
J -->|Success| K[log_credit_transaction with retry]
K --> L[Return CreditResponse]
Issue: #1730
Handler: adjust_credits_endpoint() in src/routes/credits.py (line 279)
Tags: ["credits", "admin"]
Authentication: Required - require_admin (admin role)
| Field | Type | Default | Validation |
|---|---|---|---|
user_id |
int |
required | Target user ID |
amount |
float |
required | Can be positive (add) or negative (remove) |
description |
str |
"Admin credit adjustment" |
|
reason |
str |
required | min_length=10 |
metadata |
dict[str, Any] | None |
None |
Optional |
adjust_credits_endpoint(request, admin_user)
├── Depends(require_admin)
├── If amount > 0:
│ └── _validate_admin_credit_grant(amount, admin) # Same safety checks as /add
│ ├── Per-transaction cap check
│ └── 24h rolling limit check
├── get_supabase_client()
├── SELECT id, credits FROM users WHERE id = user_id
│ └── If not found → 404
├── Calculate balance_after = balance_before + amount
│ └── If balance_after < 0 → 400 "negative balance"
├── UPDATE users SET credits = balance_after, updated_at = now()
├── Determine transaction_type:
│ ├── amount > 0 → TransactionType.ADMIN_CREDIT
│ └── amount <= 0 → TransactionType.ADMIN_DEBIT
├── log_credit_transaction(...)
│ └── INSERT INTO credit_transactions (with retry)
└── Return CreditResponse
| Step | Operation | Table | Columns | Filters |
|---|---|---|---|---|
| Safety (positive only) | SELECT | credit_transactions |
amount |
admin daily grants |
| Get user | SELECT | users |
id, credits |
.eq("id", user_id) |
| Update balance | UPDATE | users |
credits, updated_at |
.eq("id", user_id) |
| Audit trail | INSERT | credit_transactions |
All fields | With retry |
None.
None.
- Amount can be negative -- allows credit removal
- Negative balance protection -- raises 400 if adjustment would go below 0
-
Transaction type varies --
admin_creditfor positive,admin_debitfor negative -
Safety controls only for positive --
_validate_admin_credit_grantskipped for debits
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Pydantic validation (reason < 10 chars) | 422 | Automatic |
| Positive amount exceeds cap/daily limit | 400 | Detailed limit message |
| User not found | 404 | "User {id} not found" |
| Would result in negative balance | 400 | Shows current balance + adjustment |
| Balance update fails | 500 | "Failed to update user credits" |
| Any other exception | 500 | "Failed to adjust credits" |
flowchart TD
A[POST /credits/adjust] --> B[Pydantic validation]
B -->|Invalid| B1[422]
B -->|Valid| C[require_admin]
C -->|Not admin| C1[403]
C -->|Admin| D{amount > 0?}
D -->|Yes| E[_validate_admin_credit_grant]
E -->|Exceeds cap| E1[400]
E -->|OK| F[SELECT user]
D -->|No/Zero| F
F -->|Not found| F1[404]
F -->|Found| G[Calculate new balance]
G --> H{balance_after < 0?}
H -->|Yes| H1[400 Negative balance]
H -->|No| I[UPDATE users credits]
I -->|Fails| I1[500]
I -->|Success| J{amount > 0?}
J -->|Yes| K[type = ADMIN_CREDIT]
J -->|No| L[type = ADMIN_DEBIT]
K --> M[log_credit_transaction]
L --> M
M --> N[Return CreditResponse]
Issue: #1731
Handler: bulk_add_credits_endpoint() in src/routes/credits.py (line 385)
Tags: ["credits", "admin"]
Authentication: Required - require_admin (admin role)
| Field | Type | Default | Validation |
|---|---|---|---|
user_ids |
list[int] |
required | min_length=1, max_length=100 |
amount |
float |
required | gt=0 |
reason |
str |
required | min_length=10 |
description |
str |
"Bulk credit addition" |
|
metadata |
dict[str, Any] | None |
None |
Optional |
| Field | Type | Description |
|---|---|---|
status |
str |
"success", "partial", or "failed" |
message |
str |
Summary message |
total_users |
int |
Count of unique users processed |
successful |
int |
Users successfully credited |
failed |
int |
Users that failed |
amount_per_user |
float |
Amount per user |
total_credits_added |
float |
amount * successful |
results |
list[dict] |
Per-user result details |
timestamp |
str |
ISO timestamp |
bulk_add_credits_endpoint(request, admin_user)
├── Depends(require_admin)
├── Deduplicate user_ids → unique_user_ids
├── _validate_admin_credit_grant(amount, admin,
│ is_bulk=True, bulk_user_count=len(unique)) # Safety controls
│ ├── Per-transaction cap: amount <= ADMIN_MAX_CREDIT_GRANT
│ └── Daily limit: daily_total + (amount * user_count) <= ADMIN_DAILY_GRANT_LIMIT
├── get_supabase_client()
├── Batch fetch: SELECT id, credits, username FROM users
│ WHERE id IN (unique_user_ids) # Single query for all users
├── For each unique user_id:
│ ├── Look up user from batch results
│ ├── If not found → record failure, continue
│ ├── Calculate balance_after
│ ├── UPDATE users SET credits, updated_at WHERE id
│ │ └── If fails → record failure, continue
│ ├── log_credit_transaction(type=ADMIN_CREDIT)
│ │ └── metadata includes bulk_operation=True
│ └── Record success with details
├── Determine status:
│ ├── failed=0 → "success"
│ ├── successful>0 && failed>0 → "partial"
│ └── successful=0 → "failed"
└── Return BulkCreditResponse
| Step | Operation | Table | Columns | Filters |
|---|---|---|---|---|
| Safety | SELECT | credit_transactions |
amount |
Admin daily grant query |
| Batch fetch | SELECT | users |
id, credits, username |
.in_("id", unique_user_ids) |
| Per-user update | UPDATE | users |
credits, updated_at |
.eq("id", user_id) — N queries |
| Per-user audit | INSERT | credit_transactions |
All fields | N queries with retry |
Performance Note: The batch SELECT is efficient (single query), but updates and transaction logs are per-user (N+N queries for N users, max 100).
None.
None.
Same as /credits/add but with bulk awareness:
- Per-transaction cap applies to the
amount(not total) - Daily limit check uses
amount * unique_user_countas the total grant amount
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Pydantic: empty user_ids, >100, amount<=0, reason<10 | 422 | Automatic |
| Amount exceeds per-transaction cap | 400 | Detailed message |
| Total grant exceeds daily limit | 400 | Detailed message with remaining budget |
| Individual user not found | Continues | Recorded in results as "failed" |
| Individual update fails | Continues | Recorded in results as "failed" |
| Individual exception | Continues | Logged, recorded as "failed" |
| Any outer exception | 500 | "Failed to perform bulk credit addition" |
flowchart TD
A[POST /credits/bulk-add] --> B[Pydantic validation]
B -->|Invalid| B1[422]
B -->|Valid| C[require_admin]
C -->|Not admin| C1[403]
C -->|Admin| D[Deduplicate user_ids]
D --> E[_validate_admin_credit_grant with bulk=True]
E -->|Exceeds limits| E1[400]
E -->|OK| F[Batch SELECT users by IDs]
F --> G[For each unique user]
G --> H{User found in batch?}
H -->|No| I[Record failure, continue]
H -->|Yes| J[Calculate new balance]
J --> K[UPDATE user credits]
K -->|Fails| L[Record failure, continue]
K -->|Success| M[log_credit_transaction]
M --> N[Record success]
I --> O{More users?}
L --> O
N --> O
O -->|Yes| G
O -->|No| P[Determine overall status]
P --> Q[Return BulkCreditResponse]
Issue: #1732
Handler: refund_credits_endpoint() in src/routes/credits.py (line 536)
Tags: ["credits", "admin"]
Authentication: Required - require_admin (admin role)
| Field | Type | Default | Validation |
|---|---|---|---|
user_id |
int |
required | Target user ID |
amount |
float |
required |
gt=0 (must be positive) |
original_transaction_id |
int | None |
None |
Optional reference |
reason |
str |
"Refund" |
Reason for refund |
metadata |
dict[str, Any] | None |
None |
Optional |
refund_credits_endpoint(request, admin_user)
├── Depends(require_admin)
├── get_supabase_client()
├── SELECT id, credits FROM users WHERE id = user_id
│ └── If not found → 404
├── Calculate balance_after = balance_before + amount
├── UPDATE users SET credits = balance_after, updated_at = now()
│ WHERE id = user_id
│ └── If fails → 500
├── log_credit_transaction( # src/db/credit_transactions.py:68
│ user_id, amount, type=REFUND,
│ description="Refund: {reason}",
│ metadata={reason, original_transaction_id,
│ admin_user_id, admin_username},
│ created_by="admin:{id}")
│ └── execute_with_retry → INSERT INTO credit_transactions
└── Return CreditResponse
| Step | Operation | Table | Columns | Filters |
|---|---|---|---|---|
| Get user | SELECT | users |
id, credits |
.eq("id", user_id) |
| Update balance | UPDATE | users |
credits, updated_at |
.eq("id", user_id) |
| Audit trail | INSERT | credit_transactions |
All fields | With retry logic |
None.
None.
-
No admin safety controls --
_validate_admin_credit_grantis NOT called for refunds -
Transaction type is
REFUND-- notADMIN_CREDIT -
Description prefixed with
"Refund: " -
Tracks
original_transaction_idin metadata for linking to original charge -
reasonfield has a default ("Refund") -- not strictly required like /add
| Error Path | Status Code | Detail |
|---|---|---|
| Auth/admin failures | 401/402/403/404/429 | Various |
| Pydantic validation (amount<=0) | 422 | Automatic |
| User not found | 404 | "User {id} not found" |
| Balance update fails | 500 | "Failed to update user credits" |
| Transaction log fails | Continues | Returns transaction_id: None
|
| Any other exception | 500 | "Failed to refund credits" |
flowchart TD
A[POST /credits/refund] --> B[Pydantic validation]
B -->|Invalid| B1[422]
B -->|Valid| C[require_admin]
C -->|Not admin| C1[403]
C -->|Admin| D{try block}
D --> E[SELECT user by ID]
E -->|Not found| E1[404]
E -->|Found| F[Calculate balance_after = before + amount]
F --> G[UPDATE users credits]
G -->|Fails| G1[500]
G -->|Success| H[log_credit_transaction type=REFUND]
H --> I[Return CreditResponse]
D -->|HTTPException| J[Re-raise]
D -->|Other| K[500 Failed to refund credits]
Note: Unlike /credits/add, refunds bypass the admin grant safety controls (ADMIN_MAX_CREDIT_GRANT and ADMIN_DAILY_GRANT_LIMIT). This is by design -- refunds reverse existing charges.
2 endpoints
Issue: #1734
Returns real-time concurrency gate statistics including active requests, queued requests, utilization percentages, and overall health status. Designed for diagnosing 503 Service Unavailable errors caused by server capacity exhaustion.
Router: APIRouter(prefix="/api/diagnostics", tags=["diagnostics"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict[str, Any]
No query parameters, path parameters, or request body.
{
"active_requests": 5,
"queued_requests": 2,
"concurrency_limit": 20,
"queue_size_limit": 50,
"queue_timeout_seconds": 10.0,
"utilization_percent": 25.0,
"queue_utilization_percent": 4.0,
"status": "healthy",
"available_slots": 15,
"available_queue_slots": 48
}{
"error": "error message string",
"status": "unknown",
"concurrency_limit": 20,
"queue_size_limit": 50
}-
get_concurrency_stats()insrc/routes/diagnostics.py(line 19-90)
-
concurrency_active/concurrency_queuedfromsrc/middleware/concurrency_middleware.py(lazy import at line 44-47) -
Config.CONCURRENCY_LIMIT,Config.CONCURRENCY_QUEUE_SIZE,Config.CONCURRENCY_QUEUE_TIMEOUTfromsrc/config/config.py
-
concurrency_active=Gauge("concurrency_active_requests", "Number of requests currently being processed")(line 27-30) -
concurrency_queued=Gauge("concurrency_queued_requests", "Number of requests waiting in the admission queue")(line 31-34) - Values read via
._value._value(internal prometheus_client Gauge value access)
-
CONCURRENCY_LIMIT=int(os.environ.get("CONCURRENCY_LIMIT", "20"))(line 437) -
CONCURRENCY_QUEUE_SIZE=int(os.environ.get("CONCURRENCY_QUEUE_SIZE", "50"))(line 438) -
CONCURRENCY_QUEUE_TIMEOUT=float(os.environ.get("CONCURRENCY_QUEUE_TIMEOUT", "10.0"))(line 439)
None.
None.
| Metric Name | Type | Labels | Description |
|---|---|---|---|
concurrency_active_requests |
Gauge | none | Current requests being processed |
concurrency_queued_requests |
Gauge | none | Current requests waiting in queue |
Note: This endpoint reads these metrics. They are written by ConcurrencyMiddleware in src/middleware/concurrency_middleware.py.
Additionally, the middleware defines:
| Metric Name | Type | Labels | Description |
|---|---|---|---|
concurrency_rejected_total |
Counter |
reason (queue_full, queue_timeout) |
Total rejected requests |
None. Return type is dict[str, Any].
- Standard middleware pipeline applies (sentry, observability, timeout, security, gzip, trace)
-
ConcurrencyMiddlewareapplies unless path is inCONCURRENCY_EXEMPT_PATHS(/health,/metrics,/ready)./api/diagnostics/concurrencyis NOT exempt, so it is subject to concurrency gating itself.
| Error | Status | Condition |
|---|---|---|
| Generic Exception caught | 200 (degraded) | Any exception in try block returns {"error": str(e), "status": "unknown", ...}
|
No HTTPException is ever raised. All errors are caught and returned in the response body.
| Condition | Status |
|---|---|
utilization >= 90% OR queue_utilization >= 80%
|
"critical" |
utilization >= 70% OR queue_utilization >= 60%
|
"warning" |
| Otherwise | "healthy" |
flowchart TD
A[GET /api/diagnostics/concurrency] --> B{Try block}
B --> C[Import concurrency_active, concurrency_queued from middleware]
C --> D[Read Prometheus Gauge internal values]
D --> E[Calculate utilization_percent = active/CONCURRENCY_LIMIT * 100]
E --> F[Calculate queue_utilization_percent = queued/CONCURRENCY_QUEUE_SIZE * 100]
F --> G{utilization >= 90 OR queue_util >= 80?}
G -->|Yes| H[status = critical]
G -->|No| I{utilization >= 70 OR queue_util >= 60?}
I -->|Yes| J[status = warning]
I -->|No| K[status = healthy]
H --> L[Return full stats dict with status]
J --> L
K --> L
B -->|Exception| M[Log error]
M --> N[Return error dict with status=unknown]
get_concurrency_stats()
├── src/middleware/concurrency_middleware.py
│ ├── concurrency_active (Prometheus Gauge)
│ └── concurrency_queued (Prometheus Gauge)
├── src/config/config.py (Config)
│ ├── CONCURRENCY_LIMIT (env: CONCURRENCY_LIMIT, default: 20)
│ ├── CONCURRENCY_QUEUE_SIZE (env: CONCURRENCY_QUEUE_SIZE, default: 50)
│ └── CONCURRENCY_QUEUE_TIMEOUT (env: CONCURRENCY_QUEUE_TIMEOUT, default: 10.0)
└── logging (stdlib)
Issue: #1735
Returns a summary of provider response times from Prometheus metrics, identifying slow providers (>30s response times) that may be contributing to concurrency slot blocking and 503 errors.
Router: APIRouter(prefix="/api/diagnostics", tags=["diagnostics"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict[str, Any]
No query parameters, path parameters, or request body.
{
"metrics_available": true,
"slow_request_counts": {
"openrouter/gpt-4": {
"slow": 5,
"very_slow": 2
}
},
"note": "Use Prometheus/Grafana for detailed timing histograms. Query: provider_response_duration_seconds",
"thresholds": {
"slow": "30-45 seconds",
"very_slow": ">45 seconds"
}
}{
"metrics_available": false,
"error": "error message",
"note": "Provider timing metrics are exposed via Prometheus at /metrics endpoint"
}-
get_provider_timing_summary()insrc/routes/diagnostics.py(line 93-141)
-
provider_slow_requests_totalfromsrc/services/prometheus_metrics.py(lazy import at line 107-109)
-
provider_slow_requests_total=Counter("provider_slow_requests_total", "Total slow provider requests (>30s) by severity level", ["provider", "model", "severity"])(line 777-781) - Labels:
provider,model,severity(values:slowfor 30-45s,very_slowfor >45s)
- Uses
provider_slow_requests_total.collect()[0].samplesto iterate all recorded samples - Each sample has
.labelsdict and.valuefloat - Groups by
{provider}/{model}key with severity breakdown
None.
None.
| Metric Name | Type | Labels | Description |
|---|---|---|---|
provider_slow_requests_total |
Counter |
provider, model, severity
|
Total slow provider requests (>30s) |
Severity label values:
-
slow: 30-45 seconds response time -
very_slow: >45 seconds response time
This metric is written by provider client modules elsewhere in the codebase and read by this endpoint.
None. Return type is dict[str, Any].
- Standard middleware pipeline applies (sentry, observability, timeout, security, gzip, trace)
- Subject to
ConcurrencyMiddleware(not in exempt paths)
| Error | Status | Condition |
|---|---|---|
| Generic Exception caught | 200 (degraded) | Any exception returns {"metrics_available": false, "error": str(e), ...}
|
No HTTPException is raised. Errors logged at WARNING level via logger.warning().
- Import
provider_slow_requests_totalCounter from prometheus_metrics - Call
.collect()[0].samplesto get all recorded samples - For each sample with
count > 0:- Extract
provider,model,severitylabels - Group by
"{provider}/{model}"key - Store severity counts as
{severity: int(count)}
- Extract
- Return grouped counts with threshold documentation
flowchart TD
A[GET /api/diagnostics/provider-timing] --> B{Try block}
B --> C[Import provider_slow_requests_total from prometheus_metrics]
C --> D[Collect all metric samples]
D --> E[Initialize slow_counts dict]
E --> F{For each sample}
F --> G[Extract provider, model, severity labels]
G --> H{count > 0?}
H -->|Yes| I[Group by provider/model key]
I --> J[Store severity count]
J --> F
H -->|No| F
F -->|Done| K[Return metrics_available=true with slow_request_counts]
B -->|Exception| L[Log warning]
L --> M[Return metrics_available=false with error]
get_provider_timing_summary()
├── src/services/prometheus_metrics.py
│ └── provider_slow_requests_total (Counter with labels: provider, model, severity)
│ └── prometheus_client.Counter
└── logging (stdlib)
12 endpoints
Issue: #1746
Returns the current status of the autonomous error monitoring background service, including whether it is enabled, running, scan interval, last scan time, and error counts.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
No parameters.
{
"status": "ok",
"monitor": {
"enabled": true,
"running": true,
"auto_fix_enabled": true,
"scan_interval": 300,
"last_scan": "2026-03-04T11:55:00+00:00",
"errors_since_last_fix": 3,
"total_patterns": 15
}
}-
autonomous_monitor_status()insrc/routes/error_monitor.py(line 71-83)
-
get_autonomous_monitor()fromsrc/services/autonomous_monitor.py(sync singleton) -
monitor.get_status()(async method)
- Returns or creates
AutonomousMonitorsingleton
Returns dict built from instance attributes:
{
"enabled": self.enabled,
"running": self.is_running,
"auto_fix_enabled": self.auto_fix_enabled,
"scan_interval": self.scan_interval,
"last_scan": self.last_scan.isoformat() if self.last_scan else None,
"errors_since_last_fix": self.errors_since_last_fix,
"total_patterns": len(self.error_monitor.error_patterns) if self.error_monitor else 0,
}- In-memory dict of
{pattern_key: ErrorPattern}objects - Only populated if the monitor has been scanning
None.
None.
None.
None.
Standard pipeline + ConcurrencyMiddleware. No auth required.
| Exception | Status | Handler |
|---|---|---|
Generic Exception
|
500 | Caught at line 81-83, raises HTTPException(500, detail=str(e))
|
flowchart TD
A[GET /error-monitor/autonomous/status] --> B[get_autonomous_monitor singleton]
B --> C[await monitor.get_status]
C --> D[Read instance attributes: enabled, running, auto_fix, interval, last_scan, errors_count]
D --> E[Read error_monitor.error_patterns count if initialized]
E --> F["Return {status: ok, monitor: status_dict}"]
B -->|Exception| G[500 HTTPException]
autonomous_monitor_status()
├── src/services/autonomous_monitor.py::get_autonomous_monitor() [sync singleton]
│ └── AutonomousMonitor.get_status()
│ ├── Instance attributes (enabled, is_running, auto_fix_enabled, scan_interval, last_scan, errors_since_last_fix)
│ └── self.error_monitor.error_patterns (len) if error_monitor initialized
└── logging (stdlib)
Issue: #1747
Fetches recent errors from Grafana Loki, analyzes them into structured error patterns with classification, severity, fixability assessment, and grouping of similar errors. Returns analyzed and deduplicated error patterns.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
hours |
int |
1 |
ge=1, le=24 |
Lookback period in hours |
limit |
int |
100 |
ge=1, le=1000 |
Max raw errors to fetch from Loki |
{
"count": 5,
"hours": 1,
"errors": [
{
"error_type": "ConnectionError",
"message": "Provider timeout after 30s",
"category": "provider_error",
"severity": "high",
"file": "src/services/openrouter_client.py",
"line": 150,
"function": "send_request",
"stack_trace": "...",
"timestamp": "2026-03-04T11:50:00+00:00",
"count": 3,
"last_seen": "2026-03-04T11:58:00+00:00",
"examples": ["msg1", "msg2"],
"fixable": true,
"suggested_fix": "Add retry logic with exponential backoff for provider calls"
}
]
}-
get_recent_errors()insrc/routes/error_monitor.py(line 86-104)
-
get_error_monitor()->ErrorMonitorsingleton -
monitor.fetch_recent_errors(hours, limit)-> fetches from Loki -
monitor.analyze_errors(raw_errors)-> classifies and groups
- Checks
self.loki_enabledandself.loki_query_urlandself.session - Makes HTTP GET to
{loki_base_url}/loki/api/v1/query_range - Query:
{level="ERROR"} - Params:
query,limit,direction=backward - Uses
httpx.AsyncClient(timeout=10.0) - Parses Loki response streams, extracts log entries (JSON or plain text)
For each raw error:
-
extract_error_details()-> createsErrorPatternwith parsed file/line/function from stack trace -
classify_error()-> determinesErrorCategoryandErrorSeveritybased on message content -
determine_fixability()-> setsfixableflag andsuggested_fixbased on category -
group_similar_errors()-> groups by{category}:{message[:50]}key, merges counts
Classification rules (in order):
| Pattern Match | Category | Severity |
|---|---|---|
| Provider names + timeout/503/504 | PROVIDER_ERROR | HIGH |
| Provider names + 401/403 | AUTH_ERROR | HIGH |
| Provider names (other) | PROVIDER_ERROR | MEDIUM |
| supabase/postgresql/database/connection pool | DATABASE_ERROR | CRITICAL |
| rate limit / 429 | RATE_LIMIT_ERROR | MEDIUM |
| unauthorized/invalid api key/401 | AUTH_ERROR | HIGH |
| timeout/deadlineexceeded | TIMEOUT_ERROR | MEDIUM |
| validation/invalid | VALIDATION_ERROR | LOW |
| redis/cache | CACHE_ERROR | MEDIUM |
| stripe/resend/email/payment | EXTERNAL_SERVICE_ERROR | HIGH |
| (default) | INTERNAL_ERROR | MEDIUM |
| Category | Fixable | Suggested Fix |
|---|---|---|
| RATE_LIMIT_ERROR | Yes | Implement exponential backoff and request queuing |
| TIMEOUT_ERROR (provider) | Yes | Add retry logic with exponential backoff for provider calls |
| TIMEOUT_ERROR (other) | Yes | Increase timeout threshold or add connection pooling |
| CACHE_ERROR | Yes | Implement cache fallback to database queries |
| DATABASE_ERROR (pool) | Yes | Increase connection pool size or add fallback |
| DATABASE_ERROR (other) | Yes | Add database connection retry logic |
| AUTH_ERROR (invalid key) | Yes | Rotate API keys and update configuration |
| AUTH_ERROR (other) | Yes | Implement token refresh logic |
| All others | No | None |
None.
None.
None directly emitted.
| Service | Method | URL | Auth | Timeout |
|---|---|---|---|---|
| Grafana Loki | GET | {LOKI_QUERY_URL_base}/loki/api/v1/query_range |
None (configured in ErrorMonitor) | 10s |
ErrorPattern dataclass (error_monitor.py line 50-82):
| Field | Type | Default | Description |
|---|---|---|---|
error_type |
str |
required | Exception type name |
message |
str |
required | Error message |
category |
ErrorCategory |
required | Classified category enum |
severity |
ErrorSeverity |
required | Classified severity enum |
file |
str | None |
required | Source file path |
line |
int | None |
required | Line number |
function |
str | None |
required | Function name |
stack_trace |
str | None |
required | Full stack trace |
timestamp |
datetime |
required | When error occurred |
count |
int |
1 |
Occurrence count |
last_seen |
datetime | None |
None (set to timestamp) |
Last occurrence |
examples |
list[str] |
[] |
Example messages |
fixable |
bool |
False |
Whether auto-fixable |
suggested_fix |
str | None |
None |
Fix suggestion |
| Exception | Status | Handler |
|---|---|---|
Generic Exception
|
500 | Caught at line 102-104, raises HTTPException(500, detail=str(e))
|
| Loki fetch failures | 200 |
fetch_recent_errors() returns [] on any error, resulting in {"count": 0, "errors": []}
|
flowchart TD
A["GET /error-monitor/errors/recent?hours=1&limit=100"] --> B[get_error_monitor singleton]
B --> C[fetch_recent_errors from Loki]
C --> D{Loki enabled?}
D -->|No| E["Return empty list"]
D -->|Yes| F["HTTP GET Loki query_range {level='ERROR'}"]
F --> G[Parse Loki response streams]
G --> H[analyze_errors]
H --> I[For each raw error: extract_error_details]
I --> J[classify_error - determine category + severity]
J --> K[determine_fixability - set fixable + suggested_fix]
K --> L[group_similar_errors by category:message prefix]
L --> M["Return {count, hours, errors: patterns.to_dict()}"]
A -->|Exception| N[500 HTTPException]
get_recent_errors()
├── src/services/error_monitor.py::get_error_monitor() [async singleton]
│ └── ErrorMonitor
│ ├── fetch_recent_errors(hours, limit) -> Loki HTTP GET
│ │ ├── Config.LOKI_ENABLED
│ │ ├── Config.LOKI_QUERY_URL
│ │ └── httpx.AsyncClient (timeout=10s)
│ └── analyze_errors(raw_errors)
│ ├── extract_error_details() -> ErrorPattern
│ ├── classify_error() -> (ErrorCategory, ErrorSeverity)
│ ├── determine_fixability() -> (bool, str|None)
│ └── group_similar_errors() -> deduplicated dict
└── logging (stdlib)
Issue: #1748
Fetches recent errors from Loki and filters to only critical and high-severity errors, sorted by occurrence count descending. Uses the same Loki fetch + analysis pipeline as /errors/recent but with a severity filter.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
hours |
int |
1 |
ge=1, le=24 |
Lookback period in hours |
{
"count": 3,
"hours": 1,
"critical_errors": [
{
"error_type": "ConnectionError",
"message": "Supabase connection pool exhausted",
"category": "database_error",
"severity": "critical",
"count": 15,
"fixable": true,
"suggested_fix": "Increase connection pool size or add connection pooling fallback",
...
}
]
}-
get_critical_errors()insrc/routes/error_monitor.py(line 107-123)
-
get_error_monitor()->ErrorMonitorsingleton monitor.get_critical_errors(hours=hours)
- Calls
self.fetch_recent_errors(hours=hours)-> Loki query - Calls
self.analyze_errors(raw_errors)-> classify + group - Filters:
severity in [ErrorSeverity.CRITICAL, ErrorSeverity.HIGH] - Sorts by
pattern.countdescending
-
fetch_recent_errors()-> Loki HTTP GET -
analyze_errors()-> extract, classify, determine fixability, group
None.
None.
None.
Same as #1747: HTTP GET to Grafana Loki /loki/api/v1/query_range with {level="ERROR"}.
Only returns patterns where:
severity in [ErrorSeverity.CRITICAL, ErrorSeverity.HIGH]Categories that map to CRITICAL/HIGH:
- CRITICAL: Database errors (supabase, postgresql, connection pool)
- HIGH: Provider errors with timeout/503/504, auth errors (401/403/unauthorized), external service errors (stripe, resend)
| Exception | Status | Handler |
|---|---|---|
Generic Exception
|
500 | Raises HTTPException(500, detail=str(e))
|
flowchart TD
A["GET /error-monitor/errors/critical?hours=1"] --> B[get_error_monitor singleton]
B --> C[get_critical_errors hours=1]
C --> D[fetch_recent_errors from Loki]
D --> E[analyze_errors - classify + group]
E --> F["Filter: severity in [CRITICAL, HIGH]"]
F --> G[Sort by count descending]
G --> H["Return {count, hours, critical_errors}"]
A -->|Exception| I[500 HTTPException]
get_critical_errors() [route]
├── src/services/error_monitor.py::get_error_monitor()
│ └── ErrorMonitor.get_critical_errors()
│ ├── fetch_recent_errors() -> Loki HTTP GET
│ ├── analyze_errors() -> classify + group
│ └── filter severity in [CRITICAL, HIGH] + sort by count
└── logging (stdlib)
Issue: #1749
Fetches recent errors from Loki and filters to only errors that can be automatically fixed. Returns fixable errors sorted by severity then count descending. Each error includes a suggested_fix field.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
hours |
int |
1 |
ge=1, le=24 |
Lookback period in hours |
{
"count": 4,
"hours": 1,
"fixable_errors": [
{
"error_type": "TimeoutError",
"message": "Provider request timeout",
"category": "timeout_error",
"severity": "medium",
"fixable": true,
"suggested_fix": "Increase timeout threshold or add connection pooling",
"count": 8,
...
}
]
}-
get_fixable_errors()insrc/routes/error_monitor.py(line 126-142)
-
get_error_monitor()->ErrorMonitorsingleton monitor.get_fixable_errors(hours=hours)
- Calls
self.fetch_recent_errors(hours=hours)-> Loki query - Calls
self.analyze_errors(raw_errors)-> classify + group + determine fixability - Filters:
pattern.fixable == True - Sorts by
(severity.value, count)descending
| Category | Fixable | Suggested Fix |
|---|---|---|
| RATE_LIMIT_ERROR | Yes | Implement exponential backoff and request queuing |
| TIMEOUT_ERROR (provider) | Yes | Add retry logic with exponential backoff |
| TIMEOUT_ERROR (other) | Yes | Increase timeout or add connection pooling |
| CACHE_ERROR | Yes | Implement cache fallback to database queries |
| DATABASE_ERROR (pool) | Yes | Increase pool size or add fallback |
| DATABASE_ERROR (other) | Yes | Add database connection retry logic |
| AUTH_ERROR (invalid key) | Yes | Rotate API keys and update configuration |
| AUTH_ERROR (other) | Yes | Implement token refresh logic |
| PROVIDER_ERROR | No | - |
| VALIDATION_ERROR | No | - |
| EXTERNAL_SERVICE_ERROR | No | - |
| INTERNAL_ERROR | No | - |
| UNKNOWN | No | - |
None.
None.
None.
Same as #1747: HTTP GET to Grafana Loki.
| Exception | Status | Handler |
|---|---|---|
Generic Exception
|
500 | Raises HTTPException(500, detail=str(e))
|
flowchart TD
A["GET /error-monitor/errors/fixable?hours=1"] --> B[get_error_monitor singleton]
B --> C[get_fixable_errors hours=1]
C --> D[fetch_recent_errors from Loki]
D --> E[analyze_errors - classify + group + fixability]
E --> F["Filter: pattern.fixable == True"]
F --> G["Sort by (severity, count) descending"]
G --> H["Return {count, hours, fixable_errors}"]
A -->|Exception| I[500 HTTPException]
get_fixable_errors() [route]
├── src/services/error_monitor.py::get_error_monitor()
│ └── ErrorMonitor.get_fixable_errors()
│ ├── fetch_recent_errors() -> Loki HTTP GET
│ ├── analyze_errors()
│ │ ├── classify_error()
│ │ ├── determine_fixability()
│ │ └── group_similar_errors()
│ └── filter fixable + sort by severity/count
└── logging (stdlib)
Issue: #1750
Returns all error patterns currently tracked in the ErrorMonitor's in-memory store. Unlike /errors/recent, /errors/critical, and /errors/fixable which query Loki on each request, this endpoint returns patterns that have been previously stored via store_error_pattern() (from scans, continuous monitoring, or the autonomous monitor).
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
No parameters.
{
"total_patterns": 15,
"patterns": [
{
"error_type": "ConnectionError",
"message": "Provider timeout after 30s",
"category": "provider_error",
"severity": "high",
"file": "src/services/openrouter_client.py",
"line": 150,
"function": "send_request",
"stack_trace": "...",
"timestamp": "2026-03-04T11:50:00+00:00",
"count": 25,
"last_seen": "2026-03-04T11:58:00+00:00",
"examples": ["msg1", "msg2", "msg3"],
"fixable": true,
"suggested_fix": "Add retry logic..."
}
]
}-
get_error_patterns()insrc/routes/error_monitor.py(line 145-158)
-
get_error_monitor()->ErrorMonitorsingleton -
monitor.error_patterns(in-memory dict{str: ErrorPattern})
-
dict[str, ErrorPattern]keyed by{category.value}:{message[:50]} - Populated by
store_error_pattern()method (line 361-371) - Patterns are accumulated - counts increment,
last_seenis updated, examples are appended - Not persisted to database - lost on restart
Converts dataclass to dict with:
-
timestamp-> ISO format string -
last_seen-> ISO format string or None -
category->.value(string enum) -
severity->.value(string enum)
None.
None.
None.
None. This endpoint reads only from in-memory state.
The error_patterns dictionary is in-memory only. It is populated by:
-
POST /error-monitor/monitor/scan->monitor.store_error_pattern() -
POST /error-monitor/monitor/start-> continuous monitoring loop -
AutonomousMonitor._scan_for_errors()-> background scanning
On application restart, all tracked patterns are lost.
| Exception | Status | Handler |
|---|---|---|
Generic Exception
|
500 | Raises HTTPException(500, detail=str(e))
|
flowchart TD
A[GET /error-monitor/errors/patterns] --> B[get_error_monitor singleton]
B --> C[Read monitor.error_patterns in-memory dict]
C --> D[Convert values to list]
D --> E["Call .to_dict() on each ErrorPattern"]
E --> F["Return {total_patterns, patterns}"]
A -->|Exception| G[500 HTTPException]
get_error_patterns() [route]
├── src/services/error_monitor.py::get_error_monitor()
│ └── ErrorMonitor.error_patterns (in-memory dict)
│ └── ErrorPattern.to_dict() for serialization
└── logging (stdlib)
Issue: #1751
Returns all bug fixes that have been generated by the BugFixGenerator, stored in its in-memory dictionary. Each fix includes the analysis, proposed code changes, files affected, and PR status.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
No parameters.
{
"total_fixes": 3,
"fixes": [
{
"id": "uuid",
"error_pattern_id": "provider_error:Provider timeout after 30s",
"error_message": "Provider timeout after 30s",
"error_category": "provider_error",
"analysis": "Root cause analysis from Claude...",
"proposed_fix": "Description of the fix",
"code_changes": {
"src/services/openrouter_client.py": "... code ..."
},
"files_affected": ["src/services/openrouter_client.py"],
"severity": "high",
"generated_at": "2026-03-04T11:00:00+00:00",
"pr_url": "https://github.com/repo/pull/123",
"status": "testing"
}
]
}-
get_generated_fixes()insrc/routes/error_monitor.py(line 256-269)
-
get_bug_fix_generator()fromsrc/services/bug_fix_generator.py(async singleton) -
generator.generated_fixes(in-memory dict{str: BugFix})
- Creates singleton
BugFixGenerator()on first call -
Raises RuntimeError if
ANTHROPIC_API_KEYis not configured - Calls
initialize()-> creates httpx client, validates API key
Converts dataclass to dict:
-
generated_at-> ISO format string - All other fields: direct mapping
None.
None.
None.
generated_fixes is in-memory only. Fixes are accumulated when:
-
POST /error-monitor/fixes/generate-for-error->generator.generate_fix() -
POST /error-monitor/fixes/generate-batch->generator.process_multiple_errors() -
AutonomousMonitor._generate_fixes_for_critical()-> background auto-fix
On restart, all generated fixes are lost.
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
required | UUID string |
error_pattern_id |
str |
required | {category}:{message[:50]} |
error_message |
str |
required | Original error message |
error_category |
str |
required | Error category value |
analysis |
str |
required | Claude-generated analysis |
proposed_fix |
str |
required | Fix description |
code_changes |
dict[str, str] |
required | {file_path: code} |
files_affected |
list[str] |
required | List of file paths |
severity |
str |
required | Severity level |
generated_at |
datetime |
required | Generation timestamp |
pr_url |
str | None |
None |
GitHub PR URL |
status |
str |
"pending" |
One of: pending, testing, merged, failed |
| Exception | Status | Handler |
|---|---|---|
RuntimeError (ANTHROPIC_API_KEY missing) |
500 | HTTPException(500, detail=str(e)) |
Generic Exception
|
500 | HTTPException(500, detail=str(e)) |
Note: Unlike the /health endpoint which gracefully handles a missing ANTHROPIC_API_KEY, this endpoint will return 500 if the key is not configured.
flowchart TD
A[GET /error-monitor/fixes/generated] --> B{get_bug_fix_generator}
B -->|RuntimeError: no API key| C[500 HTTPException]
B -->|Success| D[Read generator.generated_fixes dict]
D --> E[Convert BugFix values to list]
E --> F["Call .to_dict() on each BugFix"]
F --> G["Return {total_fixes, fixes}"]
A -->|Exception| C
get_generated_fixes() [route]
├── src/services/bug_fix_generator.py::get_bug_fix_generator() [async singleton]
│ └── BugFixGenerator
│ ├── Config.ANTHROPIC_API_KEY (required)
│ ├── .generated_fixes (in-memory dict)
│ └── BugFix.to_dict() for serialization
└── logging (stdlib)
Issue: #1752
Retrieves the full details of a specific generated bug fix by its UUID. Looks up the fix in the BugFixGenerator's in-memory dictionary.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
| Parameter | Type | Description |
|---|---|---|
fix_id |
str |
UUID of the generated fix |
{
"fix": {
"id": "abc12345-...",
"error_pattern_id": "provider_error:Provider timeout",
"error_message": "Provider timeout after 30s",
"error_category": "provider_error",
"analysis": "Root cause: The OpenRouter provider...",
"proposed_fix": "Add retry logic with exponential backoff",
"code_changes": {
"src/services/openrouter_client.py": "import asyncio\n..."
},
"files_affected": ["src/services/openrouter_client.py"],
"severity": "high",
"generated_at": "2026-03-04T11:00:00+00:00",
"pr_url": "https://github.com/repo/pull/123",
"status": "testing"
}
}| Status | Condition |
|---|---|
| 404 | Fix not found in generated_fixes dict |
| 500 | ANTHROPIC_API_KEY not configured or other error |
-
get_fix_details()insrc/routes/error_monitor.py(line 272-289)
-
get_bug_fix_generator()->BugFixGeneratorsingleton -
generator.generated_fixes[fix_id]-> in-memory dict lookup
- Creates singleton; raises
RuntimeErrorifANTHROPIC_API_KEYmissing - Returns
BugFixGeneratorinstance
- Keyed by UUID string (generated by
uuid4()) - Values are
BugFixdataclass instances
None.
None.
None.
See BugFix dataclass in issue #1751 documentation.
| Exception | Status | Handler |
|---|---|---|
fix_id not in generator.generated_fixes |
404 |
HTTPException(404, "Fix not found") at line 279 |
HTTPException (404) |
404 | Re-raised at line 285-286 |
RuntimeError (no API key) |
500 | HTTPException(500, detail=str(e)) |
Generic Exception
|
500 | HTTPException(500, detail=str(e)) |
flowchart TD
A["GET /error-monitor/fixes/{fix_id}"] --> B{get_bug_fix_generator}
B -->|RuntimeError| C[500 HTTPException]
B -->|Success| D{fix_id in generated_fixes?}
D -->|No| E[404 Fix not found]
D -->|Yes| F[Get BugFix from dict]
F --> G[Call fix.to_dict]
G --> H["Return {fix: fix_dict}"]
get_fix_details() [route]
├── src/services/bug_fix_generator.py::get_bug_fix_generator() [async singleton]
│ └── BugFixGenerator
│ ├── Config.ANTHROPIC_API_KEY (required)
│ └── .generated_fixes[fix_id] -> BugFix.to_dict()
└── logging (stdlib)
Issue: #1753
Returns a comprehensive error monitoring dashboard with summary statistics, recent critical errors, recent fixable errors, and recently generated fixes. Aggregates data from both the ErrorMonitor (Loki-based) and BugFixGenerator (in-memory fixes) services.
Router: APIRouter(prefix="/error-monitor", tags=["error-monitor"])
Auth: None (public endpoint)
HTTP Method: GET
Return Type: dict (untyped)
No parameters.
{
"timestamp": "2026-03-04T12:00:00+00:00",
"summary": {
"total_patterns": 15,
"critical_errors": 3,
"fixable_errors": 8,
"generated_fixes": 5,
"patterns_by_category": {
"provider_error": 25,
"database_error": 10,
"timeout_error": 5,
"cache_error": 3
}
},
"recent_critical": [
{
"error_type": "...",
"message": "...",
"category": "database_error",
"severity": "critical",
...
}
],
"recent_fixable": [
{
"error_type": "...",
"fixable": true,
"suggested_fix": "...",
...
}
],
"recent_fixes": [
{
"id": "...",
"error_category": "...",
"proposed_fix": "...",
"pr_url": "...",
"status": "testing",
...
}
]
}-
error_dashboard()insrc/routes/error_monitor.py(line 292-331)
-
get_error_monitor()->ErrorMonitorsingleton -
get_bug_fix_generator()->BugFixGeneratorsingleton -
monitor.get_critical_errors(hours=1)-> Loki fetch + analysis + filter -
monitor.get_fixable_errors(hours=1)-> Loki fetch + analysis + filter -
monitor.error_patterns-> in-memory tracked patterns -
generator.generated_fixes-> in-memory generated fixes
-
fetch_recent_errors(hours=1)-> Loki HTTP GET -
analyze_errors()-> classify + group - Filter
severity in [CRITICAL, HIGH] - Sort by count descending
-
fetch_recent_errors(hours=1)-> Loki HTTP GET -
analyze_errors()-> classify + group + fixability - Filter
fixable == True - Sort by
(severity, count)descending
category_counts = {}
for pattern in monitor.error_patterns.values():
cat = pattern.category.value
category_counts[cat] = category_counts.get(cat, 0) + pattern.countCounts total error occurrences per category from stored (in-memory) patterns.
sorted(generator.generated_fixes.values(), key=lambda x: x.generated_at, reverse=True)[:10]Top 10 most recently generated fixes.
Note: This endpoint makes two separate Loki queries (one for critical, one for fixable), each fetching and analyzing errors independently.
None.
None.
None directly emitted.
| Service | Method | URL | Count per request |
|---|---|---|---|
| Grafana Loki | GET | /loki/api/v1/query_range |
2 (critical + fixable) |
Both queries use {level="ERROR"} with hours=1.
| Field | Max Items |
|---|---|
recent_critical |
10 (sliced [:10]) |
recent_fixable |
10 (sliced [:10]) |
recent_fixes |
10 (sorted by generated_at desc, sliced [:10]) |
| Exception | Status | Handler |
|---|---|---|
Generic Exception
|
500 | Raises HTTPException(500, detail=str(e))
|
Important: If get_bug_fix_generator() raises RuntimeError (missing ANTHROPIC_API_KEY), the entire dashboard request fails with 500. This is different from /health which gracefully handles this case.
flowchart TD
A[GET /error-monitor/dashboard] --> B[get_error_monitor singleton]
B --> C[get_bug_fix_generator singleton]
C -->|RuntimeError| D[500 HTTPException]
C -->|Success| E[get_critical_errors hours=1]
E --> F[Loki fetch + analyze + filter critical/high]
F --> G[get_fixable_errors hours=1]
G --> H[Loki fetch + analyze + filter fixable]
H --> I[Build category_counts from in-memory patterns]
I --> J[Sort generated_fixes by generated_at desc]
J --> K[Build summary with counts]
K --> L["Return dashboard: summary + recent_critical[:10] + recent_fixable[:10] + recent_fixes[:10]"]
A -->|Exception| D
error_dashboard() [route]
├── src/services/error_monitor.py::get_error_monitor() [async singleton]
│ └── ErrorMonitor
│ ├── get_critical_errors(hours=1)
│ │ ├── fetch_recent_errors() -> Loki HTTP GET #1
│ │ ├── analyze_errors() -> classify + group
│ │ └── filter severity in [CRITICAL, HIGH]
│ ├── get_fixable_errors(hours=1)
│ │ ├── fetch_recent_errors() -> Loki HTTP GET #2
│ │ ├── analyze_errors() -> classify + group + fixability
│ │ └── filter fixable == True
│ └── error_patterns (in-memory dict -> category counts)
├── src/services/bug_fix_generator.py::get_bug_fix_generator() [async singleton]
│ └── BugFixGenerator
│ ├── Config.ANTHROPIC_API_KEY (required - will 500 if missing)
│ └── .generated_fixes (in-memory dict -> sorted by date, top 10)
├── datetime (stdlib)
└── logging (stdlib)
Issue: #1754
Generates an automated bug fix for a specific error pattern identified by its error_type ID. Can optionally create a GitHub Pull Request with the fix in the background.
Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])
-
Authentication: NONE - This endpoint has no authentication dependency (
get_api_key,get_admin_key, etc.) -
Rate Limiting: Subject only to global middleware (IP-based rate limiting via
security_middleware.py) - Middleware Pipeline: Request → Sentry middleware → Observability middleware → Timeout middleware → Security middleware → GZip middleware → Trace middleware → Handler
| Parameter | Type | Source | Required | Default | Validation |
|---|---|---|---|---|---|
error_id |
str |
Query | Yes | N/A | FastAPI required query param |
create_pr |
bool |
Query | No | False |
Boolean |
background_tasks |
BackgroundTasks |
DI | No | BackgroundTasks() |
FastAPI injected |
Success (synchronous, create_pr=False):
{
"status": "success",
"fix": {
"id": "uuid",
"error_pattern_id": "category:message_prefix",
"error_message": "string",
"error_category": "string",
"analysis": "string",
"proposed_fix": "string",
"code_changes": {"file_path": "code"},
"files_affected": ["file1.py"],
"severity": "critical|high|medium|low|info",
"generated_at": "ISO8601",
"pr_url": null,
"status": "pending"
}
}Success (background, create_pr=True):
{
"status": "processing",
"message": "Fix generation started in background"
}generate_fix_for_error()
├── get_error_monitor() → ErrorMonitor singleton
│ ├── ErrorMonitor.__init__()
│ │ ├── Config.LOKI_ENABLED
│ │ └── Config.LOKI_QUERY_URL
│ └── ErrorMonitor.initialize()
│ └── httpx.AsyncClient(timeout=10.0)
├── get_bug_fix_generator() → BugFixGenerator singleton
│ ├── BugFixGenerator.__init__()
│ │ ├── Config.ANTHROPIC_API_KEY (required, raises RuntimeError if missing)
│ │ ├── Config.GITHUB_TOKEN (optional)
│ │ └── Config.ANTHROPIC_MODEL (default: "claude-3-5-sonnet-20241022")
│ └── BugFixGenerator.initialize()
│ ├── httpx.AsyncClient(timeout=30.0)
│ └── _validate_api_key() → POST https://api.anthropic.com/v1/messages
├── monitor.error_patterns.values() → dict iteration
├── pattern.to_dict() → dict serialization
├── generator.process_error() [if create_pr=True, background task]
│ ├── generate_fix() → see below
│ ├── create_branch_and_commit() → git subprocess calls
│ └── create_pull_request() → POST https://api.github.com/repos/{repo}/pulls
└── generator.generate_fix() [if create_pr=False, synchronous]
├── analyze_error()
│ └── _make_claude_request() [with @retry: 3 attempts, exponential backoff]
│ └── POST https://api.anthropic.com/v1/messages
├── _make_claude_request() [second call for fix generation]
│ └── POST https://api.anthropic.com/v1/messages (max_tokens=2048)
└── BugFix dataclass creation → stored in generator.generated_fixes dict
None - This endpoint does not interact with Supabase/PostgreSQL.
None - This endpoint does not interact with Redis directly.
| Service | Operation | URL | Details |
|---|---|---|---|
| Anthropic Claude API | POST | https://api.anthropic.com/v1/messages |
Error analysis (max_tokens=1024) |
| Anthropic Claude API | POST | https://api.anthropic.com/v1/messages |
Fix generation (max_tokens=2048) |
| GitHub API | POST | https://api.github.com/repos/{repo}/pulls |
PR creation (if create_pr=True) |
Retry logic: @retry decorator on _make_claude_request():
- Retries on:
httpx.TimeoutException,httpx.ConnectError - Wait: exponential backoff (min=2s, max=10s)
- Max attempts: 3
None directly - This endpoint does not record Prometheus metrics itself.
| Error | Status Code | Condition |
|---|---|---|
HTTPException(404) |
404 | Error pattern not found in monitor.error_patterns
|
HTTPException(500) |
500 |
generate_fix() returns None (fix generation failed) |
HTTPException(500) |
500 | Any unhandled exception (generic catch-all) |
RuntimeError |
500 |
ANTHROPIC_API_KEY not configured (from get_bug_fix_generator()) |
Error re-raise pattern: except HTTPException: raise ensures 404 errors propagate correctly.
flowchart TD
A[POST /error-monitor/fixes/generate-for-error] --> B[get_error_monitor singleton]
B --> C[get_bug_fix_generator singleton]
C --> D{ANTHROPIC_API_KEY configured?}
D -->|No| E[RuntimeError 500]
D -->|Yes| F[Search error_patterns by error_id]
F --> G{Pattern found?}
G -->|No| H[HTTPException 404]
G -->|Yes| I{create_pr == True?}
I -->|Yes| J[BackgroundTasks.add_task: process_error]
J --> K[Return status: processing]
I -->|No| L[generator.generate_fix synchronous]
L --> M[analyze_error via Claude API]
M --> N{Analysis successful?}
N -->|No| O[Return None]
O --> P[HTTPException 500: Failed to generate fix]
N -->|Yes| Q[Generate fix via Claude API]
Q --> R{Fix JSON parsed?}
R -->|No| S[Return None → HTTPException 500]
R -->|Yes| T[Create BugFix dataclass]
T --> U[Store in generated_fixes dict]
U --> V[Return status: success with fix]
- Error patterns are stored in-memory only (
ErrorMonitor.error_patternsdict) - they do not persist across restarts - The error_id lookup matches against
pattern.to_dict().get("error_type"), which maps toErrorPattern.error_type - Background task execution (when create_pr=True) includes git operations via
subprocess.run - The BugFixGenerator validates the Anthropic API key format (must start with
sk-ant-) - Prompt sanitization limits:
MAX_PROMPT_LENGTH = 50000,MAX_ERROR_MESSAGE_LENGTH = 10000
Generated by AI documentation tool
Issue: #1755
Generates automated bug fixes for multiple error patterns simultaneously. Supports both synchronous batch processing and background processing with optional GitHub PR creation.
Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])
- Authentication: NONE - No authentication dependency
-
Rate Limiting: Global middleware only (IP-based via
security_middleware.py) - Middleware Pipeline: Sentry → Observability → Timeout → Security → GZip → Trace → Handler
| Parameter | Type | Source | Required | Default | Validation |
|---|---|---|---|---|---|
error_ids |
list[str] |
Query | Yes | N/A | FastAPI required query param (list) |
create_prs |
bool |
Query | No | False |
Boolean |
background_tasks |
BackgroundTasks |
DI | No | BackgroundTasks() |
FastAPI injected |
Success (synchronous, create_prs=False):
{
"status": "success",
"fixes": [
{
"id": "uuid",
"error_pattern_id": "category:message_prefix",
"error_message": "string",
"error_category": "string",
"analysis": "string",
"proposed_fix": "string",
"code_changes": {"file_path": "code"},
"files_affected": ["file1.py"],
"severity": "critical|high|medium|low|info",
"generated_at": "ISO8601",
"pr_url": "string|null",
"status": "pending|testing|merged|failed"
}
],
"count": 3
}Success (background, create_prs=True):
{
"status": "processing",
"message": "Processing 3 errors in background",
"count": 3
}generate_fixes_batch()
├── get_error_monitor() → ErrorMonitor singleton
│ ├── ErrorMonitor.__init__() → Config.LOKI_ENABLED, Config.LOKI_QUERY_URL
│ └── ErrorMonitor.initialize() → httpx.AsyncClient(timeout=10.0)
├── get_bug_fix_generator() → BugFixGenerator singleton
│ ├── BugFixGenerator.__init__()
│ │ ├── Config.ANTHROPIC_API_KEY (required)
│ │ ├── Config.GITHUB_TOKEN (optional)
│ │ └── Config.ANTHROPIC_MODEL
│ └── BugFixGenerator.initialize() → httpx.AsyncClient + _validate_api_key()
├── monitor.error_patterns.values() → dict iteration
│ └── pattern.to_dict().get("error_type") → filter by error_ids list
├── generator.process_multiple_errors() [background or sync]
│ ├── asyncio.gather(*tasks, return_exceptions=True) → parallel processing
│ └── process_error() [per error, see #1754]
│ ├── generate_fix()
│ │ ├── analyze_error() → Claude API POST
│ │ └── _make_claude_request() → Claude API POST (fix generation)
│ ├── create_branch_and_commit() → git subprocess
│ └── create_pull_request() → GitHub API POST
None - No database interaction.
None - No Redis interaction.
| Service | Operation | URL | Details |
|---|---|---|---|
| Anthropic Claude API | POST | https://api.anthropic.com/v1/messages |
Analysis per error (max_tokens=1024) |
| Anthropic Claude API | POST | https://api.anthropic.com/v1/messages |
Fix generation per error (max_tokens=2048) |
| GitHub API | POST | https://api.github.com/repos/{repo}/pulls |
PR creation per fix (if create_prs=True) |
Note: All errors are processed in parallel via asyncio.gather(). For N errors, this makes up to 2N Claude API calls + N GitHub API calls.
None directly.
| Error | Status Code | Condition |
|---|---|---|
HTTPException(404) |
404 | No matching error patterns found for any of the provided error_ids |
HTTPException(500) |
500 | Any unhandled exception |
RuntimeError |
500 |
ANTHROPIC_API_KEY not configured (from get_bug_fix_generator()) |
Note: Individual error processing failures in process_multiple_errors() are caught by asyncio.gather(return_exceptions=True) and logged but do not cause the batch to fail. The response only includes successful fixes.
flowchart TD
A[POST /error-monitor/fixes/generate-batch] --> B[get_error_monitor singleton]
B --> C[get_bug_fix_generator singleton]
C --> D{ANTHROPIC_API_KEY configured?}
D -->|No| E[RuntimeError 500]
D -->|Yes| F[Filter error_patterns by error_ids list]
F --> G{Any patterns matched?}
G -->|No| H[HTTPException 404]
G -->|Yes| I{create_prs == True?}
I -->|Yes| J[BackgroundTasks.add_task: process_multiple_errors]
J --> K[Return status: processing, count: N]
I -->|No| L[Synchronous: process_multiple_errors]
L --> M[asyncio.gather - parallel processing]
M --> N[For each error: analyze + generate fix via Claude]
N --> O[Filter successful BugFix results]
O --> P{Any exceptions in results?}
P -->|Yes| Q[Log errors, continue with successes]
P -->|No| R[Return all fixes]
Q --> R
R --> S[Return status: success, fixes list, count]
- Error patterns are matched by checking
pattern.to_dict().get("error_type") in error_idswhich performs a list membership test - Parallel processing via
asyncio.gathermeans all Claude API calls fire concurrently - could hit rate limits on the Anthropic API - Failed individual fix generations (returning Exception from gather) are silently logged and excluded from the response
-
error_idsis alist[str]query parameter - in FastAPI, this means the URL would be:?error_ids=id1&error_ids=id2&error_ids=id3
Generated by AI documentation tool
Issue: #1756
Starts a continuous background error monitoring loop that periodically scans Loki logs for errors, classifies them, and stores error patterns. The monitoring runs indefinitely until the application shuts down.
Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])
- Authentication: NONE - No authentication dependency
-
Rate Limiting: Global middleware only (IP-based via
security_middleware.py) - Middleware Pipeline: Sentry → Observability → Timeout → Security → GZip → Trace → Handler
| Parameter | Type | Source | Required | Default | Validation |
|---|---|---|---|---|---|
interval |
int |
Query | No | 300 |
ge=60, le=3600 (1 min to 1 hour) |
background_tasks |
BackgroundTasks |
DI | No | BackgroundTasks() |
FastAPI injected |
{
"status": "started",
"interval_seconds": 300,
"message": "Continuous monitoring started in background"
}start_continuous_monitoring()
├── get_error_monitor() → ErrorMonitor singleton
│ ├── ErrorMonitor.__init__()
│ │ ├── Config.LOKI_ENABLED
│ │ └── Config.LOKI_QUERY_URL
│ └── ErrorMonitor.initialize() → httpx.AsyncClient(timeout=10.0)
└── BackgroundTasks.add_task(monitor.monitor_continuously, interval=interval)
└── monitor_continuously(interval) [infinite loop]
├── ErrorMonitor.initialize() → creates new httpx session
└── while True loop:
├── get_critical_errors(hours=1)
│ ├── fetch_recent_errors(hours=1)
│ │ └── HTTP GET to Loki: {base_url}/loki/api/v1/query_range
│ │ └── LogQL query: '{level="ERROR"}'
│ └── analyze_errors(raw_errors)
│ ├── extract_error_details() → ErrorPattern dataclass
│ ├── classify_error() → (ErrorCategory, ErrorSeverity)
│ ├── determine_fixability() → (bool, str|None)
│ └── group_similar_errors() → deduplicated dict
├── store_error_pattern() → updates in-memory error_patterns dict
├── get_fixable_errors(hours=1)
│ ├── fetch_recent_errors(hours=1) → Loki HTTP GET
│ └── analyze_errors(raw_errors)
└── asyncio.sleep(interval)
None - No database interaction.
None - No Redis interaction.
| Service | Operation | URL | Details |
|---|---|---|---|
| Loki | GET | {LOKI_QUERY_URL}/loki/api/v1/query_range |
Periodic error log queries |
Loki Query Parameters:
-
query:{level="ERROR"} -
limit: 100 -
direction:backward
Frequency: Every interval seconds (default 300s / 5 minutes)
None directly.
| Error | Status Code | Condition |
|---|---|---|
HTTPException(500) |
500 | Any exception during setup (before background task starts) |
Background loop error handling: Within monitor_continuously():
- Individual cycle errors are caught, logged with
exc_info=True, and the loop continues -
KeyboardInterruptstops the loop gracefully - The
finallyblock callsself.close()to clean up the httpx session
Note: There is no duplicate-start prevention. Calling this endpoint multiple times creates multiple concurrent monitoring loops.
flowchart TD
A[POST /error-monitor/monitor/start] --> B[get_error_monitor singleton]
B --> C[Add background task: monitor_continuously]
C --> D[Return status: started]
subgraph Background Loop
E[monitor_continuously starts] --> F[initialize httpx session]
F --> G[Scan for critical errors from Loki]
G --> H{Loki enabled?}
H -->|No| I[Return empty list]
H -->|Yes| J[HTTP GET Loki query_range]
J --> K[Parse JSON log entries]
K --> L[classify_error per entry]
L --> M[group_similar_errors]
M --> N[Store patterns in memory]
N --> O[Scan for fixable errors]
O --> P[asyncio.sleep interval]
P --> G
end
- The monitoring loop runs indefinitely as a background task - it only stops when the application shuts down or a
KeyboardInterruptis received - No duplicate prevention: Multiple calls create multiple parallel monitoring loops, each making Loki queries at the configured interval
- Error patterns are stored in-memory (
ErrorMonitor.error_patternsdict) - lost on restart - The
initialize()call insidemonitor_continuously()creates a new httpx session separate from the singleton's session - Loki connectivity is required (
Config.LOKI_ENABLEDandConfig.LOKI_QUERY_URLmust be set) for the monitoring to actually find errors - Error classification supports 10 categories: PROVIDER_ERROR, DATABASE_ERROR, RATE_LIMIT_ERROR, AUTH_ERROR, TIMEOUT_ERROR, VALIDATION_ERROR, CACHE_ERROR, EXTERNAL_SERVICE_ERROR, INTERNAL_ERROR, UNKNOWN
Generated by AI documentation tool
Issue: #1757
Triggers a one-time manual scan of Loki logs for errors. Analyzes and stores error patterns, and optionally kicks off automated fix generation for fixable errors in the background.
Route prefix: /error-monitor (APIRouter with tags=["error-monitor"])
- Authentication: NONE - No authentication dependency
-
Rate Limiting: Global middleware only (IP-based via
security_middleware.py) - Middleware Pipeline: Sentry → Observability → Timeout → Security → GZip → Trace → Handler
| Parameter | Type | Source | Required | Default | Validation |
|---|---|---|---|---|---|
hours |
int |
Query | No | 1 |
ge=1, le=24 |
auto_fix |
bool |
Query | No | False |
Boolean |
background_tasks |
BackgroundTasks |
DI | No | BackgroundTasks() |
FastAPI injected |
{
"status": "scanned",
"errors_found": 5,
"hours": 1,
"critical_errors": 2,
"auto_fixes_started": 1
}The auto_fixes_started field is only present when auto_fix=True and fixable errors exist.
scan_for_errors()
├── get_error_monitor() → ErrorMonitor singleton
│ └── [see #1754 for init chain]
├── get_bug_fix_generator() → BugFixGenerator singleton
│ └── [see #1754 for init chain]
├── monitor.fetch_recent_errors(hours=hours)
│ └── HTTP GET Loki: {base_url}/loki/api/v1/query_range
│ ├── LogQL query: '{level="ERROR"}'
│ ├── Params: limit=100, direction=backward
│ └── Response: parsed JSON log entries
├── monitor.analyze_errors(raw_errors)
│ ├── extract_error_details() per error
│ │ ├── classify_error() → (ErrorCategory, ErrorSeverity)
│ │ └── regex extraction: file, line, function from stack traces
│ ├── determine_fixability() per pattern
│ │ └── Category-based rules (rate_limit→True, timeout→True, etc.)
│ └── group_similar_errors() → deduplicated by "category:message[:50]"
├── monitor.store_error_pattern() per pattern
│ └── Updates in-memory error_patterns dict (merges counts, examples)
└── [if auto_fix and fixable patterns exist]
└── BackgroundTasks.add_task(generator.process_multiple_errors, fixable, create_prs=True)
└── asyncio.gather → parallel process_error() calls
├── generate_fix() → 2x Claude API calls
├── create_branch_and_commit() → git subprocess
└── create_pull_request() → GitHub API POST
None - No database interaction.
None - No Redis interaction.
During scan (synchronous):
| Service | Operation | URL | Details |
|---|---|---|---|
| Loki | GET | {LOKI_QUERY_URL}/loki/api/v1/query_range |
Fetch error logs |
During auto-fix (background, if auto_fix=True):
| Service | Operation | URL | Details |
|---|---|---|---|
| Anthropic Claude API | POST | https://api.anthropic.com/v1/messages |
Error analysis per fixable error |
| Anthropic Claude API | POST | https://api.anthropic.com/v1/messages |
Fix generation per fixable error |
| GitHub API | POST | https://api.github.com/repos/{repo}/pulls |
PR creation per fix |
None directly.
| Error | Status Code | Condition |
|---|---|---|
HTTPException(500) |
500 | Any exception during scan or setup |
RuntimeError |
500 |
ANTHROPIC_API_KEY not configured (from get_bug_fix_generator()) |
Note: The get_bug_fix_generator() is called even if auto_fix=False, meaning the endpoint will fail with 500 if ANTHROPIC_API_KEY is not set, regardless of auto_fix setting.
flowchart TD
A[POST /error-monitor/monitor/scan] --> B[get_error_monitor singleton]
B --> C[get_bug_fix_generator singleton]
C --> D{ANTHROPIC_API_KEY set?}
D -->|No| E[RuntimeError 500]
D -->|Yes| F[fetch_recent_errors from Loki]
F --> G{Loki enabled?}
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References