Usage Quota Guide

🌍 View in other languages

Usage, Quota & Spend Tracking

TL;DR: OmniRoute tracks every request's token usage, computes cost, enforces per-API-key quota, and surfaces analytics in the dashboard. This guide explains how it all works.

Sources:

open-sse/services/usage.ts (~70KB) — main usage tracking
src/lib/usageAnalytics.ts (~10KB) — aggregation for dashboard
src/lib/db/quotaSnapshots.ts — historical quota data
src/lib/db/usage*.ts — multiple usage-related DB modules

Overview

Every request that flows through OmniRoute generates a usage record that captures:

Identity: which API key, provider, model, combo
Tokens: prompt tokens, completion tokens, cached tokens, total
Cost: USD amount (computed from pricing data)
Timing: latency, start/end timestamps
Status: success, error, rate-limited, etc.

These records are aggregated into analytics, persisted as quota snapshots, and used to enforce per-key budget limits.

Request ──▶ chatCore ──▶ usage.record() ──▶ SQLite
                                  │
                          ┌───────┼───────┐
                          ▼       ▼       ▼
                    analytics  quota   billing
                    (dashboard) (enforce) (export)

What Gets Recorded

The usage.ts service captures a usage event for every request:

Field	Type	Source
`id`	string	UUID generated on record
`apiKeyId`	string	The API key that initiated the request
`provider`	string	Provider ID (openai, anthropic, etc.)
`model`	string	Model ID (gpt-5, claude-opus-4-6, etc.)
`comboId`	string?	Combo ID if routed through a combo
`promptTokens`	number	From upstream response
`completionTokens`	number	From upstream response
`cachedTokens`	number	Cache hit tokens (Anthropic prompt caching, etc.)
`totalTokens`	number	prompt + completion
`costUsd`	number	Computed from pricing data
`latencyMs`	number	End-to-end request duration
`status`	enum	`success`, `error`, `rate_limited`, `timeout`, `cancelled`
`errorClass`	string?	Error class if status != success
`timestamp`	string	ISO 8601 UTC
`metadata`	object	Custom plugin-injected data

Where Tokens Come From

Tokens are extracted from the upstream provider's response in the response handler:

// From open-sse/handlers/chatCore.ts
const response = await providerExecutor.execute(provider, request);
const usage = response.usage || {
  prompt_tokens: 0,
  completion_tokens: 0,
  cached_tokens: 0,
};

For providers that don't return usage (some web-cookie providers), OmniRoute estimates tokens using a ~4 chars per token heuristic (see open-sse/services/autoCombo/pipelineRouter.ts).

Cached Tokens

OmniRoute tracks cached_tokens separately from prompt_tokens because:

Anthropic prompt caching charges a reduced rate for cached tokens (10% of normal)
Some providers return cache_read_input_tokens that should be priced differently
Analytics can show the cache hit rate = cached_tokens / prompt_tokens

Cost Calculation

Costs are computed from pricing data synced from LiteLLM (src/lib/pricingSync.ts):

Model	Input $/1M	Output $/1M	Cached $/1M
gpt-5	$2.50	$10.00	—
claude-opus-4-6	$15.00	$75.00	$1.50
claude-sonnet-4-5	$3.00	$15.00	$0.30
gemini-2.5-pro	$1.25	$10.00	—

The cost formula (src/lib/usage/costCalculator.ts):

cost = (prompt_tokens - cached_tokens) * input_price
     + cached_tokens * cached_price
     + completion_tokens * output_price

Why subtract cached from prompt? The cached portion is priced separately; charging input price on the whole prompt would over-count.

Pricing Sync

Pricing data is auto-synced from LiteLLM via the /api/pricing/sync endpoint (triggered by the built-in cron task, not a user-facing env var):

# Manual trigger
curl -X POST http://localhost:20128/api/pricing/sync

For models with no pricing data, OmniRoute falls back to estimating cost using internal average rates (sourced from LiteLLM's pricing data).

Date Range Aggregation

The usageAnalytics.ts module computes dashboard widgets from raw usage data. It supports 7 time ranges:

Range	Window	Use case
`1d`	Last 24 hours	Hourly cost spike detection
`7d`	Last 7 days	Weekly review
`30d`	Last 30 days	Monthly billing
`90d`	Last 90 days	Quarterly analysis
`ytd`	Since Jan 1 of current year	Annual budget tracking
`all`	All time	Lifetime stats
`custom`	User-defined start/end	Audits, ad-hoc queries

Dashboard Widgets Computed

For any date range, the analytics layer computes:

Widget	Description
Summary cards	Total requests, total cost, total tokens, success rate
Daily trend chart	Cost + tokens per day, stacked by model
Activity heatmap	Hour-of-day × day-of-week grid, color = request count
Model breakdown	Pie chart of cost by model
Provider breakdown	Bar chart of requests by provider
Top API keys	Table of top 10 keys by cost
Error analysis	Error rate over time, top error classes

Programmatic Access

import { computeAnalytics } from "@/lib/usageAnalytics";

const analytics = await computeAnalytics(
  history,              // usage history records
  "7d",                 // time range: "1d" | "7d" | "30d" | "90d" | "ytd" | "all" | "custom"
  connectionMap,        // provider connection map (connectionId → account name)
  {
    startDate: "2025-01-01",  // optional: for "custom" range
    endDate: "2025-06-01",   // optional: for "custom" range
  }
);

console.log(analytics.summary.totalCost);   // 12.34 (cents)
console.log(analytics.byModel[0]);           // { model, cost, requests, promptTokens, completionTokens }

---

## Quota Enforcement

Per-API-key quota is enforced in two places:

1. **Soft limit** (`quotaWarnAt`): dashboard warning when usage exceeds threshold
2. **Hard limit** (`quotaLimit`): request rejected with HTTP 429 when exceeded

### Configuration

```ts
// Per API key
await updateApiKey(keyId, {
  quotaWarnAt: 5_00,    // $5.00 — show warning
  quotaLimit: 10_00,    // $10.00 — hard stop
  quotaWindow: "month", // "day" | "week" | "month" | "all"
});

Enforcement Flow

Request ──▶ quotaCheck()
              │
              ├── Within limit?  ──▶ allow
              │
              └── Over limit?  ──▶ 429 Too Many Requests
                                   with Retry-After header

Quota Snapshots

quotaSnapshots table stores historical quota state for trend analysis:

Field	Description
`apiKeyId`	The key being tracked
`window`	"day"
`used`	Cost used in this window (cents)
`limit`	The limit (cents)
`resetAt`	When the window resets
`createdAt`	When the snapshot was taken

Snapshots are taken on every request that uses > 0 cost, and used to:

Render the quota progress bar in the dashboard
Show 30-day quota trend charts
Trigger alerts when usage approaches the limit

REST API

List Usage Records

GET /api/usage?range=7d&limit=100
GET /api/usage?apiKeyId=key-123&range=30d
GET /api/usage?provider=openai&range=1d

Response:

{
  "records": [
    {
      "id": "uuid",
      "apiKeyId": "key-123",
      "provider": "openai",
      "model": "gpt-5",
      "promptTokens": 1234,
      "completionTokens": 567,
      "totalTokens": 1801,
      "costUsd": 0.0050,
      "latencyMs": 1234,
      "status": "success",
      "timestamp": "2026-06-08T12:00:00Z"
    }
  ],
  "total": 1234,
  "nextCursor": "..."
}

Get Analytics Summary

GET /api/usage/analytics?range=7d&groupBy=model

Response:

{
  "summary": {
    "totalCost": 12.34,
    "totalRequests": 5678,
    "totalTokens": 12345678,
    "successRate": 0.987,
    "avgLatencyMs": 1234
  },
  "models": [
    { "model": "gpt-5", "cost": 8.50, "requests": 1234, "tokens": 4567890 },
    { "model": "claude-opus-4-6", "cost": 3.84, "requests": 234, "tokens": 234567 }
  ],
  "daily": [
    { "date": "2026-06-01", "cost": 1.50, "requests": 800 },
    { "date": "2026-06-02", "cost": 2.00, "requests": 1000 }
  ]
}

Query Usage Analytics

Usage data is accessed via the dashboard or MCP tools, not direct REST export endpoints. Available analytics:

/api/usage/analytics — aggregated usage metrics (group by model, provider, key)
/api/usage/quota — current quota status per API key
/api/usage/history — request history logs

MCP Tools

Two MCP tools expose usage data to agents (see open-sse/mcp-server/tools/):

Tool	Description
`omniroute_cost_report`	Generates a per-key cost report for a given period
`omniroute_check_quota`	Returns current quota status for an API key

Example agent invocation:

{
  "tool": "omniroute_cost_report",
  "args": { "period": "week" }
}

Retention and Cleanup

Usage data grows ~1-10KB per request. At scale, this can be significant.

Retention Settings

Usage history retention is configured via the Database Settings in the UI or via /api/settings/database.

By default, usage history is retained for 90 days.

Cleanup

Old records are cleaned up by src/lib/db/cleanup.ts:

Triggered by the background cron process
Deletes records from usage_history older than the configured usageHistory retention setting

Storage Estimation

Request rate	30-day storage	90-day storage
100 req/day	~3MB	~9MB
1,000 req/day	~30MB	~90MB
10,000 req/day	~300MB	~900MB
100,000 req/day	~3GB	~9GB

For very high traffic, consider:

Reducing the retention period via Database Settings
Using aggregated_metrics instead of raw records (only for analytics)

Cost Optimization Tips

1. Use the Right Model

# Quick answer — use cheap + fast
curl -d '{"model":"auto/fast","messages":[...]}'

# Complex task — use quality
curl -d '{"model":"auto/smart","messages":[...]}'

2. Enable Caching

Anthropic prompt caching saves 90% on repeated context:

// The caching is automatic — just include the same large system prompt
const response = await openai.chat({
  model: "claude-sonnet-4-5",
  system: longSystemPrompt,  // Will be cached automatically
  messages: [{ role: "user", content: "..." }]
});

3. Use Compression

RTK + Caveman compression saves 15-95% on tool-heavy sessions:

const config = {
  compression: {
    engine: "rtk",
    intensity: "aggressive"
  }
};

4. Set Per-Key Quotas

Always set quotaLimit to prevent runaway costs:

await updateApiKey(keyId, { quotaLimit: 10_00 });  // $10/month cap

5. Audit Top Consumers

Use the dashboard or /api/usage/analytics to group by API key and sort by cost:

GET /api/usage/analytics?groupBy=apiKey

Troubleshooting

"Cost is higher than expected"

Check /api/usage/analytics?groupBy=model — find the expensive model
Check /api/usage/analytics?groupBy=apiKey — find the heavy consumer
Verify pricing data is up to date: POST /api/pricing/sync

"Records missing"

Check DB retention settings under Dashboard → Database → Cleanup — old records are deleted by the periodic cleanup task (src/lib/db/cleanup.ts)
Check for errors in src/lib/db/usage*.ts — DB write failures are logged but not surfaced
Verify the request actually reached chatCore — check combo routing

"Quota not enforcing"

Check the key's quotaLimit setting
Verify quotaWindow is set correctly
Look for quotaSnapshots records — they should be created on every request

Uh oh!

Usage Quota Guide

Usage, Quota & Spend Tracking

Overview

What Gets Recorded

Where Tokens Come From

Cached Tokens

Cost Calculation

Pricing Sync

Date Range Aggregation

Dashboard Widgets Computed

Programmatic Access

Enforcement Flow

Quota Snapshots

REST API

List Usage Records

Get Analytics Summary

Query Usage Analytics

MCP Tools

Retention and Cleanup

Retention Settings

Cleanup

Storage Estimation

Cost Optimization Tips

1. Use the Right Model

2. Enable Caching

3. Use Compression

4. Set Per-Key Quotas

5. Audit Top Consumers

Troubleshooting

"Cost is higher than expected"

"Records missing"

"Quota not enforcing"

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!