-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Quota Share
Doc reference:
docs/routing/QUOTA_SHARE.mdPart of Group B (plans 16 + 22).
The Quota Sharing Engine distributes a provider's time-based quota (e.g. Codex 5-hour window, Kimi 1500 req/h) fairly across multiple API keys that share the same connection.
Problem it solves: OmniRoute proxies many API keys against the same upstream provider account. Without sharing logic, a burst from key A can exhaust the provider quota for the hour, leaving keys B and C blocked until the window resets. The engine prevents this by:
- Tracking each key's rolling consumption per dimension (%, requests, tokens, $).
- Applying a work-conserving fair-share algorithm: a key may borrow from idle shares while the global pool is not saturated.
- Enforcing the result in the hot path (
chatCore.ts) before the request reaches the upstream executor.
Implemented in src/lib/quota/fairShare.ts.
| Condition | Mode | Behaviour |
|---|---|---|
globalUsedPercent < saturationThreshold |
Generous | Key may borrow up to global limit minus consumed-total |
globalUsedPercent >= saturationThreshold |
Strict | Enforce individual fair share strictly |
Default saturationThreshold = 0.5 (env QUOTA_SATURATION_THRESHOLD).
For each active dimension in the pool, the engine computes:
fairShareAllowed = poolLimit × (allocationWeight / 100)
consumed = current rolling value for this key (from QuotaStore.peek)
remaining = fairShareAllowed - consumed
Then:
-
policy = hard: ifconsumed > fairShareAllowedand mode is strict → block. -
policy = soft: ifconsumed > fairShareAllowedand mode is strict → penalize (deprioritize in combo; never hard-block). -
policy = burst: allow while global headroom exists regardless of fair share.
capValue + capUnit on an allocation is a hard ceiling independent of mode or
policy. Any dimension where consumed >= capValue always blocks the request.
A request is blocked if any dimension in the pool would block it. Dimensions are independent — a 5h% exhaustion does not affect the weekly% dimension.
In generous mode, a key whose allocation is under-consumed can use surplus from other keys' unallocated shares. The formula is:
maxAllowed = globalLimit - consumedByOtherKeys
where consumedByOtherKeys = consumedTotal - consumedByThisKey. The teto global
(pool limit for that dimension) is always the hard ceiling.
Implemented in src/lib/quota/sqliteQuotaStore.ts and redisQuotaStore.ts.
Two buckets per (apiKeyId, dimensionKey):
-
curr: current bucket (floor(nowMs / windowMs)) -
prev: previous bucket (curr - 1)
Effective rolling value:
effectiveBucketIndex = floor(nowMs / windowMs)
bucketStartMs = effectiveBucketIndex × windowMs
elapsed = nowMs - bucketStartMs
weight = 1 - elapsed / windowMs
effective = prev × weight + curr
Precision: ~99% accurate. The error is at most 1% of the window size at the boundary between buckets (inherent to the 2-bucket approximation).
SQLite driver: in-memory mutex per (apiKeyId | dimensionKey) key prevents the
read-modify-write race. Pattern mirrors src/sse/services/auth.ts anti-thundering-herd.
Redis driver: Lua EVAL script for atomic increment — runs as a single Redis command.
- Table:
quota_consumption(see migration073_quota_pools.sql/074_quota_consumption.sql). - Best for single-instance deployments.
- All persistence is in the existing OmniRoute SQLite DB (
DATA_DIR/storage.sqlite).
- Requires
ioredisnpm package. - Counters stored in Redis; metadata (pools/allocations) still in SQLite.
- Best for multi-replica deployments where counters must be shared.
Via settings UI (/dashboard/settings → Quota Store), or via env vars:
QUOTA_STORE_DRIVER=redis
QUOTA_STORE_REDIS_URL=redis://localhost:6379DB setting has precedence over env. If driver=redis but URL is absent or
ioredis is not installed, the factory falls back to SQLite and logs a warning.
Driver selection order:
- DB setting
quotaStore.driver - Env
QUOTA_STORE_DRIVER - Default:
sqlite
A pool can have multiple dimensions. Each dimension is independent:
QuotaDimension {
unit: "percent" | "requests" | "tokens" | "usd",
window: "5h" | "hourly" | "daily" | "weekly" | "monthly",
limit: number, // global pool ceiling for this dimension
}Example: Codex plan (5h% + weekly%):
[
{ "unit": "percent", "window": "5h", "limit": 100 },
{ "unit": "percent", "window": "weekly","limit": 100 }
]A request must satisfy all dimensions to be allowed.
Implemented in src/lib/quota/planResolver.ts.
Precedence (highest to lowest):
-
Manual DB override —
provider_planstable, perconnectionId. -
Known catalog —
src/lib/quota/planRegistry.ts(data-only). - Empty plan — no dimensions, manual configuration required.
| Provider | Dimensions |
|---|---|
codex |
percent/5h/100, percent/weekly/100
|
glm |
tokens/5h (limit=0, unknown), tokens/weekly
|
minimax |
tokens/5h, tokens/weekly
|
bailian |
percent/5h/100, percent/weekly/100, percent/monthly/100
|
kimi |
requests/hourly/1500 |
alibaba |
requests/monthly/90000 |
openai, anthropic
|
No default — manual configuration required |
Runs before the upstream executor, after auth and policy checks:
resolveComboTargets / handleSingleModel
→ enforceQuotaShare(apiKeyId, connectionId, provider, estimatedCost)
→ getQuotaStore().peek() per dimension
→ fairShare.decideFairShare()
→ if block → return 429 (buildErrorBody, Hard Rule #12)
→ if allow + deprioritize → set quotaSoftPenalty=true on candidate
→ executor.execute()
Fail-open: if enforceQuotaShare throws, the request is allowed through
with a pino.warn log. This prevents a quota-engine bug from blocking all
traffic.
After a successful response:
executor returns success
→ spendRecorder.recordConsumption(apiKeyId, connectionId, provider, actualCost)
→ getQuotaStore().consume() per dimension
→ fail-open: errors logged as pino.warn, never propagated to client
Drift note: if consume fails post-response, the rolling counter under-counts.
The saturation signal from the provider (e.g. anthropic-ratelimit-unified-5h-utilization)
corrects the global estimate on the next request.
When decision.deprioritize === true:
if (candidate.quotaSoftPenalty) {
score *= QUOTA_SOFT_DEPRIORITIZE_FACTOR; // default 0.7
}The penalty is applied after all other scoring factors. It lowers the auto-combo probability of selecting a saturated key without hard-blocking it.
Components (all in src/app/(dashboard)/dashboard/costs/quota-share/):
| Component | Purpose |
|---|---|
QuotaConceptCard |
Introductory card explaining quota sharing to new users |
CreatePoolModal |
Create a new quota pool (connection + name + initial allocations) |
PoolCard |
Per-pool summary: name, connection, allocation count |
DimensionBar |
Per-dimension stacked bar: each key's share + global usage |
AllocationTable |
Table with consumed, fair share, deficit/surplus, borrowing flag |
BurnRateChart |
EMA burn-rate line chart (lazy Recharts via dynamic()) |
EditAllocationsModal |
Edit allocation weights, caps, and policies for a pool |
The page hooks:
-
usePools— fetchesGET /api/quota/poolsevery 30s. -
usePoolUsage— fetchesGET /api/quota/pools/[id]/usageon demand. -
useLocalStoragePoolMigration— runs once on mount to migrate legacy LS data.
-
ProviderPlanConfigClient.tsx: dropdown to select a provider, view resolved plan (auto from catalog or manual override), and edit dimensions. - Changes write to
PUT /api/quota/plans/[connectionId]. - Deletion reverts to catalog or empty plan.
| Variable | Default | Description |
|---|---|---|
QUOTA_STORE_DRIVER |
sqlite |
Driver to use: sqlite or redis
|
QUOTA_STORE_REDIS_URL |
(empty) | Redis URL, e.g. redis://localhost:6379
|
QUOTA_SATURATION_THRESHOLD |
0.5 |
0..1; >= threshold activates strict mode |
QUOTA_SOFT_DEPRIORITIZE_FACTOR |
0.7 |
0..1; multiplier for soft-policy combo score |
QUOTA_CONSUMPTION_RETENTION_DAYS |
14 |
Days before GC removes old quota_consumption buckets |
DB settings (quotaStore.*) override env vars.
Check that ioredis is installed (npm ls ioredis) and QUOTA_STORE_REDIS_URL
is reachable. On connection failure the factory falls back to SQLite (logged at
warn).
If peek throws, enforceQuotaShare treats the result as "allow" (fail-open).
Check pino logs for quota:enforce and quota:factory entries to identify
the root cause.
If the actual provider usage differs from the counters, it is expected — the
2-bucket sliding window has ~1% error at window boundaries, and consume is
fire-and-forget post-response. The saturation signal (saturationSignals.ts)
reads the real provider utilization with a 30s TTL and adjusts globalUsedPercent
accordingly.
computeBurnRate requires at least 2 historical samples. New pools without prior
consume calls will show tokensPerSecond: 0 and timeToExhaustionMs: null.
When /dashboard/costs/quota-share first loads, the hook useLocalStoragePoolMigration
checks:
-
localStorage.getItem("omniroute:quota-share:pools")is non-empty. -
GET /api/quota/poolsreturns[](DB is empty).
If both are true, it posts each legacy pool to POST /api/quota/pools in batch,
then removes the localStorage key. The migration is idempotent: condition 2 prevents
re-migration.
Three tables added by migrations 073–075:
-
quota_pools+quota_allocations— pool definitions and per-key allocations. -
quota_consumption— rolling 2-bucket counters per(apiKeyId, dimensionKey). -
provider_plans— manual provider plan overrides (dimensions JSON per connectionId).
All tables added via idempotent CREATE TABLE IF NOT EXISTS migrations.
OmniRoute · Website · npm · Docker Hub
- Setup Guide
- User Guide
- Features
- Quick Start (Docker)
- Electron Desktop App
- Termux (Android)
- PWA Guide
- MCP Server
- A2A Server
- Agent Protocols
- OpenCode Plugin
- Webhooks
- Cloud Agents
- Skills
- Memory
- Evals
- Gamification
- Guardrails
- Compliance
- Error Sanitization
- Public Credentials
- Route Guard Tiers
- Stealth Guide
- CLI Token Auth