# Unified admission — one decision across rate, concurrency, cost `unifiedAdmission(...)` composes the three orthogonal admission axes a real API request must clear — **rate** (req/min), **concurrency** (in-flight ceiling), and **cost** (tokens per window) — into a single `Decision` via a pure, four-law algebra (`combineDecisions`). It's the shape LLM gateways need: one decision, one observable binding axis, one shared retry hint. ```ts import { adaptiveConcurrency, gcra, rateLimit, tokenBucket, unifiedAdmission, } from "throttlekit"; const admit = unifiedAdmission({ rate: rateLimit({ strategy: gcra({ limit: 60, periodMs: 60_000 }) }), concurrency: adaptiveConcurrency({ minLimit: 4, maxLimit: 32 }), cost: rateLimit({ strategy: tokenBucket({ capacity: 100_000, refillPerSec: 1_667 }) }), }); // In an express handler: const { decision, release } = await admit.admit({ key: req.user.id, cost: req.body.maxTokens ?? 1000, }); if (!decision.allowed) { res.setHeader("Retry-After", Math.ceil(decision.retryAfterMs / 1000)); return res.status(429).json({ error: "rate_limited", retryAfterMs: decision.retryAfterMs }); } res.on("finish", () => release({ dropped: false })); res.on("close", () => release({ dropped: true })); // client hung up await callLLM(req.body); ``` ## The algebra (`combineDecisions`) Aggregation across axes: | Field | Rule | Why | |---|---|---| | `allowed` | `a.allowed && b.allowed` | AND — both must allow | | `limit` | `min(a.limit, b.limit)` | binding ceiling — what the client should see | | `remaining` | `min(a.remaining, b.remaining)` | binding remainder | | `resetAt` | `max(a.resetAt, b.resetAt)` | latest-resolution wait | | `retryAfterMs` | `max(a.retryAfterMs, b.retryAfterMs)` | dominant wait — never under-state | Four algebraic laws hold (proven via `fast-check` at `numRuns ≥ 500`): **identity**, **associativity**, **commutativity**, **idempotency**. Together they mean axis evaluation order doesn't change the result, N inputs reduce flat, retried sub-checks are safe, and unused axes plug in cleanly via the `ALLOW_FULL` neutral element. `combineDecisions` and `ALLOW_FULL` are publicly exported off the root — useful for tests and N-ary composition. ## Two backend modes | Mode | When to use | Wire cost | |---|---|---| | **Sequential** (default) | Any backend mix (in-process + Redis + Postgres) | rate-axis RTT + cost-axis RTT (often pipelined to ~1 RTT) | | **Lua-fused** (opt-in) | All rate/cost on the same Redis client; you want atomic joint enforcement | 1 RTT regardless of axes | The **lua-fused** path ships GCRA + tokenBucket fusion in 0.9.0 — the LLM-gateway combo: ```ts import Redis from "ioredis"; import { fromIoredis } from "throttlekit/redis"; const admit = unifiedAdmission({ concurrency: adaptiveConcurrency({ minLimit: 4, maxLimit: 32 }), backend: "lua-fused", fused: { client: fromIoredis(new Redis(process.env.REDIS_URL!)), rate: { strategy: "gcra", limit: 60, periodMs: 60_000, prefix: "rl:rate" }, cost: { strategy: "tokenBucket", capacity: 100_000, refillPerSec: 1_667, prefix: "rl:cost" }, }, }); ``` Sequential ≡ Lua-fused: the byte-identical Decision-stream property is proven across 100 fast-check timelines per (rate-binding, cost-binding, both-binding) configuration in `test/admission/fused-conformance.test.ts` (TK-1006). ## Observability — the binding axis When an admission denies, **which axis was binding?** That's the #1 missing OTel signal for LLM gateways today. Two helpers from the `throttlekit/observability` subpath: ```ts import { trace } from "@opentelemetry/api"; import { bindingAxisOf, recordUnifiedAdmissionOnSpan } from "throttlekit/observability"; const { decision, release } = await admit.admit({ key, cost }); const span = trace.getActiveSpan(); if (span) recordUnifiedAdmissionOnSpan(span, decision, admit.lastDecisions()); // Or query directly: if (!decision.allowed) { log.info({ axis: bindingAxisOf(admit.lastDecisions()), retryAfterMs: decision.retryAfterMs }); } ``` The attribute key is `throttlekit.binding_axis ∈ {"rate", "concurrency", "cost"}`. It's set only on denied admissions (omitted when allowed). When multiple axes deny (possible in lua-fused mode), the convention is **concurrency → rate → cost** priority — matches sequential's evaluation order so the value is deterministic regardless of backend. `UnifiedAdmitter.lastDecisions()` returns a frozen per-axis snapshot (`{ rate, concurrency, cost }`); unconfigured axes are `undefined`, short-circuited axes also `undefined` (so you can identify the first denying axis from absence alone). Behind a ThrottleKit server the binding axis is also readable **remotely and from any language** via the read-only **Monitor door** (`GetSnapshot` / `Watch`), and exported to Prometheus `/metrics` as `throttlekit_denied_by_axis_total` — the same signal, off the request path. See the [Monitoring guide](Monitoring-and-the-Lens) and [Operations](Operations). ## Lifecycle — `admit()` vs `admitSync()` - `admit()` is **async** and returns `Promise` — works for any backend mix (Redis, Postgres, in-process). - `admitSync()` is the sync sibling — only valid when every configured axis has `checkSync` (in-process MemoryStore for rate/cost; concurrency is always sync). Throws otherwise (same convention as `Limiter.check` / `Limiter.checkSync`). Both return `{ decision, release }` — the release is the lifecycle hook for the concurrency slot, separate from the Decision because concurrency has *lease semantics* (acquire-release) that don't fit `Limiter`'s stateless `.check() → Decision` shape (the locked decision is **D-U4** in `research/bigger-bets/unified/DESIGN.md` §14). Idempotency: a second `release()` call is a no-op. A denied admit's `release` is a no-op (no slot was held — any transient acquire upstream of the binding axis was released as part of the short-circuit). ## Joint-LP policy — bid-price admission (opt-in, 0.11.1) Marginal-AND admits when each axis independently has room. When the **cost** axis binds and request types differ in value-per-cost-unit, that greedily burns budget on whatever arrives first — including cheap-to-pass, low-value, cost-heavy requests that starve the high-value requests arriving later. The fix from revenue management is a **bid-price filter**: admit iff the request's value clears the shadow price of the budget it consumes, ``` admit ⟺ value ≥ p_R + p_C · cost ``` where `(p_R, p_C)` are the dual variables of the workload's *fluid LP*. The literature (Talluri–van Ryzin 1998; Devanur–Hayes 2009; Buchbinder–Jain–Naor 2007) shows static bid prices are asymptotically fluid-optimal under (approximate) stationarity. `research/bigger-bets/unified/THEORY.md` (TK-1007) calibrated the gap on an LLM-gateway workload: | ρ (autocorrelation) | regret(marginal-AND) | regret(joint-LP) | ε | |---|---|---|---| | −1.0 (alternation) | 40.00% | 0.00% | **+40.00%** | | 0.0 (independent) | 40.50% | 1.01% | **+39.49%** | | +1.0 (one type forever) | 32.50% | 65.00% | **−32.50%** (the foil) | | **mean** | 38.90% | 13.57% | **+25.33%** | Mean ε = 25.33% ≫ the 5% ship gate (DR-19) → **shipped as opt-in `policy: "joint-lp"` in 0.11.1**. ### API ```ts // Supply a workload model — the library solves the fluid LP once at construction: const admit = unifiedAdmission({ cost: rateLimit({ strategy: tokenBudget({ budget: 50_000, windowMs: 60_000 }) }), policy: "joint-lp", jointLp: { workload: { types: [ { cost: 100, value: 1, weight: 0.5 }, // small completion { cost: 10_000, value: 50, weight: 0.5 }, // large completion ], rateBudget: 1_000, costBudget: 50_000, }, }, }); // …or supply precomputed bid prices directly (e.g. solved offline): // jointLp: { duals: { rate: 0, cost: 0.01 } } const { decision, release, policyDenied } = admit.admitSync({ cost: 10_000, value: 50 }); // policyDenied === true ⇒ the bid-price filter bound (every axis had room). ``` `solveFluidLp(...)` is also exported standalone (returns `{ duals, admitFractions, objective }`). Per-call `value` defaults to `1`. The policy is **strictly more selective** than marginal-AND — it only ever *removes* admits, so it cannot breach any limit — and runs identically over the sequential and `lua-fused` backends. Default `"marginal"` is byte-for-byte unchanged. Requires a `cost` axis. ### Honest caveat — do not enable blindly The ρ = +1 column is **negative**: under a highly autocorrelated, near-*absorbing* workload (long runs of one type), the static fluid-LP duals can *under-perform* marginal-AND — the textbook fluid-LP failure under non-stationarity (Talluri–van Ryzin 1998). Real aggregator traffic sits in moderate ρ where joint-LP wins by +39–40%, but if your arrivals are strongly autocorrelated, **re-measure ε on your own trace** and keep the default. ### Online dual refinement — `jointLp.adaptive` (opt-in, 0.11.3) If you can't pin the prior confidently, let the policy **learn** the bid prices online (Devanur–Hayes sample-then-price). Requires the `workload` form: ```ts const admit = unifiedAdmission({ cost, policy: "joint-lp", jointLp: { workload, // the construction PRIOR (+ per-arrival budgets) adaptive: { sampleWindow: 500 }, // observe 500 requests, then re-price }, }); ``` It prices the first `sampleWindow` requests with the prior while observing the live `(cost, value)` mixture, then re-solves the fluid LP and **adopts the learned duals only if they beat the prior on the observed sample**, else keeps the prior — then freezes. So: - a **misspecified** prior is *rescued* (a prior whose duals reject everything is escaped — ~100% → ~20–30% regret in the gate); - a **correct** prior is *kept* (noise can't dislodge it; the naïve "always re-price" variant instead hurts a correct prior, **9.9–21.1%** vs static's **0.7–1.2%** — that design was rejected by the gate). **Honest scope:** the guarantee is non-inferiority *on the observed sample*, not over the full horizon — under autocorrelated arrivals the window can be unrepresentative and an adopted dual can be slightly worse on the full stream (the ρ=+1 foil's cousin; bounded, +~0.8pp measured). Prefer a larger `sampleWindow` on bursty traffic; the prior is always the floor. With a `concurrency` axis the window counts the concurrency-passed population. ### Concurrency shadow price — the 3-axis filter (opt-in, 0.11.3) The 2-axis filter prices two *flow* budgets (rate, cost). A third axis — **concurrency** — is a *stock* (a held slot), which looks like it doesn't fit the same fluid relaxation. It does, via **Little's law**: an occupancy cap `L` over a window `T` is a concurrency-seconds budget `K = L·T`, and each admit consumes its **hold time** `h`. The bid test gains a term: ``` admit iff value ≥ p_R + p_C·cost + p_K·hold ``` This rejects a **hold-time hog** — a request that is cheap and valuable *per token* but holds a worker slot for a long time — that the 2-axis filter is structurally blind to (two requests identical on cost+value but 10× apart in hold time look the same to it). The gate (`three-axis-gate.ts`) measures regret **53% → 2% (ε≈51pp)** when concurrency binds and the hog is indistinguishable on (rate, cost). ```ts const admit = unifiedAdmission({ concurrency, // the real occupancy limiter cost, policy: "joint-lp", jointLp: { workload: { types: [ { cost: 100, value: 10, weight: 1800, hold: 15 }, // short — frees its slot fast { cost: 100, value: 10, weight: 200, hold: 200 }, // long — a concurrency hog ], rateBudget: 2000, costBudget: 1e9, concBudget: 20_000, // K = L·T (e.g. L=10 slots × T=2000) }, }, }); // pass the request's expected service time per call: admit.admitSync({ cost: 100, value: 10, hold: 200 }); // policyDenied — the hog is priced out ``` Honest scope: it earns its keep only when concurrency BINDS and the hog is **strictly dominated** (a bid-price threshold can't ration a *marginal* hog — the same limit as the ρ=+1 foil); when concurrency is ample `p_K = 0` and it's a no-op. A missing / non-finite / negative per-request `hold` is **fail-open** (no concurrency term — never a wrongful reject, and a hog can't dodge the price by reporting a negative hold). Not combinable with `jointLp.adaptive` yet. ## Deferred / future work | Item | Where | Status | |---|---|---| | `policy: "joint-lp"` runtime | bid-price filter on `unifiedAdmission` | **✅ shipped 0.11.1** (ε = 25.33%) | | Online primal-dual (Devanur–Hayes sample-then-price) | `jointLp.adaptive` — guarded warm-up on `unifiedAdmission` | **✅ shipped 0.11.3** (guarded self-validating; D-JLP-13/14) | | 3-axis joint LP (rate + cost + concurrency shadow price) | `value ≥ p_R + p_C·cost + p_K·hold` via Little's law (`concBudget` + per-request `hold`) | **✅ shipped 0.11.3** (D-JLP-15/16) | | `Decision.bindingAxis` field | breaking change to Decision shape — use the OTel attr + `lastDecisions()` / `policyDenied` instead | 1.0 candidate | See `research/bigger-bets/joint-lp-admission/DESIGN.md` (D-JLP-1..16) for the policy design, `research/bigger-bets/unified/DESIGN.md` for the composition algebra, and `research/bigger-bets/PLAN.md` for the roadmap. ## Recipes ### Fastest-fail order matters (when correlated with cost) Sequential evaluates **concurrency → rate → cost** (in-process first, fastest fail). Commutativity of `combineDecisions` (the proven law) means the *result* doesn't depend on order — only the short-circuit cost. For LLM gateways the concurrency axis usually binds first under load, so this saves the Redis round-trip in the common deny path. ### Federated unified admission Federation (`federate({ coordinator, ... })`) is shipped as of 0.8.3. `unifiedAdmission` composes with it for free — pass a federated `Limiter` as the rate or cost axis. No new surface is needed; the unified layer doesn't know or care about the federation layer. Tested in `test/admission/unified.test.ts`. ### Configuring multiple tenants Concurrency is keyless (one global guard per process). Rate and cost are per-key — `admit({ key: "tenant:abc", cost: 1500 })` keys the rate and cost limits independently per tenant. For weighted fair sharing across tenants, layer `weightedFairShare(...)` upstream of `unifiedAdmission`. ### Error path on release ```ts const { decision, release } = await admit.admit({ key, cost }); if (!decision.allowed) return reject(decision); try { await doWork(); release({ dropped: false }); } catch (err) { release({ dropped: true }); // signal overload → AIMD contracts the ceiling throw err; } ``` The `dropped: true` flag propagates to the underlying gradient2 / AIMD update — the limit contracts on overload signals. ## See also - [Distributed & provable](Distributed-and-Provable) — `twoTier(leased)`, `windowCoupled`, the per-window overshoot bound. - [Federation](Federation) — the cross-cluster federation primitive that composes with `unifiedAdmission`. - [Operations](Operations) — Prometheus / Grafana / OTel guidance. - [GALE & TALE](Research) — the bounded-overshoot guarantees `unifiedAdmission` plugs into. - `examples/unified.ts` in the repo — runnable LLM-gateway-style demo. - `research/bigger-bets/unified/DESIGN.md` — the design lock + decision records. - `research/bigger-bets/unified/THEORY.md` — joint-vs-marginal empirical regret analysis.