# Unified admission — one decision across rate, concurrency, cost

`unifiedAdmission(...)` composes the three orthogonal admission axes a
real API request must clear — **rate** (req/min), **concurrency**
(in-flight ceiling), and **cost** (tokens per window) — into a single
`Decision` via a pure, four-law algebra (`combineDecisions`). It's the
shape LLM gateways need: one decision, one observable binding axis,
one shared retry hint.

```ts
import {
  adaptiveConcurrency,
  gcra,
  rateLimit,
  tokenBucket,
  unifiedAdmission,
} from "throttlekit";

const admit = unifiedAdmission({
  rate:        rateLimit({ strategy: gcra({ limit: 60, periodMs: 60_000 }) }),
  concurrency: adaptiveConcurrency({ minLimit: 4, maxLimit: 32 }),
  cost:        rateLimit({ strategy: tokenBucket({ capacity: 100_000, refillPerSec: 1_667 }) }),
});

// In an express handler:
const { decision, release } = await admit.admit({
  key:  req.user.id,
  cost: req.body.maxTokens ?? 1000,
});

if (!decision.allowed) {
  res.setHeader("Retry-After", Math.ceil(decision.retryAfterMs / 1000));
  return res.status(429).json({ error: "rate_limited", retryAfterMs: decision.retryAfterMs });
}

res.on("finish", () => release({ dropped: false }));
res.on("close",  () => release({ dropped: true }));  // client hung up
await callLLM(req.body);
```

## The algebra (`combineDecisions`)

Aggregation across axes:

| Field | Rule | Why |
|---|---|---|
| `allowed` | `a.allowed && b.allowed` | AND — both must allow |
| `limit` | `min(a.limit, b.limit)` | binding ceiling — what the client should see |
| `remaining` | `min(a.remaining, b.remaining)` | binding remainder |
| `resetAt` | `max(a.resetAt, b.resetAt)` | latest-resolution wait |
| `retryAfterMs` | `max(a.retryAfterMs, b.retryAfterMs)` | dominant wait — never under-state |

Four algebraic laws hold (proven via `fast-check` at `numRuns ≥ 500`):
**identity**, **associativity**, **commutativity**, **idempotency**.
Together they mean axis evaluation order doesn't change the result,
N inputs reduce flat, retried sub-checks are safe, and unused axes
plug in cleanly via the `ALLOW_FULL` neutral element.

`combineDecisions` and `ALLOW_FULL` are publicly exported off the root
— useful for tests and N-ary composition.

## Two backend modes

| Mode | When to use | Wire cost |
|---|---|---|
| **Sequential** (default) | Any backend mix (in-process + Redis + Postgres) | rate-axis RTT + cost-axis RTT (often pipelined to ~1 RTT) |
| **Lua-fused** (opt-in) | All rate/cost on the same Redis client; you want atomic joint enforcement | 1 RTT regardless of axes |

The **lua-fused** path ships GCRA + tokenBucket fusion in 0.9.0 — the
LLM-gateway combo:

```ts
import Redis from "ioredis";
import { fromIoredis } from "throttlekit/redis";

const admit = unifiedAdmission({
  concurrency: adaptiveConcurrency({ minLimit: 4, maxLimit: 32 }),
  backend: "lua-fused",
  fused: {
    client: fromIoredis(new Redis(process.env.REDIS_URL!)),
    rate: { strategy: "gcra",        limit: 60,     periodMs: 60_000,    prefix: "rl:rate" },
    cost: { strategy: "tokenBucket", capacity: 100_000, refillPerSec: 1_667, prefix: "rl:cost" },
  },
});
```

Sequential ≡ Lua-fused: the byte-identical Decision-stream property is
proven across 100 fast-check timelines per (rate-binding,
cost-binding, both-binding) configuration in
`test/admission/fused-conformance.test.ts` (TK-1006).

## Observability — the binding axis

When an admission denies, **which axis was binding?** That's the #1
missing OTel signal for LLM gateways today. Two helpers from the
`throttlekit/observability` subpath:

```ts
import { trace } from "@opentelemetry/api";
import { bindingAxisOf, recordUnifiedAdmissionOnSpan } from "throttlekit/observability";

const { decision, release } = await admit.admit({ key, cost });
const span = trace.getActiveSpan();
if (span) recordUnifiedAdmissionOnSpan(span, decision, admit.lastDecisions());

// Or query directly:
if (!decision.allowed) {
  log.info({ axis: bindingAxisOf(admit.lastDecisions()), retryAfterMs: decision.retryAfterMs });
}
```

The attribute key is `throttlekit.binding_axis ∈ {"rate", "concurrency",
"cost"}`. It's set only on denied admissions (omitted when allowed).
When multiple axes deny (possible in lua-fused mode), the convention
is **concurrency → rate → cost** priority — matches sequential's
evaluation order so the value is deterministic regardless of backend.

`UnifiedAdmitter.lastDecisions()` returns a frozen per-axis snapshot
(`{ rate, concurrency, cost }`); unconfigured axes are `undefined`,
short-circuited axes also `undefined` (so you can identify the first
denying axis from absence alone).

Behind a ThrottleKit server the binding axis is also readable **remotely
and from any language** via the read-only **Monitor door** (`GetSnapshot`
/ `Watch`), and exported to Prometheus `/metrics` as
`throttlekit_denied_by_axis_total` — the same signal, off the request
path. See the [Monitoring guide](Monitoring-and-the-Lens) and
[Operations](Operations).

## Lifecycle — `admit()` vs `admitSync()`

- `admit()` is **async** and returns `Promise<UnifiedAdmission>` —
  works for any backend mix (Redis, Postgres, in-process).
- `admitSync()` is the sync sibling — only valid when every configured
  axis has `checkSync` (in-process MemoryStore for rate/cost;
  concurrency is always sync). Throws otherwise (same convention as
  `Limiter.check` / `Limiter.checkSync`).

Both return `{ decision, release }` — the release is the lifecycle
hook for the concurrency slot, separate from the Decision because
concurrency has *lease semantics* (acquire-release) that don't fit
`Limiter`'s stateless `.check() → Decision` shape (the locked
decision is **D-U4** in `research/bigger-bets/unified/DESIGN.md` §14).

Idempotency: a second `release()` call is a no-op. A denied admit's
`release` is a no-op (no slot was held — any transient acquire upstream
of the binding axis was released as part of the short-circuit).

## Joint-LP policy — bid-price admission (opt-in, 0.11.1)

Marginal-AND admits when each axis independently has room. When the
**cost** axis binds and request types differ in value-per-cost-unit,
that greedily burns budget on whatever arrives first — including
cheap-to-pass, low-value, cost-heavy requests that starve the
high-value requests arriving later. The fix from revenue management is a
**bid-price filter**: admit iff the request's value clears the shadow
price of the budget it consumes,

```
admit  ⟺  value ≥ p_R + p_C · cost
```

where `(p_R, p_C)` are the dual variables of the workload's *fluid LP*.
The literature (Talluri–van Ryzin 1998; Devanur–Hayes 2009;
Buchbinder–Jain–Naor 2007) shows static bid prices are asymptotically
fluid-optimal under (approximate) stationarity. `research/bigger-bets/unified/THEORY.md`
(TK-1007) calibrated the gap on an LLM-gateway workload:

| ρ (autocorrelation) | regret(marginal-AND) | regret(joint-LP) | ε |
|---|---|---|---|
| −1.0 (alternation) | 40.00% | 0.00% | **+40.00%** |
| 0.0 (independent) | 40.50% | 1.01% | **+39.49%** |
| +1.0 (one type forever) | 32.50% | 65.00% | **−32.50%** (the foil) |
| **mean** | 38.90% | 13.57% | **+25.33%** |

Mean ε = 25.33% ≫ the 5% ship gate (DR-19) → **shipped as opt-in
`policy: "joint-lp"` in 0.11.1**.

### API

```ts
// Supply a workload model — the library solves the fluid LP once at construction:
const admit = unifiedAdmission({
  cost: rateLimit({ strategy: tokenBudget({ budget: 50_000, windowMs: 60_000 }) }),
  policy: "joint-lp",
  jointLp: {
    workload: {
      types: [
        { cost: 100,    value: 1,  weight: 0.5 },  // small completion
        { cost: 10_000, value: 50, weight: 0.5 },  // large completion
      ],
      rateBudget: 1_000,
      costBudget: 50_000,
    },
  },
});

// …or supply precomputed bid prices directly (e.g. solved offline):
//   jointLp: { duals: { rate: 0, cost: 0.01 } }

const { decision, release, policyDenied } = admit.admitSync({ cost: 10_000, value: 50 });
// policyDenied === true ⇒ the bid-price filter bound (every axis had room).
```

`solveFluidLp(...)` is also exported standalone (returns `{ duals, admitFractions, objective }`).
Per-call `value` defaults to `1`. The policy is **strictly more selective** than
marginal-AND — it only ever *removes* admits, so it cannot breach any limit — and
runs identically over the sequential and `lua-fused` backends. Default `"marginal"`
is byte-for-byte unchanged. Requires a `cost` axis.

### Honest caveat — do not enable blindly

The ρ = +1 column is **negative**: under a highly autocorrelated,
near-*absorbing* workload (long runs of one type), the static fluid-LP
duals can *under-perform* marginal-AND — the textbook fluid-LP failure
under non-stationarity (Talluri–van Ryzin 1998). Real aggregator traffic
sits in moderate ρ where joint-LP wins by +39–40%, but if your arrivals
are strongly autocorrelated, **re-measure ε on your own trace** and keep
the default.

### Online dual refinement — `jointLp.adaptive` (opt-in, 0.11.3)

If you can't pin the prior confidently, let the policy **learn** the bid prices online
(Devanur–Hayes sample-then-price). Requires the `workload` form:

```ts
const admit = unifiedAdmission({
  cost,
  policy: "joint-lp",
  jointLp: {
    workload,                        // the construction PRIOR (+ per-arrival budgets)
    adaptive: { sampleWindow: 500 }, // observe 500 requests, then re-price
  },
});
```

It prices the first `sampleWindow` requests with the prior while observing the live
`(cost, value)` mixture, then re-solves the fluid LP and **adopts the learned duals only
if they beat the prior on the observed sample**, else keeps the prior — then freezes. So:

- a **misspecified** prior is *rescued* (a prior whose duals reject everything is escaped —
  ~100% → ~20–30% regret in the gate);
- a **correct** prior is *kept* (noise can't dislodge it; the naïve "always re-price" variant
  instead hurts a correct prior, **9.9–21.1%** vs static's **0.7–1.2%** — that design was
  rejected by the gate).

**Honest scope:** the guarantee is non-inferiority *on the observed sample*, not over the
full horizon — under autocorrelated arrivals the window can be unrepresentative and an
adopted dual can be slightly worse on the full stream (the ρ=+1 foil's cousin; bounded,
+~0.8pp measured). Prefer a larger `sampleWindow` on bursty traffic; the prior is always the
floor. With a `concurrency` axis the window counts the concurrency-passed population.

### Concurrency shadow price — the 3-axis filter (opt-in, 0.11.3)

The 2-axis filter prices two *flow* budgets (rate, cost). A third axis — **concurrency** —
is a *stock* (a held slot), which looks like it doesn't fit the same fluid relaxation. It
does, via **Little's law**: an occupancy cap `L` over a window `T` is a concurrency-seconds
budget `K = L·T`, and each admit consumes its **hold time** `h`. The bid test gains a term:

```
admit iff  value ≥ p_R + p_C·cost + p_K·hold
```

This rejects a **hold-time hog** — a request that is cheap and valuable *per token* but holds
a worker slot for a long time — that the 2-axis filter is structurally blind to (two requests
identical on cost+value but 10× apart in hold time look the same to it). The gate
(`three-axis-gate.ts`) measures regret **53% → 2% (ε≈51pp)** when concurrency binds and the hog
is indistinguishable on (rate, cost).

```ts
const admit = unifiedAdmission({
  concurrency,                                 // the real occupancy limiter
  cost,
  policy: "joint-lp",
  jointLp: {
    workload: {
      types: [
        { cost: 100, value: 10, weight: 1800, hold: 15 },   // short — frees its slot fast
        { cost: 100, value: 10, weight: 200, hold: 200 },    // long  — a concurrency hog
      ],
      rateBudget: 2000, costBudget: 1e9,
      concBudget: 20_000,                       // K = L·T  (e.g. L=10 slots × T=2000)
    },
  },
});
// pass the request's expected service time per call:
admit.admitSync({ cost: 100, value: 10, hold: 200 }); // policyDenied — the hog is priced out
```

Honest scope: it earns its keep only when concurrency BINDS and the hog is **strictly dominated**
(a bid-price threshold can't ration a *marginal* hog — the same limit as the ρ=+1 foil); when
concurrency is ample `p_K = 0` and it's a no-op. A missing / non-finite / negative per-request
`hold` is **fail-open** (no concurrency term — never a wrongful reject, and a hog can't dodge the
price by reporting a negative hold). Not combinable with `jointLp.adaptive` yet.

## Deferred / future work

| Item | Where | Status |
|---|---|---|
| `policy: "joint-lp"` runtime | bid-price filter on `unifiedAdmission` | **✅ shipped 0.11.1** (ε = 25.33%) |
| Online primal-dual (Devanur–Hayes sample-then-price) | `jointLp.adaptive` — guarded warm-up on `unifiedAdmission` | **✅ shipped 0.11.3** (guarded self-validating; D-JLP-13/14) |
| 3-axis joint LP (rate + cost + concurrency shadow price) | `value ≥ p_R + p_C·cost + p_K·hold` via Little's law (`concBudget` + per-request `hold`) | **✅ shipped 0.11.3** (D-JLP-15/16) |
| `Decision.bindingAxis` field | breaking change to Decision shape — use the OTel attr + `lastDecisions()` / `policyDenied` instead | 1.0 candidate |

See `research/bigger-bets/joint-lp-admission/DESIGN.md` (D-JLP-1..16) for
the policy design, `research/bigger-bets/unified/DESIGN.md` for the
composition algebra, and `research/bigger-bets/PLAN.md` for the roadmap.

## Recipes

### Fastest-fail order matters (when correlated with cost)

Sequential evaluates **concurrency → rate → cost** (in-process first,
fastest fail). Commutativity of `combineDecisions` (the proven law)
means the *result* doesn't depend on order — only the short-circuit
cost. For LLM gateways the concurrency axis usually binds first
under load, so this saves the Redis round-trip in the common deny path.

### Federated unified admission

Federation (`federate({ coordinator, ... })`) is shipped as of 0.8.3.
`unifiedAdmission` composes with it for free — pass a federated
`Limiter` as the rate or cost axis. No new surface is needed; the
unified layer doesn't know or care about the federation layer. Tested
in `test/admission/unified.test.ts`.

### Configuring multiple tenants

Concurrency is keyless (one global guard per process). Rate and cost
are per-key — `admit({ key: "tenant:abc", cost: 1500 })` keys the rate
and cost limits independently per tenant. For weighted fair sharing
across tenants, layer `weightedFairShare(...)` upstream of
`unifiedAdmission`.

### Error path on release

```ts
const { decision, release } = await admit.admit({ key, cost });
if (!decision.allowed) return reject(decision);
try {
  await doWork();
  release({ dropped: false });
} catch (err) {
  release({ dropped: true });  // signal overload → AIMD contracts the ceiling
  throw err;
}
```

The `dropped: true` flag propagates to the underlying gradient2 / AIMD
update — the limit contracts on overload signals.

## See also

- [Distributed & provable](Distributed-and-Provable) — `twoTier(leased)`,
  `windowCoupled`, the per-window overshoot bound.
- [Federation](Federation) — the cross-cluster federation primitive
  that composes with `unifiedAdmission`.
- [Operations](Operations) — Prometheus / Grafana / OTel guidance.
- [GALE & TALE](Research) — the bounded-overshoot guarantees
  `unifiedAdmission` plugs into.
- `examples/unified.ts` in the repo — runnable LLM-gateway-style demo.
- `research/bigger-bets/unified/DESIGN.md` — the design lock + decision records.
- `research/bigger-bets/unified/THEORY.md` — joint-vs-marginal empirical
  regret analysis.