-
Notifications
You must be signed in to change notification settings - Fork 0
Pillar 4 Weighted Fair Escrow
weightedFairEscrow(...) splits a shared per-window budget across
tenants in weighted-max-min fair proportion, with idle tenants' surplus
reclaimed by backlogged ones. It's the production graduation of GALE's
Pillar 4 (research/gale/PILLAR4-fairness.md) — work-conserving, weight-
honoring, with a hard Δ = 0 per-window cap inherited from Pillar 1.
import { weightedFairEscrow } from "throttlekit";
const TIERS = { enterprise: 4, pro: 2, free: 1 };
const escrow = weightedFairEscrow({
limit: 30_000, // L tokens / window
windowMs: 60_000,
weightOf: (tenant) => TIERS[tenant.split(":")[0]] ?? 1,
l1: { maxKeys: 1024 }, // cap tenant set
});
const d = await escrow.check("enterprise:alpha", 8000);
if (!d.allowed) return reject({ retryAfterMs: d.retryAfterMs });
// d.limit is THIS tenant's current ceiling; d.remaining the headroom.ThrottleKit ships three fairness primitives that compose orthogonally:
| Primitive | Best for | Distributed? | Work-conserving? | Weight-honoring? |
|---|---|---|---|---|
fairShare |
Single-process equal-share | no | no (FCFS surplus) | no (equal weights) |
weightedFairShare |
Single-process weight-skewed | no | no (FCFS surplus) | yes (approximate) |
weightedFairEscrow |
Multi-process, work-conserving, weight-honoring | yes | yes (water-filled) | yes (exact) |
The difference matters under multi-tenant overload. With
weightedFairShare, a low-weight flood gets unused budget on a
first-come basis — the high-weight tenant's idle share is consumed by
whoever happens to ask first. WFE redistributes that surplus by weight:
under skew, an idle high-priority tenant's quota flows to backlogged
tenants in proportion to their weights, not to the loudest one.
Each tenant has a dynamic guaranteed share gᵢ = ⌊wᵢ·L/W⌋
(recomputed from the current active set on every check; W = total
weight of active tenants). Each .check() is hierarchical:
-
Within guarantee (
used + cost ≤ gᵢ): always allowed, subject to the global capΣ used + cost ≤ L. This is the T2 sharing-incentive promise — every backlogged tenant is admissible up togᵢbefore any other tenant can borrow beyond theirgⱼ. -
Borrow phase (
used + cost > gᵢ): the asker tries to grow into surplus other tenants haven't claimed. The pessimistic surplus isborrowAvailable = max(0, (L − Σ used) − Σⱼ≠ᵢ max(0, gⱼ − usedⱼ))— the unallocated budget minus what would still need to be served to other backlogged tenants' guarantees. The per-call borrow is capped at the call's own
cost, matching Shreedhar-Varghese DRR semantics (q = cost). Ifcost ≤ borrowAvailable, the request is admitted; otherwise denied.
| Theorem | Statement |
|---|---|
| T1 safety |
Σ used ≤ L always (global cap; never over-admits even under adversarial scheduling). |
| T2 sharing-incentive | Every active tenant is admissible up to gᵢ before any other tenant can borrow past gⱼ. |
| T3 work-conservation | At end-of-window with continuously-backlogged tenants, Σ used = L (full utilisation). |
| T4 bounded unfairness | For continuously-backlogged tenants, ` |
All four hold under fast-check at numRuns: 200 (test/twotier/ weighted-fair-escrow-properties.test.ts). The pure batch algebra
(weightedMaxMin) is independently machine-checked at 20 000 random
trials (test/gale/fair-escrow.test.ts).
- Not strategy-proof. Per FairRide (Pu et al., NSDI'16): no shared-cache primitive can be simultaneously sharing-incentive, work-conserving, AND strategy-proof. WFE takes the first two and concedes the third. A tenant can over-declare demand to claim surplus; window-coupling bounds the gain to one window.
-
Flat tenant set in the base limiter. A single
weightedFairEscrowhas one weight per tenant. The recursive WFE composition — an internal node weighted by the sum of its children — is shipped as Federated Weighted Fair Escrow for the region→tenant hierarchy (the GPS-decomposition special case). Arbitrary nested weights (teams-within-orgs) reuse the same two-level machinery and are the documented next step (DR-P4-8).
When the budget needs to be shared across processes (a fleet of
gateways, a horizontally-scaled service), pass any distributed
Store and a per-process quantum:
import Redis from "ioredis";
import { fromIoredis } from "throttlekit/redis";
const escrow = weightedFairEscrow({
limit: 200_000, windowMs: 60_000,
weightOf: (t) => TIERS[t.split(":")[0]] ?? 1,
l2: fromIoredis(new Redis(process.env.REDIS_URL!)),
quantum: 500, // per-process lease size
l2Key: "tk:wfe:gateway-prod", // same key on every process
});
// Async-only when l2 is configured (the lease step is async):
const d = await escrow.check("enterprise:alpha", 1500);Internally each process leases quantum credits at a time from a
shared fixedWindow({ limit: L, windowMs }) counter in the store — the
same atomic Lua that ships with distributedTokenBudget, so the wire
surface adds zero new scripts (DR-P4-5). The per-tenant fairness
arithmetic stays in-process; the shared store's atomicity is what
bounds the global Σ used ≤ L.
Tuning quantum. Smaller quantum = tighter cross-process fairness,
more RTTs (O(L/quantum) Redis calls per window). Larger quantum =
fewer RTTs, looser bound (the cross-process T4 picks up a quantum-
scaled DRR slack: Σₚ Q⁽ᵖ⁾ · (1/wᵢ + 1/wⱼ)). For an LLM gateway with
L = 200 000 TPM and 4 process replicas, quantum = 500 gives ~400
RTTs per minute per process and a sub-millisecond cross-process T4 in
the common case.
A single weightedFairEscrow splits ONE budget L across tenants in one
process. Federated WFE (federatedWeightedFairEscrow +
regionFairPool, shipped — TK-1404) composes that primitive
across regions so that a tenant's global total — summed over
every region it is active in — is the weighted-max-min allocation a
single, flat, global WFE over L would produce. The regions are
plumbing, not a fairness boundary: a tenant is neither helped nor hurt by
which region it lives in.
Hierarchical max-min fairness is, in general, not flat max-min fairness. Running WFE per region over a budget shared by a plain first-come-first-served counter gives in-region weighted fairness but cross-region FCFS pooling — a heavily-weighted region that arrives late is starved by a lightly-weighted one (HLS isolation, Saeed et al. 2021). Flat global fairness emerges only under the Parekh–Gallager GPS decomposition: an internal node weighted by the sum of its children's weights reproduces the flat allocation. Federated WFE realises that as two composed WFEs:
-
Cross-region WFE —
regionFairPool, a weighted-fair reservation layer whose "tenants" are regions. Regionr's weight is its dynamic active aggregate tenant weightW_r = Σ w_{t,r}; the pool reserves regionrat least⌊W_r/ΣW · L⌋(a busy region cannot steal it) and lets it borrow idle regions' surplus. A plain counter cannot reserve — exactly why cross-region weights need this layer. -
In-region WFE —
federatedWeightedFairEscrow, the same per-tenantgᵢ = ⌊wᵢ·L_r/W_r⌋arithmetic from above, splitting each region's pool-granted budgetL_ramong its tenants by weight.
import { regionFairPool, federatedWeightedFairEscrow } from "throttlekit/twotier";
// ONE shared cross-region pool per global budget L.
const pool = regionFairPool({ limit: 30_000, windowMs: 60_000 });
const TIERS = { enterprise: 4, pro: 2, free: 1 };
const weightOf = (t: string) => TIERS[t.split(":")[0]] ?? 1;
// One limiter per region; they ALL share the same pool.
const usEast = federatedWeightedFairEscrow({ region: "us-east", pool, weightOf });
const euWest = federatedWeightedFairEscrow({ region: "eu-west", pool, weightOf });
const d = usEast.checkSync("enterprise:alpha", 1_500); // sync — the in-process pool is synchronous
// `.check(...)` is the Promise form; `.reset(tenant?)`, `.stats()` mirror the base limiter.Each region's weightOf returns the tenant's weight as seen in that
region (see the split rule below). regionFairPool owns limit,
windowMs, and the clock; every region drawing from one global budget
must share one pool instance.
A tenant active in one region returns its full global weight w_t. A
tenant active in several regions must have its weight split
demand-proportionally:
w_{t,r} = w_t · d_{t,r} / d_t (so Σ_r w_{t,r} = w_t)
Returning the full w_t in every region double-counts the tenant —
it claims a full-weight floor in each region and is over-served ≈k× (k =
the number of regions it spans). The split keeps its global weight at
exactly w_t, so its global share matches the flat ideal.
| Theorem | Statement |
|---|---|
| T-FED-1 safety | The pool grants Σ_r L_r ≤ L and in-region WFE serves Σ used ≤ L_r, so Σ admitted ≤ L globally, independent of region count (Δ = 0, inherited from Pillar 1). |
| T-FED-2 fluid exactness | Under the three conditions (region weight = ΣW_r, in-region WFE, demand-proportional split), each tenant's global total equals the flat global weighted-max-min ideal exactly in the fluid limit. |
| T-FED-3 bounded discrete error | With discrete granting, |a_t − a*_t| ≤ span(t)·(2·q_R+1) — a two-level DRR residual (q_R = region-level lease quantum). Verified ≈0.02% relative. |
Proof + the machine-checked gate (0 deviation on structured + random
worlds, never over-admits):
research/bigger-bets/federation/federated-wfe-gate.ts;
theorem write-up in
research/gale/PILLAR4-fairness.md
§"Federated composition".
- Exact in the all-backlogged limit (the common overload case).
- Mixed saturation (a region with a demand-bottlenecked co-tenant) inherits in-region WFE's T3 reserve gap: the shipped streaming code reclaims only from truly-absent regions (which never call the pool), not from a present-but-paused one — so it strands a saturated participant's reserve and deviates from the clairvoyant fluid oracle by that fraction. This is exactly what a flat streaming WFE does; no streaming limiter matches the clairvoyant oracle without a demand oracle.
-
Topology.
regionFairPoolis the in-process pool — correct and complete for a single arbiter process all regions consult (e.g. a central rate-limit service), and the substrate the tests verify against the flat oracle. To distribute the pool across separate region processes,RedisRegionFairPoolkeeps the same per-region accounting in a shared store (the weighted-max-min ceiling math ported to atomic Lua over a Redis hash, the weighted analog ofRegionalEscrow's Lua) — verified grant-for-grant against the in-process oracle on live Redis — so federated fair-escrow stays correct when each region runs as its own process (DR-FWFE-1). Behind a server this is also reachable over the existingCheckRPC (federatedFairEscrow:) with no client change — see Scaling & the Fleet.
Runnable demo:
examples/federated-weighted-fair-escrow.ts
(npx tsx examples/federated-weighted-fair-escrow.ts).
| Failure | Behavior |
|---|---|
L2 Store unreachable (Redis down, etc.) |
check() rejects with StoreUnavailableError. No silent fallback — the leased pool is the safety boundary. |
| Process restart mid-window | L1 state lost; tenants restart at gᵢ; pool re-leases from L2. Δ ≤ quantum · N_tenants in the cross-restart transition. |
| Untrusted tenant flood (key blowup) | L1 tenants map grows unbounded. Set l1.maxKeys to a value that comfortably exceeds the working set. |
| Tenant weight changes mid-window | New weight takes effect from the next check; the current window honors the weight at first check. |
| Tenant stops mid-window | Its guaranteed reserve stays pinned until the window rolls — pessimistic-correct (we cannot distinguish "stopped" from "about to ask again"). End-of-window T3 still holds. |
| Tenant over-declares demand (T5) | Window-coupling bounds the gain to one window; inflated credits expire at window roll. Not a bug — a stated impossibility (FairRide). |
See docs/FAILURE-MODES.md for the complete outage matrix.
WFE returns a Decision so it composes with any other axis via the
0.9.0 algebra:
import { combineDecisions, rateLimit, gcra } from "throttlekit";
const rate = rateLimit({ strategy: gcra({ limit: 500, periodMs: 60_000 }) });
async function check(tenant: string, key: string, tokens: number) {
const [r, w] = await Promise.all([rate.check(key), escrow.check(tenant, tokens)]);
return combineDecisions(r, w);
}At 0.9.1, unifiedAdmission's admit options accept { cost?, tenant? }
additively — pass through to a WFE-backed cost axis when present. The
existing rate / concurrency axes ignore tenant. See docs/DESIGN-NOTES.md
or research/bigger-bets/pillar4-wfe/DESIGN.md §7.1.
- 0.9.1 ships single-pool WFE (this page).
-
Shipped — federated WFE:
federatedWeightedFairEscrow+regionFairPool, the two-level region→tenant composition with the(region, tenant)cross-fairness theorem (DR-P4-7, now shipped). See Federated Weighted Fair Escrow above. The in-process pool ships now; the store-backed multi-process poolRedisRegionFairPool(DR-FWFE-1) is also shipped for separate-region-process topologies. - next — arbitrary hierarchical weights (teams-within-orgs) on the same two-level machinery (DR-P4-8).
-
0.10.x? — hot-path-fused leased WFE inside
twoTier(DR-P4-9), conditional on a benchmark demonstrating ≥ 2× speedup over the Option L2-B path. The currentl2: Storeshape is fast enough for the LLM-gateway target.
- Shreedhar & Varghese, Efficient Fair Queuing using Deficit Round Robin, SIGCOMM '95 — the DRR quantum and its fairness bound (T4).
- Demers, Keshav & Shenker, Analysis and Simulation of a Fair Queueing Algorithm (WFQ), SIGCOMM '89; Parekh & Gallager (GPS), IEEE/ACM ToN 1993 — the weighted max-min ideal that WFE approximates.
- Stoica, Shenker & Zhang, Core-Stateless Fair Queueing, SIGCOMM '98 — fairness with O(1) shared state. WFE is its escrow-leasing analog (one shared integer, no per-tenant queues).
- Pu et al., FairRide, NSDI '16 — the SIP impossibility theorem (T5, conceded vertex).
- Bertsekas & Gallager, Data Networks §6.5.2 — canonical max-min fairness definition.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE