Pillar 4 Weighted Fair Escrow

Pillar 4 — Weighted Fair Escrow (WFE)

weightedFairEscrow(...) splits a shared per-window budget across tenants in weighted-max-min fair proportion, with idle tenants' surplus reclaimed by backlogged ones. It's the production graduation of GALE's Pillar 4 (research/gale/PILLAR4-fairness.md) — work-conserving, weight- honoring, with a hard Δ = 0 per-window cap inherited from Pillar 1.

import { weightedFairEscrow } from "throttlekit";

const TIERS = { enterprise: 4, pro: 2, free: 1 };

const escrow = weightedFairEscrow({
  limit:    30_000,                                       // L tokens / window
  windowMs: 60_000,
  weightOf: (tenant) => TIERS[tenant.split(":")[0]] ?? 1,
  l1:       { maxKeys: 1024 },                            // cap tenant set
});

const d = await escrow.check("enterprise:alpha", 8000);
if (!d.allowed) return reject({ retryAfterMs: d.retryAfterMs });
// d.limit is THIS tenant's current ceiling; d.remaining the headroom.

When to use WFE vs the other fairness primitives

ThrottleKit ships three fairness primitives that compose orthogonally:

Primitive	Best for	Distributed?	Work-conserving?	Weight-honoring?
`fairShare`	Single-process equal-share	no	no (FCFS surplus)	no (equal weights)
`weightedFairShare`	Single-process weight-skewed	no	no (FCFS surplus)	yes (approximate)
`weightedFairEscrow`	Multi-process, work-conserving, weight-honoring	yes	yes (water-filled)	yes (exact)

The difference matters under multi-tenant overload. With weightedFairShare, a low-weight flood gets unused budget on a first-come basis — the high-weight tenant's idle share is consumed by whoever happens to ask first. WFE redistributes that surplus by weight: under skew, an idle high-priority tenant's quota flows to backlogged tenants in proportion to their weights, not to the loudest one.

The algorithm (the streaming realisation)

Each tenant has a dynamic guaranteed share gᵢ = ⌊wᵢ·L/W⌋ (recomputed from the current active set on every check; W = total weight of active tenants). Each .check() is hierarchical:

Within guarantee (used + cost ≤ gᵢ): always allowed, subject to the global cap Σ used + cost ≤ L. This is the T2 sharing-incentive promise — every backlogged tenant is admissible up to gᵢ before any other tenant can borrow beyond their gⱼ.
Borrow phase (used + cost > gᵢ): the asker tries to grow into surplus other tenants haven't claimed. The pessimistic surplus is
```
borrowAvailable = max(0, (L − Σ used) − Σⱼ≠ᵢ max(0, gⱼ − usedⱼ))
```
— the unallocated budget minus what would still need to be served to other backlogged tenants' guarantees. The per-call borrow is capped at the call's own cost, matching Shreedhar-Varghese DRR semantics (q = cost). If cost ≤ borrowAvailable, the request is admitted; otherwise denied.

What WFE guarantees

Theorem	Statement
T1 safety	`Σ used ≤ L` always (global cap; never over-admits even under adversarial scheduling).
T2 sharing-incentive	Every active tenant is admissible up to `gᵢ` before any other tenant can borrow past `gⱼ`.
T3 work-conservation	At end-of-window with continuously-backlogged tenants, `Σ used = L` (full utilisation).
T4 bounded unfairness	For continuously-backlogged tenants, `

All four hold under fast-check at numRuns: 200 (test/twotier/ weighted-fair-escrow-properties.test.ts). The pure batch algebra (weightedMaxMin) is independently machine-checked at 20 000 random trials (test/gale/fair-escrow.test.ts).

What WFE does NOT guarantee

Not strategy-proof. Per FairRide (Pu et al., NSDI'16): no shared-cache primitive can be simultaneously sharing-incentive, work-conserving, AND strategy-proof. WFE takes the first two and concedes the third. A tenant can over-declare demand to claim surplus; window-coupling bounds the gain to one window.
Flat tenant set in the base limiter. A single weightedFairEscrow has one weight per tenant. The recursive WFE composition — an internal node weighted by the sum of its children — is shipped as Federated Weighted Fair Escrow for the region→tenant hierarchy (the GPS-decomposition special case). Arbitrary nested weights (teams-within-orgs) reuse the same two-level machinery and are the documented next step (DR-P4-8).

Multi-process backing (the L2 path)

When the budget needs to be shared across processes (a fleet of gateways, a horizontally-scaled service), pass any distributed Store and a per-process quantum:

import Redis from "ioredis";
import { fromIoredis } from "throttlekit/redis";

const escrow = weightedFairEscrow({
  limit:    200_000, windowMs: 60_000,
  weightOf: (t) => TIERS[t.split(":")[0]] ?? 1,
  l2:       fromIoredis(new Redis(process.env.REDIS_URL!)),
  quantum:  500,                       // per-process lease size
  l2Key:    "tk:wfe:gateway-prod",     // same key on every process
});

// Async-only when l2 is configured (the lease step is async):
const d = await escrow.check("enterprise:alpha", 1500);

Internally each process leases quantum credits at a time from a shared fixedWindow({ limit: L, windowMs }) counter in the store — the same atomic Lua that ships with distributedTokenBudget, so the wire surface adds zero new scripts (DR-P4-5). The per-tenant fairness arithmetic stays in-process; the shared store's atomicity is what bounds the global Σ used ≤ L.

Tuning quantum. Smaller quantum = tighter cross-process fairness, more RTTs (O(L/quantum) Redis calls per window). Larger quantum = fewer RTTs, looser bound (the cross-process T4 picks up a quantum- scaled DRR slack: Σₚ Q⁽ᵖ⁾ · (1/wᵢ + 1/wⱼ)). For an LLM gateway with L = 200 000 TPM and 4 process replicas, quantum = 500 gives ~400 RTTs per minute per process and a sub-millisecond cross-process T4 in the common case.

Federated Weighted Fair Escrow (cross-region)

A single weightedFairEscrow splits ONE budget L across tenants in one process. Federated WFE (federatedWeightedFairEscrow + regionFairPool, shipped — TK-1404) composes that primitive across regions so that a tenant's global total — summed over every region it is active in — is the weighted-max-min allocation a single, flat, global WFE over L would produce. The regions are plumbing, not a fairness boundary: a tenant is neither helped nor hurt by which region it lives in.

Why it isn't just "run WFE per region"

Hierarchical max-min fairness is, in general, not flat max-min fairness. Running WFE per region over a budget shared by a plain first-come-first-served counter gives in-region weighted fairness but cross-region FCFS pooling — a heavily-weighted region that arrives late is starved by a lightly-weighted one (HLS isolation, Saeed et al. 2021). Flat global fairness emerges only under the Parekh–Gallager GPS decomposition: an internal node weighted by the sum of its children's weights reproduces the flat allocation. Federated WFE realises that as two composed WFEs:

Cross-region WFE — regionFairPool, a weighted-fair reservation layer whose "tenants" are regions. Region r's weight is its dynamic active aggregate tenant weight W_r = Σ w_{t,r}; the pool reserves region r at least ⌊W_r/ΣW · L⌋ (a busy region cannot steal it) and lets it borrow idle regions' surplus. A plain counter cannot reserve — exactly why cross-region weights need this layer.
In-region WFE — federatedWeightedFairEscrow, the same per-tenant gᵢ = ⌊wᵢ·L_r/W_r⌋ arithmetic from above, splitting each region's pool-granted budget L_r among its tenants by weight.

API

import { regionFairPool, federatedWeightedFairEscrow } from "throttlekit/twotier";

// ONE shared cross-region pool per global budget L.
const pool = regionFairPool({ limit: 30_000, windowMs: 60_000 });

const TIERS = { enterprise: 4, pro: 2, free: 1 };
const weightOf = (t: string) => TIERS[t.split(":")[0]] ?? 1;

// One limiter per region; they ALL share the same pool.
const usEast = federatedWeightedFairEscrow({ region: "us-east", pool, weightOf });
const euWest = federatedWeightedFairEscrow({ region: "eu-west", pool, weightOf });

const d = usEast.checkSync("enterprise:alpha", 1_500); // sync — the in-process pool is synchronous
// `.check(...)` is the Promise form; `.reset(tenant?)`, `.stats()` mirror the base limiter.

Each region's weightOf returns the tenant's weight as seen in that region (see the split rule below). regionFairPool owns limit, windowMs, and the clock; every region drawing from one global budget must share one pool instance.

The demand-proportional weight split (region-spanning tenants)

A tenant active in one region returns its full global weight w_t. A tenant active in several regions must have its weight split demand-proportionally:

w_{t,r} = w_t · d_{t,r} / d_t        (so Σ_r w_{t,r} = w_t)

Returning the full w_t in every region double-counts the tenant — it claims a full-weight floor in each region and is over-served ≈k× (k = the number of regions it spans). The split keeps its global weight at exactly w_t, so its global share matches the flat ideal.

What it guarantees

Theorem	Statement
T-FED-1 safety	The pool grants `Σ_r L_r ≤ L` and in-region WFE serves `Σ used ≤ L_r`, so `Σ admitted ≤ L` globally, independent of region count (`Δ = 0`, inherited from Pillar 1).
T-FED-2 fluid exactness	Under the three conditions (region weight = `ΣW_r`, in-region WFE, demand-proportional split), each tenant's global total equals the flat global weighted-max-min ideal exactly in the fluid limit.
T-FED-3 bounded discrete error	With discrete granting, `\|a_t − a*_t\| ≤ span(t)·(2·q_R+1)` — a two-level DRR residual (`q_R` = region-level lease quantum). Verified ≈0.02% relative.

Proof + the machine-checked gate (0 deviation on structured + random worlds, never over-admits): research/bigger-bets/federation/federated-wfe-gate.ts; theorem write-up in research/gale/PILLAR4-fairness.md §"Federated composition".

Honest scope

Exact in the all-backlogged limit (the common overload case).
Mixed saturation (a region with a demand-bottlenecked co-tenant) inherits in-region WFE's T3 reserve gap: the shipped streaming code reclaims only from truly-absent regions (which never call the pool), not from a present-but-paused one — so it strands a saturated participant's reserve and deviates from the clairvoyant fluid oracle by that fraction. This is exactly what a flat streaming WFE does; no streaming limiter matches the clairvoyant oracle without a demand oracle.
Topology. regionFairPool is the in-process pool — correct and complete for a single arbiter process all regions consult (e.g. a central rate-limit service), and the substrate the tests verify against the flat oracle. To distribute the pool across separate region processes, RedisRegionFairPool keeps the same per-region accounting in a shared store (the weighted-max-min ceiling math ported to atomic Lua over a Redis hash, the weighted analog of RegionalEscrow's Lua) — verified grant-for-grant against the in-process oracle on live Redis — so federated fair-escrow stays correct when each region runs as its own process (DR-FWFE-1). Behind a server this is also reachable over the existing Check RPC (federatedFairEscrow:) with no client change — see Scaling & the Fleet.

Runnable demo: examples/federated-weighted-fair-escrow.ts (npx tsx examples/federated-weighted-fair-escrow.ts).

Failure modes

Failure	Behavior
L2 `Store` unreachable (Redis down, etc.)	`check()` rejects with `StoreUnavailableError`. No silent fallback — the leased pool is the safety boundary.
Process restart mid-window	L1 state lost; tenants restart at `gᵢ`; pool re-leases from L2. `Δ ≤ quantum · N_tenants` in the cross-restart transition.
Untrusted tenant flood (key blowup)	L1 `tenants` map grows unbounded. Set `l1.maxKeys` to a value that comfortably exceeds the working set.
Tenant weight changes mid-window	New weight takes effect from the next check; the current window honors the weight at first check.
Tenant stops mid-window	Its guaranteed reserve stays pinned until the window rolls — pessimistic-correct (we cannot distinguish "stopped" from "about to ask again"). End-of-window T3 still holds.
Tenant over-declares demand (T5)	Window-coupling bounds the gain to one window; inflated credits expire at window roll. Not a bug — a stated impossibility (FairRide).

See docs/FAILURE-MODES.md for the complete outage matrix.

Composition

With `combineDecisions` (manual axis composition)

WFE returns a Decision so it composes with any other axis via the 0.9.0 algebra:

import { combineDecisions, rateLimit, gcra } from "throttlekit";

const rate = rateLimit({ strategy: gcra({ limit: 500, periodMs: 60_000 }) });

async function check(tenant: string, key: string, tokens: number) {
  const [r, w] = await Promise.all([rate.check(key), escrow.check(tenant, tokens)]);
  return combineDecisions(r, w);
}

With `unifiedAdmission`

At 0.9.1, unifiedAdmission's admit options accept { cost?, tenant? } additively — pass through to a WFE-backed cost axis when present. The existing rate / concurrency axes ignore tenant. See docs/DESIGN-NOTES.md or research/bigger-bets/pillar4-wfe/DESIGN.md §7.1.

Roadmap notes

0.9.1 ships single-pool WFE (this page).
Shipped — federated WFE: federatedWeightedFairEscrow + regionFairPool, the two-level region→tenant composition with the (region, tenant) cross-fairness theorem (DR-P4-7, now shipped). See Federated Weighted Fair Escrow above. The in-process pool ships now; the store-backed multi-process pool RedisRegionFairPool (DR-FWFE-1) is also shipped for separate-region-process topologies.
next — arbitrary hierarchical weights (teams-within-orgs) on the same two-level machinery (DR-P4-8).
0.10.x? — hot-path-fused leased WFE inside twoTier (DR-P4-9), conditional on a benchmark demonstrating ≥ 2× speedup over the Option L2-B path. The current l2: Store shape is fast enough for the LLM-gateway target.

Anchors

Shreedhar & Varghese, Efficient Fair Queuing using Deficit Round Robin, SIGCOMM '95 — the DRR quantum and its fairness bound (T4).
Demers, Keshav & Shenker, Analysis and Simulation of a Fair Queueing Algorithm (WFQ), SIGCOMM '89; Parekh & Gallager (GPS), IEEE/ACM ToN 1993 — the weighted max-min ideal that WFE approximates.
Stoica, Shenker & Zhang, Core-Stateless Fair Queueing, SIGCOMM '98 — fairness with O(1) shared state. WFE is its escrow-leasing analog (one shared integer, no per-tenant queues).
Pu et al., FairRide, NSDI '16 — the SIP impossibility theorem (T5, conceded vertex).
Bertsekas & Gallager, Data Networks §6.5.2 — canonical max-min fairness definition.

ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)

ThrottleKit Wiki

Pillar 4 Weighted Fair Escrow

Pillar 4 — Weighted Fair Escrow (WFE)

When to use WFE vs the other fairness primitives

The algorithm (the streaming realisation)

What WFE guarantees

What WFE does NOT guarantee

Multi-process backing (the L2 path)

Federated Weighted Fair Escrow (cross-region)

Why it isn't just "run WFE per region"

API

The demand-proportional weight split (region-spanning tenants)

What it guarantees

Honest scope

Failure modes

Composition

With combineDecisions (manual axis composition)

With unifiedAdmission

Roadmap notes

Anchors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

With `combineDecisions` (manual axis composition)

With `unifiedAdmission`