Skip to content

Pillar 4 Weighted Fair Escrow

Ameya Borkar edited this page Jun 10, 2026 · 3 revisions

Pillar 4 — Weighted Fair Escrow (WFE)

weightedFairEscrow(...) splits a shared per-window budget across tenants in weighted-max-min fair proportion, with idle tenants' surplus reclaimed by backlogged ones. It's the production graduation of GALE's Pillar 4 (research/gale/PILLAR4-fairness.md) — work-conserving, weight- honoring, with a hard Δ = 0 per-window cap inherited from Pillar 1.

import { weightedFairEscrow } from "throttlekit";

const TIERS = { enterprise: 4, pro: 2, free: 1 };

const escrow = weightedFairEscrow({
  limit:    30_000,                                       // L tokens / window
  windowMs: 60_000,
  weightOf: (tenant) => TIERS[tenant.split(":")[0]] ?? 1,
  l1:       { maxKeys: 1024 },                            // cap tenant set
});

const d = await escrow.check("enterprise:alpha", 8000);
if (!d.allowed) return reject({ retryAfterMs: d.retryAfterMs });
// d.limit is THIS tenant's current ceiling; d.remaining the headroom.

When to use WFE vs the other fairness primitives

ThrottleKit ships three fairness primitives that compose orthogonally:

Primitive Best for Distributed? Work-conserving? Weight-honoring?
fairShare Single-process equal-share no no (FCFS surplus) no (equal weights)
weightedFairShare Single-process weight-skewed no no (FCFS surplus) yes (approximate)
weightedFairEscrow Multi-process, work-conserving, weight-honoring yes yes (water-filled) yes (exact)

The difference matters under multi-tenant overload. With weightedFairShare, a low-weight flood gets unused budget on a first-come basis — the high-weight tenant's idle share is consumed by whoever happens to ask first. WFE redistributes that surplus by weight: under skew, an idle high-priority tenant's quota flows to backlogged tenants in proportion to their weights, not to the loudest one.

The algorithm (the streaming realisation)

Each tenant has a dynamic guaranteed share gᵢ = ⌊wᵢ·L/W⌋ (recomputed from the current active set on every check; W = total weight of active tenants). Each .check() is hierarchical:

  1. Within guarantee (used + cost ≤ gᵢ): always allowed, subject to the global cap Σ used + cost ≤ L. This is the T2 sharing-incentive promise — every backlogged tenant is admissible up to gᵢ before any other tenant can borrow beyond their gⱼ.

  2. Borrow phase (used + cost > gᵢ): the asker tries to grow into surplus other tenants haven't claimed. The pessimistic surplus is

    borrowAvailable = max(0, (L − Σ used) − Σⱼ≠ᵢ max(0, gⱼ − usedⱼ))
    

    — the unallocated budget minus what would still need to be served to other backlogged tenants' guarantees. The per-call borrow is capped at the call's own cost, matching Shreedhar-Varghese DRR semantics (q = cost). If cost ≤ borrowAvailable, the request is admitted; otherwise denied.

What WFE guarantees

Theorem Statement
T1 safety Σ used ≤ L always (global cap; never over-admits even under adversarial scheduling).
T2 sharing-incentive Every active tenant is admissible up to gᵢ before any other tenant can borrow past gⱼ.
T3 work-conservation At end-of-window with continuously-backlogged tenants, Σ used = L (full utilisation).
T4 bounded unfairness For continuously-backlogged tenants, `

All four hold under fast-check at numRuns: 200 (test/twotier/ weighted-fair-escrow-properties.test.ts). The pure batch algebra (weightedMaxMin) is independently machine-checked at 20 000 random trials (test/gale/fair-escrow.test.ts).

What WFE does NOT guarantee

  • Not strategy-proof. Per FairRide (Pu et al., NSDI'16): no shared-cache primitive can be simultaneously sharing-incentive, work-conserving, AND strategy-proof. WFE takes the first two and concedes the third. A tenant can over-declare demand to claim surplus; window-coupling bounds the gain to one window.
  • Flat tenant set in the base limiter. A single weightedFairEscrow has one weight per tenant. The recursive WFE composition — an internal node weighted by the sum of its children — is shipped as Federated Weighted Fair Escrow for the region→tenant hierarchy (the GPS-decomposition special case). Arbitrary nested weights (teams-within-orgs) reuse the same two-level machinery and are the documented next step (DR-P4-8).

Multi-process backing (the L2 path)

When the budget needs to be shared across processes (a fleet of gateways, a horizontally-scaled service), pass any distributed Store and a per-process quantum:

import Redis from "ioredis";
import { fromIoredis } from "throttlekit/redis";

const escrow = weightedFairEscrow({
  limit:    200_000, windowMs: 60_000,
  weightOf: (t) => TIERS[t.split(":")[0]] ?? 1,
  l2:       fromIoredis(new Redis(process.env.REDIS_URL!)),
  quantum:  500,                       // per-process lease size
  l2Key:    "tk:wfe:gateway-prod",     // same key on every process
});

// Async-only when l2 is configured (the lease step is async):
const d = await escrow.check("enterprise:alpha", 1500);

Internally each process leases quantum credits at a time from a shared fixedWindow({ limit: L, windowMs }) counter in the store — the same atomic Lua that ships with distributedTokenBudget, so the wire surface adds zero new scripts (DR-P4-5). The per-tenant fairness arithmetic stays in-process; the shared store's atomicity is what bounds the global Σ used ≤ L.

Tuning quantum. Smaller quantum = tighter cross-process fairness, more RTTs (O(L/quantum) Redis calls per window). Larger quantum = fewer RTTs, looser bound (the cross-process T4 picks up a quantum- scaled DRR slack: Σₚ Q⁽ᵖ⁾ · (1/wᵢ + 1/wⱼ)). For an LLM gateway with L = 200 000 TPM and 4 process replicas, quantum = 500 gives ~400 RTTs per minute per process and a sub-millisecond cross-process T4 in the common case.

Federated Weighted Fair Escrow (cross-region)

A single weightedFairEscrow splits ONE budget L across tenants in one process. Federated WFE (federatedWeightedFairEscrow + regionFairPool, shipped — TK-1404) composes that primitive across regions so that a tenant's global total — summed over every region it is active in — is the weighted-max-min allocation a single, flat, global WFE over L would produce. The regions are plumbing, not a fairness boundary: a tenant is neither helped nor hurt by which region it lives in.

Why it isn't just "run WFE per region"

Hierarchical max-min fairness is, in general, not flat max-min fairness. Running WFE per region over a budget shared by a plain first-come-first-served counter gives in-region weighted fairness but cross-region FCFS pooling — a heavily-weighted region that arrives late is starved by a lightly-weighted one (HLS isolation, Saeed et al. 2021). Flat global fairness emerges only under the Parekh–Gallager GPS decomposition: an internal node weighted by the sum of its children's weights reproduces the flat allocation. Federated WFE realises that as two composed WFEs:

  1. Cross-region WFEregionFairPool, a weighted-fair reservation layer whose "tenants" are regions. Region r's weight is its dynamic active aggregate tenant weight W_r = Σ w_{t,r}; the pool reserves region r at least ⌊W_r/ΣW · L⌋ (a busy region cannot steal it) and lets it borrow idle regions' surplus. A plain counter cannot reserve — exactly why cross-region weights need this layer.
  2. In-region WFEfederatedWeightedFairEscrow, the same per-tenant gᵢ = ⌊wᵢ·L_r/W_r⌋ arithmetic from above, splitting each region's pool-granted budget L_r among its tenants by weight.

API

import { regionFairPool, federatedWeightedFairEscrow } from "throttlekit/twotier";

// ONE shared cross-region pool per global budget L.
const pool = regionFairPool({ limit: 30_000, windowMs: 60_000 });

const TIERS = { enterprise: 4, pro: 2, free: 1 };
const weightOf = (t: string) => TIERS[t.split(":")[0]] ?? 1;

// One limiter per region; they ALL share the same pool.
const usEast = federatedWeightedFairEscrow({ region: "us-east", pool, weightOf });
const euWest = federatedWeightedFairEscrow({ region: "eu-west", pool, weightOf });

const d = usEast.checkSync("enterprise:alpha", 1_500); // sync — the in-process pool is synchronous
// `.check(...)` is the Promise form; `.reset(tenant?)`, `.stats()` mirror the base limiter.

Each region's weightOf returns the tenant's weight as seen in that region (see the split rule below). regionFairPool owns limit, windowMs, and the clock; every region drawing from one global budget must share one pool instance.

The demand-proportional weight split (region-spanning tenants)

A tenant active in one region returns its full global weight w_t. A tenant active in several regions must have its weight split demand-proportionally:

w_{t,r} = w_t · d_{t,r} / d_t        (so Σ_r w_{t,r} = w_t)

Returning the full w_t in every region double-counts the tenant — it claims a full-weight floor in each region and is over-served ≈k× (k = the number of regions it spans). The split keeps its global weight at exactly w_t, so its global share matches the flat ideal.

What it guarantees

Theorem Statement
T-FED-1 safety The pool grants Σ_r L_r ≤ L and in-region WFE serves Σ used ≤ L_r, so Σ admitted ≤ L globally, independent of region count (Δ = 0, inherited from Pillar 1).
T-FED-2 fluid exactness Under the three conditions (region weight = ΣW_r, in-region WFE, demand-proportional split), each tenant's global total equals the flat global weighted-max-min ideal exactly in the fluid limit.
T-FED-3 bounded discrete error With discrete granting, |a_t − a*_t| ≤ span(t)·(2·q_R+1) — a two-level DRR residual (q_R = region-level lease quantum). Verified ≈0.02% relative.

Proof + the machine-checked gate (0 deviation on structured + random worlds, never over-admits): research/bigger-bets/federation/federated-wfe-gate.ts; theorem write-up in research/gale/PILLAR4-fairness.md §"Federated composition".

Honest scope

  • Exact in the all-backlogged limit (the common overload case).
  • Mixed saturation (a region with a demand-bottlenecked co-tenant) inherits in-region WFE's T3 reserve gap: the shipped streaming code reclaims only from truly-absent regions (which never call the pool), not from a present-but-paused one — so it strands a saturated participant's reserve and deviates from the clairvoyant fluid oracle by that fraction. This is exactly what a flat streaming WFE does; no streaming limiter matches the clairvoyant oracle without a demand oracle.
  • Topology. regionFairPool is the in-process pool — correct and complete for a single arbiter process all regions consult (e.g. a central rate-limit service), and the substrate the tests verify against the flat oracle. To distribute the pool across separate region processes, RedisRegionFairPool keeps the same per-region accounting in a shared store (the weighted-max-min ceiling math ported to atomic Lua over a Redis hash, the weighted analog of RegionalEscrow's Lua) — verified grant-for-grant against the in-process oracle on live Redis — so federated fair-escrow stays correct when each region runs as its own process (DR-FWFE-1). Behind a server this is also reachable over the existing Check RPC (federatedFairEscrow:) with no client change — see Scaling & the Fleet.

Runnable demo: examples/federated-weighted-fair-escrow.ts (npx tsx examples/federated-weighted-fair-escrow.ts).

Failure modes

Failure Behavior
L2 Store unreachable (Redis down, etc.) check() rejects with StoreUnavailableError. No silent fallback — the leased pool is the safety boundary.
Process restart mid-window L1 state lost; tenants restart at gᵢ; pool re-leases from L2. Δ ≤ quantum · N_tenants in the cross-restart transition.
Untrusted tenant flood (key blowup) L1 tenants map grows unbounded. Set l1.maxKeys to a value that comfortably exceeds the working set.
Tenant weight changes mid-window New weight takes effect from the next check; the current window honors the weight at first check.
Tenant stops mid-window Its guaranteed reserve stays pinned until the window rolls — pessimistic-correct (we cannot distinguish "stopped" from "about to ask again"). End-of-window T3 still holds.
Tenant over-declares demand (T5) Window-coupling bounds the gain to one window; inflated credits expire at window roll. Not a bug — a stated impossibility (FairRide).

See docs/FAILURE-MODES.md for the complete outage matrix.

Composition

With combineDecisions (manual axis composition)

WFE returns a Decision so it composes with any other axis via the 0.9.0 algebra:

import { combineDecisions, rateLimit, gcra } from "throttlekit";

const rate = rateLimit({ strategy: gcra({ limit: 500, periodMs: 60_000 }) });

async function check(tenant: string, key: string, tokens: number) {
  const [r, w] = await Promise.all([rate.check(key), escrow.check(tenant, tokens)]);
  return combineDecisions(r, w);
}

With unifiedAdmission

At 0.9.1, unifiedAdmission's admit options accept { cost?, tenant? } additively — pass through to a WFE-backed cost axis when present. The existing rate / concurrency axes ignore tenant. See docs/DESIGN-NOTES.md or research/bigger-bets/pillar4-wfe/DESIGN.md §7.1.

Roadmap notes

  • 0.9.1 ships single-pool WFE (this page).
  • Shippedfederated WFE: federatedWeightedFairEscrow + regionFairPool, the two-level region→tenant composition with the (region, tenant) cross-fairness theorem (DR-P4-7, now shipped). See Federated Weighted Fair Escrow above. The in-process pool ships now; the store-backed multi-process pool RedisRegionFairPool (DR-FWFE-1) is also shipped for separate-region-process topologies.
  • next — arbitrary hierarchical weights (teams-within-orgs) on the same two-level machinery (DR-P4-8).
  • 0.10.x? — hot-path-fused leased WFE inside twoTier (DR-P4-9), conditional on a benchmark demonstrating ≥ 2× speedup over the Option L2-B path. The current l2: Store shape is fast enough for the LLM-gateway target.

Anchors

  • Shreedhar & Varghese, Efficient Fair Queuing using Deficit Round Robin, SIGCOMM '95 — the DRR quantum and its fairness bound (T4).
  • Demers, Keshav & Shenker, Analysis and Simulation of a Fair Queueing Algorithm (WFQ), SIGCOMM '89; Parekh & Gallager (GPS), IEEE/ACM ToN 1993 — the weighted max-min ideal that WFE approximates.
  • Stoica, Shenker & Zhang, Core-Stateless Fair Queueing, SIGCOMM '98 — fairness with O(1) shared state. WFE is its escrow-leasing analog (one shared integer, no per-tenant queues).
  • Pu et al., FairRide, NSDI '16 — the SIP impossibility theorem (T5, conceded vertex).
  • Bertsekas & Gallager, Data Networks §6.5.2 — canonical max-min fairness definition.

Clone this wiki locally