# Pillar 4 — Weighted Fair Escrow (WFE)

`weightedFairEscrow(...)` splits a shared per-window budget across
tenants in weighted-max-min fair proportion, with idle tenants' surplus
reclaimed by backlogged ones. It's the production graduation of GALE's
Pillar 4 (`research/gale/PILLAR4-fairness.md`) — work-conserving, weight-
honoring, with a hard `Δ = 0` per-window cap inherited from Pillar 1.

```ts
import { weightedFairEscrow } from "throttlekit";

const TIERS = { enterprise: 4, pro: 2, free: 1 };

const escrow = weightedFairEscrow({
  limit:    30_000,                                       // L tokens / window
  windowMs: 60_000,
  weightOf: (tenant) => TIERS[tenant.split(":")[0]] ?? 1,
  l1:       { maxKeys: 1024 },                            // cap tenant set
});

const d = await escrow.check("enterprise:alpha", 8000);
if (!d.allowed) return reject({ retryAfterMs: d.retryAfterMs });
// d.limit is THIS tenant's current ceiling; d.remaining the headroom.
```

## When to use WFE vs the other fairness primitives

ThrottleKit ships three fairness primitives that compose orthogonally:

| Primitive | Best for | Distributed? | Work-conserving? | Weight-honoring? |
|---|---|---|---|---|
| `fairShare` | Single-process equal-share | no | no (FCFS surplus) | no (equal weights) |
| `weightedFairShare` | Single-process weight-skewed | no | no (FCFS surplus) | yes (approximate) |
| **`weightedFairEscrow`** | **Multi-process, work-conserving, weight-honoring** | **yes** | **yes (water-filled)** | **yes (exact)** |

The difference matters under **multi-tenant overload**. With
`weightedFairShare`, a low-weight flood gets unused budget on a
first-come basis — the high-weight tenant's idle share is consumed by
whoever happens to ask first. WFE redistributes that surplus by weight:
under skew, an idle high-priority tenant's quota flows to backlogged
tenants in proportion to their weights, not to the loudest one.

## The algorithm (the streaming realisation)

Each tenant has a **dynamic guaranteed share** `gᵢ = ⌊wᵢ·L/W⌋`
(recomputed from the current active set on every check; `W` = total
weight of active tenants). Each `.check()` is hierarchical:

1. **Within guarantee** (`used + cost ≤ gᵢ`): always allowed, subject to
   the global cap `Σ used + cost ≤ L`. This is the **T2 sharing-incentive**
   promise — every backlogged tenant is admissible up to `gᵢ` before any
   other tenant can borrow beyond their `gⱼ`.
2. **Borrow phase** (`used + cost > gᵢ`): the asker tries to grow into
   surplus other tenants haven't claimed. The pessimistic surplus is

   ```text
   borrowAvailable = max(0, (L − Σ used) − Σⱼ≠ᵢ max(0, gⱼ − usedⱼ))
   ```

   — the unallocated budget minus what would still need to be served to
   *other* backlogged tenants' guarantees. The per-call borrow is capped
   at the call's own `cost`, matching Shreedhar-Varghese DRR semantics
   (`q = cost`). If `cost ≤ borrowAvailable`, the request is admitted;
   otherwise denied.

## What WFE guarantees

| Theorem | Statement |
|---|---|
| **T1 safety** | `Σ used ≤ L` always (global cap; never over-admits even under adversarial scheduling). |
| **T2 sharing-incentive** | Every active tenant is admissible up to `gᵢ` before any other tenant can borrow past `gⱼ`. |
| **T3 work-conservation** | At end-of-window with continuously-backlogged tenants, `Σ used = L` (full utilisation). |
| **T4 bounded unfairness** | For continuously-backlogged tenants, `|aᵢ/wᵢ − aⱼ/wⱼ| ≤ q·(1/wᵢ + 1/wⱼ)` with `q = cost`. |

All four hold under `fast-check` at `numRuns: 200` (`test/twotier/
weighted-fair-escrow-properties.test.ts`). The pure batch algebra
(`weightedMaxMin`) is independently machine-checked at 20 000 random
trials (`test/gale/fair-escrow.test.ts`).

## What WFE does NOT guarantee

- **Not strategy-proof.** Per [FairRide](https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-pu.pdf) (Pu et al., NSDI'16): no shared-cache
  primitive can be simultaneously sharing-incentive, work-conserving,
  AND strategy-proof. WFE takes the first two and concedes the third.
  A tenant *can* over-declare demand to claim surplus; window-coupling
  bounds the gain to one window.
- **Flat tenant set** in the base limiter. A single `weightedFairEscrow`
  has one weight per tenant. The *recursive* WFE composition — an
  internal node weighted by the sum of its children — is **shipped** as
  [Federated Weighted Fair Escrow](#federated-weighted-fair-escrow-cross-region)
  for the region→tenant hierarchy (the GPS-decomposition special case).
  Arbitrary nested weights (teams-within-orgs) reuse the same two-level
  machinery and are the documented next step (DR-P4-8).

## Multi-process backing (the L2 path)

When the budget needs to be shared across processes (a fleet of
gateways, a horizontally-scaled service), pass any distributed
`Store` and a per-process `quantum`:

```ts
import Redis from "ioredis";
import { fromIoredis } from "throttlekit/redis";

const escrow = weightedFairEscrow({
  limit:    200_000, windowMs: 60_000,
  weightOf: (t) => TIERS[t.split(":")[0]] ?? 1,
  l2:       fromIoredis(new Redis(process.env.REDIS_URL!)),
  quantum:  500,                       // per-process lease size
  l2Key:    "tk:wfe:gateway-prod",     // same key on every process
});

// Async-only when l2 is configured (the lease step is async):
const d = await escrow.check("enterprise:alpha", 1500);
```

Internally each process leases `quantum` credits at a time from a
shared `fixedWindow({ limit: L, windowMs })` counter in the store — the
same atomic Lua that ships with `distributedTokenBudget`, so the wire
surface adds zero new scripts (DR-P4-5). The per-tenant fairness
arithmetic stays in-process; the shared store's atomicity is what
bounds the global `Σ used ≤ L`.

**Tuning `quantum`.** Smaller quantum = tighter cross-process fairness,
more RTTs (`O(L/quantum)` Redis calls per window). Larger quantum =
fewer RTTs, looser bound (the cross-process T4 picks up a `quantum`-
scaled DRR slack: `Σₚ Q⁽ᵖ⁾ · (1/wᵢ + 1/wⱼ)`). For an LLM gateway with
`L = 200 000` TPM and 4 process replicas, `quantum = 500` gives ~400
RTTs per minute per process and a sub-millisecond cross-process T4 in
the common case.

## Federated Weighted Fair Escrow (cross-region)

A single `weightedFairEscrow` splits ONE budget `L` across tenants in one
process. **Federated WFE** (`federatedWeightedFairEscrow` +
`regionFairPool`, shipped — TK-1404) composes that primitive
across **regions** so that a tenant's **global** total — summed over
every region it is active in — is the weighted-max-min allocation a
single, flat, global WFE over `L` would produce. The regions are
plumbing, not a fairness boundary: a tenant is neither helped nor hurt by
*which* region it lives in.

### Why it isn't just "run WFE per region"

Hierarchical max-min fairness is, in general, **not** flat max-min
fairness. Running WFE per region over a budget shared by a plain
first-come-first-served counter gives *in-region* weighted fairness but
*cross-region* FCFS pooling — a heavily-weighted region that arrives late
is starved by a lightly-weighted one (HLS isolation, Saeed et al. 2021).
Flat global fairness emerges only under the **Parekh–Gallager GPS
decomposition**: an internal node weighted by the **sum of its children's
weights** reproduces the flat allocation. Federated WFE realises that as
**two composed WFEs**:

1. **Cross-region WFE** — `regionFairPool`, a weighted-fair *reservation*
   layer whose "tenants" are regions. Region `r`'s weight is its
   **dynamic active aggregate** tenant weight `W_r = Σ w_{t,r}`; the pool
   reserves region `r` at least `⌊W_r/ΣW · L⌋` (a busy region cannot
   steal it) and lets it borrow idle regions' surplus. A plain counter
   cannot reserve — exactly why cross-region weights need this layer.
2. **In-region WFE** — `federatedWeightedFairEscrow`, the same per-tenant
   `gᵢ = ⌊wᵢ·L_r/W_r⌋` arithmetic from above, splitting each region's
   pool-granted budget `L_r` among its tenants by weight.

### API

```ts
import { regionFairPool, federatedWeightedFairEscrow } from "throttlekit/twotier";

// ONE shared cross-region pool per global budget L.
const pool = regionFairPool({ limit: 30_000, windowMs: 60_000 });

const TIERS = { enterprise: 4, pro: 2, free: 1 };
const weightOf = (t: string) => TIERS[t.split(":")[0]] ?? 1;

// One limiter per region; they ALL share the same pool.
const usEast = federatedWeightedFairEscrow({ region: "us-east", pool, weightOf });
const euWest = federatedWeightedFairEscrow({ region: "eu-west", pool, weightOf });

const d = usEast.checkSync("enterprise:alpha", 1_500); // sync — the in-process pool is synchronous
// `.check(...)` is the Promise form; `.reset(tenant?)`, `.stats()` mirror the base limiter.
```

Each region's `weightOf` returns the tenant's weight **as seen in that
region** (see the split rule below). `regionFairPool` owns `limit`,
`windowMs`, and the `clock`; every region drawing from one global budget
**must share one pool instance**.

### The demand-proportional weight split (region-spanning tenants)

A tenant active in **one** region returns its full global weight `w_t`. A
tenant active in **several** regions must have its weight **split**
demand-proportionally:

```text
w_{t,r} = w_t · d_{t,r} / d_t        (so Σ_r w_{t,r} = w_t)
```

Returning the full `w_t` in *every* region **double-counts** the tenant —
it claims a full-weight floor in each region and is over-served ≈k× (k =
the number of regions it spans). The split keeps its **global** weight at
exactly `w_t`, so its global share matches the flat ideal.

### What it guarantees

| Theorem | Statement |
|---|---|
| **T-FED-1 safety** | The pool grants `Σ_r L_r ≤ L` and in-region WFE serves `Σ used ≤ L_r`, so **`Σ admitted ≤ L` globally**, independent of region count (`Δ = 0`, inherited from Pillar 1). |
| **T-FED-2 fluid exactness** | Under the three conditions (region weight = `ΣW_r`, in-region WFE, demand-proportional split), each tenant's **global** total equals the flat global weighted-max-min ideal **exactly in the fluid limit**. |
| **T-FED-3 bounded discrete error** | With discrete granting, `\|a_t − a*_t\| ≤ span(t)·(2·q_R+1)` — a two-level DRR residual (`q_R` = region-level lease quantum). Verified ≈0.02% relative. |

Proof + the machine-checked gate (0 deviation on structured + random
worlds, never over-admits):
[`research/bigger-bets/federation/federated-wfe-gate.ts`](https://github.com/AmeyaBorkar/throttlekit/blob/main/research/bigger-bets/federation/federated-wfe-gate.ts);
theorem write-up in
[`research/gale/PILLAR4-fairness.md`](https://github.com/AmeyaBorkar/throttlekit/blob/main/research/gale/PILLAR4-fairness.md)
§"Federated composition".

### Honest scope

- **Exact in the all-backlogged limit** (the common overload case).
- **Mixed saturation** (a region with a demand-bottlenecked co-tenant)
  inherits in-region WFE's **T3 reserve gap**: the shipped *streaming*
  code reclaims only from **truly-absent** regions (which never call the
  pool), not from a present-but-paused one — so it strands a saturated
  participant's reserve and deviates from the *clairvoyant* fluid oracle
  by that fraction. This is exactly what a flat *streaming* WFE does; no
  streaming limiter matches the clairvoyant oracle without a demand
  oracle.
- **Topology.** `regionFairPool` is the **in-process** pool — correct and
  complete for a **single arbiter process** all regions consult (e.g. a
  central rate-limit service), and the substrate the tests verify against
  the flat oracle. To distribute the pool across **separate region
  processes**, **`RedisRegionFairPool`** keeps the same per-region
  accounting in a shared store (the weighted-max-min ceiling math ported
  to atomic Lua over a Redis hash, the weighted analog of
  `RegionalEscrow`'s Lua) — verified grant-for-grant against the
  in-process oracle on live Redis — so federated fair-escrow stays correct
  when each region runs as its own process (DR-FWFE-1). Behind a server
  this is also reachable over the existing `Check` RPC
  (`federatedFairEscrow:`) with no client change — see
  **[Scaling & the Fleet](Scaling-and-the-Fleet)**.

Runnable demo:
[`examples/federated-weighted-fair-escrow.ts`](https://github.com/AmeyaBorkar/throttlekit/blob/main/examples/federated-weighted-fair-escrow.ts)
(`npx tsx examples/federated-weighted-fair-escrow.ts`).

## Failure modes

| Failure | Behavior |
|---|---|
| L2 `Store` unreachable (Redis down, etc.) | `check()` rejects with `StoreUnavailableError`. No silent fallback — the leased pool is the safety boundary. |
| Process restart mid-window | L1 state lost; tenants restart at `gᵢ`; pool re-leases from L2. `Δ ≤ quantum · N_tenants` in the cross-restart transition. |
| Untrusted tenant flood (key blowup) | L1 `tenants` map grows unbounded. Set `l1.maxKeys` to a value that comfortably exceeds the working set. |
| Tenant weight changes mid-window | New weight takes effect from the next check; the current window honors the weight at first check. |
| Tenant stops mid-window | Its guaranteed reserve stays pinned until the window rolls — pessimistic-correct (we cannot distinguish "stopped" from "about to ask again"). End-of-window T3 still holds. |
| Tenant over-declares demand (T5) | Window-coupling bounds the gain to one window; inflated credits expire at window roll. **Not** a bug — a stated impossibility (FairRide). |

See `docs/FAILURE-MODES.md` for the complete outage matrix.

## Composition

### With `combineDecisions` (manual axis composition)

WFE returns a `Decision` so it composes with any other axis via the
0.9.0 algebra:

```ts
import { combineDecisions, rateLimit, gcra } from "throttlekit";

const rate = rateLimit({ strategy: gcra({ limit: 500, periodMs: 60_000 }) });

async function check(tenant: string, key: string, tokens: number) {
  const [r, w] = await Promise.all([rate.check(key), escrow.check(tenant, tokens)]);
  return combineDecisions(r, w);
}
```

### With `unifiedAdmission`

At 0.9.1, `unifiedAdmission`'s admit options accept `{ cost?, tenant? }`
additively — pass through to a WFE-backed cost axis when present. The
existing rate / concurrency axes ignore `tenant`. See `docs/DESIGN-NOTES.md`
or `research/bigger-bets/pillar4-wfe/DESIGN.md` §7.1.

## Roadmap notes

- **0.9.1** ships single-pool WFE (this page).
- **Shipped** — **federated WFE**: `federatedWeightedFairEscrow` +
  `regionFairPool`, the two-level region→tenant composition with the
  `(region, tenant)` cross-fairness theorem (DR-P4-7, now *shipped*). See
  [Federated Weighted Fair Escrow](#federated-weighted-fair-escrow-cross-region)
  above. The in-process pool ships now; the store-backed multi-process
  pool **`RedisRegionFairPool`** (DR-FWFE-1) is **also shipped** for
  separate-region-process topologies.
- **next** — arbitrary hierarchical weights (teams-within-orgs) on the
  same two-level machinery (DR-P4-8).
- **0.10.x?** — hot-path-fused leased WFE inside `twoTier` (DR-P4-9),
  conditional on a benchmark demonstrating ≥ 2× speedup over the
  Option L2-B path. The current `l2: Store` shape is fast enough for
  the LLM-gateway target.

## Anchors

- Shreedhar & Varghese, **Efficient Fair Queuing using Deficit Round
  Robin**, *SIGCOMM '95* — the DRR quantum and its fairness bound (T4).
- Demers, Keshav & Shenker, **Analysis and Simulation of a Fair
  Queueing Algorithm** (WFQ), *SIGCOMM '89*; Parekh & Gallager (GPS),
  *IEEE/ACM ToN* 1993 — the weighted max-min ideal that WFE
  approximates.
- Stoica, Shenker & Zhang, **Core-Stateless Fair Queueing**,
  *SIGCOMM '98* — fairness with O(1) shared state. WFE is its
  escrow-leasing analog (one shared integer, no per-tenant queues).
- Pu et al., **FairRide**, *NSDI '16* — the SIP impossibility theorem
  (T5, conceded vertex).
- Bertsekas & Gallager, **Data Networks** §6.5.2 — canonical max-min
  fairness definition.