Skip to content

Research

Ameya Borkar edited this page Jun 5, 2026 · 5 revisions

GALE & TALE — the proven guarantees

ThrottleKit's distributed guarantees aren't folklore — they come from two bodies of engineering developed alongside the library, GALE and TALE. Both are proven and measured: gated under research/ and test/, with the pieces that ship into the public API marked below. Reproduce with npx vitest run test/gale and npx vitest run test/cost.

GALE — Globally-Accounted Learned Escrow

The first distributed rate limiter with a hard, tight overshoot bound independent of fleet size. Four pillars and a capstone, each machine-checked or measured:

Pillar Result Status
1 — safety Window-coupled overshoot = L, independent of N Shipped as lease.windowCoupled; TLA⁺ + exhaustive BFS twin + discrete-event sim (Δ=0 for N→512 under latency/partitions)
2 — efficiency Online-EOQ lease sizing, O(√T) regret Shipped as leaseSizer; measured (avg regret/round 18.6 → 0.40)
3 — predictions Learning-augmented sizing; consistency + robustness, safety unconditional Shipped as predictiveLeaseSizer; measured
4 — fairness Weighted Fair Escrow (work-conserving multi-tenant fairness) Shipped as weightedMaxMin / weightedFairShare; 4 theorems machine-checked on 20k instances
Capstone The rate-limiting trilemma Δ + N·U ≥ (N−1)L, tight, + a 0<C<N partial-coordination interpolation Proven + machine-checked (N ∈ {2,3,4})

The crux insight: stranded capacity is overshoot debt — both are held-but-unused credits surviving the L2 window boundary, so minimizing them tightens overshoot and raises utilization at once. The only real tension is hold-few-credits vs coordination cost, which the trilemma makes precise. Write-ups under research/gale/.

TALE — Temporally-Accounted Learned Escrow

The cost-axis sibling of GALE: token-budget rate limiting for LLMs, where a request's cost — its output tokens — is unknown at admission and revealed only as it streams. A reserve-then-reconcile escrow in three layers:

Layer Result Status
1 — streaming meter Overshoot ≤ g−1 (0 at g=1), independent of max_tokens Shipped as tokenBudget; measured (vs reserve-max util collapse 0.77→0)
2 — learned reservation Online newsvendor critical-fractile quantile, O(√T) regret Shipped as learnedReservation; measured (avg pinball regret 8.49 → 2.77)
3 — predictions-with-safety Rank predictor + Hedge; safety unconditional Shipped as predictiveReservation; measured (overshoot 0 under any predictor)
Distributed Multi-gateway TPM = GALE leased budget (token unit) Byte-identical to GALE simulateWindowCoupled, ∀ gateways C ∈ {1..32}

The unification: GALE escrows across placement (which node spends a shared budget); TALE escrows across cost (how much a single request spends). They are the same reserve → meter-actuals → reconcile mechanism — TALE's learned layers are literally GALE's retargeted onto the cost axis, and the multi-gateway form reduces to GALE leasing token-for-token. Write-up under research/cost-uncertainty/.

Relationship to the shipped library

You do not need to read any of this to use ThrottleKit. The research exists so that the library's distributed claims are provable — and most of it now ships: lease.windowCoupled (GALE Pillar 1), the adaptive lease sizers leaseSizer / predictiveLeaseSizer (Pillars 2–3), weightedFairShare / weightedMaxMin (Pillar 4), and on the cost axis tokenBudget / learnedReservation / predictiveReservation (TALE Layers 1–3) are all first-class API. What remains validation-only is the theory behind them — the trilemma lower bounds and the at-scale discrete-event simulator — which justify the design rather than being code to call. See also docs/FORMAL-MODEL.md.

Clone this wiki locally