-
Notifications
You must be signed in to change notification settings - Fork 0
Research
ThrottleKit's distributed guarantees aren't folklore — they come from two bodies of engineering developed alongside the library, GALE and TALE. Both are proven and measured: gated under research/ and test/, with the pieces that ship into the public API marked below. Reproduce with npx vitest run test/gale and npx vitest run test/cost.
The first distributed rate limiter with a hard, tight overshoot bound independent of fleet size. Four pillars and a capstone, each machine-checked or measured:
| Pillar | Result | Status |
|---|---|---|
| 1 — safety | Window-coupled overshoot = L, independent of N |
Shipped as lease.windowCoupled; TLA⁺ + exhaustive BFS twin + discrete-event sim (Δ=0 for N→512 under latency/partitions) |
| 2 — efficiency | Online-EOQ lease sizing, O(√T) regret |
Shipped as leaseSizer; measured (avg regret/round 18.6 → 0.40) |
| 3 — predictions | Learning-augmented sizing; consistency + robustness, safety unconditional |
Shipped as predictiveLeaseSizer; measured |
| 4 — fairness | Weighted Fair Escrow (work-conserving multi-tenant fairness) |
Shipped as weightedMaxMin / weightedFairShare; 4 theorems machine-checked on 20k instances |
| Capstone | The rate-limiting trilemma Δ + N·U ≥ (N−1)L, tight, + a 0<C<N partial-coordination interpolation |
Proven + machine-checked (N ∈ {2,3,4}) |
The crux insight: stranded capacity is overshoot debt — both are held-but-unused credits surviving the L2 window boundary, so minimizing them tightens overshoot and raises utilization at once. The only real tension is hold-few-credits vs coordination cost, which the trilemma makes precise. Write-ups under research/gale/.
The cost-axis sibling of GALE: token-budget rate limiting for LLMs, where a request's cost — its output tokens — is unknown at admission and revealed only as it streams. A reserve-then-reconcile escrow in three layers:
| Layer | Result | Status |
|---|---|---|
| 1 — streaming meter | Overshoot ≤ g−1 (0 at g=1), independent of max_tokens
|
Shipped as tokenBudget; measured (vs reserve-max util collapse 0.77→0) |
| 2 — learned reservation | Online newsvendor critical-fractile quantile, O(√T) regret |
Shipped as learnedReservation; measured (avg pinball regret 8.49 → 2.77) |
| 3 — predictions-with-safety | Rank predictor + Hedge; safety unconditional |
Shipped as predictiveReservation; measured (overshoot 0 under any predictor) |
| Distributed | Multi-gateway TPM = GALE leased budget (token unit) |
Byte-identical to GALE simulateWindowCoupled, ∀ gateways C ∈ {1..32} |
The unification: GALE escrows across placement (which node spends a shared budget); TALE escrows across cost (how much a single request spends). They are the same reserve → meter-actuals → reconcile mechanism — TALE's learned layers are literally GALE's retargeted onto the cost axis, and the multi-gateway form reduces to GALE leasing token-for-token. Write-up under research/cost-uncertainty/.
You do not need to read any of this to use ThrottleKit. The research exists so that the library's distributed claims are provable — and most of it now ships: lease.windowCoupled (GALE Pillar 1), the adaptive lease sizers leaseSizer / predictiveLeaseSizer (Pillars 2–3), weightedFairShare / weightedMaxMin (Pillar 4), and on the cost axis tokenBudget / learnedReservation / predictiveReservation (TALE Layers 1–3) are all first-class API. What remains validation-only is the theory behind them — the trilemma lower bounds and the at-scale discrete-event simulator — which justify the design rather than being code to call. See also docs/FORMAL-MODEL.md.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE