Skip to content
Ameya Borkar edited this page Jun 10, 2026 · 20 revisions

ThrottleKit Wiki

Beyond rate limiting — govern rate, concurrency & cost, provably. Two engines do the hard part: GALE (provable distributed leasing — a fleet-size-independent overshoot bound, machine-checked in TLA⁺) and TALE (LLM token-budget escrow — meter what your model spends as it streams), on one small core, from a 169 ns in-process check to a global cluster. (throttlekit.in)

ThrottleKit rests on three ideas: algorithms are pure functions of time, storage is one atomic primitive, and adapters are thin glue. That separation lets the same configuration run as an allocation-free in-process check or atomically across a cluster — and makes the distributed behaviour something you can verify rather than hope for.

New here? Start with Getting Started, then Distributed & provable for the part most libraries hand-wave.

What makes it different

  • A formally-verified overshoot bound — independent of fleet size. The two-tier leasing path is model-checked in TLA⁺/TLC: worst-case global admissions collapse to exactly Limit under windowCoupled, no matter how many nodes. See Distributed & provable.
  • One algorithm, every backend, proven identical. The same strategy code runs in-memory, on Redis (atomic Lua), Postgres, Cloudflare (Durable Objects / D1), DynamoDB, and Deno KV — a dual-path conformance suite proves the decisions bit-identical.
  • A real synchronous API. checkSync is allocation-free at 169 ns/op — uncommon among JS limiters.
  • Breadth on one core. Seven rate-limit algorithms plus first-class billing quota (calendar-month/-week/-day, fixed, rolling — leap-correct), seven exact backends + a best-effort Workers KV, a dozen+ framework & transport bindings (incl. NestJS — with the new @RateLimit decorator — AWS Lambda, gRPC, tRPC, SvelteKit, Remix, Elysia + a transport-agnostic createEnforcer), non-consuming peek / forecast introspection, multi-dimensional single-round-trip checks, fixed-memory DDoS sketches, adaptive concurrency, weighted fair-share admission, an LLM cost-control stack (tokenBudget / distributedTokenBudget + learnedReservation), a .throttlekit.yaml rate-limit-as-code config, and a throttlekit CLI (benchmark / doctor / replay).
  • Proven, and shipping. The guarantees that underpin the distributed paths — GALE and TALE — each land as a real feature, model-checked or measured before it ships.
  • Not Node-only. A gRPC service door (throttlekit-server) + a throttlekit-py client reach the same limiter from any language — rate, cost, concurrency, and unified admission — with decisions proven bit-for-bit against language-neutral golden vectors. See Polyglot & Python.
  • A dashboard that answers "which axis throttled me?" ThrottleKit Lens — a built-in, zero-dependency terminal dashboard (throttlekit-server --tui, eight tabbed views) — shows live binding-axis attribution, which of rate / concurrency / cost bound each denial, the one view no other rate-limiter dashboard can render. Read the same state remotely, from any language, over the Monitor door (gRPC + Prometheus /metrics); for headless use the signal also exports to Grafana.
  • Plan a limit change before you ship it. Policy Plans is a terraform plan for limits — replay your recorded traffic against a candidate config and read the exact per-policy allow↔deny diff before you deploy, gate-able in CI. The same testkit powers live What-If Replay in the Lens.
  • Scale from one process to a global fleet. Configure a policy federated: / fleetBudget: / distributedConcurrency: / federatedFairEscrow: and every client gets fleet-coordinated decisions over the existing RPCs with no client change; a high-throughput client can lease a slice of the global budget through the additive Fleet door. See Scaling & the Fleet.

How it compares

The incumbents are good at what they do — this is what ThrottleKit adds on top. Every row is a shipped, tested feature.

express-rate-limit rate-limiter-flexible @upstash/ratelimit ThrottleKit
Provable, fleet-size-independent overshoot bound (TLA⁺-checked)
Synchronous, allocation-free check ✓ 169 ns
One algorithm, proven bit-identical across backends ✓ (6 stores)
Two-tier leasing — amortized round trips, bounded overshoot
LLM token-budget escrow (post-hoc cost axis) ✓ (TALE)
Unified rate × concurrency × cost in one decision
Weighted-fair · overload shedding · fixed-memory DDoS sketch
Polyglot from one verified core (Python today)
Live binding-axis monitoring dashboard (which axis is throttling) ✓ (Lens)
Plan a limit change before deploy — replay traffic → allow↔deny diff ✓ (Policy Plans)
Framework / transport adapters 1 a few 13
Zero runtime dependencies

About distributed-correctness + breadth — the benchmarks (incl. where an incumbent wins) are reproducible: Performance · BENCH.md. Coming from another library? Migrating.

Guides

Page What's in it
Getting Started Install, your first limiter, the Decision object, checkSync, batch checks, deterministic time
Choosing a strategy The seven algorithms and when to use each
Frameworks & the edge Express, fetch/edge, Hono, Next, Fastify, Koa, NestJS, SvelteKit, Remix, Elysia, AWS Lambda, tRPC, gRPC, and createEnforcer for custom transports
Distributed & provable Redis, Postgres, Cloudflare, DynamoDB, Deno KV, two-tier leasing, multi-region, and the formally-verified bound
Federation One global limit across regional clusters; proven Δ = 0 independent of region count K (0.8.3)
Scaling & the Fleet One global limit across a fleet — Tier-1 over the existing RPCs (zero client change) and Tier-2 via the Fleet.Reserve lease, plus the Monitor door
Unified admission One Decision across rate + concurrency + cost (LLM-gateway shape); algebra-proven, sequential or Lua-fused (0.9.0)
Pillar 4 — Weighted Fair Escrow Weighted-fair, work-conserving budget split across tenants; multi-process L2-backed (0.9.1)
Advanced limiting Multi-dimensional limits, adaptive concurrency, leaky-bucket shaping
Overload, fairness & DDoS Adaptive load-shedding, fair-share & weighted fairness, fixed-memory sketches
Operations Standards headers, trusted-proxy IP keys, PII-safe HMAC keys, OpenTelemetry, failure modes
Monitoring — ThrottleKit Lens The built-in terminal dashboard (throttlekit-server --tui, eight tabs): live binding-axis attribution + the full ops board + the remote Monitor door, no browser or backend
Policy Plans A terraform plan for limits — replay recorded traffic vs a candidate config → the allow↔deny diff, before you deploy
Replay Deterministic What-If Replay — record a limiter's decisions, replay a candidate, read the flip ledger
Performance Benchmarks, the honest head-to-head, and where it loses
Migrating Drop-in paths from express-rate-limit and rate-limiter-flexible, plus recipes
Polyglot & Python Reach the same limiter from any language — the throttlekit-server gRPC service + the throttlekit-py client; every axis, bit-for-bit
GALE & TALE — the guarantees How the provable distributed-leasing and LLM token-budget-escrow paths are proven

In the repository

Clone this wiki locally