Home

ThrottleKit Wiki

Beyond rate limiting — govern rate, concurrency & cost, provably. Two engines do the hard part: GALE (provable distributed leasing — a fleet-size-independent overshoot bound, machine-checked in TLA⁺) and TALE (LLM token-budget escrow — meter what your model spends as it streams), on one small core, from a 169 ns in-process check to a global cluster. (throttlekit.in)

ThrottleKit rests on three ideas: algorithms are pure functions of time, storage is one atomic primitive, and adapters are thin glue. That separation lets the same configuration run as an allocation-free in-process check or atomically across a cluster — and makes the distributed behaviour something you can verify rather than hope for.

New here? Start with Getting Started, then Distributed & provable for the part most libraries hand-wave.

What makes it different

A formally-verified overshoot bound — independent of fleet size. The two-tier leasing path is model-checked in TLA⁺/TLC: worst-case global admissions collapse to exactly Limit under windowCoupled, no matter how many nodes. See Distributed & provable.
One algorithm, every backend, proven identical. The same strategy code runs in-memory, on Redis (atomic Lua), Postgres, Cloudflare (Durable Objects / D1), DynamoDB, and Deno KV — a dual-path conformance suite proves the decisions bit-identical.
A real synchronous API. checkSync is allocation-free at 169 ns/op — uncommon among JS limiters.
Breadth on one core. Seven rate-limit algorithms plus first-class billing quota (calendar-month/-week/-day, fixed, rolling — leap-correct), seven exact backends + a best-effort Workers KV, a dozen+ framework & transport bindings (incl. NestJS — with the new @RateLimit decorator — AWS Lambda, gRPC, tRPC, SvelteKit, Remix, Elysia + a transport-agnostic createEnforcer), non-consuming peek / forecast introspection, multi-dimensional single-round-trip checks, fixed-memory DDoS sketches, adaptive concurrency, weighted fair-share admission, an LLM cost-control stack (tokenBudget / distributedTokenBudget + learnedReservation), a .throttlekit.yaml rate-limit-as-code config, and a throttlekit CLI (benchmark / doctor / replay).
Proven, and shipping. The guarantees that underpin the distributed paths — GALE and TALE — each land as a real feature, model-checked or measured before it ships.
Not Node-only. A gRPC service door (throttlekit-server) + a throttlekit-py client reach the same limiter from any language — rate, cost, concurrency, and unified admission — with decisions proven bit-for-bit against language-neutral golden vectors. See Polyglot & Python.
A dashboard that answers "which axis throttled me?" ThrottleKit Lens — a built-in, zero-dependency terminal dashboard (throttlekit-server --tui, eight tabbed views) — shows live binding-axis attribution, which of rate / concurrency / cost bound each denial, the one view no other rate-limiter dashboard can render. Read the same state remotely, from any language, over the Monitor door (gRPC + Prometheus /metrics); for headless use the signal also exports to Grafana.
Plan a limit change before you ship it. Policy Plans is a terraform plan for limits — replay your recorded traffic against a candidate config and read the exact per-policy allow↔deny diff before you deploy, gate-able in CI. The same testkit powers live What-If Replay in the Lens.
Scale from one process to a global fleet. Configure a policy federated: / fleetBudget: / distributedConcurrency: / federatedFairEscrow: and every client gets fleet-coordinated decisions over the existing RPCs with no client change; a high-throughput client can lease a slice of the global budget through the additive Fleet door. See Scaling & the Fleet.

How it compares

The incumbents are good at what they do — this is what ThrottleKit adds on top. Every row is a shipped, tested feature.

	`express-rate-limit`	`rate-limiter-flexible`	`@upstash/ratelimit`	ThrottleKit
Provable, fleet-size-independent overshoot bound (TLA⁺-checked)	–	–	–	✓
Synchronous, allocation-free check	–	–	–	✓ 169 ns
One algorithm, proven bit-identical across backends	–	–	–	✓ (6 stores)
Two-tier leasing — amortized round trips, bounded overshoot	–	–	–	✓
LLM token-budget escrow (post-hoc cost axis)	–	–	–	✓ (TALE)
Unified rate × concurrency × cost in one decision	–	–	–	✓
Weighted-fair · overload shedding · fixed-memory DDoS sketch	–	–	–	✓
Polyglot from one verified core (Python today)	–	–	–	✓
Live binding-axis monitoring dashboard (which axis is throttling)	–	–	–	✓ (Lens)
Plan a limit change before deploy — replay traffic → allow↔deny diff	–	–	–	✓ (Policy Plans)
Framework / transport adapters	1	a few	–	13
Zero runtime dependencies	–	–	–	✓

About distributed-correctness + breadth — the benchmarks (incl. where an incumbent wins) are reproducible: Performance · BENCH.md. Coming from another library? Migrating.

Guides

Page	What's in it
Getting Started	Install, your first limiter, the `Decision` object, `checkSync`, batch checks, deterministic time
Choosing a strategy	The seven algorithms and when to use each
Frameworks & the edge	Express, `fetch`/edge, Hono, Next, Fastify, Koa, NestJS, SvelteKit, Remix, Elysia, AWS Lambda, tRPC, gRPC, and `createEnforcer` for custom transports
Distributed & provable	Redis, Postgres, Cloudflare, DynamoDB, Deno KV, two-tier leasing, multi-region, and the formally-verified bound
Federation	One global limit across regional clusters; proven `Δ = 0` independent of region count K (0.8.3)
Scaling & the Fleet	One global limit across a fleet — Tier-1 over the existing RPCs (zero client change) and Tier-2 via the `Fleet.Reserve` lease, plus the Monitor door
Unified admission	One Decision across rate + concurrency + cost (LLM-gateway shape); algebra-proven, sequential or Lua-fused (0.9.0)
Pillar 4 — Weighted Fair Escrow	Weighted-fair, work-conserving budget split across tenants; multi-process L2-backed (0.9.1)
Advanced limiting	Multi-dimensional limits, adaptive concurrency, leaky-bucket shaping
Overload, fairness & DDoS	Adaptive load-shedding, fair-share & weighted fairness, fixed-memory sketches
Operations	Standards headers, trusted-proxy IP keys, PII-safe HMAC keys, OpenTelemetry, failure modes
Monitoring — ThrottleKit Lens	The built-in terminal dashboard (`throttlekit-server --tui`, eight tabs): live binding-axis attribution + the full ops board + the remote Monitor door, no browser or backend
Policy Plans	A `terraform plan` for limits — replay recorded traffic vs a candidate config → the allow↔deny diff, before you deploy
Replay	Deterministic What-If Replay — record a limiter's decisions, replay a candidate, read the flip ledger
Performance	Benchmarks, the honest head-to-head, and where it loses
Migrating	Drop-in paths from `express-rate-limit` and `rate-limiter-flexible`, plus recipes
Polyglot & Python	Reach the same limiter from any language — the `throttlekit-server` gRPC service + the `throttlekit-py` client; every axis, bit-for-bit
GALE & TALE — the guarantees	How the provable distributed-leasing and LLM token-budget-escrow paths are proven

In the repository

README — the short version of this page.
THROTTLEKIT.md — full design and architecture.
SCOREBOARD.md — benchmarks, correctness guarantees, feature matrix.
docs/FORMAL-MODEL.md — the formally-verified leasing bound.
research/ — the design docs, proofs, and evals behind the guarantees.
examples/ — a runnable file for every feature.
CHANGELOG.md — release history.

ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)

ThrottleKit Wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

ThrottleKit Wiki

What makes it different

How it compares

Guides

In the repository

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally