-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Beyond rate limiting — govern rate, concurrency & cost, provably. Two engines do the hard part: GALE (provable distributed leasing — a fleet-size-independent overshoot bound, machine-checked in TLA⁺) and TALE (LLM token-budget escrow — meter what your model spends as it streams), on one small core, from a 169 ns in-process check to a global cluster. (throttlekit.in)
ThrottleKit rests on three ideas: algorithms are pure functions of time, storage is one atomic primitive, and adapters are thin glue. That separation lets the same configuration run as an allocation-free in-process check or atomically across a cluster — and makes the distributed behaviour something you can verify rather than hope for.
New here? Start with Getting Started, then Distributed & provable for the part most libraries hand-wave.
-
A formally-verified overshoot bound — independent of fleet size. The two-tier leasing path is model-checked in TLA⁺/TLC: worst-case global admissions collapse to exactly
LimitunderwindowCoupled, no matter how many nodes. See Distributed & provable. - One algorithm, every backend, proven identical. The same strategy code runs in-memory, on Redis (atomic Lua), Postgres, Cloudflare (Durable Objects / D1), DynamoDB, and Deno KV — a dual-path conformance suite proves the decisions bit-identical.
-
A real synchronous API.
checkSyncis allocation-free at 169 ns/op — uncommon among JS limiters. -
Breadth on one core. Seven rate-limit algorithms plus first-class billing
quota(calendar-month/-week/-day, fixed, rolling — leap-correct), seven exact backends + a best-effort Workers KV, a dozen+ framework & transport bindings (incl. NestJS — with the new@RateLimitdecorator — AWS Lambda, gRPC, tRPC, SvelteKit, Remix, Elysia + a transport-agnosticcreateEnforcer), non-consumingpeek/forecastintrospection, multi-dimensional single-round-trip checks, fixed-memory DDoS sketches, adaptive concurrency, weighted fair-share admission, an LLM cost-control stack (tokenBudget/distributedTokenBudget+learnedReservation), a.throttlekit.yamlrate-limit-as-code config, and athrottlekitCLI (benchmark/doctor/replay). - Proven, and shipping. The guarantees that underpin the distributed paths — GALE and TALE — each land as a real feature, model-checked or measured before it ships.
-
Not Node-only. A gRPC service door (
throttlekit-server) + athrottlekit-pyclient reach the same limiter from any language — rate, cost, concurrency, and unified admission — with decisions proven bit-for-bit against language-neutral golden vectors. See Polyglot & Python. -
A dashboard that answers "which axis throttled me?" ThrottleKit Lens — a built-in, zero-dependency terminal dashboard (
throttlekit-server --tui, eight tabbed views) — shows live binding-axis attribution, which of rate / concurrency / cost bound each denial, the one view no other rate-limiter dashboard can render. Read the same state remotely, from any language, over the Monitor door (gRPC + Prometheus/metrics); for headless use the signal also exports to Grafana. -
Plan a limit change before you ship it. Policy Plans is a
terraform planfor limits — replay your recorded traffic against a candidate config and read the exact per-policy allow↔deny diff before you deploy, gate-able in CI. The same testkit powers live What-If Replay in the Lens. -
Scale from one process to a global fleet. Configure a policy
federated:/fleetBudget:/distributedConcurrency:/federatedFairEscrow:and every client gets fleet-coordinated decisions over the existing RPCs with no client change; a high-throughput client can lease a slice of the global budget through the additive Fleet door. See Scaling & the Fleet.
The incumbents are good at what they do — this is what ThrottleKit adds on top. Every row is a shipped, tested feature.
express-rate-limit |
rate-limiter-flexible |
@upstash/ratelimit |
ThrottleKit | |
|---|---|---|---|---|
| Provable, fleet-size-independent overshoot bound (TLA⁺-checked) | – | – | – | ✓ |
| Synchronous, allocation-free check | – | – | – | ✓ 169 ns |
| One algorithm, proven bit-identical across backends | – | – | – | ✓ (6 stores) |
| Two-tier leasing — amortized round trips, bounded overshoot | – | – | – | ✓ |
| LLM token-budget escrow (post-hoc cost axis) | – | – | – | ✓ (TALE) |
| Unified rate × concurrency × cost in one decision | – | – | – | ✓ |
| Weighted-fair · overload shedding · fixed-memory DDoS sketch | – | – | – | ✓ |
| Polyglot from one verified core (Python today) | – | – | – | ✓ |
| Live binding-axis monitoring dashboard (which axis is throttling) | – | – | – | ✓ (Lens) |
| Plan a limit change before deploy — replay traffic → allow↔deny diff | – | – | – | ✓ (Policy Plans) |
| Framework / transport adapters | 1 | a few | – | 13 |
| Zero runtime dependencies | – | – | – | ✓ |
About distributed-correctness + breadth — the benchmarks (incl. where an incumbent wins) are reproducible: Performance · BENCH.md. Coming from another library? Migrating.
| Page | What's in it |
|---|---|
| Getting Started | Install, your first limiter, the Decision object, checkSync, batch checks, deterministic time |
| Choosing a strategy | The seven algorithms and when to use each |
| Frameworks & the edge | Express, fetch/edge, Hono, Next, Fastify, Koa, NestJS, SvelteKit, Remix, Elysia, AWS Lambda, tRPC, gRPC, and createEnforcer for custom transports |
| Distributed & provable | Redis, Postgres, Cloudflare, DynamoDB, Deno KV, two-tier leasing, multi-region, and the formally-verified bound |
| Federation | One global limit across regional clusters; proven Δ = 0 independent of region count K (0.8.3) |
| Scaling & the Fleet | One global limit across a fleet — Tier-1 over the existing RPCs (zero client change) and Tier-2 via the Fleet.Reserve lease, plus the Monitor door |
| Unified admission | One Decision across rate + concurrency + cost (LLM-gateway shape); algebra-proven, sequential or Lua-fused (0.9.0) |
| Pillar 4 — Weighted Fair Escrow | Weighted-fair, work-conserving budget split across tenants; multi-process L2-backed (0.9.1) |
| Advanced limiting | Multi-dimensional limits, adaptive concurrency, leaky-bucket shaping |
| Overload, fairness & DDoS | Adaptive load-shedding, fair-share & weighted fairness, fixed-memory sketches |
| Operations | Standards headers, trusted-proxy IP keys, PII-safe HMAC keys, OpenTelemetry, failure modes |
| Monitoring — ThrottleKit Lens | The built-in terminal dashboard (throttlekit-server --tui, eight tabs): live binding-axis attribution + the full ops board + the remote Monitor door, no browser or backend |
| Policy Plans | A terraform plan for limits — replay recorded traffic vs a candidate config → the allow↔deny diff, before you deploy |
| Replay | Deterministic What-If Replay — record a limiter's decisions, replay a candidate, read the flip ledger |
| Performance | Benchmarks, the honest head-to-head, and where it loses |
| Migrating | Drop-in paths from express-rate-limit and rate-limiter-flexible, plus recipes |
| Polyglot & Python | Reach the same limiter from any language — the throttlekit-server gRPC service + the throttlekit-py client; every axis, bit-for-bit |
| GALE & TALE — the guarantees | How the provable distributed-leasing and LLM token-budget-escrow paths are proven |
- README — the short version of this page.
- THROTTLEKIT.md — full design and architecture.
- SCOREBOARD.md — benchmarks, correctness guarantees, feature matrix.
- docs/FORMAL-MODEL.md — the formally-verified leasing bound.
- research/ — the design docs, proofs, and evals behind the guarantees.
- examples/ — a runnable file for every feature.
- CHANGELOG.md — release history.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE