Skip to content
ameyaborkar edited this page Jun 10, 2026 · 8 revisions

throttlekit-py

Beyond rate limiting — from Python. Govern rate, concurrency & cost, provably: this is ThrottleKit's Python client, returning decisions from the one Node core and its two engines — GALE (provable distributed leasing) and TALE (LLM token-budget escrow) — bit-identical to the Node oracle, through either of two pluggable backends.

Installed as throttlekit-py, imported as throttlekit (PyPI's throttlekit is an unrelated project).

🌐 throttlekit.in · 📦 PyPI · 🧪 runnable examples (one script per axis)

The one invariant

The whole design rests on it: exactly one thing computes a Decision — the Node core, directly or as Lua-in-Redis. Neither backend re-implements an algorithm, so there is no second rate limiter to keep in sync and no float-determinism risk. The client transports decisions; it never derives them.

Why reach for it

The decisions you get back carry the core's guarantees — not a re-implemented approximation:

  • a machine-checked (TLA⁺), fleet-size-independent overshoot bound — window-coupled leasing admits ≤ the limit at any fleet size; most rate limiters state no bound at all;
  • GALE (provable distributed leasing) and TALE (LLM token-budget escrow), shipped as features and reachable from Python — leased two-tier check, the cost axis via debit, unified rate × concurrency × cost via admit;
  • bit-identical results, replayed against the core's golden vectors through real Redis;
  • fleet scale with no client change — point the same ServiceBackend at a distributed-configured server (federated: / fleetBudget: / distributedConcurrency:) for fleet-coordinated decisions, or lease a chunk of the global budget with the new FleetBackend (0.5.0).

A Python service gets the same proven core a Node fleet does, not a second limiter to keep in sync. The guarantees — how they work — are what make ThrottleKit worth reaching for from any language.

Two backends

Backend Path Decision computed in Use it when
ServiceBackend gRPC → throttlekit-server the service (= the core) you want the full surface (rate · cost · concurrency · unified) and never to touch the raw wire
RedisBackend vendored Lua → the same Redis a Node fleet uses Lua-in-Redis (the core's script) you already run Redis and want one hop, no extra service — check only

Install

pip install throttlekit-py            # the gRPC ServiceBackend
pip install "throttlekit-py[redis]"   # + a redis client for the direct RedisBackend

30 seconds

from throttlekit import ServiceBackend

with ServiceBackend("localhost:50051") as rl:
    d = rl.check("api", api_key)
    if not d.allowed:
        ...  # 429 — retry after d.retry_after_ms

A denial is a normal Decision (allowed is False), never an exception; gRPC faults map to PolicyNotFoundError / OperationNotSupportedError / ServiceUnavailableError.

Guides

Page What's in it
Getting Started Install, the two backends, the Decision object, errors
The axes Every axis from Python: check (rate), debit (cost), leased two-tier, admit (concurrency / unified)
Fleet & Monitor clients Tier-2 fleet leasing (FleetBackend / LeasedLimiter) + reading the server's live state (MonitorBackend) — new in 0.5.0
Conformance & development How it stays in lock-step with the core, and how to develop / contribute

Status

Experimental (alpha). The contract (throttlekit.proto, the golden vectors, the extracted Lua) is vendored and checksum-pinned from the throttlekit core's frozen public API; this client tracks it. The .proto evolves additively only0.5.0 adds the read-only Monitor and Fleet services alongside the unchanged decision RPCs — while the raw Lua wire ships frozen: false, so the RedisBackend is explicitly experimental and may change with the core's scripts.

See also