-
Notifications
You must be signed in to change notification settings - Fork 0
Polyglot and Python
ThrottleKit is a Node library, but you don't have to be on Node to use it. A layered-hybrid design lets a Python (or any-language) service reach the same limiter — the same decisions, bit-for-bit — without re-implementing a single algorithm.
The load-bearing rule: exactly one thing computes a Decision — the Node core, directly or as Lua-in-Redis. Every other surface is a thin pipe, conformance-checked against one set of language-neutral golden vectors. There is no second rate limiter to keep in sync and no float-determinism risk.
| Door | Package | Decision computed in | Reach |
|---|---|---|---|
| Service |
throttlekit-server (a gRPC service) |
the service (= the core) | the full surface — rate, cost, concurrency, unified |
| Direct | the client runs the core's vendored Lua against the same Redis your Node fleet uses | Lua-in-Redis (the core's own script) |
check only — one hop, no extra service |
Both prove themselves against the golden vectors; neither re-derives a decision.
A small standalone service (depends on the published throttlekit + @grpc/grpc-js). It loads a .throttlekit.yaml, runs the core, and answers the throttlekit.proto contract (throttlekit.v1.RateLimiter). A denial is a normal decision (allowed: false), never an RPC error — operational faults map to gRPC status codes (NOT_FOUND / UNIMPLEMENTED / UNAVAILABLE).
npx throttlekit-server --config .throttlekit.yaml --port 50051
# pick a shared store for a coordinated fleet (omit for a single-instance in-process memory store):
# --redis redis://… Redis
# --postgres-url postgres://user:pass@host/db Postgres (no Redis needed)
# --store dynamodb --dynamodb-table tk DynamoDB (+ --dynamodb-create-table to provision)
# --tls-cert/--tls-key/--tls-ca for mTLSThe server is store-agnostic behind a pluggable resolver — --store memory|redis|postgres|dynamodb (inferred from the URL flags when omitted). The client sends the same requests regardless of backend; the core computes every decision server-side, so they stay bit-identical. (Deno KV and Cloudflare D1 / Durable Objects / Workers KV are edge-runtime stores — reachable only inside those runtimes, not through the Node service door.)
# .throttlekit.yaml — one file, every axis
version: 1
limiters:
api: # rate (the base axis)
{ strategy: gcra, limit: 1000, period: 1m, burst: 100 }
leased: # two-tier leased (cut the per-request L2 round trip)
strategy: gcra
limit: 1000
period: 1m
twoTier: { mode: leased, batch: 100, windowCoupled: true }
completions: # cost axis — a windowed token budget (LLM gateway)
tokenBudget: { budget: 100000, windowMs: 60000 }
checkout: # concurrency — at most N requests in flight
concurrency: { maxLimit: 64 }
unified: # rate AND concurrency, whichever binds first
strategy: gcra
limit: 1000
period: 1m
concurrency: { maxLimit: 64 }The proto is additive-evolvable and is the stable polyglot contract; the raw Lua wire is behavior-locked but deliberately not frozen (it can change with the core's scripts). See research/polyglot/DESIGN.md.
Installed as throttlekit-py, imported as throttlekit (PyPI's throttlekit is an unrelated project):
pip install throttlekit-py # the gRPC ServiceBackend
pip install "throttlekit-py[redis]" # + a redis client for the direct RedisBackendEvery axis is reachable. A denial is always a normal Decision/Admission, never an exception.
from throttlekit import ServiceBackend
with ServiceBackend("localhost:50051") as rl:
# Rate — the base axis (also check_many / peek / forecast)
if not rl.check("api", api_key).allowed:
return 429
# Cost — debit the actual tokens a stream produces (the LLM-gateway problem)
rl.debit("completions", tenant, tokens=n)
# Concurrency / unified — hold an in-flight slot for the duration of the work
with rl.admit("checkout", user_id) as adm:
if not adm.allowed:
return 429 # adm.binding_axis names the axis that bound it
do_work() # released on exit (dropped=True if it raises)| Axis | Python | Notes |
|---|---|---|
| Rate |
check / check_many / peek / forecast
|
the base limiter; peek/forecast are service-door only |
| Two-tier leased |
check (transparent) |
the policy is configured as twoTier server-side; the client just calls check
|
| Cost / token budget | debit(policy, key, tokens) |
windowed budget; per-token debiting overshoots by 0 |
| Concurrency / unified | admit(policy, key) → Admission |
holds a crash-safe lease; heartbeat=True for long holds |
Crash safety for admit: a granted admission holds a server lease; if the client crashes without releasing, the server reclaims the slot once the lease TTL lapses without a heartbeat — the same node↔coordinator contract the core uses for distributed concurrency, one layer out.
The direct Redis door runs the core's vendored Lua against the same Redis a Node fleet shares, and replays the full rate-limit golden vectors through real Redis to reproduce the Node oracle bit-for-bit:
import redis
from throttlekit import RedisBackend, Gcra
api = RedisBackend(redis.Redis.from_url("redis://localhost:6379"),
Gcra(limit=100, period_ms=60_000, burst=20), prefix="prod")
d = api.check(api_key) # decided server-side, in Lua; same key scheme as the Node corethrottlekit-py vendors the contract from the core with checksums (scripts/sync_contract.py): the throttlekit.proto, the golden vectors, and the runtime Lua (with the core's manifest.json). A drift-gate test fails if any vendored byte diverges from its pinned checksum, and the real proof is behavioral — the cross-language conformance suite replays every rate-limit vector through Python → vendored Lua → real Redis and asserts each reply field equals the Node oracle.
The Tier-1 distributed features reach Python with no client change — they ride the server's existing decision RPCs, so a stock ServiceBackend gets them just by pointing at a fleet-configured policy: federated: / federatedFairEscrow: over check, fleetBudget: over debit, and distributedConcurrency: over admit. The coordination lives server-side; the client just makes the same call it always did. On top of that the server adds two new additive services with first-class Python clients (0.5.0): the Fleet lease door (Fleet.Reserve — a client-held window-coupled lease, the Tier-2 path) and the read-only Monitor door (GetSnapshot / Watch, the programmable observability surface — see Operations). Full picture in Scaling & the Fleet.
The polyglot surface is experimental (alpha). The .proto is the comfortable, additive-only contract (Monitor + Fleet are new additive services; the decision RPCs are unchanged); the raw Lua wire ships frozen: false, so the direct RedisBackend may change with the core's scripts. The single-instance stateful axes (cost/concurrency) still work standalone; the fleet-coordinated variants above are reachable by the same client API.
-
throttlekit-server— the gRPC service (README + deploy/failure-mode tables). -
throttlekit-py— the Python client (wiki). - Unified admission · Distributed adaptive concurrency — the axes, in the Node core.
- GALE & TALE — where the cost/concurrency axes come from.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE