Skip to content

Polyglot and Python

Ameya Borkar edited this page Jun 10, 2026 · 4 revisions

Polyglot — one core, every language

ThrottleKit is a Node library, but you don't have to be on Node to use it. A layered-hybrid design lets a Python (or any-language) service reach the same limiter — the same decisions, bit-for-bit — without re-implementing a single algorithm.

The load-bearing rule: exactly one thing computes a Decision — the Node core, directly or as Lua-in-Redis. Every other surface is a thin pipe, conformance-checked against one set of language-neutral golden vectors. There is no second rate limiter to keep in sync and no float-determinism risk.

Two doors

Door Package Decision computed in Reach
Service throttlekit-server (a gRPC service) the service (= the core) the full surface — rate, cost, concurrency, unified
Direct the client runs the core's vendored Lua against the same Redis your Node fleet uses Lua-in-Redis (the core's own script) check only — one hop, no extra service

Both prove themselves against the golden vectors; neither re-derives a decision.

The service door — throttlekit-server

A small standalone service (depends on the published throttlekit + @grpc/grpc-js). It loads a .throttlekit.yaml, runs the core, and answers the throttlekit.proto contract (throttlekit.v1.RateLimiter). A denial is a normal decision (allowed: false), never an RPC error — operational faults map to gRPC status codes (NOT_FOUND / UNIMPLEMENTED / UNAVAILABLE).

npx throttlekit-server --config .throttlekit.yaml --port 50051
#   pick a shared store for a coordinated fleet (omit for a single-instance in-process memory store):
#     --redis redis://…                             Redis
#     --postgres-url postgres://user:pass@host/db   Postgres (no Redis needed)
#     --store dynamodb --dynamodb-table tk          DynamoDB (+ --dynamodb-create-table to provision)
#   --tls-cert/--tls-key/--tls-ca for mTLS

The server is store-agnostic behind a pluggable resolver — --store memory|redis|postgres|dynamodb (inferred from the URL flags when omitted). The client sends the same requests regardless of backend; the core computes every decision server-side, so they stay bit-identical. (Deno KV and Cloudflare D1 / Durable Objects / Workers KV are edge-runtime stores — reachable only inside those runtimes, not through the Node service door.)

# .throttlekit.yaml — one file, every axis
version: 1
limiters:
  api:                      # rate (the base axis)
    { strategy: gcra, limit: 1000, period: 1m, burst: 100 }
  leased:                   # two-tier leased (cut the per-request L2 round trip)
    strategy: gcra
    limit: 1000
    period: 1m
    twoTier: { mode: leased, batch: 100, windowCoupled: true }
  completions:              # cost axis — a windowed token budget (LLM gateway)
    tokenBudget: { budget: 100000, windowMs: 60000 }
  checkout:                 # concurrency — at most N requests in flight
    concurrency: { maxLimit: 64 }
  unified:                  # rate AND concurrency, whichever binds first
    strategy: gcra
    limit: 1000
    period: 1m
    concurrency: { maxLimit: 64 }

The proto is additive-evolvable and is the stable polyglot contract; the raw Lua wire is behavior-locked but deliberately not frozen (it can change with the core's scripts). See research/polyglot/DESIGN.md.

The Python client — throttlekit-py

Installed as throttlekit-py, imported as throttlekit (PyPI's throttlekit is an unrelated project):

pip install throttlekit-py            # the gRPC ServiceBackend
pip install "throttlekit-py[redis]"   # + a redis client for the direct RedisBackend

Every axis is reachable. A denial is always a normal Decision/Admission, never an exception.

from throttlekit import ServiceBackend

with ServiceBackend("localhost:50051") as rl:
    # Rate — the base axis (also check_many / peek / forecast)
    if not rl.check("api", api_key).allowed:
        return 429

    # Cost — debit the actual tokens a stream produces (the LLM-gateway problem)
    rl.debit("completions", tenant, tokens=n)

    # Concurrency / unified — hold an in-flight slot for the duration of the work
    with rl.admit("checkout", user_id) as adm:
        if not adm.allowed:
            return 429                 # adm.binding_axis names the axis that bound it
        do_work()                      # released on exit (dropped=True if it raises)
Axis Python Notes
Rate check / check_many / peek / forecast the base limiter; peek/forecast are service-door only
Two-tier leased check (transparent) the policy is configured as twoTier server-side; the client just calls check
Cost / token budget debit(policy, key, tokens) windowed budget; per-token debiting overshoots by 0
Concurrency / unified admit(policy, key) → Admission holds a crash-safe lease; heartbeat=True for long holds

Crash safety for admit: a granted admission holds a server lease; if the client crashes without releasing, the server reclaims the slot once the lease TTL lapses without a heartbeat — the same node↔coordinator contract the core uses for distributed concurrency, one layer out.

The direct Redis door runs the core's vendored Lua against the same Redis a Node fleet shares, and replays the full rate-limit golden vectors through real Redis to reproduce the Node oracle bit-for-bit:

import redis
from throttlekit import RedisBackend, Gcra

api = RedisBackend(redis.Redis.from_url("redis://localhost:6379"),
                   Gcra(limit=100, period_ms=60_000, burst=20), prefix="prod")
d = api.check(api_key)   # decided server-side, in Lua; same key scheme as the Node core

How it stays in lock-step

throttlekit-py vendors the contract from the core with checksums (scripts/sync_contract.py): the throttlekit.proto, the golden vectors, and the runtime Lua (with the core's manifest.json). A drift-gate test fails if any vendored byte diverges from its pinned checksum, and the real proof is behavioral — the cross-language conformance suite replays every rate-limit vector through Python → vendored Lua → real Redis and asserts each reply field equals the Node oracle.

Fleet-coordinated features — over the existing RPCs

The Tier-1 distributed features reach Python with no client change — they ride the server's existing decision RPCs, so a stock ServiceBackend gets them just by pointing at a fleet-configured policy: federated: / federatedFairEscrow: over check, fleetBudget: over debit, and distributedConcurrency: over admit. The coordination lives server-side; the client just makes the same call it always did. On top of that the server adds two new additive services with first-class Python clients (0.5.0): the Fleet lease door (Fleet.Reserve — a client-held window-coupled lease, the Tier-2 path) and the read-only Monitor door (GetSnapshot / Watch, the programmable observability surface — see Operations). Full picture in Scaling & the Fleet.

Status

The polyglot surface is experimental (alpha). The .proto is the comfortable, additive-only contract (Monitor + Fleet are new additive services; the decision RPCs are unchanged); the raw Lua wire ships frozen: false, so the direct RedisBackend may change with the core's scripts. The single-instance stateful axes (cost/concurrency) still work standalone; the fleet-coordinated variants above are reachable by the same client API.

See also

Clone this wiki locally