-
Notifications
You must be signed in to change notification settings - Fork 0
Scaling and the Fleet
ThrottleKit scales from a single in-process check to a globally-coordinated fleet on one design: one oracle, four doors, two coordination tiers. The same configuration that limits one process limits a thousand — and the distributed behaviour is something you can verify rather than hope for.
The one invariant — one oracle. Exactly one thing ever computes a
Decisionor sizes a grant: the Node core, directly or as Lua-in-Redis. Every other surface (Python, edge, a leased client) is a thin pipe. Scaling never adds a second rate limiter to keep in sync — so a fleet can't silently drift.
| Tier | What it is | Wire change | Reach from any client |
|---|---|---|---|
| Tier 1 — shared-store coordination | Server instances coordinate through a shared store via the core's coordinators. Configure a policy distributed; every client gets globally-coordinated decisions over the existing RPCs. | None |
Check / Debit / Admit — unchanged |
| Tier 2 — client-held lease | A high-throughput client leases a chunk of the global budget and enforces locally, round-tripping only to refresh. | Additive Fleet service |
new Fleet.Reserve door + a local-spend helper |
Tier 1 is the default and the answer to "make it fleet-wide without touching my client." Tier 2 is the scale ceiling for a client that can't afford a round trip per request.
Configure a policy as one of four distributed blocks and point every server instance at one shared store (--redis / --postgres / …). The instances coordinate through the store; the client calls the same RPC it always did and gets a fleet-coordinated decision. This is how the distributed features reach Python and every other language with no client change.
| Feature | Config block | Served over | What it holds across the fleet |
|---|---|---|---|
| Cross-region federation | federated: |
Check |
One global per-window budget across regions (the core's federate()); Δ = 0 independent of region count |
| Fleet token budget | fleetBudget: |
Debit |
One LLM/cost budget across every instance (the cost axis, fleet-wide) |
| Distributed concurrency | distributedConcurrency: |
Admit |
One in-flight ceiling across every instance, not N × per-instance |
| Cross-region fair escrow | federatedFairEscrow: |
Check |
One weighted-fair budget L split across tenants, with fleet total ≤ L across regions |
version: 1
limiters:
global-api: # ONE global rate limit across regions — served by plain Check
federated: { batch: 16 }
strategy: fixedWindow # must be window-coupled (fixedWindow / slidingWindow / fixed-cadence quota)
limit: 10000
period: 1m
completions: # ONE token budget across the fleet — served by plain Debit
fleetBudget: { budget: 1000000, windowMs: 60000 }
checkout: # ONE in-flight ceiling across the fleet — served by plain Admit
distributedConcurrency: { minLimit: 4, maxLimit: 200, aggregate: median }Each is covered in depth on its own page — Federation, Overload, fairness & DDoS (fleet token budget), Distributed adaptive concurrency, and Pillar 4 — Weighted Fair Escrow (whose RedisRegionFairPool makes fair-escrow correct across separate region processes). Run the server with throttlekit-server; a memory/dynamodb store that can't coordinate fails fast at load, and unsupported ops on a distributed policy raise UNIMPLEMENTED rather than return a meaningless answer.
A per-request Check/Debit round trip is the bottleneck for a very high-throughput client. The Fleet door hands such a client a chunk of a federated: policy's global per-window budget to spend locally, so it round-trips only to refresh — not once per request:
Reserve { policy: "global-api", caller: { domain: "acme" }, wants: 200 }
→ Lease { capacity: 200, expiry_ms, refresh_interval_ms, safe_capacity, retry_after_ms, limit }
The server is the one oracle. It computes the grant size via the policy's federation coordinator — a partial grant (capacity < wants) is legitimate, and the grant is window-coupled, discarded at expiry_ms. The client only spends it, with the core LeaseSpender (throttlekit/twotier) — a verbatim port of the leased-L1 spend, proven byte-for-byte against the shipped twoTier(leased, windowCoupled) path and pinned by a golden lease vector suite every polyglot port replays. The client never invents a denial; when capacity is 0 it surfaces the server's verdict. Local spend is ≈ 10 ns/op, so the lease effectively removes the network from the hot path.
The door is served automatically whenever a federated: policy is configured, on the same gRPC port, and is loopback-only by default (handing out budget is a poisoning vector) — set --fleet-secret (or THROTTLEKIT_FLEET_SECRET), paired with TLS, to use it from a remote peer. v1 leases the rate axis (Reserve returns UNIMPLEMENTED for concurrency, NOT_FOUND for a non-leasable policy). From Python, FleetBackend / LeasedLimiter wrap this — see Fleet & Monitor clients.
A leased budget is only safe if the client discards leftover credits at exactly the global window boundary; if a node's clock runs fast it could discard early (wasting budget) or late (over-admitting). ThrottleKit closes this with an optional coordinator method, leaseWindowed(key, tokens) → { granted, expiresAt }, which returns the authoritative store-clock boundary atomically with the grant — the Redis TIME-derived (or Postgres clock_timestamp()-derived) window end, never a node-clock value. The Tier-2 client treats expiry_ms as authoritative and never extends it. The method is additive and optional: callers feature-detect it and fall back to a node-clock window, and the existing lease() is unchanged — so a coordinator that predates it still works, just without the skew-proof boundary.
The fleet's live operational state is readable from any language over the read-only Monitor door (throttlekit.v1.Monitor: GetSnapshot + Watch), with a Prometheus /metrics endpoint and standard gRPC health. It's the same state ThrottleKit Lens renders in the terminal — see that page for the full board and the auth posture.
-
Tier-1
fleetBudgetkey-semantics: the wirekeyselects which budget (a per-policy key→store-key mapping). Two clients coordinate iff they resolve the same store key — which same-config instances do automatically. -
Distributed
CheckManyfans out to N coordinator round-trips (not the single consistent instant a local batch gives); distributed batch size is capped. -
Peek/ForecastareUNIMPLEMENTEDunder federation/leasing (those limiters are async, window-only). -
Federated fair-escrow is correct across N region instances only with the store-backed
RedisRegionFairPool(--redis); a single-instancefairEscrow:is the right tool for one process. - Tier-2 lease decisions are made client-side, so the server's capture/Replay sees the lease grants, not each local spend — Tier-1 decisions remain fully observable.
- Distributed & provable — the stores, two-tier leasing, and the formally-verified overshoot bound the lease path inherits.
- Polyglot & Python — reaching all of the above from another language through the service door.
- Monitoring — ThrottleKit Lens — the dashboard and the Monitor door.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE