Scaling and the Fleet

Scaling & the Fleet

ThrottleKit scales from a single in-process check to a globally-coordinated fleet on one design: one oracle, four doors, two coordination tiers. The same configuration that limits one process limits a thousand — and the distributed behaviour is something you can verify rather than hope for.

The one invariant — one oracle. Exactly one thing ever computes a Decision or sizes a grant: the Node core, directly or as Lua-in-Redis. Every other surface (Python, edge, a leased client) is a thin pipe. Scaling never adds a second rate limiter to keep in sync — so a fleet can't silently drift.

Two coordination tiers

Tier	What it is	Wire change	Reach from any client
Tier 1 — shared-store coordination	Server instances coordinate through a shared store via the core's coordinators. Configure a policy distributed; every client gets globally-coordinated decisions over the existing RPCs.	None	`Check` / `Debit` / `Admit` — unchanged
Tier 2 — client-held lease	A high-throughput client leases a chunk of the global budget and enforces locally, round-tripping only to refresh.	Additive `Fleet` service	new `Fleet.Reserve` door + a local-spend helper

Tier 1 is the default and the answer to "make it fleet-wide without touching my client." Tier 2 is the scale ceiling for a client that can't afford a round trip per request.

Tier 1 — distributed over the existing RPCs (zero client change)

Configure a policy as one of four distributed blocks and point every server instance at one shared store (--redis / --postgres / …). The instances coordinate through the store; the client calls the same RPC it always did and gets a fleet-coordinated decision. This is how the distributed features reach Python and every other language with no client change.

Feature	Config block	Served over	What it holds across the fleet
Cross-region federation	`federated:`	`Check`	One global per-window budget across regions (the core's `federate()`); `Δ = 0` independent of region count
Fleet token budget	`fleetBudget:`	`Debit`	One LLM/cost budget across every instance (the cost axis, fleet-wide)
Distributed concurrency	`distributedConcurrency:`	`Admit`	One in-flight ceiling across every instance, not `N ×` per-instance
Cross-region fair escrow	`federatedFairEscrow:`	`Check`	One weighted-fair budget `L` split across tenants, with fleet total `≤ L` across regions

version: 1
limiters:
  global-api:                 # ONE global rate limit across regions — served by plain Check
    federated: { batch: 16 }
    strategy: fixedWindow      # must be window-coupled (fixedWindow / slidingWindow / fixed-cadence quota)
    limit: 10000
    period: 1m
  completions:                # ONE token budget across the fleet — served by plain Debit
    fleetBudget: { budget: 1000000, windowMs: 60000 }
  checkout:                   # ONE in-flight ceiling across the fleet — served by plain Admit
    distributedConcurrency: { minLimit: 4, maxLimit: 200, aggregate: median }

Each is covered in depth on its own page — Federation, Overload, fairness & DDoS (fleet token budget), Distributed adaptive concurrency, and Pillar 4 — Weighted Fair Escrow (whose RedisRegionFairPool makes fair-escrow correct across separate region processes). Run the server with throttlekit-server; a memory/dynamodb store that can't coordinate fails fast at load, and unsupported ops on a distributed policy raise UNIMPLEMENTED rather than return a meaningless answer.

Tier 2 — the client-held lease (`Fleet.Reserve`)

A per-request Check/Debit round trip is the bottleneck for a very high-throughput client. The Fleet door hands such a client a chunk of a federated: policy's global per-window budget to spend locally, so it round-trips only to refresh — not once per request:

Reserve { policy: "global-api", caller: { domain: "acme" }, wants: 200 }
  → Lease { capacity: 200, expiry_ms, refresh_interval_ms, safe_capacity, retry_after_ms, limit }

The server is the one oracle. It computes the grant size via the policy's federation coordinator — a partial grant (capacity < wants) is legitimate, and the grant is window-coupled, discarded at expiry_ms. The client only spends it, with the core LeaseSpender (throttlekit/twotier) — a verbatim port of the leased-L1 spend, proven byte-for-byte against the shipped twoTier(leased, windowCoupled) path and pinned by a golden lease vector suite every polyglot port replays. The client never invents a denial; when capacity is 0 it surfaces the server's verdict. Local spend is ≈ 10 ns/op, so the lease effectively removes the network from the hot path.

The door is served automatically whenever a federated: policy is configured, on the same gRPC port, and is loopback-only by default (handing out budget is a poisoning vector) — set --fleet-secret (or THROTTLEKIT_FLEET_SECRET), paired with TLS, to use it from a remote peer. v1 leases the rate axis (Reserve returns UNIMPLEMENTED for concurrency, NOT_FOUND for a non-leasable policy). From Python, FleetBackend / LeasedLimiter wrap this — see Fleet & Monitor clients.

Clock skew, defended — `leaseWindowed`

A leased budget is only safe if the client discards leftover credits at exactly the global window boundary; if a node's clock runs fast it could discard early (wasting budget) or late (over-admitting). ThrottleKit closes this with an optional coordinator method, leaseWindowed(key, tokens) → { granted, expiresAt }, which returns the authoritative store-clock boundary atomically with the grant — the Redis TIME-derived (or Postgres clock_timestamp()-derived) window end, never a node-clock value. The Tier-2 client treats expiry_ms as authoritative and never extends it. The method is additive and optional: callers feature-detect it and fall back to a node-clock window, and the existing lease() is unchanged — so a coordinator that predates it still works, just without the skew-proof boundary.

The Monitor door — read the fleet remotely

The fleet's live operational state is readable from any language over the read-only Monitor door (throttlekit.v1.Monitor: GetSnapshot + Watch), with a Prometheus /metrics endpoint and standard gRPC health. It's the same state ThrottleKit Lens renders in the terminal — see that page for the full board and the auth posture.

Honest boundaries (the non-claims)

Tier-1 fleetBudget key-semantics: the wire key selects which budget (a per-policy key→store-key mapping). Two clients coordinate iff they resolve the same store key — which same-config instances do automatically.
Distributed CheckMany fans out to N coordinator round-trips (not the single consistent instant a local batch gives); distributed batch size is capped.
Peek / Forecast are UNIMPLEMENTED under federation/leasing (those limiters are async, window-only).
Federated fair-escrow is correct across N region instances only with the store-backed RedisRegionFairPool (--redis); a single-instance fairEscrow: is the right tool for one process.
Tier-2 lease decisions are made client-side, so the server's capture/Replay sees the lease grants, not each local spend — Tier-1 decisions remain fully observable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling and the Fleet

Scaling & the Fleet

Two coordination tiers

Tier 1 — distributed over the existing RPCs (zero client change)

Tier 2 — the client-held lease (`Fleet.Reserve`)

Clock skew, defended — `leaseWindowed`

The Monitor door — read the fleet remotely

Honest boundaries (the non-claims)

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Scaling and the Fleet

Scaling & the Fleet

Two coordination tiers

Tier 1 — distributed over the existing RPCs (zero client change)

Tier 2 — the client-held lease (Fleet.Reserve)

Clock skew, defended — leaseWindowed

The Monitor door — read the fleet remotely

Honest boundaries (the non-claims)

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Tier 2 — the client-held lease (`Fleet.Reserve`)

Clock skew, defended — `leaseWindowed`