-
Notifications
You must be signed in to change notification settings - Fork 0
Federation
federate(...) ships a cross-cluster rate limiter that pools a single
global budget across K regions, with a formally-verified, K-INDEPENDENT
overshoot bound:
admitted_per_global_window ≤ Limit (for ANY number of regions K)
The TLA⁺ spec is spec/GaleFederatedLeasing.tla; the BFS twin in
test/gale/federated/leasing-variants.test.ts re-runs it in CI without
Java. The end-to-end eval against a real Lua-backed coordinator is in
research/bigger-bets/federation/eval/RESULTS.md.
When to use this vs
twoTier(leased). If your fleet shares ONE regional Redis (one cluster, many processes),twoTier(leased)withwindowCoupled: truealready gives you the K-independent bound. Usefederate(...)only when your processes span MULTIPLE regions and you need a single global budget across them.
import { fixedWindow } from "throttlekit";
import { federate, RedisCoordinator } from "throttlekit/federation";
import { fromIoredis } from "throttlekit/redis";
import Redis from "ioredis";
// One global coordinator Redis (the federation's "L3").
const coordinator = new RedisCoordinator({
client: fromIoredis(new Redis(process.env.GLOBAL_COORDINATOR_URL!)),
windowMs: 60_000,
budgetPerWindow: 1000,
});
// One federated Limiter per region. They all share `coordinator`.
const limiter = federate({
strategy: fixedWindow({ limit: 1000, windowMs: 60_000 }),
coordinator,
region: "us-east", // your region identifier
batch: 16, // escrow size; 16 is a sensible default
});
const decision = await limiter.check("user:42");Run the full example end-to-end:
docker compose -f research/bigger-bets/federation/eval/docker-compose.yml up -d
npx tsx examples/federation.ts
docker compose -f research/bigger-bets/federation/eval/docker-compose.yml down-
Each region holds a local escrow lease —
batchunits of budget drawn from the global coordinator in a single cross-region RPC. -
Most requests serve from local escrow — no coordinator hit. The
amortized coordinator cost is
1/batchper request. -
At the global window boundary, escrow EXPIRES —
Rollin the formal model. Un-served escrow forfeits, then reconciles back to the coordinator (idempotent onwindowStart) for the next window. This is the window-coupling rule — it collapses the per-window overshoot to zero. -
On coordinator outage, regions fail closed — the existing escrow
keeps serving until exhausted, then denies. The bound holds at
Δ = 0even through partition.
| Scheme | Δ (overshoot) | Pooling under skew | Failure mode |
|---|---|---|---|
| Per-region independent limiters | Unbounded | None | Each region serves its own budget |
Static partition (L/K each) |
0 | None — hot region binds at L/K | Each region serves its own L/K |
federate(...) |
0 | Full — hot region can draw the whole budget | Fail closed; bound preserved through partition |
| CRDT / gossip merge | Bounded by staleness | Full | Staleness-Δ tradeoff (research follow-up) |
At max skew (s = 1, all load on one region) the static partition admits
only L/K; federate(...) admits up to the full L. The eval rows in
RESULTS.md:
| skew | static U_capacity
|
federated U_capacity
|
|---|---|---|
| 0.00 | 1.000 | 0.973 |
| 0.25 | 0.833 | 0.977 |
| 0.50 | 0.667 | 0.990 |
| 0.75 | 0.500 | 0.957 |
| 1.00 | 0.333 | 1.000 |
The −0.027 at uniform load is the bounded batch overhead — at most
(K−1)·(batch−1) un-served escrow per window. Tighten batch to shrink
that gap at the cost of more coordinator round trips.
The bound holds end-to-end through every coordinator outage shape;
detailed table in
docs/FAILURE-MODES.md.
Summary:
| Outage shape | Behavior | Δ |
|---|---|---|
| Region partitioned from coordinator | Region serves existing escrow until empty, then denies (fail-closed default) | 0 |
| Coordinator crash + recovery within a window | Region serves existing escrow; on recovery, leases resume against the preserved budget | 0 |
| Coordinator unavailable across a window boundary | All regions deny during outage; on recovery, fresh window's budget acquired normally | 0 |
For soft-traffic operators who prefer availability, onCoordinatorOutage: "regional-only" (TK-906+) falls back to per-region limits during outage —
Δ degrades to the regional limit, not the federation limit; documented
opt-in.
| Coordinator | When to use | Status |
|---|---|---|
TestCoordinator (in-memory) |
Tests + examples; deterministic | Shipped 0.8.3 |
RedisCoordinator (single global Redis) |
Production default; documented SPOF | Shipped 0.8.3 |
PostgresCoordinator |
When you already run Postgres (no separate Redis) | 0.8.x follow-up |
| Raft-via-etcd | HA-without-SPOF (the SPOF mitigation) | 1.0.x follow-up |
| CRDT / gossip | Multi-leader with bounded staleness | Research follow-up |
The GlobalCoordinator interface is small (3 methods); rolling a custom
backend is a couple of dozen lines of Lua / SQL / equivalent.
A single global Redis IS a single point of failure for the federation's
safety bound. When the Redis is unreachable, every region's lease()
throws → fail-closed (default) → no new admissions across the entire
federation until the Redis returns. The mitigations:
- Sentinel / Cluster under your Redis client — the Lua scripts work unchanged.
-
PostgresCoordinator(0.8.x follow-up) — replaces the SPOF with Postgres failover semantics (synchronous replication, automatic primary promotion). - Raft-via-etcd (1.0.x) — true HA-without-SPOF.
For 0.8.3 the SPOF is documented; users in regulated environments should
opt for Sentinel or wait for PostgresCoordinator.
-
DESIGN.md —
full design, the lift argument from
windowCoupled, the lockedGlobalCoordinatorinterface, failure semantics in depth, the worked example. -
spec/GaleFederatedLeasing.tla— the TLA⁺ spec provingadmitted ≤ Limitindependent ofK. -
test/gale/federated/leasing-variants.test.ts— the CI-runnable BFS twin matching TLC's distinct-state counts. -
research/bigger-bets/federation/eval/RESULTS.md— the end-to-end 3-region eval with measured Δ and U numbers.
federate(...) returns a regular Limiter. It composes naturally with
the rest of the library:
- Wrap with
withAnalytics(...)for heavy-hitter detection. - Tap with
tapDecisions(...)for OTel / Prometheus integration. - Use with adapters (
createEnforcer(...), the Express middleware, etc.) via the standardLimiterinterface. - Stack
twoTier(leased)on TOP for an in-process L1 cache; the recursive twoTier composition is the canonical multi-process per- region setup.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE