Federation

Federation — one global limit across regions, with `Δ = 0`

federate(...) ships a cross-cluster rate limiter that pools a single global budget across K regions, with a formally-verified, K-INDEPENDENT overshoot bound:

admitted_per_global_window  ≤  Limit          (for ANY number of regions K)

The TLA⁺ spec is spec/GaleFederatedLeasing.tla; the BFS twin in test/gale/federated/leasing-variants.test.ts re-runs it in CI without Java. The end-to-end eval against a real Lua-backed coordinator is in research/bigger-bets/federation/eval/RESULTS.md.

When to use this vs twoTier(leased). If your fleet shares ONE regional Redis (one cluster, many processes), twoTier(leased) with windowCoupled: true already gives you the K-independent bound. Use federate(...) only when your processes span MULTIPLE regions and you need a single global budget across them.

From any client, no client change. Behind a ThrottleKit server this is a Tier-1 feature reachable over the server's existing Check RPC (federated: policy) — a stock client (including Python) gets it just by pointing at a federated policy. See Scaling & the Fleet.

Quick start

import { fixedWindow } from "throttlekit";
import { federate, RedisCoordinator } from "throttlekit/federation";
import { fromIoredis } from "throttlekit/redis";
import Redis from "ioredis";

// One global coordinator Redis (the federation's "L3").
const coordinator = new RedisCoordinator({
  client: fromIoredis(new Redis(process.env.GLOBAL_COORDINATOR_URL!)),
  windowMs: 60_000,
  budgetPerWindow: 1000,
});

// One federated Limiter per region. They all share `coordinator`.
const limiter = federate({
  strategy: fixedWindow({ limit: 1000, windowMs: 60_000 }),
  coordinator,
  region: "us-east",            // your region identifier
  batch: 16,                    // escrow size; 16 is a sensible default
});

const decision = await limiter.check("user:42");

Run the full example end-to-end:

docker compose -f research/bigger-bets/federation/eval/docker-compose.yml up -d
npx tsx examples/federation.ts
docker compose -f research/bigger-bets/federation/eval/docker-compose.yml down

What it does

Each region holds a local escrow lease — batch units of budget drawn from the global coordinator in a single cross-region RPC.
Most requests serve from local escrow — no coordinator hit. The amortized coordinator cost is 1/batch per request.
At the global window boundary, escrow EXPIRES — Roll in the formal model. Un-served escrow forfeits, then reconciles back to the coordinator (idempotent on windowStart) for the next window. This is the window-coupling rule — it collapses the per-window overshoot to zero.
On coordinator outage, regions fail closed — the existing escrow keeps serving until exhausted, then denies. The bound holds at Δ = 0 even through partition.

The contribution — pooling under skew

Scheme	Δ (overshoot)	Pooling under skew	Failure mode
Per-region independent limiters	Unbounded	None	Each region serves its own budget
Static partition (`L/K` each)	0	None — hot region binds at L/K	Each region serves its own L/K
`federate(...)`	0	Full — hot region can draw the whole budget	Fail closed; bound preserved through partition
CRDT / gossip merge	Bounded by staleness	Full	Staleness-Δ tradeoff (research follow-up)

At max skew (s = 1, all load on one region) the static partition admits only L/K; federate(...) admits up to the full L. The eval rows in RESULTS.md:

skew	static `U_capacity`	federated `U_capacity`
0.00	1.000	0.973
0.25	0.833	0.977
0.50	0.667	0.990
0.75	0.500	0.957
1.00	0.333	1.000

The −0.027 at uniform load is the bounded batch overhead — at most (K−1)·(batch−1) un-served escrow per window. Tighten batch to shrink that gap at the cost of more coordinator round trips.

Failure modes

The bound holds end-to-end through every coordinator outage shape; detailed table in docs/FAILURE-MODES.md. Summary:

Outage shape	Behavior	Δ
Region partitioned from coordinator	Region serves existing escrow until empty, then denies (fail-closed default)	0
Coordinator crash + recovery within a window	Region serves existing escrow; on recovery, leases resume against the preserved budget	0
Coordinator unavailable across a window boundary	All regions deny during outage; on recovery, fresh window's budget acquired normally	0
Coordinator outage with `onCoordinatorOutage: "regional-only"` (0.8.5; requires `regionalEscrow`)	Engine continues serving from the L2 balance until depletion; re-probes via `coordinator.isHealthy()` every `coordinatorHealthCheckMs` (default 5 s); resumes on recovery	≤ L2 balance
Multi-process within a region sharing a `regionalEscrow` (0.8.5)	M engines share one L2; in-flight per-region escrow bounded by `perKeyBudget`, not `M × batch`	0
Regional Redis (L2) outage, coordinator reachable (0.8.5)	Engine falls through to direct L3 leasing (matches 0.8.4 behavior); multi-process atomicity lost for the outage duration	0

For soft-traffic operators who prefer availability, onCoordinatorOutage: "regional-only" (shipped 0.8.5; requires a regionalEscrow) keeps serving from the L2 balance during outage — Δ degrades to ≤ the L2 balance at outage onset, not 0; documented opt-in. See the multi-process regional escrow section below.

Coordinator backends

Coordinator	When to use	Status
`TestCoordinator` (in-memory)	Tests + examples; deterministic	Shipped 0.8.3
`RedisCoordinator` (single global Redis)	Production default; lowest latency; documented SPOF	Shipped 0.8.3
`PostgresCoordinator` (single Postgres primary)	When you already run Postgres; faster failover via Patroni	Shipped 0.8.4
Raft-via-etcd	HA-without-SPOF (the SPOF mitigation)	Future
CRDT / gossip	Multi-leader with bounded staleness	Research follow-up

Both RedisCoordinator and PostgresCoordinator implement the identical GlobalCoordinator interface — same window-coupling guarantee (Δ = 0), same K-independent bound. They are drop-in interchangeable; the choice is operational, not semantic.

Axis	`RedisCoordinator`	`PostgresCoordinator`
Latency per lease	~0.5–1 ms (Lua EVALSHA)	~1–3 ms (transactional SQL)
Throughput cap	100K+ leases/sec	5K–20K leases/sec
HA story	Sentinel / Cluster	Synchronous replication + Patroni / pg_auto_failover
Durability	Configurable (RDB / AOF)	WAL + sync replication (byte-durable)

`PostgresCoordinator` quick start

import { Pool } from "pg";
import { fixedWindow } from "throttlekit";
import { federate, PostgresCoordinator } from "throttlekit/federation";

const pool = new Pool({ connectionString: process.env.GLOBAL_COORDINATOR_PG_URL });

const coordinator = new PostgresCoordinator({
  pool,
  windowMs: 60_000,
  budgetPerWindow: 1000,
});

const limiter = federate({
  strategy: fixedWindow({ limit: 1000, windowMs: 60_000 }),
  coordinator,
  region: "us-east",
  batch: 16,
});

// Same Limiter surface as RedisCoordinator-backed federation.
const decision = await limiter.check("global:checkout");

// On shutdown:
coordinator.close(); // stops the background GC timer
await pool.end();

The Postgres-backed coordinator creates its own table (tk_fed_state) on first use — no migration tool required. Schema details in research/postgres-coordinator/DESIGN.md.

The GlobalCoordinator interface is small (3 methods); rolling a custom backend is a couple of dozen lines of Lua / SQL / equivalent.

Multi-process regional escrow (0.8.5)

When you run multiple processes in the same region (e.g., M=4 Node workers behind a load balancer), the default federation engine has each process hold its own in-process escrow — so in-flight per-region escrow is M × batch. For most workloads this is fine (the federation bound Δ = 0 still holds at the global coordinator), but two scenarios benefit from a tighter regional sub-bound:

Tight regional sub-bounds (rare; some billing/quota stories want per-region admissions ≤ perKeyBudget not M × batch).
The regional-only outage mode — without an L2 to serve from during a coordinator outage, the mode collapses to fail-closed.

The fix is a regional escrow — an L2 layer between in-process L1 and the cross-region L3 coordinator, typically backed by a regional Redis cluster shared by all M processes:

import { createClient } from "redis";
import { fixedWindow } from "throttlekit";
import {
  federate,
  RedisCoordinator,
  RedisRegionalEscrow,
} from "throttlekit/federation";
import { fromNodeRedis } from "throttlekit/redis";

const l2 = createClient({ url: process.env.REGIONAL_REDIS_URL });
const l3 = createClient({ url: process.env.GLOBAL_COORDINATOR_REDIS_URL });
await l2.connect();
await l3.connect();

const coordinator = new RedisCoordinator({
  client: fromNodeRedis(l3),
  windowMs: 60_000,
  budgetPerWindow: 1000,
});

const regionalEscrow = new RedisRegionalEscrow({
  client: fromNodeRedis(l2),
  windowMs: 60_000,
  region: "us-east",
});

const limiter = federate({
  strategy: fixedWindow({ limit: 1000, windowMs: 60_000 }),
  coordinator,
  regionalEscrow,
  region: "us-east",
  batch: 16,
  onCoordinatorOutage: "regional-only", // optional; default is "fail-closed"
});

The RegionalEscrow interface mirrors GlobalCoordinator one layer down (lease / refill / release / optional isHealthy); the default RedisRegionalEscrow uses three atomic Lua scripts, same pattern as RedisCoordinator. TestRegionalEscrow is the deterministic in-memory mirror for tests + examples.

regional-only outage mode: when set, on a coordinator outage the engine continues serving from the L2 balance until depletion. It re-probes coordinator.isHealthy() every coordinatorHealthCheckMs (default 5 s, clock-driven not timer-driven); on recovery, normal lease

reconcile resumes. The federation bound (Δ = 0) degrades to the regional sub-bound (≤ L2 balance at outage onset) during the outage and is re-enforced from the recovery point onward.

Backward compatible: federate({ ... }) without regionalEscrow uses the legacy 0.8.4 in-process-only path verbatim. See examples/federation-regional-escrow.ts for a runnable demo, and research/regional-escrow/DESIGN.md for the full design.

SPOF caveat

A single global Redis IS a single point of failure for the federation's safety bound. When the Redis is unreachable, every region's lease() throws → fail-closed (default) → no new admissions across the entire federation until the Redis returns. The mitigations:

Sentinel / Cluster under your Redis client — the Lua scripts work unchanged.
PostgresCoordinator (0.8.x follow-up) — replaces the SPOF with Postgres failover semantics (synchronous replication, automatic primary promotion).
Raft-via-etcd (1.0.x) — true HA-without-SPOF.

For 0.8.3 the SPOF is documented; users in regulated environments should opt for Sentinel or wait for PostgresCoordinator.

Design + proof — read more

DESIGN.md — full design, the lift argument from windowCoupled, the locked GlobalCoordinator interface, failure semantics in depth, the worked example.
spec/GaleFederatedLeasing.tla — the TLA⁺ spec proving admitted ≤ Limit independent of K.
test/gale/federated/leasing-variants.test.ts — the CI-runnable BFS twin matching TLC's distinct-state counts.
research/bigger-bets/federation/eval/RESULTS.md — the end-to-end 3-region eval with measured Δ and U numbers.

Composing with the rest of ThrottleKit

federate(...) returns a regular Limiter. It composes naturally with the rest of the library:

Wrap with withAnalytics(...) for heavy-hitter detection.
Tap with tapDecisions(...) for OTel / Prometheus integration.
Use with adapters (createEnforcer(...), the Express middleware, etc.) via the standard Limiter interface.
Stack twoTier(leased) on TOP for an in-process L1 cache; the recursive twoTier composition is the canonical multi-process per- region setup.

ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)

ThrottleKit Wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Federation

Federation — one global limit across regions, with `Δ = 0`

Quick start

What it does

The contribution — pooling under skew

Failure modes

Coordinator backends

`PostgresCoordinator` quick start

Multi-process regional escrow (0.8.5)

SPOF caveat

Design + proof — read more

Composing with the rest of ThrottleKit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Federation

Federation — one global limit across regions, with Δ = 0

Quick start

What it does

The contribution — pooling under skew

Failure modes

Coordinator backends

PostgresCoordinator quick start

Multi-process regional escrow (0.8.5)

SPOF caveat

Design + proof — read more

Composing with the rest of ThrottleKit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Federation — one global limit across regions, with `Δ = 0`

`PostgresCoordinator` quick start