Skip to content

Federation

Ameya Borkar edited this page May 28, 2026 · 5 revisions

Federation — one global limit across regions, with Δ = 0

federate(...) ships a cross-cluster rate limiter that pools a single global budget across K regions, with a formally-verified, K-INDEPENDENT overshoot bound:

admitted_per_global_window  ≤  Limit          (for ANY number of regions K)

The TLA⁺ spec is spec/GaleFederatedLeasing.tla; the BFS twin in test/gale/federated/leasing-variants.test.ts re-runs it in CI without Java. The end-to-end eval against a real Lua-backed coordinator is in research/bigger-bets/federation/eval/RESULTS.md.

When to use this vs twoTier(leased). If your fleet shares ONE regional Redis (one cluster, many processes), twoTier(leased) with windowCoupled: true already gives you the K-independent bound. Use federate(...) only when your processes span MULTIPLE regions and you need a single global budget across them.

Quick start

import { fixedWindow } from "throttlekit";
import { federate, RedisCoordinator } from "throttlekit/federation";
import { fromIoredis } from "throttlekit/redis";
import Redis from "ioredis";

// One global coordinator Redis (the federation's "L3").
const coordinator = new RedisCoordinator({
  client: fromIoredis(new Redis(process.env.GLOBAL_COORDINATOR_URL!)),
  windowMs: 60_000,
  budgetPerWindow: 1000,
});

// One federated Limiter per region. They all share `coordinator`.
const limiter = federate({
  strategy: fixedWindow({ limit: 1000, windowMs: 60_000 }),
  coordinator,
  region: "us-east",            // your region identifier
  batch: 16,                    // escrow size; 16 is a sensible default
});

const decision = await limiter.check("user:42");

Run the full example end-to-end:

docker compose -f research/bigger-bets/federation/eval/docker-compose.yml up -d
npx tsx examples/federation.ts
docker compose -f research/bigger-bets/federation/eval/docker-compose.yml down

What it does

  1. Each region holds a local escrow leasebatch units of budget drawn from the global coordinator in a single cross-region RPC.
  2. Most requests serve from local escrow — no coordinator hit. The amortized coordinator cost is 1/batch per request.
  3. At the global window boundary, escrow EXPIRESRoll in the formal model. Un-served escrow forfeits, then reconciles back to the coordinator (idempotent on windowStart) for the next window. This is the window-coupling rule — it collapses the per-window overshoot to zero.
  4. On coordinator outage, regions fail closed — the existing escrow keeps serving until exhausted, then denies. The bound holds at Δ = 0 even through partition.

The contribution — pooling under skew

Scheme Δ (overshoot) Pooling under skew Failure mode
Per-region independent limiters Unbounded None Each region serves its own budget
Static partition (L/K each) 0 None — hot region binds at L/K Each region serves its own L/K
federate(...) 0 Full — hot region can draw the whole budget Fail closed; bound preserved through partition
CRDT / gossip merge Bounded by staleness Full Staleness-Δ tradeoff (research follow-up)

At max skew (s = 1, all load on one region) the static partition admits only L/K; federate(...) admits up to the full L. The eval rows in RESULTS.md:

skew static U_capacity federated U_capacity
0.00 1.000 0.973
0.25 0.833 0.977
0.50 0.667 0.990
0.75 0.500 0.957
1.00 0.333 1.000

The −0.027 at uniform load is the bounded batch overhead — at most (K−1)·(batch−1) un-served escrow per window. Tighten batch to shrink that gap at the cost of more coordinator round trips.

Failure modes

The bound holds end-to-end through every coordinator outage shape; detailed table in docs/FAILURE-MODES.md. Summary:

Outage shape Behavior Δ
Region partitioned from coordinator Region serves existing escrow until empty, then denies (fail-closed default) 0
Coordinator crash + recovery within a window Region serves existing escrow; on recovery, leases resume against the preserved budget 0
Coordinator unavailable across a window boundary All regions deny during outage; on recovery, fresh window's budget acquired normally 0

For soft-traffic operators who prefer availability, onCoordinatorOutage: "regional-only" (TK-906+) falls back to per-region limits during outage — Δ degrades to the regional limit, not the federation limit; documented opt-in.

Coordinator backends

Coordinator When to use Status
TestCoordinator (in-memory) Tests + examples; deterministic Shipped 0.8.3
RedisCoordinator (single global Redis) Production default; documented SPOF Shipped 0.8.3
PostgresCoordinator When you already run Postgres (no separate Redis) 0.8.x follow-up
Raft-via-etcd HA-without-SPOF (the SPOF mitigation) 1.0.x follow-up
CRDT / gossip Multi-leader with bounded staleness Research follow-up

The GlobalCoordinator interface is small (3 methods); rolling a custom backend is a couple of dozen lines of Lua / SQL / equivalent.

SPOF caveat

A single global Redis IS a single point of failure for the federation's safety bound. When the Redis is unreachable, every region's lease() throws → fail-closed (default) → no new admissions across the entire federation until the Redis returns. The mitigations:

  • Sentinel / Cluster under your Redis client — the Lua scripts work unchanged.
  • PostgresCoordinator (0.8.x follow-up) — replaces the SPOF with Postgres failover semantics (synchronous replication, automatic primary promotion).
  • Raft-via-etcd (1.0.x) — true HA-without-SPOF.

For 0.8.3 the SPOF is documented; users in regulated environments should opt for Sentinel or wait for PostgresCoordinator.

Design + proof — read more

Composing with the rest of ThrottleKit

federate(...) returns a regular Limiter. It composes naturally with the rest of the library:

  • Wrap with withAnalytics(...) for heavy-hitter detection.
  • Tap with tapDecisions(...) for OTel / Prometheus integration.
  • Use with adapters (createEnforcer(...), the Express middleware, etc.) via the standard Limiter interface.
  • Stack twoTier(leased) on TOP for an in-process L1 cache; the recursive twoTier composition is the canonical multi-process per- region setup.

Clone this wiki locally