-
Notifications
You must be signed in to change notification settings - Fork 0
Distributed Adaptive Concurrency
Added in 0.10.0 (2026-05-29).
adaptiveConcurrency() infers a concurrency ceiling per process from
locally observed RTT. The moment N processes front a shared backend (one
inference cluster, one database pool, one upstream API), each of the N
independent limiters infers a ceiling for the whole backend and the fleet
collectively admits up to Σ Lᵢ — N× the backend's true capacity. The limiter
that was supposed to prevent overload now causes it under fan-out.
distributedAdaptiveConcurrency() closes that gap. It is a drop-in
ConcurrencyGuard whose admissions are governed by one cooperatively-inferred
global ceiling L_global (inferred from the fleet's aggregate RTT signal, not
configured by hand). The hard, coordinator-enforced guarantee is GlobalCap:
Σ_n granted_share[n] ≤ L_global (the coordinator never over-commits)
so in steady state Σ_n inflight[n] ≤ L_global. The coordinator also reserves each
peer's max(share, inflight) (0.10.0, D-DAC-18), so a joiner is never granted
capacity an incumbent is still occupying — eliminating the rebalance in-flight
overshoot in the synchronous / low-latency case (the joiner ramps only as fast as
incumbents drain). It is not a hard instantaneous bound across the async fleet:
grant-reply and heartbeat-reporting lag leave a bounded (~1.5–2×), self-draining
residual — in-flight requests are non-revocable — that converges back under budget,
never by new over-authorization (see Membership changes).
A single backend has true capacity C (time-varying, unknown). N gateway
nodes each run their own adaptiveConcurrency. Each node observes RTT that
reflects the total load on the shared backend (the backend is slow for
everyone at once), so each independently infers Lᵢ ≈ C and admits up to
Lᵢ concurrent requests. The backend then sees Σ Lᵢ ≈ N·C — an overshoot
factor of N. As the backend buckles, every node's RTT climbs together, every
node backs off together, and the fleet oscillates in lockstep (synchronized
AIMD collapse).
The single most important conceptual guard-rail: all nodes estimate the same quantity (the shared backend's capacity), so reconstructing a fleet ceiling by summing per-node estimates is exactly the bug above. The coordinator never sums — it folds the per-node views into one robust central estimate.
Distributed adaptive concurrency is a composition of two orthogonal mechanisms running at different tempos.
Mechanism 1 — Capacity estimation ("how big is the backend right now?"). Each node reports its locally-inferred
L_local(straight from a privateadaptiveConcurrencyfed by that node's RTTs). The coordinator folds all live reports intoL_global = aggregate({L_local})usingmedian(default) ormin— NEVERsum(summing rebuilds the N·C overshoot). Median is robust to a single mis-calibrated or cold-starting node; min is the conservative extreme where the most-stressed node's view caps the fleet.
Mechanism 2 — Capacity allocation ("which node may admit a request?").
L_globalis divided into per-node shares by budget-capped equal split: a node's grant is its equal-split target⌊L/N⌋capped at the budget not already committed to the other live nodes —share = min(target, L_global − Σ other shares). A node admits while its local in-flight count is below its share. The cap makesΣ shares ≤ L_globala hard invariant under any heartbeat interleaving (GlobalCap): a joining node computesmin(target, L_global − L_global) = 0and stays at 0 until the incumbents re-heartbeat down — no separate join bookkeeping needed.
fast loop (synchronous, in-process, per request)
acquire()/release() ──gate on──▶ min(share, L_local)
▲ │ RTT feeds
the ConcurrencyGuard contract lives here ▼
slow loop (async, background, every heartbeatMs)
report L_local + inflight ──▶ coordinator ──▶ aggregate ──▶ equal-split ──▶ new share
The fast loop keeps acquire() synchronous — a concurrency gate must never
block on the network; shedding is the right move when you are out of slots. The
slow loop imposes one coordination round-trip per heartbeat, not per
request: the share is delegated authority the node spends locally.
The effective ceiling is min(share, L_local). The min only ever lowers
the ceiling below share, so it can never violate inflight ≤ share (safe),
and it gives fast local reaction: if this node's RTT spikes, L_local
drops within one request and the node sheds immediately — without waiting for
the next heartbeat.
import {
distributedAdaptiveConcurrency,
TestConcurrencyCoordinator,
} from "throttlekit";
// One coordinator owns L_global for a shared backend.
const coordinator = new TestConcurrencyCoordinator({ aggregate: "median" });
const guard = distributedAdaptiveConcurrency({
coordinator,
nodeId: process.env.HOSTNAME!, // REQUIRED — unique per process
key: "inference-cluster", // nodes fronting the same backend MUST match
local: { minLimit: 4, maxLimit: 128 },
heartbeatMs: 1000, // the heartbeat_T
leaseTtlMs: 2000, // default 2 * heartbeatMs
onCoordinatorOutage: "fail-closed",
});
// acquire() is synchronous — the ConcurrencyGuard contract.
const lease = guard.acquire();
if (lease.ok) {
try { /* call the shared backend */ }
finally { lease.release(); } // event-release; idempotent
}
await guard.close(); // stop the timer + leave() the fleetdistributedAdaptiveConcurrency(options) returns a
DistributedConcurrencyGuard — a ConcurrencyGuard plus distributed
lifecycle:
| Member | Shape | Notes |
|---|---|---|
acquire() |
synchronous | Gates on min(share, L_local); inherited from the base guard |
release() |
synchronous | Event-release; idempotent (double-release is a no-op) |
heartbeat() |
Promise<void> |
Force a heartbeat now (report L_local, refresh share). Never throws — outage routes to the outage policy. Useful to gate startup: await guard.heartbeat() once after construction |
close() |
Promise<void> |
Stop the timer and leave() the fleet. Idempotent |
stats() |
snapshot | Extends the base stats with share, lGlobal, nodes
|
The coordinator owns the shared L_global and parcels it into per-node shares
— the event-release sibling of federation's GlobalCoordinator. Each node's
heartbeat(report) does report + (re)lease in one round-trip; the coordinator
upserts the node's { lLocal, inflight, expiresAt }, evicts every node whose
lease has expired, recomputes L_global = aggregate(...), equal-splits it
across the live nodes, and returns this node's { share, lGlobal, nodes }.
leave() reclaims a node's share immediately on voluntary departure.
Aggregation policy lives on the coordinator, not the guard.
aggregate ∈ { "median", "min" }is a fleet-wide decision — every node must use one consistent rule — so it is a coordinator-construction option. Guards only report rawlLocal. A misconfigured mixed-aggregation fleet is impossible by construction.
| Coordinator | When to use | Status |
|---|---|---|
TestConcurrencyCoordinator (in-memory) |
Tests + examples; deterministic, no timers, no I/O (expiry compared against an injected clock) | Shipped 0.10.0 |
RedisConcurrencyCoordinator (shared Redis) |
Production default; one Lua script does heartbeat-aggregate-split atomically | Shipped 0.10.0 |
PostgresConcurrencyCoordinator |
When you already run Postgres | Deferred (follows the Redis one, as federation's Postgres coordinator followed its Redis one) |
import { RedisConcurrencyCoordinator } from "throttlekit";
import { fromIoredis } from "throttlekit/redis";
import Redis from "ioredis";
const coordinator = new RedisConcurrencyCoordinator({
client: fromIoredis(new Redis(process.env.COORDINATOR_URL!)),
aggregate: "median",
});Both coordinators implement the identical ConcurrencyCoordinator interface and
are drop-in interchangeable; an identical report sequence through either yields
identical { share, lGlobal, nodes } (the dual-path conformance test).
Each grant is the equal-split target ⌊L/N⌋ (+1 to the lowest-ranked
L mod N nodes by lexicographic nodeId) capped at L_global − Σ other live shares. The cap is what makes Σ shares ≤ L_global hold under staggered
heartbeats — a stateless ⌊L/N⌋ split would over-commit the instant a node
joins (the joiner computes its small share while an incumbent still holds its
larger pre-join one). It is deterministic (no RNG, O(N log N)) and needs no
separate clawback path — the cap is the clawback. The fleet converges to the
exact equal split (Σ shares = L_global) within ≈ N heartbeats.
The cost: an idle node still holds ≈ L/N. Under skew the busy nodes are
capped below what the idle nodes are leaving on the table. This is a
utilization limitation, not a safety one — the cap never overshoots.
Demand-proportional allocation (the carried inflight field in the report is
reserved for exactly this) is a planned refinement; today, over-provision the
fleet or accept the skew penalty.
A node joining a live fleet is the staggered hazard the budget cap exists
for: because each node fetches its share independently at a staggered time from a
different live-set snapshot, a stateless ⌊L/N⌋ split would grant a fresh
joiner a positive share before the incumbents re-split down — so for up to one
heartbeat Σ granted shares > L_global (worst case a 1.5× overshoot on a 1→2
scale-up) with L_global never having decreased.
The budget cap closes this with no extra bookkeeping: a joiner's grant is
min(target, L_global − Σ incumbent shares), and since the incumbents still hold
all of L_global, the joiner gets min(target, 0) = 0. It stays at 0 until the
incumbents re-heartbeat down (each capped to its own fair share), then earns its
≈ L/N on a later heartbeat. Σ shares ≤ L_global holds at every instant
across growth — and, unlike a one-shot "share 0 for one heartbeat" rule, the cap
stays correct even if the joiner re-heartbeats repeatedly before an incumbent
shrinks (each grant is re-capped at the current remainder).
The cap reserves not just a peer's share but its in-flight too —
L_global − Σ_other max(share, inflight) (D-DAC-18). So even a joiner that could
fit by committed shares is held at 0 until the incumbents' in-flight physically
drains — it never ramps into still-occupied capacity. In the synchronous /
low-latency case this makes Σ inflight ≤ L_global hold throughout the rebalance
(model-checked as InflightCap in the spec + BFS twin). Across the async fleet a
bounded, self-draining residual remains — a guard admits against its cached grant
while a reduction is still in flight, and the cap reserves a peer's last-reported
in-flight; a hard instantaneous bound would need acknowledged handoff (not done).
The honest contract is unchanged: bounded and converges, never a runaway.
A node leaving (or L_global dropping) is the dual — shrink-drain: an
over-allocated node admits nothing new and drains as in-flight requests
complete (you cannot retroactively un-admit a running request). Σ inflight is
non-increasing under this debt and converges back to ≤ L_global. This is
identical to how single-node adaptiveConcurrency behaves when its estimate
drops, and is the same reason Σ inflight ≤ L_global is a steady-state property,
not a hard invariant — a liveness/convergence property, not a safety violation.
| Mode | Behavior on heartbeat() throw |
When to choose |
|---|---|---|
"fail-closed" (default) |
share → 0; the node sheds everything (503) until the coordinator returns. Never overshoots; trades availability |
Safety > availability (matches federation's default) |
"local-only" |
share → L_local; the node falls back to pure in-process adaptive concurrency. The fleet may collectively overshoot the backend (the fan-out regime) but stays up |
Availability > strict bound; the honest degraded mode |
leaseTtlMs defaults to 2 · heartbeatMs, so a single slow heartbeat does
not drop a node from the fleet — only two consecutive misses do. A node
that crashes is reclaimed within leaseTtlMs, and survivors' shares grow on
their next heartbeat.
Cold start: the first heartbeat is scheduled on the next tick (not after a
full heartbeatMs) to minimize the startup stall. Before the first grant lands,
share = 0 under fail-closed (admits nothing for ~one RTT) or
share = L_local under local-only. A founding (solo) node's first grant is
its full L_global — it is the only live node, so the budget cap binds at
nothing; a node joining an already-saturated fleet instead gets share = 0 from
the cap until the incumbents re-heartbeat down. Callers who must gate startup can
await guard.heartbeat() once after construction.
DistributedConcurrencyGuard extends ConcurrencyGuard, and acquire() stays
synchronous, so the value returned drops straight into the 0.9.2
middleware adapters — no code change at the call site. This is the forward-
compat hook the middleware integration page promised:
import express from "express";
import {
expressAdaptiveConcurrency,
distributedAdaptiveConcurrency,
RedisConcurrencyCoordinator,
} from "throttlekit";
const guard = distributedAdaptiveConcurrency({
coordinator: new RedisConcurrencyCoordinator({ client, aggregate: "median" }),
nodeId: process.env.HOSTNAME!,
key: "inference-cluster",
});
const app = express();
// The same adapter that wrapped a local guard now wraps the distributed one.
app.use(expressAdaptiveConcurrency({ guard }));Every sibling adapter (fastifyAdaptiveConcurrency, koaAdaptiveConcurrency,
honoAdaptiveConcurrency, withAdaptiveConcurrency, …) picks it up the same
way — the adapter owns the exactly-once release(), the distributed guard owns
L_global.
The bound holds end-to-end through every coordinator outage shape; detailed
table in
docs/FAILURE-MODES.md.
Summary:
| Condition |
fail-closed (default) |
local-only |
Recovery |
|---|---|---|---|
| Coordinator unreachable |
share → 0, node sheds all (503) |
share → L_local, per-node self-limit; fleet may overshoot backend |
Next successful heartbeat restores share |
| Node crashes holding share | Lease TTL (2·heartbeatMs) expires → coordinator reclaims; survivors' shares grow next heartbeat |
same | Automatic within leaseTtlMs
|
L_global shrinks (backend degraded) |
Over-allocated nodes admit nothing new; in-flight drains; Σ inflight → L_global
|
same | Convergence within max request duration |
Node JOINS (L_global constant) |
Budget cap gives the joiner min(target, L_global − Σ incumbents) = 0 until incumbents re-split down; Σ shares ≤ L_global holds throughout (no over-commitment) |
same | Joiner earns its ≈ L/N share over the next ≈ N heartbeats |
| Stale share (between heartbeats) | Node may admit up to a now-too-large share for ≤ heartbeatMs; bounded by the min(share, L_local) fast-shrink |
same | Smaller heartbeatMs trades coordinator load for reaction speed |
nodeId collision (operator error) |
Two processes overwrite one record → undercount → under-admission (safe direction) | same | Enforce unique nodeId
|
| Clock skew between node and coordinator | Large skew → premature evict (safe) or late evict (bounded by skew) | same | Keep nodes NTP-synced |
The hard safety theorem (for every staggered interleaving of per-node
Reallocate/Join/Leave/Acquire/Release at constant L_global,
GlobalCap: Σ_active share ≤ L_global) relabels federation's window-
coupled leasing proof — windowMs → heartbeat_T, the heartbeat boundary IS
the federation Roll. The TLA⁺ spec models the staggered, budget-capped
protocol (a Reallocate action that caps each grant at the remaining budget, a
Join action that adds a node at share = 0) so the membership-growth bound is
witnessed, not assumed. Σ inflight ≤ L_global is deliberately not an
invariant — in-flight is non-revocable and drains after a rebalance.
-
DESIGN.md
— full design, the DR-18 reduction, the two-mechanism split, the locked
ConcurrencyCoordinatorinterface, the budget cap (D-DAC-17), and 17 decision records. -
spec/GaleHeartbeatLeasing.tla— the TLA⁺ spec provingGlobalCap(Σ_active share ≤ L), with theGlobalCap/GlobalCapTightinvariants, model-checked by the BFS twin. -
docs/FAILURE-MODES.md— full outage shape table.
- Middleware integration — the 0.9.2 adapters that pick up the distributed guard unchanged.
-
Federation — the rate-axis sibling (window-coupled leasing,
Δ = 0); this is its event-release dual on the concurrency axis. - Distributed & provable — the broader provable- bounds story.
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE