Skip to content

Advanced Limiting

Ameya Borkar edited this page Jun 10, 2026 · 2 revisions

Advanced limiting

Multi-dimensional limits and the two backpressure/shaping primitives.

Limiting on several axes

Limit on per-IP and per-user and per-route at once. all({...}) allows only if every dimension allows, and consumes nothing unless all allow (no partial-consume). any({...}) allows if any permits. Pass the result to multiRateLimit (not rateLimit):

import { all, gcra, fixedWindow, multiRateLimit } from "throttlekit";

interface Ctx { ip: string; userId: string; route: string }

const limiter = multiRateLimit<Ctx>({
  store, // on Redis, all dimensions fuse into one atomic Lua round trip
  strategy: all<Ctx>({
    ip:    { key: (c) => c.ip,     strategy: gcra({ limit: 100, periodMs: 60_000 }) },
    user:  { key: (c) => c.userId, strategy: gcra({ limit: 1000, periodMs: 60_000 }) },
    route: { key: (c) => c.route,  strategy: fixedWindow({ limit: 50, windowMs: 1_000 }) },
  }),
});

const d = await limiter.check({ ip, userId, route: "/search" });

The returned Decision reflects the binding constraint (the denying dimension, or the smallest remaining when allowed). Dimensions support per-dimension cost. On Redis, multi-dimensional checks support gcra, tokenBucket, and fixedWindow. See examples/multi-dimensional.ts.

Backpressure and shaping

Adaptive concurrency

Not a rate — a dynamically inferred ceiling on in-flight requests, derived from the latency gradient (RTT_noload / RTT_actual) and adjusted with a congestion-control sawtooth.

import { adaptiveConcurrency } from "throttlekit";

const guard = adaptiveConcurrency({ minLimit: 4, maxLimit: 512, algorithm: "gradient2" });

const lease = guard.acquire();
if (!lease.ok) return; // over the inferred ceiling — shed load (e.g. 503)
try {
  await handle(request);
} finally {
  lease.release(); // latency measured automatically; pass { dropped: true } on a failure/timeout
}

Introspect with guard.limit, guard.inflight, guard.stats(). Algorithms: "gradient2" (default) or "aimd". See examples/adaptive-concurrency.ts. For a single ceiling shared across a fleet (so N nodes don't each infer the whole backend's capacity), use distributedAdaptiveConcurrency.

Leaky-bucket scheduling

leakyBucket builds a Shaper that delays rather than rejects, smoothing bursty input to a steady output rate — handy for pacing outbound calls to a third-party budget.

import { leakyBucket, QueueFullError } from "throttlekit";

const shaper = leakyBucket({ ratePerSec: 5, maxQueueMs: 2_000 });
try {
  await shaper.schedule("upstream-api"); // resolves after the paced delay
  await callUpstream();
} catch (err) {
  if (err instanceof QueueFullError) console.warn("queue full, retry in", err.retryAfterMs, "ms");
}

reserve(key, cost?) returns a Reservation ({ accepted, delayMs }) without sleeping; schedule waits or throws QueueFullError. reserveSync is available on a sync store. See examples/leaky-bucket.ts.

Clone this wiki locally