Skip to content

Strategies

Ameya Borkar edited this page May 26, 2026 · 2 revisions

Choosing a strategy

Pick one and pass it to rateLimit({ strategy }):

Goal Strategy
Good general default — tiny state, smooth pacing, controlled bursts gcra({ limit, periodMs, burst? })
Client-friendly "tokens remaining", controlled bursts tokenBucket({ capacity, refillPerSec })
Cheapest coarse cap (allows up to 2× across a boundary, by design) fixedWindow({ limit, windowMs })
Near-exact rolling window at any limit, bounded memory slidingWindow({ limit, windowMs, buckets? })
Exact "N in the last X" at low/moderate limits slidingWindowLog({ limit, windowMs })
Shape/queue outbound calls to a fixed rate (delays, doesn't reject) leakyBucket({ ratePerSec, maxQueueMs })
Protect a service from overload when the right rate is unknown adaptiveConcurrency({ ... })

Notes

  • gcra stores a single number (the theoretical arrival time); burst defaults to limit. It is the recommended default: smooth pacing, controlled bursts, one timestamp of state per key.
  • slidingWindowbuckets defaults to 10 (error ≈ 1/buckets of the window); buckets: 1 is the classic estimator. slidingWindowLog is exact but O(limit) memory per key.
  • leakyBucket and adaptiveConcurrency build a Shaper / concurrency guard, not a Limiter — see Advanced limiting → Backpressure & shaping.

Why GCRA by default

GCRA (the Generic Cell Rate Algorithm) paces requests smoothly rather than resetting on a window boundary, so it has no "2× burst across the boundary" pathology that a fixed window has by design. It needs only a single timestamp of state per key, which keeps memory bounded and makes the Redis/Postgres encodings trivial (one number to store). For most APIs, gcra is the right call; reach for the others when you specifically need their semantics (e.g. a client-facing "tokens remaining" display → tokenBucket, or exact trailing-window counting → slidingWindowLog).

Composing strategies

For tiered burst + sustained limits (e.g. 10/sec and 1000/hour), compose two GCRA limiters and allow only if both pass — or use multi-dimensional limits to evaluate several axes in a single round trip.

Clone this wiki locally