Skip to content

Middleware Integration

Ameya Borkar edited this page Jun 10, 2026 · 2 revisions

Middleware integration for unifiedAdmission + adaptiveConcurrency

Added in 0.9.2 (2026-05-29).

Both unifiedAdmission (0.9.0) and adaptiveConcurrency (pre-0.8) expose a release() lifecycle callback that must be invoked exactly once when the request lifecycle ends. Miss any one of the hooks (finish, close, or the error path) and concurrency slots leak silently until the adaptive limit collapses to zero and your server stops admitting anything.

Before 0.9.2 you had to wire release() to your framework's request lifecycle by hand. As of 0.9.2 the library does it for you.

What's in the box

22 new exports across 11 frameworks. The user passes a prebuilt UnifiedAdmitter (from unifiedAdmission(...)) or ConcurrencyGuard (from adaptiveConcurrency(...)) to the adapter, and the adapter owns the release.

Framework unifiedAdmission adaptiveConcurrency
express expressUnifiedAdmission expressAdaptiveConcurrency
fastify fastifyUnifiedAdmission fastifyAdaptiveConcurrency
koa koaUnifiedAdmission koaAdaptiveConcurrency
nest nestUnifiedAdmissionMiddleware nestAdaptiveConcurrencyMiddleware
hono honoUnifiedAdmission honoAdaptiveConcurrency
fetch withUnifiedAdmission withAdaptiveConcurrency
next nextUnifiedAdmission nextAdaptiveConcurrency
remix remixUnifiedAdmission remixAdaptiveConcurrency
sveltekit sveltekitUnifiedAdmission sveltekitAdaptiveConcurrency
elysia elysiaUnifiedAdmission elysiaAdaptiveConcurrency
trpc trpcUnifiedAdmission trpcAdaptiveConcurrency

The express pattern (canonical)

import express from "express";
import {
  expressUnifiedAdmission,
  unifiedAdmission,
  adaptiveConcurrency,
  rateLimit,
  gcra,
  tokenBucket,
} from "throttlekit";

const admitter = unifiedAdmission({
  rate: rateLimit({ strategy: gcra({ limit: 60, periodMs: 60_000 }) }),
  concurrency: adaptiveConcurrency({ minLimit: 4, maxLimit: 128 }),
  cost: rateLimit({ strategy: tokenBucket({ capacity: 100_000, refillPerSec: 1667 }) }),
});

const app = express();
app.use(expressUnifiedAdmission({ admitter, dropOn5xx: false }));
app.post("/completions", (req, res) => res.json({ ok: true }));

The middleware wires res.on("finish") + res.on("close") with the first-fire-wins pattern:

  • close before finishrelease({dropped: true}) — client hangup, handler threw without an error middleware, or a server-side timeout.
  • finish first ⇒ release({dropped: false}) (normal completion), or dropped: true when dropOn5xx: true and the status is 5xx.

The second event is a no-op (idempotent release).

The web-platform pattern (fetch, next, remix, sveltekit)

These frameworks return a Response — the lifecycle is the body stream. The adapter wraps the body so release fires when the stream drains, errors, or is cancelled.

import { withUnifiedAdmission, unifiedAdmission, adaptiveConcurrency } from "throttlekit";

const admitter = unifiedAdmission({
  concurrency: adaptiveConcurrency({ minLimit: 4, maxLimit: 128 }),
});

export default {
  fetch: withUnifiedAdmission(
    async (request) => new Response("ok"),
    { admitter },
  ),
};

The wrap pattern (hono, trpc, elysia)

These use a try/finally around the user's next() or body. The release fires with dropped = thrown (plus the dropOn5xx rule for normal returns).

// Hono
import { honoUnifiedAdmission } from "throttlekit/hono";
app.use("*", honoUnifiedAdmission({ admitter }));

// tRPC
import { trpcUnifiedAdmission } from "throttlekit/trpc";
const ratelimited = t.procedure.use(
  t.middleware(trpcUnifiedAdmission<MyCtx>({
    admitter,
    key: ({ ctx }) => ctx.user.id,
  })),
);

// Elysia (wrap function)
import { elysiaUnifiedAdmission } from "throttlekit/elysia";
const admit = elysiaUnifiedAdmission({ admitter });
app.get("/", (ctx) => admit(ctx, async () => "ok"));

dropped decision matrix

dropped is a property of the response state, not the handler outcome:

Event dropped Why
Response finished normally (any status) false (default) The runtime delivered a response; lifecycle completed.
Response finished, status >= 500, dropOn5xx: true true User opted in to treating 5xx as overload.
Handler threw, error middleware wrote response false finish fired — response was delivered.
Handler threw, no error middleware wrote anything true close fires when socket times out; no finish.
Client hung up mid-stream true close fires without finish.
Server-side timeout fires true Triggers close.
Consumer cancelled response body (fetch-style) true Stream cancel callback.
Stream errored mid-flight true Stream pump's catch.

Forward-compat

The adapters accept any ConcurrencyGuard, not just the in-process implementation. distributedAdaptiveConcurrency (0.10.0) drops in behind the same middleware — its DistributedConcurrencyGuard keeps acquire() synchronous, so passing it as the guard gives every route a fleet-shared ceiling with no call-site change:

import { expressAdaptiveConcurrency, distributedAdaptiveConcurrency, RedisConcurrencyCoordinator } from "throttlekit";

const guard = distributedAdaptiveConcurrency({
  coordinator: new RedisConcurrencyCoordinator({ client, aggregate: "median" }),
  nodeId: process.env.HOSTNAME!, key: "inference-cluster",
});
app.use(expressAdaptiveConcurrency({ guard })); // same adapter, now a fleet-wide bound

See Distributed adaptive concurrency. Behind a ThrottleKit server the same fleet-shared ceiling is reachable over the existing Admit RPC with no client change — see Scaling & the Fleet.

See also

Clone this wiki locally