Skip to content

Monitoring and the Lens

Ameya Borkar edited this page Jun 4, 2026 · 5 revisions

Monitoring & the Lens

ThrottleKit Lens is a built-in, zero-dependency, read-only monitoring dashboard. It gives every limiter the full operational board — and one view no other rate-limiter dashboard can render: live binding-axis attribution.

throttlekit-lens is @experimental — it lives outside ThrottleKit's 1.x SemVer freeze (its surface and snapshot shapes may change in a minor). It builds on the core's @experimental admissionTap / withAdmissionAnalytics primitives and needs throttlekit >= 1.1.0.

Already emitting OpenTelemetry? Keep doing that — the Lens doesn't replace your metrics backend (see Operations for the OTel layer). The Lens is the no-backend, see-it-now view, and it carries a signal Prometheus/Grafana structurally can't: which axis bound each denial.

The hero — which axis is throttling you, right now

Most rate-limit dashboards can tell you that requests were denied. Because ThrottleKit composes rate × concurrency × cost in a single unifiedAdmission decision, the Lens can tell you which constraint actually bound each denialrate, concurrency, cost, or the joint-LP policy lane — as a live Sankey from policy → binding lane → top-denied keys. Click a denial and a drawer shows the exact per-axis Decision (remaining / limit / retryAfterMs / resetAt) that produced it: the literal "why was this throttled, with numbers."

This is structural, not a UI trick. bindingAxis is minted end-to-end inside unified admission but is exposed only as an OpenTelemetry span attribute — never a metric label — so no Prometheus/Grafana board can break denials down by axis. The Lens is a purpose-built live view that needs no observability backend and carries the numeric per-axis decision the span omits. (Honest closest analog with your existing tools: faceting the throttlekit.binding_axis span in an APM trace view.)

Universal — the full board for every limiter

The axis lane is the premium layer for unified-admission users; the dashboard itself works for everyone. A plain rateLimit(), quota, twoTier, token-budget, or concurrency guard gets the whole board plus "why throttled" attribution by policy/limiter + hot key — sourced from the same tapDecisions stream and withAnalytics top-K that already ship in the core. A single-axis limiter simply has one axis, so there is nothing to decompose; the Lens says so rather than implying otherwise.

Three ways to run it

Register what you already use with a hub; it returns tapped wrappers to use in their place. The taps are synchronous, exception-swallowing, and O(1) — the dashboard can never perturb your control path or change a decision.

1. Mount it in your own app (no extra port)

import { createLensHub, lensHandler } from "throttlekit-lens";
import { rateLimit, gcra, unifiedAdmission } from "throttlekit";

const hub = createLensHub();
const api = hub.trackLimiter("api", rateLimit({ strategy: gcra({ limit: 100, periodMs: 60_000 }) }));
const checkout = hub.trackAdmitter("checkout", unifiedAdmission({ rate, concurrency, cost }));

// Express — mount the read-only handler at a private path, behind a token:
const handler = lensHandler(hub, { basePath: "/__throttlekit", token: process.env.LENS_TOKEN });
app.use("/__throttlekit", (req, res) => handler(req, res));

// ...then use `api` / `checkout` exactly like the originals.

2. Run a standalone sidecar

import { createLensHub, serveLens } from "throttlekit-lens";

const hub = createLensHub();
// ...track your limiters / admitters / guards...
const lens = await serveLens(hub, { port: 9090 }); // loopback by default
console.log(`Lens at ${lens.url}`);

3. Get it for free on the service door

throttlekit-server serves the Lens on by default, bound to loopback — no code, works for Python/Go/any-language clients too (it watches the server's decisions, whatever drives them):

throttlekit-server --config .throttlekit.yaml
#  → gRPC on :50051  +  Lens dashboard on http://127.0.0.1:9090

--lens off disables it; --lens-host / --lens-port move or expose it (a non-loopback host warns and wants a --lens-token); --lens-aggregator <url> pushes this node's snapshot to a fleet aggregator.

What's on the board

  • Admission attribution (hero) — the policy → binding-lane → top-denied-keys Sankey + a stacked-area deny-rate-by-axis timeline + the click-to-snapshot drawer.
  • Throughput & outcome — requests/s split allow/deny.
  • Deny rate — overall, by policy/limiter for everyone, and the per-axis split for unified admitters.
  • Top keys/tenants — Space-Saving top-K heavy hitters (requested + denied), each row carrying a policy (and binding-axis) chip.
  • Latency — admit-/check-path latency over a small ring.
  • Concurrency healthlimit vs inflight, no-load RTT, share / lGlobal / nodes, the fenced flag, and a live self-fence event feed.
  • Guarantee panel (below).
  • Store / fleet health — backend, reachability, fail-mode (and, in server mode, lease-table size + reclaim count).

The Guarantee panel

ThrottleKit's headline is a machine-checked, fleet-size-independent overshoot bound. The Guarantee panel makes it operational: per key, live admitted-this-window vs the design-time-computed ceiling Limit + N·(B−1) (collapsing to exactly Limit under windowCoupled), rendered as headroom to a proven line — plus live Σinflight ≤ L PASS/FAIL chips, each linking to the TLA⁺ spec.

It is headroom-to-a-known-line, not a "proof is holding" needle: the bound and invariants are design-time proofs (a static badge + chips that link to the spec), and approaching the line is real model-predicted overshoot or a misconfiguration — never a claim that the Lens is continuously model-checking.

Fleet view

Each instance serves its own per-process Lens. For a fleet-global view, point every node at one aggregator — it merges additive counters (allow/deny, per-lane denials) across nodes and re-tops the heavy hitters, into a single mode:"fleet" snapshot the same UI renders:

import { createLensAggregator, serveLensAggregator, pushSnapshots } from "throttlekit-lens";

// Aggregator host:
const agg = createLensAggregator();
await serveLensAggregator(agg, { port: 9091, token: process.env.LENS_TOKEN });

// Each node (or `throttlekit-server --lens-aggregator http://aggregator:9091`):
pushSnapshots(hub, { url: "http://aggregator:9091", token: process.env.LENS_TOKEN });

Best-effort and eventually-consistent: the fleet view reflects the last snapshot each live node pushed (stale nodes are evicted). The top-K merge is an honest additive merge of the per-node Space-Saving lists — it never drops a true fleet heavy hitter.

Security & defaults

  • On by default only on throttlekit-server, and only on loopback. The embedded handler is opt-in (you choose where to mount it — it's your process); the sidecar binds to 127.0.0.1 by default.
  • Exposing beyond loopback wants auth. A non-loopback host without TLS or a token logs a loud warning (mirroring the insecure-gRPC warning). Pass { token } (constant-time-compared Authorization: Bearer …) and/or { tls } (HTTPS, or mTLS with a caPath).
  • Strictly read-only. Only GET is served; there are no mutation endpoints. The board exposes keys/tenants, so on a shared host loopback isn't private — hence the off switch (--lens off) and the expose-requires-auth rule.

Honest scope (the non-claims)

  • The binding-axis lane needs unifiedAdmission; a single-axis rateLimit() has nothing to decompose (the board still works — it just shows policy/key attribution).
  • Numbers are eventually-consistent and per-window; top-K is Space-Saving (over-estimates, never misses a true heavy hitter).
  • The Guarantee panel is live overshoot by accumulation vs a computed line, rendered as headroom — not a native admitted_this_window field, and not a live model-checker.
  • This is not bit-exact replay; the Lens streams live decisions. Reproducing a past incident exactly is a future direction (deterministic replay against a candidate policy).

Built on

Two @experimental core primitives, both zero-dep and read-only:

  • admissionTap(admitter, onAdmission) — the multi-axis sibling of tapDecisions; fires once per completed unified admission with the combined decision, the binding axis, and the per-axis snapshot.
  • withAdmissionAnalytics(admitter, opts) — the lane-segmented fork of withAnalytics: allow/deny counters and Space-Saving top-K, partitioned by binding lane (with Σ deniedByLane === denied).

A plain rateLimit() feeds the universal board through the existing tapDecisions + withAnalytics; a unifiedAdmission additionally lights up the axis lane through these two.


See also: Operations (OTel, the analytics tap, headers, failure modes) · Unified admission (the rate × concurrency × cost decision the axis lane attributes) · Distributed & provable (the bound the Guarantee panel tracks) · the package README.

Clone this wiki locally