Monitoring and the Lens

Monitoring — ThrottleKit Lens

ThrottleKit Lens is the project's built-in monitoring dashboard — a zero-dependency, read-only board that runs right in your terminal: throttlekit-server --config x.yaml --tui, no browser and no metrics backend. It gives every policy the full operational board across eight tabbed views, and one view no other rate-limiter dashboard renders: live binding-axis attribution. Need it remotely, or from another language? The same state is a read-only gRPC service — the Monitor door (below).

throttlekit-server --config .throttlekit.yaml --tui
#  → gRPC on :50051  +  a live dashboard in your terminal

ThrottleKit · memory · fail-open · 4 policies                 12:04:01 · 60s window
ALLOW 1.2k   DENY 84   7% deny   ▁▂▃▅▇▅▃▂
 Overview │ Latency │ Fairness │ Capacity │ Guarantee │ Cost Room │ Replay │ Plan

─ BINDING AXIS  /  TOP DENIED KEYS ──────────────────────────────────────────────
unified-api  10 denied            user-42     ███████ 312
 rate        ████████████ 61%     tenant-aci  ████ 180
 concurrency █████ 24%            ip-10.0.0.7 ██ 96
 cost        ██ 10%  policy █ 5%

─ CONCURRENCY ───────────────────────────────────────────────────────────────────
checkout    6/8     rtt 12ms      unified-api 4/4  FENCED

─ DENIALS (live) ─────────────────────────────────────────────────────────────────
12:04:01 unified-api   user-42   [concurrency]  rem 0  retry 240ms
12:04:00 api           ip-10.0.0.7 [rate]        rem 0  retry 1.2s

q quit · 1-8/Tab views · ↑↓ scroll · p pause                       0.0.0.0:50051

The dashboard builds on the core's @experimental admissionTap / withAdmissionAnalytics primitives (needs throttlekit >= 1.1.0) and is itself @experimental — it lives outside the 1.x SemVer freeze. (It replaces the former browser-based throttlekit-lens package, which is deprecated.)

Already emitting OpenTelemetry? Keep doing that — the dashboard doesn't replace your metrics backend (see Operations for the OTel layer). It is the no-backend, see-it-now view for an interactive terminal, and it carries a signal Prometheus/Grafana structurally can't: which axis bound each denial, per key, with exact numbers.

The hero — which axis is throttling you, right now

Most rate-limit dashboards tell you that requests were denied. Because ThrottleKit composes rate × concurrency × cost in a single unifiedAdmission decision, this one tells you which constraint actually bound each denial — rate, concurrency, cost, or the joint-LP policy lane — as a live per-lane breakdown, beside the live denial feed where each row carries the lane tag and the exact per-axis numbers (remaining / retryAfter) that produced it: the literal "why was this throttled, with numbers."

This is structural, not a UI trick. bindingAxis is minted end-to-end inside unified admission. As of 1.2.0 you can also export it to Grafana as an aggregate counter — throttlekit.denies_by_axis{lane}, via instrumentAdmitter (the deliberate escape hatch for headless/production; see Operations and docs/METRICS.md). The terminal dashboard goes further than a counter can: a live, per-key view with the exact per-axis Decision a metric can't carry (per-key axis numbers would be unbounded label cardinality).

Universal — the full board for every policy

The axis lane is the premium layer for unified-admission users; the dashboard itself works for everyone. A plain rateLimit(), quota, twoTier, or concurrency policy gets the whole board plus "why throttled" attribution by policy/limiter + hot key — sourced from the same tapDecisions stream and withAnalytics top-K that already ship in the core. A single-axis limiter simply has one axis, so there is nothing to decompose; the dashboard says so rather than implying otherwise.

Running it

throttlekit-server --config .throttlekit.yaml --tui

It watches the server's decisions, so it works for Python / Go / any-language clients too — whatever drives the server shows up. The taps are synchronous, exception-swallowing, and O(1), so the dashboard can never perturb your control path or change a decision; the gRPC decisions are byte-for-byte unchanged.

Keys: 1–8 (or Tab / Shift-Tab) switch views · q (or Ctrl-C) quit · ↑ / ↓ scroll the denial feed · PgUp / PgDn page · g / G oldest / newest · p pause (freeze the feed).

A TUI inherently owns the terminal, so — unlike a loopback web page — it can't be on-by-default: it is opt-in (--tui) and needs an interactive TTY. Run without a TTY (a pipe, a container with no console, systemd) and the server prints a warning and serves normally without the dashboard. For headless / production monitoring, use OpenTelemetry → Grafana (the reference grafana/throttlekit-dashboard.json + the denies_by_axis counter).

The views

A persistent strip (store backend · fail-mode · policy count · window · clock, plus the window allow / deny totals and a deny sparkline) sits above eight tabbed views — press 1–8 or Tab:

Overview — the binding-axis hero (for the first unified admitter, denials by lane rate / concurrency / cost / policy as live bars), top denied keys (Space-Saving top-K, an upper bound — never misses a true heavy hitter), concurrency health, and the scrollable live denial feed (each row: policy, key, binding lane, exact remaining / retryAfter).
Latency — per-policy admit-path latency: avg / p50 / p99 / max + sample count over a rolling ring. A policy with no samples this window says so.
Fairness — for a weighted-fair-escrow policy, each tenant's guaranteed share vs used vs borrowed against the shared budget (green = within its guarantee, yellow = borrowed idle surplus).
Capacity — a non-consuming forecast for each policy's hottest key: spendable now, when capacity next returns (+1 in), and when it's fully replenished (full in). Sync-store limiters only; an async store / admitter / idle policy reads n/a honestly.
Guarantee — concurrency headroom to a known line (observed, never a "proof holding" needle): each guard's inflight vs its enforced ceiling, how many guards are draining over their slice, self-fence status, and the self-fence feed.
Cost Room — for a fairEscrow (or federatedFairEscrow) policy, the cost axis: per-tenant burn-down of the shared budget, spend rate, and an ETA to exhaustion — where the LLM tokens are going.
Replay — deterministic What-If Replay: the server shadows the live arrival stream through a cold copy of a leaf-rate limiter, and on r replays a configured candidate to read the allow↔deny flip ledger (an honest empty / truncated state when there's nothing to show). See Replay.
Plan — a whole-config "terraform plan" for limits: with --plan-candidate <config>, press P to diff the candidate against the running config over the recorded shadow traffic — the per-policy allow↔deny ledger, before you ship.

Lighting up Fairness & Guarantee on the server: add a fairEscrow policy block (a weighted-fair-escrow limiter served by check, the request key being the tenant) to feed Fairness; any policy with a concurrency block feeds Guarantee and the concurrency-health readout. Both also work when you mount the hub in your own app.

Read it remotely — the Monitor door

ThrottleKit Lens is the terminal view; the Monitor door is the same operational state as a read-only gRPC service (throttlekit.v1.Monitor), so any language can read it remotely — no terminal, no scraping. It runs on the same port as the rate limiter and is on by default (--monitor off to disable):

GetSnapshot returns a typed, point-in-time snapshot — per-policy allow / deny / limit / latency + top keys + concurrency-guard health + the recent denial feed — plus a raw_json field carrying the full dashboard snapshot for depth.
Watch opens a live, filtered denial stream (optionally one policy), each event the "why, with numbers" of a rejection — rate-capped and backpressured server-side, so a slow reader drops events rather than growing server memory.

The snapshot carries traffic keys (PII), so the door is loopback-only by default; set --monitor-secret (and pair it with TLS) to read it from another host. For metrics tooling, --metrics-port <n> serves Prometheus /metrics — aggregate, PII-free series (throttlekit_allowed_total, throttlekit_denied_total, the per-axis throttlekit_denied_by_axis_total, p50/p99 admit latency, guard health) — plus a /healthz probe; and the standard grpc.health.v1.Health service is always on. From Python, the new MonitorBackend / AsyncMonitorBackend read this door directly — see Polyglot & Python.

Honest scope (the non-claims)

The binding-axis lane needs unifiedAdmission; a single-axis rateLimit() has nothing to decompose (the board still works — it just shows policy/key attribution).
Numbers are eventually-consistent and per-window; top-K is Space-Saving (over-estimates, never misses a true heavy hitter).
It is a single-process view of the server you point it at — there is no built-in fleet merge in the terminal (use OTel/Grafana, which aggregates across a fleet).
The Guarantee view renders observed per-node status (inflight vs the guard's enforced ceiling), not a live proof — the fleet Σ inflight ≤ L_global bound is machine-checked in TLA⁺, not re-verified at runtime, and the per-key two-tier overshoot is a fleet property.
It is not bit-exact replay; it streams live decisions. Reproducing a past incident exactly is a future direction.
It is read-only and opt-in (interactive TTY); headless monitoring belongs on OTel.

Built on

Two @experimental core primitives, both zero-dep and read-only:

admissionTap(admitter, onAdmission) — the multi-axis sibling of tapDecisions; fires once per completed unified admission with the combined decision, the binding axis, and the per-axis snapshot.
withAdmissionAnalytics(admitter, opts) — the lane-segmented fork of withAnalytics: allow/deny counters and Space-Saving top-K, partitioned by binding lane (with Σ deniedByLane === denied).

A plain rateLimit() feeds the universal board through the existing tapDecisions + withAnalytics; a unifiedAdmission additionally lights up the axis lane through these two.

See also: Operations (OTel, the analytics tap, headers, failure modes) · Unified admission (the rate × concurrency × cost decision the axis lane attributes) · Distributed & provable (the overshoot bound the concurrency readout tracks).

ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)

ThrottleKit Wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring and the Lens

Monitoring — ThrottleKit Lens

The hero — which axis is throttling you, right now

Universal — the full board for every policy

Running it

The views

Read it remotely — the Monitor door

Honest scope (the non-claims)

Built on

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally