Skip to content

Monitoring and the Lens

Ameya Borkar edited this page Jun 10, 2026 · 5 revisions

Monitoring — ThrottleKit Lens

ThrottleKit Lens is the project's built-in monitoring dashboard — a zero-dependency, read-only board that runs right in your terminal: throttlekit-server --config x.yaml --tui, no browser and no metrics backend. It gives every policy the full operational board across eight tabbed views, and one view no other rate-limiter dashboard renders: live binding-axis attribution. Need it remotely, or from another language? The same state is a read-only gRPC service — the Monitor door (below).

throttlekit-server --config .throttlekit.yaml --tui
#  → gRPC on :50051  +  a live dashboard in your terminal
ThrottleKit · memory · fail-open · 4 policies                 12:04:01 · 60s window
ALLOW 1.2k   DENY 84   7% deny   ▁▂▃▅▇▅▃▂
 Overview │ Latency │ Fairness │ Capacity │ Guarantee │ Cost Room │ Replay │ Plan

─ BINDING AXIS  /  TOP DENIED KEYS ──────────────────────────────────────────────
unified-api  10 denied            user-42     ███████ 312
 rate        ████████████ 61%     tenant-aci  ████ 180
 concurrency █████ 24%            ip-10.0.0.7 ██ 96
 cost        ██ 10%  policy █ 5%

─ CONCURRENCY ───────────────────────────────────────────────────────────────────
checkout    6/8     rtt 12ms      unified-api 4/4  FENCED

─ DENIALS (live) ─────────────────────────────────────────────────────────────────
12:04:01 unified-api   user-42   [concurrency]  rem 0  retry 240ms
12:04:00 api           ip-10.0.0.7 [rate]        rem 0  retry 1.2s

q quit · 1-8/Tab views · ↑↓ scroll · p pause                       0.0.0.0:50051

The dashboard builds on the core's @experimental admissionTap / withAdmissionAnalytics primitives (needs throttlekit >= 1.1.0) and is itself @experimental — it lives outside the 1.x SemVer freeze. (It replaces the former browser-based throttlekit-lens package, which is deprecated.)

Already emitting OpenTelemetry? Keep doing that — the dashboard doesn't replace your metrics backend (see Operations for the OTel layer). It is the no-backend, see-it-now view for an interactive terminal, and it carries a signal Prometheus/Grafana structurally can't: which axis bound each denial, per key, with exact numbers.

The hero — which axis is throttling you, right now

Most rate-limit dashboards tell you that requests were denied. Because ThrottleKit composes rate × concurrency × cost in a single unifiedAdmission decision, this one tells you which constraint actually bound each denialrate, concurrency, cost, or the joint-LP policy lane — as a live per-lane breakdown, beside the live denial feed where each row carries the lane tag and the exact per-axis numbers (remaining / retryAfter) that produced it: the literal "why was this throttled, with numbers."

This is structural, not a UI trick. bindingAxis is minted end-to-end inside unified admission. As of 1.2.0 you can also export it to Grafana as an aggregate counter — throttlekit.denies_by_axis{lane}, via instrumentAdmitter (the deliberate escape hatch for headless/production; see Operations and docs/METRICS.md). The terminal dashboard goes further than a counter can: a live, per-key view with the exact per-axis Decision a metric can't carry (per-key axis numbers would be unbounded label cardinality).

Universal — the full board for every policy

The axis lane is the premium layer for unified-admission users; the dashboard itself works for everyone. A plain rateLimit(), quota, twoTier, or concurrency policy gets the whole board plus "why throttled" attribution by policy/limiter + hot key — sourced from the same tapDecisions stream and withAnalytics top-K that already ship in the core. A single-axis limiter simply has one axis, so there is nothing to decompose; the dashboard says so rather than implying otherwise.

Running it

throttlekit-server --config .throttlekit.yaml --tui

It watches the server's decisions, so it works for Python / Go / any-language clients too — whatever drives the server shows up. The taps are synchronous, exception-swallowing, and O(1), so the dashboard can never perturb your control path or change a decision; the gRPC decisions are byte-for-byte unchanged.

Keys: 18 (or Tab / Shift-Tab) switch views · q (or Ctrl-C) quit · / scroll the denial feed · PgUp / PgDn page · g / G oldest / newest · p pause (freeze the feed).

A TUI inherently owns the terminal, so — unlike a loopback web page — it can't be on-by-default: it is opt-in (--tui) and needs an interactive TTY. Run without a TTY (a pipe, a container with no console, systemd) and the server prints a warning and serves normally without the dashboard. For headless / production monitoring, use OpenTelemetry → Grafana (the reference grafana/throttlekit-dashboard.json + the denies_by_axis counter).

The views

A persistent strip (store backend · fail-mode · policy count · window · clock, plus the window allow / deny totals and a deny sparkline) sits above eight tabbed views — press 18 or Tab:

  • Overview — the binding-axis hero (for the first unified admitter, denials by lane rate / concurrency / cost / policy as live bars), top denied keys (Space-Saving top-K, an upper bound — never misses a true heavy hitter), concurrency health, and the scrollable live denial feed (each row: policy, key, binding lane, exact remaining / retryAfter).
  • Latency — per-policy admit-path latency: avg / p50 / p99 / max + sample count over a rolling ring. A policy with no samples this window says so.
  • Fairness — for a weighted-fair-escrow policy, each tenant's guaranteed share vs used vs borrowed against the shared budget (green = within its guarantee, yellow = borrowed idle surplus).
  • Capacity — a non-consuming forecast for each policy's hottest key: spendable now, when capacity next returns (+1 in), and when it's fully replenished (full in). Sync-store limiters only; an async store / admitter / idle policy reads n/a honestly.
  • Guarantee — concurrency headroom to a known line (observed, never a "proof holding" needle): each guard's inflight vs its enforced ceiling, how many guards are draining over their slice, self-fence status, and the self-fence feed.
  • Cost Room — for a fairEscrow (or federatedFairEscrow) policy, the cost axis: per-tenant burn-down of the shared budget, spend rate, and an ETA to exhaustion — where the LLM tokens are going.
  • Replay — deterministic What-If Replay: the server shadows the live arrival stream through a cold copy of a leaf-rate limiter, and on r replays a configured candidate to read the allow↔deny flip ledger (an honest empty / truncated state when there's nothing to show). See Replay.
  • Plan — a whole-config "terraform plan" for limits: with --plan-candidate <config>, press P to diff the candidate against the running config over the recorded shadow traffic — the per-policy allow↔deny ledger, before you ship.

Lighting up Fairness & Guarantee on the server: add a fairEscrow policy block (a weighted-fair-escrow limiter served by check, the request key being the tenant) to feed Fairness; any policy with a concurrency block feeds Guarantee and the concurrency-health readout. Both also work when you mount the hub in your own app.

Read it remotely — the Monitor door

ThrottleKit Lens is the terminal view; the Monitor door is the same operational state as a read-only gRPC service (throttlekit.v1.Monitor), so any language can read it remotely — no terminal, no scraping. It runs on the same port as the rate limiter and is on by default (--monitor off to disable):

  • GetSnapshot returns a typed, point-in-time snapshot — per-policy allow / deny / limit / latency + top keys + concurrency-guard health + the recent denial feed — plus a raw_json field carrying the full dashboard snapshot for depth.
  • Watch opens a live, filtered denial stream (optionally one policy), each event the "why, with numbers" of a rejection — rate-capped and backpressured server-side, so a slow reader drops events rather than growing server memory.

The snapshot carries traffic keys (PII), so the door is loopback-only by default; set --monitor-secret (and pair it with TLS) to read it from another host. For metrics tooling, --metrics-port <n> serves Prometheus /metrics — aggregate, PII-free series (throttlekit_allowed_total, throttlekit_denied_total, the per-axis throttlekit_denied_by_axis_total, p50/p99 admit latency, guard health) — plus a /healthz probe; and the standard grpc.health.v1.Health service is always on. From Python, the new MonitorBackend / AsyncMonitorBackend read this door directly — see Polyglot & Python.

Honest scope (the non-claims)

  • The binding-axis lane needs unifiedAdmission; a single-axis rateLimit() has nothing to decompose (the board still works — it just shows policy/key attribution).
  • Numbers are eventually-consistent and per-window; top-K is Space-Saving (over-estimates, never misses a true heavy hitter).
  • It is a single-process view of the server you point it at — there is no built-in fleet merge in the terminal (use OTel/Grafana, which aggregates across a fleet).
  • The Guarantee view renders observed per-node status (inflight vs the guard's enforced ceiling), not a live proof — the fleet Σ inflight ≤ L_global bound is machine-checked in TLA⁺, not re-verified at runtime, and the per-key two-tier overshoot is a fleet property.
  • It is not bit-exact replay; it streams live decisions. Reproducing a past incident exactly is a future direction.
  • It is read-only and opt-in (interactive TTY); headless monitoring belongs on OTel.

Built on

Two @experimental core primitives, both zero-dep and read-only:

  • admissionTap(admitter, onAdmission) — the multi-axis sibling of tapDecisions; fires once per completed unified admission with the combined decision, the binding axis, and the per-axis snapshot.
  • withAdmissionAnalytics(admitter, opts) — the lane-segmented fork of withAnalytics: allow/deny counters and Space-Saving top-K, partitioned by binding lane (with Σ deniedByLane === denied).

A plain rateLimit() feeds the universal board through the existing tapDecisions + withAnalytics; a unifiedAdmission additionally lights up the axis lane through these two.


See also: Operations (OTel, the analytics tap, headers, failure modes) · Unified admission (the rate × concurrency × cost decision the axis lane attributes) · Distributed & provable (the overshoot bound the concurrency readout tracks).

Clone this wiki locally