-
Notifications
You must be signed in to change notification settings - Fork 0
Monitoring and the Lens
ThrottleKit Lens is the project's built-in monitoring dashboard — a zero-dependency, read-only board that runs right in your terminal: throttlekit-server --config x.yaml --tui, no browser and no metrics backend. It gives every policy the full operational board across eight tabbed views, and one view no other rate-limiter dashboard renders: live binding-axis attribution. Need it remotely, or from another language? The same state is a read-only gRPC service — the Monitor door (below).
throttlekit-server --config .throttlekit.yaml --tui
# → gRPC on :50051 + a live dashboard in your terminalThrottleKit · memory · fail-open · 4 policies 12:04:01 · 60s window
ALLOW 1.2k DENY 84 7% deny ▁▂▃▅▇▅▃▂
Overview │ Latency │ Fairness │ Capacity │ Guarantee │ Cost Room │ Replay │ Plan
─ BINDING AXIS / TOP DENIED KEYS ──────────────────────────────────────────────
unified-api 10 denied user-42 ███████ 312
rate ████████████ 61% tenant-aci ████ 180
concurrency █████ 24% ip-10.0.0.7 ██ 96
cost ██ 10% policy █ 5%
─ CONCURRENCY ───────────────────────────────────────────────────────────────────
checkout 6/8 rtt 12ms unified-api 4/4 FENCED
─ DENIALS (live) ─────────────────────────────────────────────────────────────────
12:04:01 unified-api user-42 [concurrency] rem 0 retry 240ms
12:04:00 api ip-10.0.0.7 [rate] rem 0 retry 1.2s
q quit · 1-8/Tab views · ↑↓ scroll · p pause 0.0.0.0:50051
The dashboard builds on the core's
@experimentaladmissionTap/withAdmissionAnalyticsprimitives (needsthrottlekit >= 1.1.0) and is itself@experimental— it lives outside the 1.x SemVer freeze. (It replaces the former browser-basedthrottlekit-lenspackage, which is deprecated.)
Already emitting OpenTelemetry? Keep doing that — the dashboard doesn't replace your metrics backend (see Operations for the OTel layer). It is the no-backend, see-it-now view for an interactive terminal, and it carries a signal Prometheus/Grafana structurally can't: which axis bound each denial, per key, with exact numbers.
Most rate-limit dashboards tell you that requests were denied. Because ThrottleKit composes rate × concurrency × cost in a single unifiedAdmission decision, this one tells you which constraint actually bound each denial — rate, concurrency, cost, or the joint-LP policy lane — as a live per-lane breakdown, beside the live denial feed where each row carries the lane tag and the exact per-axis numbers (remaining / retryAfter) that produced it: the literal "why was this throttled, with numbers."
This is structural, not a UI trick. bindingAxis is minted end-to-end inside unified admission. As of 1.2.0 you can also export it to Grafana as an aggregate counter — throttlekit.denies_by_axis{lane}, via instrumentAdmitter (the deliberate escape hatch for headless/production; see Operations and docs/METRICS.md). The terminal dashboard goes further than a counter can: a live, per-key view with the exact per-axis Decision a metric can't carry (per-key axis numbers would be unbounded label cardinality).
The axis lane is the premium layer for unified-admission users; the dashboard itself works for everyone. A plain rateLimit(), quota, twoTier, or concurrency policy gets the whole board plus "why throttled" attribution by policy/limiter + hot key — sourced from the same tapDecisions stream and withAnalytics top-K that already ship in the core. A single-axis limiter simply has one axis, so there is nothing to decompose; the dashboard says so rather than implying otherwise.
throttlekit-server --config .throttlekit.yaml --tuiIt watches the server's decisions, so it works for Python / Go / any-language clients too — whatever drives the server shows up. The taps are synchronous, exception-swallowing, and O(1), so the dashboard can never perturb your control path or change a decision; the gRPC decisions are byte-for-byte unchanged.
Keys: 1–8 (or Tab / Shift-Tab) switch views · q (or Ctrl-C) quit · ↑ / ↓ scroll the denial feed · PgUp / PgDn page · g / G oldest / newest · p pause (freeze the feed).
A TUI inherently owns the terminal, so — unlike a loopback web page — it can't be on-by-default: it is opt-in (--tui) and needs an interactive TTY. Run without a TTY (a pipe, a container with no console, systemd) and the server prints a warning and serves normally without the dashboard. For headless / production monitoring, use OpenTelemetry → Grafana (the reference grafana/throttlekit-dashboard.json + the denies_by_axis counter).
A persistent strip (store backend · fail-mode · policy count · window · clock, plus the window allow / deny totals and a deny sparkline) sits above eight tabbed views — press 1–8 or Tab:
-
Overview — the binding-axis hero (for the first unified admitter, denials by lane
rate/concurrency/cost/policyas live bars), top denied keys (Space-Saving top-K, an upper bound — never misses a true heavy hitter), concurrency health, and the scrollable live denial feed (each row: policy, key, binding lane, exactremaining/retryAfter). - Latency — per-policy admit-path latency: avg / p50 / p99 / max + sample count over a rolling ring. A policy with no samples this window says so.
- Fairness — for a weighted-fair-escrow policy, each tenant's guaranteed share vs used vs borrowed against the shared budget (green = within its guarantee, yellow = borrowed idle surplus).
-
Capacity — a non-consuming forecast for each policy's hottest key: spendable now, when capacity next returns (
+1 in), and when it's fully replenished (full in). Sync-store limiters only; an async store / admitter / idle policy readsn/ahonestly. - Guarantee — concurrency headroom to a known line (observed, never a "proof holding" needle): each guard's inflight vs its enforced ceiling, how many guards are draining over their slice, self-fence status, and the self-fence feed.
-
Cost Room — for a
fairEscrow(orfederatedFairEscrow) policy, the cost axis: per-tenant burn-down of the shared budget, spend rate, and an ETA to exhaustion — where the LLM tokens are going. -
Replay — deterministic What-If Replay: the server shadows the live arrival stream through a cold copy of a leaf-rate limiter, and on
rreplays a configured candidate to read the allow↔deny flip ledger (an honest empty / truncated state when there's nothing to show). See Replay. -
Plan — a whole-config "terraform plan" for limits: with
--plan-candidate <config>, pressPto diff the candidate against the running config over the recorded shadow traffic — the per-policy allow↔deny ledger, before you ship.
Lighting up Fairness & Guarantee on the server: add a fairEscrow policy block (a weighted-fair-escrow limiter served by check, the request key being the tenant) to feed Fairness; any policy with a concurrency block feeds Guarantee and the concurrency-health readout. Both also work when you mount the hub in your own app.
ThrottleKit Lens is the terminal view; the Monitor door is the same operational state as a read-only gRPC service (throttlekit.v1.Monitor), so any language can read it remotely — no terminal, no scraping. It runs on the same port as the rate limiter and is on by default (--monitor off to disable):
-
GetSnapshotreturns a typed, point-in-time snapshot — per-policy allow / deny / limit / latency + top keys + concurrency-guard health + the recent denial feed — plus araw_jsonfield carrying the full dashboard snapshot for depth. -
Watchopens a live, filtered denial stream (optionally onepolicy), each event the "why, with numbers" of a rejection — rate-capped and backpressured server-side, so a slow reader drops events rather than growing server memory.
The snapshot carries traffic keys (PII), so the door is loopback-only by default; set --monitor-secret (and pair it with TLS) to read it from another host. For metrics tooling, --metrics-port <n> serves Prometheus /metrics — aggregate, PII-free series (throttlekit_allowed_total, throttlekit_denied_total, the per-axis throttlekit_denied_by_axis_total, p50/p99 admit latency, guard health) — plus a /healthz probe; and the standard grpc.health.v1.Health service is always on. From Python, the new MonitorBackend / AsyncMonitorBackend read this door directly — see Polyglot & Python.
- The binding-axis lane needs
unifiedAdmission; a single-axisrateLimit()has nothing to decompose (the board still works — it just shows policy/key attribution). - Numbers are eventually-consistent and per-window; top-K is Space-Saving (over-estimates, never misses a true heavy hitter).
- It is a single-process view of the server you point it at — there is no built-in fleet merge in the terminal (use OTel/Grafana, which aggregates across a fleet).
- The Guarantee view renders observed per-node status (inflight vs the guard's enforced ceiling), not a live proof — the fleet
Σ inflight ≤ L_globalbound is machine-checked in TLA⁺, not re-verified at runtime, and the per-key two-tier overshoot is a fleet property. - It is not bit-exact replay; it streams live decisions. Reproducing a past incident exactly is a future direction.
- It is read-only and opt-in (interactive TTY); headless monitoring belongs on OTel.
Two @experimental core primitives, both zero-dep and read-only:
-
admissionTap(admitter, onAdmission)— the multi-axis sibling oftapDecisions; fires once per completed unified admission with the combined decision, the binding axis, and the per-axis snapshot. -
withAdmissionAnalytics(admitter, opts)— the lane-segmented fork ofwithAnalytics: allow/deny counters and Space-Saving top-K, partitioned by binding lane (withΣ deniedByLane === denied).
A plain rateLimit() feeds the universal board through the existing tapDecisions + withAnalytics; a unifiedAdmission additionally lights up the axis lane through these two.
See also: Operations (OTel, the analytics tap, headers, failure modes) · Unified admission (the rate × concurrency × cost decision the axis lane attributes) · Distributed & provable (the overshoot bound the concurrency readout tracks).
ThrottleKit · MIT · 1.0 — API frozen under SemVer (Stability)
- Getting Started
- Choosing a strategy
- Frameworks & the edge
- Distributed & provable
- Federation
- Scaling & the Fleet
- Unified admission
- Pillar 4 — Weighted Fair Escrow
- Middleware integration
- Distributed adaptive concurrency
- Advanced limiting
- Overload, fairness & DDoS
- Operations
- Monitoring — ThrottleKit Lens
- Policy Plans
- Replay
- Performance
- Migrating
- Polyglot & Python
- GALE & TALE