Skip to content

Dashboard

AstorisTheBrave edited this page Jun 21, 2026 · 4 revisions

Dashboard

A React/Vite SPA, bundled into the wheel and served by Argus at dashboard_path (default /) on the bot's loop. It is a single pane: Overview, Interactions, Gateway, Grafana, and Analytics. Disable with Argus(bot, dashboard=False).

Routes

build_app(registry, metrics_path, dashboard=..., middlewares=...) mounts:

Route Purpose
GET {dashboard_path} SPA index.html
GET {dashboard_path}assets/* hashed JS/CSS/fonts
GET /api/config {namespace, metrics_path, grafana_url, analytics_enabled, version, auth_required}
GET /api/stream SSE: a metric snapshot now, then every dashboard_intervals
GET /api/analytics/interaction-volume per-guild daily counts (analytics)
GET /api/analytics/command-stats per-guild count + avg ms per command
GET /api/analytics/avg-duration per-guild overall avg ms
GET /metrics, /healthz unchanged

Snapshot protocol

/api/stream is text/event-stream. Each event is data: <json>\n\n where the JSON is build_snapshot(registry):

{"metrics": {"discord_guilds": {"type": "gauge",
  "samples": [{"name": "discord_guilds", "labels": {"cluster": "default"}, "value": 3.0}]}}}

It walks CollectorRegistry.collect(), so gauges are still read live at scrape time and every exposed metric is covered automatically. Note prometheus_client strips _total from a counter's family name; the samples keep it (e.g. family discord_interactions, sample discord_interactions_total).

The SPA prefers SSE and falls back to polling /metrics (parsed client-side into the same shape). It keeps a rolling buffer of recent snapshots to draw rates and sparklines (uPlot), and computes histogram quantiles client-side.

Auth

If dashboard_auth_token is set, an aiohttp middleware requires Authorization: Bearer <token> or ?token= (EventSource cannot set headers), compared with hmac.compare_digest. /healthz and the metrics path stay open so a Prometheus scraper does not need the token. The SPA reads the token from ?token= or localStorage and shows a prompt on 401.

Analytics section

Shown only when /api/config.analytics_enabled is true (i.e. enable_per_guild

  • clickhouse_dsn). Enter a guild id to load command stats (count + avg ms), overall average duration, and interaction volume. The API fails closed (403) without a token. See History and ClickHouse.

Grafana section

When grafana_url is set, links and embeds the four provisioned dashboards (argus-overview, argus-interactions, argus-gateway, argus-health); otherwise an empty-state with setup guidance.

Alerting and recording rules

The self-host stack ships Prometheus rules in prometheus/rules/argus.rules.yml (loaded via rule_files in prometheus/prometheus.yml and mounted by docker-compose.yml). They are conservative starting points - tune thresholds and for windows to your bot:

  • Recording rules: per-cluster app/prefix command error ratio (multi-window: 5m/30m/1h/6h) and p95 app command latency, reused by the Health dashboard.
  • Alerts: ArgusDown, ArgusSubsystemDown, ArgusInstrumentationErrors, ArgusHistoryEventsDropped, DiscordShardsDisconnected, DiscordHighCommandErrorRatio, DiscordRateLimited.
  • SLO burn-rate alerts (99% app-command success, 1% budget): a fast burn (ArgusAppCommandErrorBudgetBurnFast, pages) and a slow burn (ArgusAppCommandErrorBudgetBurnSlow, tickets), following the Google SRE multi-window multi-burn-rate pattern.

The Argus - Health Grafana dashboard (argus-health) visualises argus_up, argus_subsystem_up, instrumentation-error and history-drop rates, the command error ratio, and p95 latency, plus a Golden signals (RED) row: traffic (command throughput), errors (error ratio), and saturation (dropped events).

The rules are unit-tested with promtool test rules in CI, so a broken alert or recording rule fails the build.

Security

With no token the dashboard exposes the same operational data as /metrics to anyone who can reach the port. For anything public, set dashboard_auth_token and bind to localhost or sit behind a reverse proxy.

Building the SPA (contributors)

The SPA lives in frontend/ (React 19, Vite, uPlot, lucide-react, Nimble tokens). npm run dev proxies /api and /metrics to localhost:9191. npm run build outputs to frontend/dist; the hatch build hook copies it into src/argus/dashboard/static/ at wheel build (end users need no Node). Only two subset woff2 fonts ship; .ttf is excluded from the wheel.

Fleet mode

The same SPA bundle also powers the Fleet control plane: when /api/config reports fleet: true, the app renders the Global -> Fleet -> Cluster drill-down (with per-cluster trend sparklines) instead of the per-process dashboard. The per-process view above is unchanged.

Clone this wiki locally