-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture and Invariants
AstorisTheBrave edited this page Jun 20, 2026
·
1 revision
discord.py bot
│ events, state
▼
core hooks -> instrumentation -> MetricRegistry (neutral model)
│ reads from
├───────────────┬───────────────────┐
▼ ▼ ▼
adapters/ adapters/ history/ (optional)
prometheus otlp EventSink -> ClickHouse
│
▼
exposition (aiohttp): /metrics, /healthz, dashboard, /api/*
-
core (
src/argus/core/):collector.py(the neutralMetricRegistry,MetricDef,MetricKind, theMetricBackendprotocol),metrics.py(the catalogue + scrape-time gauge callbacks),hooks.py(listener registration + teardown),instrumentation.py(the hook bodies + the fail-open wrapper). -
adapters (
src/argus/adapters/):base.py(AdapterABC),prometheus.py(custom collector + held Counter/Histogram/Info on oneCollectorRegistry),otlp.py(OpenTelemetry push). -
exposition (
src/argus/exposition/server.py):build_app(registry, ...)factory +start_server(AppRunner/TCPSite, noweb.run_app). -
dashboard (
src/argus/dashboard/): route registration, auth middleware, the JSON snapshot, the built SPA understatic/. -
history (
src/argus/history/):EventSinkABC,NullSink,BatchingSink,ClickHouseSink, andAnalyticsQuery.
These are enforced by structure and tests, not convention.
-
Collection is decoupled from exposition.
corenever imports an adapter. Adapters depend oncorevia theMetricBackendprotocol. A test parses thecorepackage AST and fails if it importsprometheus_client,opentelemetry,clickhouse, orargus.adapters. -
No unbounded-cardinality label.
guild_id/user_id/channel_idare never Prometheus labels. Allowed labels are bounded:shard,cluster,command,event,status,type,error_type,logger,level,hook. A test asserts the forbidden set appears in no metric definition or exposition. - Hooks are O(1) and non-blocking. Hook bodies are synchronous, do counter increments and label lookups only, and never await I/O. An overhead benchmark fires 20k events and asserts sub-0.5 ms/event.
- Gauges are read at scrape time. Live state (latency, counts, shard state) is read by a custom collector each scrape via the neutral gauge callback. No background poller caches values.
-
The SDK fails open. Every hook runs inside
_safe; an exception is counted (argus_instrumentation_errors_total), logged, and swallowed. It is never re-raised into the bot loop. -
One config object. Everything reads a single resolved
ArgusConfig. No module reads env vars or defaults independently. -
Operational and analytical paths are separate.
enable_per_guildonly affects the history sink; it can never add aguild_idlabel to Prometheus. The analytics API fails closed without a token.
Argus(bot) constructs an ArgusCog: it builds the registry, defines the
catalogue, attaches the Prometheus adapter (and OTLP if otlp_endpoint is set),
builds the sink, registers listeners synchronously, and sets the info metric.
It then chains bot.setup_hook to await bot.add_cog(cog). The cog's
cog_load starts the aiohttp server; cog_unload removes listeners, cleans up
the runner, closes the sink and any analytics client.
App command errors have no bot event, so CommandTree.on_error is chained (the
original is preserved and still runs). Rate-limit/log signal comes from a
logging.Handler on the discord logger hierarchy, never by monkeypatching the
HTTP client.