-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Short answers to the things people hit first. See Configuration and Dashboard for the full detail.
No. Argus(bot) is the whole integration: metrics at /metrics, the dashboard
at /, on port 9191. Everything else is opt-in.
Set one environment variable on the host/process that runs your bot:
ARGUS_DASHBOARD_AUTH_TOKEN=your-secretArgus reads it automatically (no kwarg needed) and uses it for both serving and
gating. There is nothing else to host: the dashboard is served by Argus inside
your bot process, not as a separate app. The token gates / and every /api/*
route; /metrics and /healthz stay open so a Prometheus scraper does not need
it.
You can also pass it in code: Argus(bot, dashboard_auth_token="..."). The kwarg
wins over the env var if both are set.
Open the dashboard once with the token in the URL and it is remembered in that browser (localStorage):
http://your-host:9191/?token=your-secret
After that, plain http://your-host:9191/ works. Programmatic clients send
Authorization: Bearer your-secret.
You do not. Argus serves the SPA from the same aiohttp server as /metrics, on
your bot's event loop. If you want it on a public URL, put a reverse proxy in
front of the bot's port and set the token. If you want it on a different path or
port, use dashboard_path / port.
Argus(bot, dashboard=False) (or ARGUS_DASHBOARD=false). /metrics still
serves.
By design. guild_id/user_id/channel_id are unbounded and would explode
Prometheus, so they are never labels (invariant 2). Per-guild figures live in the
analytical path: set enable_per_guild=true + clickhouse_dsn, and use the
dashboard's Analytics section. See History and ClickHouse.
It reflects the cache, which needs the members intent enabled on your bot to be meaningful.
Run one Argus per process with a distinct cluster_id; the cluster label
separates them and counter rates aggregate across the fleet. See Clustering.
The /metrics endpoint cannot be moved out of the bot process: gauges read live
bot state at scrape time. So collection is always in-process. What you centralise
is the view (Grafana, which aggregates all processes via Prometheus) and the
storage (one shared ClickHouse for analytics). The built-in per-process dashboard
is for a single process; use Grafana for a fleet view (set grafana_url).
No. If each cluster is on its own host/container/pod (the normal case), keep the
same port 9191 everywhere - they do not collide across hosts. Only co-located
processes need distinct ports, and then assign them via ARGUS_PORT from your
orchestrator and use Prometheus service discovery (Kubernetes PodMonitor,
file_sd, DNS) instead of a hand-written 100-target scrape config. Full guidance:
Clustering.
No measurable amount. Hooks are O(1), non-blocking, and fail-open (an instrumentation error is counted and swallowed, never raised into your bot). The metrics server runs on the bot's existing loop.
pip install "argus-dpy[otlp]" and set otlp_endpoint. Argus pushes via
OpenTelemetry in addition to the Prometheus endpoint. See OTLP.
docker compose up -d in the repo brings up a provisioned Prometheus + Grafana
with three dashboards. Point the dashboard's Grafana section at them with
grafana_url.
ghcr.io/astoristhebrave/argus:<version> ships the released SDK + the example
bot. Pin a version tag so a mid-development change can never reach your
deployment; :latest tracks the newest release. See Releasing.
The dashboard is blank / "waiting for the first sample". The bot has not
logged in yet, or it just started. The first snapshot appears within a few
seconds of the bot connecting. If it never appears, check the browser console and
that /metrics returns data.
Port already in use / OSError: address already in use. Another process
holds the port. Argus is fail-open here too: it logs the failure, sets
argus_subsystem_up{subsystem="server"} 0, and your bot keeps running normally -
it just serves no metrics until you fix the bind. Change ARGUS_PORT (or
port=), or stop the other process. In a clustered single-host deploy give each
process a distinct ARGUS_PORT. Alert on argus_subsystem_up == 0 to catch this.
Prometheus shows exported_cluster instead of cluster. You set a cluster
target label in your scrape config that clashes with Argus's own label;
Prometheus renames the conflicting one. Remove the target label. See
Clustering.
argus_instrumentation_errors_total is non-zero. Instrumentation is
fail-open: an error in a hook was counted and swallowed (your bot was never
affected). A rising counter is a signal to investigate (raise the argus logger
to DEBUG), not an outage.
discord_cached_users is 0 or wrong. Enable the members intent on the
bot (and in the Discord developer portal); the cache is empty without it.
401 on the dashboard. A token is set; open with ?token=... once, or send
Authorization: Bearer <token>. /metrics and /healthz stay open.
Behind a reverse proxy, client IPs look wrong. aiohttp does not trust
X-Forwarded-For by default. Terminate TLS and set real-IP headers at the proxy;
do not expose the bot/control-plane port directly.
It refuses to start: "refusing to bind ... without a token". Secure by
default: a non-loopback bind needs a token. Set ARGUS_FLEET_TOKEN (or
ARGUS_FLEET_TOKEN_FILE, or split ARGUS_FLEET_INGEST_TOKEN +
ARGUS_FLEET_VIEWER_TOKEN), bind to 127.0.0.1, or set ARGUS_FLEET_INSECURE=1
for local testing only.
A bot does not appear in the fleet. Check the bot's ARGUS_FLEET_URL is
reachable from the bot host and the token matches (the bot uses the ingest
token when split tokens are in use). Registration is fail-open and silent by
design; raise the argus/argus.fleet logger to DEBUG to see why. From the bot
host you can confirm reachability with curl http://fleet-host:9190/healthz
(open, no token); doctor is an operator tool and reads the view, so it needs the
viewer token: python -m argus.fleet doctor --url <fleet> --token <viewer>.
A cluster shows "down" but it is alive. Its heartbeat is not reaching the
control plane, or ARGUS_FLEET_HEARTBEAT_INTERVAL / ARGUS_FLEET_TTL_FACTOR are
too tight for its network. A cluster is up while now - last_seen <= interval * ttl_factor.
Per-fleet numbers reset after a restart. The state file was not persisted.
Set ARGUS_FLEET_STATE to durable storage (the container image mounts /data;
the wizard sets it). Without persistence, numbers restart from 1.
A second control plane will not start ("another argus-fleet process holds...").
By design: two processes must not share one state file. Run one control plane per
state file, or give the second its own ARGUS_FLEET_STATE.
The view shows clusters up but all metrics are 0. Likely a namespace
mismatch: the members' ARGUS_NAMESPACE must equal the control plane's. Confirm
with python -m argus.fleet doctor --url <fleet> --token <viewer> --namespace <expected>. With the push source, per-second rates are also 0 until a second
heartbeat arrives; for exact rates use the Prometheus source.
429 Too Many Requests from register/heartbeat. Rate limiting kicked in
(per-IP for register, per-identity for heartbeat). Raise
ARGUS_FLEET_REGISTER_BURST / ARGUS_FLEET_HEARTBEAT_BURST, or slow the caller.
413 Request Entity Too Large on heartbeat. The snapshot exceeded
ARGUS_FLEET_MAX_BODY_BYTES (256 KiB default). Raise it if you have a legitimate
reason; otherwise something is sending an oversized body.
403 fleet cluster cap reached. You hit ARGUS_FLEET_MAX_CLUSTERS. Raise the
cap (and check you are not leaking new identities; reuse a stable CLUSTER_ID /
fleet_id).
argus_fleet_identity_conflicts_total is climbing. Two processes are
registering under the same identity from different hosts (a duplicate
CLUSTER_ID/fleet_id), so the number/health flaps. Give each process a unique
identity (e.g. a StatefulSet pod name).
The fleet dashboard 401s but the bots register fine (or vice versa). You are
using split tokens: the viewer token gates the UI//api/* and the ingest
token gates register/heartbeat. Use the right one for each.
python -m argus.fleet ignores my .env. Autoload needs the extra:
pip install 'argus-dpy[fleet]' (or point ARGUS_FLEET_ENV_FILE at it). Without
it, load the .env via the generated compose env_file: or a systemd
EnvironmentFile=.
Do I need CORS for the fleet dashboard? No, unless you serve the SPA from a
different origin than the API (e.g. the UI on a CDN). The bundled same-origin UI
needs nothing. For a detached UI set ARGUS_FLEET_CORS_ORIGINS to the explicit
origin(s). Hosting the bot and the dashboard on different providers does not
require CORS.
No data at my backend. Check the collector first (a debug exporter proves
the bot reached it). If the collector sees data but the backend does not, the
problem is the collector's exporter/credentials, not Argus.
Connection refused / handshake errors. The endpoint must be the gRPC receiver
(:4317), reachable from the bot; use https:// only if the collector terminates
TLS. See the OTLP tutorial.
ImportError on start. Install the extra: pip install "argus-dpy[otlp]".
Analytics section is empty or returns 403. The analytics API fails closed
without dashboard_auth_token; set it and open with ?token=.
No rows appear. You need both enable_per_guild=true and a valid
clickhouse_dsn; with either missing, the sink is a no-op. Check connectivity to
ClickHouse's HTTP port (8123). See the analytics tutorial.
The events table is growing without bound. Add a ClickHouse TTL to
argus_events (e.g. drop rows older than 90 days).