-
Notifications
You must be signed in to change notification settings - Fork 0
Security
How Argus is secured, what is exposed, and how to deploy it safely. For reporting a vulnerability, see SECURITY.md (private advisory via the GitHub Security tab, or email).
| Surface | Default | Gate |
|---|---|---|
Bot /metrics
|
open | scrape-only aggregate data; keep open for Prometheus, restrict at the network if needed |
Bot dashboard / + /api/*
|
open | set dashboard_auth_token (or ARGUS_DASHBOARD_AUTH_TOKEN) |
Bot /healthz
|
open | liveness only |
| Fleet control plane (all routes) | refuses to start on a public bind without a token |
ARGUS_FLEET_TOKEN (or split ingest/viewer), /healthz + /readyz open |
guild_id, user_id, and channel_id are never Prometheus labels - enforced
by a test, not convention. Per-entity questions go only to the optional analytical
path (ClickHouse), which is separate storage and fails closed without a token. The
fleet control plane reads aggregate metrics only and cannot expose per-entity data.
- Set
dashboard_auth_tokenfor anything not strictly localhost. It gates/and every/api/*route with a constant-time comparison;/metrics//healthzstay open so a scraper does not need the token. If the dashboard binds off-loopback with no token, Argus logs a startup warning pointing at the fix - it cannot refuse to start (that would also take down/metrics, which Prometheus needs), so the warning is your signal to add a token or bind to127.0.0.1. - The browser remembers a
?token=link in localStorage. Query-string tokens can land in proxy/access logs; prefer theAuthorization: Bearerheader for programmatic clients, and treat the link as a credential. - Every response carries the same security headers as the fleet plane
(
X-Frame-Options: DENY,X-Content-Type-Options: nosniff, a CSP, and aReferrer-Policy), the version banner is stripped, and the request body is capped. The live SSE stream caps concurrent connections so it cannot be turned into a resource-exhaustion vector. - Disable the UI entirely with
dashboard=Falseand rely on Grafana if you only want scraping. - On a shared/public host you can also gate
/metricsitself withmetrics_auth_token(orARGUS_METRICS_AUTH_TOKEN): the scraper then sendsAuthorization: Bearer <token>(or?token=). It is a separate token from the dashboard's, since the scraper is a different audience, and/healthzstays open. Prefer push (OTLP/Fleet) when you can avoid exposing a port at all.
Secure-by-default and hardened (see Fleet for the full table):
-
Refuse-insecure bind: a non-loopback bind with no token will not start. Set
a token, bind loopback, or
ARGUS_FLEET_INSECURE=1(local testing only). -
Split tokens (optional): a low-privilege
ARGUS_FLEET_INGEST_TOKENlives on every bot; anARGUS_FLEET_VIEWER_TOKENgates the UI/read APIs - so a leaked bot token cannot read the dashboard. Each falls back to the shared token.*_FILEvariants read from mounted secrets. - Token rotation (zero downtime): any token var accepts a comma-separated list; every value is accepted. Rotate by adding the new token, rolling it out, then dropping the old one - no window where requests are rejected.
-
Per-identity lease secret (optional): with
ARGUS_FLEET_REQUIRE_LEASE=1the control plane mints a high-entropy secret at register and a member must present it on every heartbeat/re-register; a mismatch is409. So even a leaked ingest token can't hijack an existing cluster's slot. The secret is stored only as an HMAC-SHA256 digest, optionally keyed byARGUS_FLEET_SECRET_PEPPERkept in a separate boundary from the state file; verification is constant-time. The member persists its lease0600and resends it automatically. -
Abuse resistance: request body cap (413); per-IP register, per-identity
heartbeat, and per-client read (
ARGUS_FLEET_READ_BURST, GET view/api/metrics) rate limits (429); and aARGUS_FLEET_MAX_CLUSTERScap./healthz//readyzstay exempt from the read limit. -
Audit log (optional):
ARGUS_FLEET_AUDIT_LOG=1emits one INFO line per ingest event (register/heartbeat) with the identity, client address, and outcome (includingdenied:lease). Inputs are sanitized for log injection; the secret is never logged. -
Scanner/fingerprint reduction: the version banner is stripped and security
headers (
X-Frame-Options: DENY, CSP,nosniff,Referrer-Policy) are sent on every response; unknown paths and bad tokens get flat, detail-free responses. - Single writer: an advisory lock on the state file refuses a second instance; the on-disk state is schema-versioned (unknown versions refuse to load).
-
Duplicate-identity detection: a
CLUSTER_ID/fleet_idreused from two hosts incrementsargus_fleet_identity_conflicts_totaland logs a warning. -
Owner-only state: the control-plane state file and the member's identity file
are written
0600. - Fail-open member: the bot's heartbeat is bounded and never raises into the bot loop, so a fleet outage cannot affect your bot.
The identity is an address, not a secret, but two processes must never share one
(it makes the per-fleet number and health flap). The robust way: derive it from a
stable per-instance value - set cluster_id (or fleet_id) from the
orchestrator (a StatefulSet pod name, the hostname, ...). The identity resolves to
fleet_id -> cluster_id -> a random per-instance UUID, so this is automatic.
Don'ts:
-
Don't bake the
argus-fleet-idfile into a container image - every copy would share it. Generate it on a per-instance writable volume, or setcluster_id/fleet_idinstead. -
Don't mount one
fleet_state_dirinto multiple replicas. -
Don't set the same
cluster_id/fleet_idon two processes.
You do not need CORS for the bundled, same-origin fleet UI - the browser only
ever talks to the control plane, which serves both the SPA and the API. Hosting the
bot and the dashboard on different machines/providers does not change this. CORS is
only needed if you serve the SPA from a different origin than the API; then set
ARGUS_FLEET_CORS_ORIGINS to an explicit allowlist (never a wildcard on a
token-gated surface).
-
Terminate TLS at a reverse proxy (or a tunnel) for any public deployment;
tokens are bearer credentials. Behind a proxy, set
ARGUS_FLEET_TRUSTED_PROXY=1so rate limits key on the real client IP, and addStrict-Transport-Securityat the proxy. -
Run as non-root. The published images already do (uid 10001) and ship a
HEALTHCHECK; the k8s example uses a read-only root filesystem and drops all capabilities. -
Restrict exposure. IP-allowlist or VPN-gate the control plane where possible;
a non-default port and
/metricsbehind the token trim opportunistic scanning. - Pin versions of the package and images so a mid-development change cannot reach a deployment.
-
Secrets: use
*_TOKEN_FILE/ Kubernetes Secrets rather than inline env vars where you can; never commit tokens.
- Releases publish to PyPI via OIDC Trusted Publishing with attestations (no stored long-lived credentials).
- A CycloneDX SBOM of the wheel is attached to each GitHub release; the GHCR
images (
argus,argus-fleet) ship their own SBOM and max provenance attestation. - CI runs CodeQL and a pip-audit dependency audit on every change, and Dependabot watches dependencies.
- The repo's threat model documents assets, trust boundaries, and the threats Argus defends against.