Clustering

Argus supports single-process and clustered (multi-process) deployments. The cluster label keeps them apart and is the only thing you must set per process.

Single process

One AutoShardedBot, one Argus, one endpoint exposing all shards. cluster_id optional (defaults to default).

Argus(bot)

Clustered

Run one Argus per process, each with a distinct cluster_id and port:

Argus(bot, cluster_id="0", port=9191)   # process 0, shards 0..n
Argus(bot, cluster_id="1", port=9192)   # process 1, shards n+1..m

State gauges carry the distinct cluster label; every counter and the duration histogram carry it too, so per-cluster breakdowns work directly:

sum by (cluster) (rate(discord_interactions_total[5m]))
sum by (cluster) (discord_guilds)

Counter rates aggregate across the fleet by simply dropping the by (cluster).

Prometheus scrape config

List every process; do not also set a cluster target label, or Prometheus renames Argus's own cluster label to exported_cluster to avoid the clash:

scrape_configs:
  - job_name: argus
    static_configs:
      - targets:
          - "host.docker.internal:9191"
          - "host.docker.internal:9192"

examples/clustered_bot.py shows the per-process pattern driven by env vars (CLUSTER_ID, ARGUS_PORT, SHARD_IDS, SHARD_COUNT).

Per-shard metrics

discord_shard_latency_seconds{shard} and discord_shard_up{shard} are per-shard; shard ids are globally unique across a clustered deploy (each process owns a disjoint range), so they need no cluster qualifier to disambiguate.

Hosting at scale (e.g. 100 clusters) and ports

The metrics endpoint must live inside each bot process: gauges read live bot state (bot.latencies, bot.guilds, ...) at scrape time, so /metrics cannot be moved to a separate process. You run one Argus per process; what you centralise is the view and the storage, not the collection.

Ports. Do not hand-allocate a contiguous range like 9191..9290.

Separate hosts/containers/pods (the normal case at this scale): keep the same port (9191) on every process. They do not collide because each has its own network namespace/IP. Prometheus scrapes each at host:9191.
Co-located on one host (not recommended at 100): use distinct ports, assigned by the orchestrator via ARGUS_PORT per process, and discover targets with Prometheus service discovery, not a hand-written 100-target config.

Scrape config. Use service discovery rather than a static list at this size.

Kubernetes: one pod per cluster, containerPort: 9191, a PodMonitor/ ServiceMonitor selecting them. The cluster label is already on the metrics.
VMs/bare metal: Prometheus file_sd or DNS SD listing the hosts.

A single pane across all clusters. The built-in dashboard is per-process (it shows the one process it is attached to). For a fleet view across 100 clusters, use Grafana: it scrapes every process via Prometheus and aggregates with PromQL (sum by (cluster) (...)). The repo ships provisioned Grafana dashboards; set grafana_url so each process's dashboard links to them. A common production setup is dashboard=False on the bots (rely on Grafana for the fleet) while /metrics stays scraped.

Already separate. ClickHouse analytics is external and shared: all processes write to one ClickHouse and queries run anywhere (see History and ClickHouse).

Roadmap idea (not shipped): a mode where the built-in SPA reads from a Prometheus base URL (PromQL) instead of a single /metrics, making it a standalone, separately-hostable fleet dashboard. Open an issue if you want it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering

Clustering

Single process

Clustered

Prometheus scrape config

Per-shard metrics

Hosting at scale (e.g. 100 clusters) and ports

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Argus

Tutorials

Clone this wiki locally