[Phase 4] Leader election and coordination (Raft)

Implement Raft-based leader election to coordinate distributed load tests. No etcd dependency — Raft runs embedded in each node via the `openraft` crate.

## Cluster mode is opt-in

Cluster mode is **disabled by default**. Without `CLUSTER_ENABLED=true` the binary runs exactly as it does today — single node, config from env vars / YAML, no Raft, no gRPC.

```
CLUSTER_ENABLED=false   # default — standalone mode, no cluster behaviour
CLUSTER_ENABLED=true    # opt-in to Raft cluster formation
```

## Node Discovery

### Local / Dev — HashiCorp Consul DNS

Nodes discover each other via Consul DNS — no hardcoded IP list needed locally.

Each node registers itself as a Consul service (`loadtest-cluster`) on startup and exposes an HTTP health endpoint that Consul polls. Consul DNS `loadtest-cluster.service.consul` then resolves to all nodes with a passing health check.

#### Health check states

The health endpoint (`GET /health/cluster`) returns the node's current Raft state. Consul tracks this and updates the node's service tags accordingly:

| Tag | Meaning | DNS visible |
|---|---|---|
| `forming` | Node started, waiting to reach quorum | yes — `forming.loadtest-cluster.service.consul` |
| `follower` | In cluster, running as a Raft follower | yes — `follower.loadtest-cluster.service.consul` |
| `leader` | Elected Raft leader / coordinator | yes — `leader.loadtest-cluster.service.consul` |

The untagged `loadtest-cluster.service.consul` resolves to **all** healthy nodes regardless of state, so a new node can query it to find peers to join.

Health check response (JSON):
```json
{
  state: leader,
  node_id: node-dev-1,
  leader_id: node-dev-1,
  term: 3,
  peers: 3,
  cluster_ready: true
}
```

Consul service registration on each node startup:
```json
{
  name: loadtest-cluster,
  port: 7000,
  tags: [forming],
  checks: [{
    http: http://localhost:8080/health/cluster,
    interval: 5s,
    timeout: 2s
  }]
}
```

Tags are updated dynamically as Raft state changes: `forming → follower → leader`.
A node that loses leader status updates its tag back to `follower` automatically.

Env vars:
```
DISCOVERY_MODE=consul
CONSUL_ADDR=http://127.0.0.1:8500
CONSUL_SERVICE_NAME=loadtest-cluster   # default
```

### GCP / Production — Static peer list auto-join

```
CLUSTER_NODES=10.1.0.5:7000,10.2.0.5:7000,10.3.0.5:7000
```

Each node reads the peer list, attempts gRPC handshake, and joins the Raft cluster. First node to achieve quorum becomes the initial leader.

## Key behaviours
- Single coordinator (leader) per test run — no split-brain
- Automatic failover: if the leader dies, remaining nodes elect a new one (quorum = majority)
- After election, leader retrieves test config from GCS or Consul KV (Issue #76) and distributes it to all followers via gRPC (Issue #46)
- Followers stream metrics back to leader for aggregation (Issue #49)
- Node IDs are derived from hostname + region tag so they are stable across restarts

## Implementation notes
- `openraft` crate for Raft state machine
- gRPC (tonic) as the Raft transport layer (reuses Issue #46 infrastructure)
- Health endpoint served on `CLUSTER_HEALTH_ADDR` (default `0.0.0.0:8080`, path `/health/cluster`)
- Consul tags updated via Consul agent API on every Raft state transition
- `CLUSTER_NODE_ID` env var (or auto-derived from hostname) for stable node identity
- `CLUSTER_BIND_ADDR` for the Raft/gRPC listen address

## Full env var reference
```
CLUSTER_ENABLED=false                      # opt-in (default: false)
CLUSTER_BIND_ADDR=0.0.0.0:7000            # Raft + gRPC listen address
CLUSTER_HEALTH_ADDR=0.0.0.0:8080          # HTTP health check endpoint
CLUSTER_NODE_ID=node-us-central1          # stable node identity (or auto from hostname)
DISCOVERY_MODE=static                      # static | consul
CLUSTER_NODES=ip1:7000,ip2:7000           # peer list (static discovery)
CONSUL_ADDR=http://127.0.0.1:8500         # Consul address (consul discovery)
CONSUL_SERVICE_NAME=loadtest-cluster      # Consul service name (consul discovery)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 4] Leader election and coordination (Raft) #47

Cluster mode is opt-in

Node Discovery

Local / Dev — HashiCorp Consul DNS

Health check states

GCP / Production — Static peer list auto-join

Key behaviours

Implementation notes

Full env var reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tag	Meaning	DNS visible
`forming`	Node started, waiting to reach quorum	yes — `forming.loadtest-cluster.service.consul`
`follower`	In cluster, running as a Raft follower	yes — `follower.loadtest-cluster.service.consul`
`leader`	Elected Raft leader / coordinator	yes — `leader.loadtest-cluster.service.consul`

[Phase 4] Leader election and coordination (Raft) #47

Description

Cluster mode is opt-in

Node Discovery

Local / Dev — HashiCorp Consul DNS

Health check states

GCP / Production — Static peer list auto-join

Key behaviours

Implementation notes

Full env var reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions