Skip to content

v0.5.6 — multi-node supervisor/worker state fix

Choose a tag to compare

@anousss007 anousss007 released this 16 Jun 13:19
· 2 commits to main since this release

Multi-node fix from a distributed-deployment attack pass, plus an adversarial audit of the RUM symbolicator.

Fixed

  • Multi-node fleets under-reported workers; supervisors clobbered each other. Supervisor/worker heartbeat rows were keyed by supervisor name only (vigilance_supervisors.name was even the primary key). Running the same supervisor on multiple servers — normal horizontal scaling — meant each node's heartbeat overwrote the others' row and each node's worker-set write deleted the other nodes' worker rows, so the dashboard showed one flapping node and a worker count well below the real fleet. State is now keyed by (name, host): every node keeps its own rows, the dashboard shows each node (with hostname) and true fleet totals, and prune/forget act per-node. Node identity is configurable via supervision.host / VIGILANCE_SUPERVISOR_HOST (default: machine hostname — set it for containers).

Schema note: vigilance_supervisors gains an id PK + unique(name, host); vigilance_workers is now unique(supervisor, host, pid). Existing installs run php artisan migrate:fresh (supervisor/worker rows are ephemeral heartbeats — nothing of value is lost).

Validated (no code change)

  • RUM symbolicator attacked with 200 KB pathological stacks (no ReDoS — 0.1 ms), malformed source maps (bad JSON / bad VLQ) and high token counts: stays fast, degrades to unsymbolicated without crashing. Stacks capped (8 KB) and ≤5 errors/request before symbolication, atop the endpoint rate limit.

Full notes: CHANGELOG.md.