Skip to content

v0.5.5 — concurrency fix + adversarial validation

Choose a tag to compare

@anousss007 anousss007 released this 16 Jun 12:34
· 3 commits to main since this release

A second, harder adversarial pass — DDoS/flood amplification, a cardinality bomb, a failing-job storm, a job-dispatching endpoint under flood, multi-driver supervision and the public RUM endpoint — which surfaced and fixed one real concurrency bug.

Fixed

  • Failure-group occurrence counts undercounted under concurrent failures. The per-group occurrences counter was a read-modify-write, so simultaneous workers recording the same failure signature (a failing-job storm) clobbered each other's increments (~10% loss measured across 2000 failures on 3 workers). Now uses a race-safe createOrFirst() (on the unique signature index → no duplicate groups) plus an atomic SQL increment, so the count is exact under any concurrency.

Validated (no code change)

  • No DDoS amplification — throughput/error-rate identical with Vigilance on or off under sustained flood; enabling it never introduced an error.
  • Bounded cardinality — thousands of distinct URLs collapse to the route pattern in APM (1 key); random 404 floods write nothing (only matched routes are recorded). The aggregate tables can't be exploded.
  • Job-dispatch storm — an endpoint enqueuing 10 jobs/request under flood captured every job exactly, no failed requests; VIGILANCE_SAMPLE_RATE throttles enqueue-write load.
  • Multiple queue drivers at once — one vigilance:supervise drained database + Redis + beanstalkd concurrently, correct per-driver attribution.
  • Public RUM endpoint — rate-limited (rum.throttle, default 120/min) and capped to ≤12 metrics + ≤5 errors per request with length-bounded fields; safe to expose.
  • Extreme concurrency — graceful degradation + immediate recovery, no Vigilance-induced errors, ~one storage connection per worker.

Full notes: CHANGELOG.md.