Releases: anousss007/laravel-vigilance
v0.5.6 — multi-node supervisor/worker state fix
Multi-node fix from a distributed-deployment attack pass, plus an adversarial audit of the RUM symbolicator.
Fixed
- Multi-node fleets under-reported workers; supervisors clobbered each other. Supervisor/worker heartbeat rows were keyed by supervisor name only (
vigilance_supervisors.namewas even the primary key). Running the same supervisor on multiple servers — normal horizontal scaling — meant each node's heartbeat overwrote the others' row and each node's worker-set write deleted the other nodes' worker rows, so the dashboard showed one flapping node and a worker count well below the real fleet. State is now keyed by (name, host): every node keeps its own rows, the dashboard shows each node (with hostname) and true fleet totals, and prune/forgetact per-node. Node identity is configurable viasupervision.host/VIGILANCE_SUPERVISOR_HOST(default: machine hostname — set it for containers).
Schema note: vigilance_supervisors gains an id PK + unique(name, host); vigilance_workers is now unique(supervisor, host, pid). Existing installs run php artisan migrate:fresh (supervisor/worker rows are ephemeral heartbeats — nothing of value is lost).
Validated (no code change)
- RUM symbolicator attacked with 200 KB pathological stacks (no ReDoS — 0.1 ms), malformed source maps (bad JSON / bad VLQ) and high token counts: stays fast, degrades to unsymbolicated without crashing. Stacks capped (8 KB) and ≤5 errors/request before symbolication, atop the endpoint rate limit.
Full notes: CHANGELOG.md.
v0.5.5 — concurrency fix + adversarial validation
A second, harder adversarial pass — DDoS/flood amplification, a cardinality bomb, a failing-job storm, a job-dispatching endpoint under flood, multi-driver supervision and the public RUM endpoint — which surfaced and fixed one real concurrency bug.
Fixed
- Failure-group occurrence counts undercounted under concurrent failures. The per-group
occurrencescounter was a read-modify-write, so simultaneous workers recording the same failure signature (a failing-job storm) clobbered each other's increments (~10% loss measured across 2000 failures on 3 workers). Now uses a race-safecreateOrFirst()(on the uniquesignatureindex → no duplicate groups) plus an atomic SQL increment, so the count is exact under any concurrency.
Validated (no code change)
- No DDoS amplification — throughput/error-rate identical with Vigilance on or off under sustained flood; enabling it never introduced an error.
- Bounded cardinality — thousands of distinct URLs collapse to the route pattern in APM (1 key); random 404 floods write nothing (only matched routes are recorded). The aggregate tables can't be exploded.
- Job-dispatch storm — an endpoint enqueuing 10 jobs/request under flood captured every job exactly, no failed requests;
VIGILANCE_SAMPLE_RATEthrottles enqueue-write load. - Multiple queue drivers at once — one
vigilance:supervisedrained database + Redis + beanstalkd concurrently, correct per-driver attribution. - Public RUM endpoint — rate-limited (
rum.throttle, default 120/min) and capped to ≤12 metrics + ≤5 errors per request with length-bounded fields; safe to expose. - Extreme concurrency — graceful degradation + immediate recovery, no Vigilance-induced errors, ~one storage connection per worker.
Full notes: CHANGELOG.md.
v0.5.4 — production hardening across runtimes, queues & databases
A relentless production-readiness pass on real Linux infrastructure — every common web server and app runtime, all four supervisable queue drivers, server-class databases, storage-outage chaos and high concurrency — which surfaced and fixed four real issues.
Fixed
- Long-running daemons no longer dangle as stuck "running" command runs.
octane:start,reverb:start,pulse:work/pulse:checkare excluded from command capture via an unconditionalDefaults::daemonCommands()baseline (protects installs whose published config predates the list). - Redis queue names normalized (
queues:default→default) so per-queue grouping is consistent across drivers and matches the supervisor / queue-depth probe / config. - Batched jobs link to their batch —
batch_idis now recorded on batched runs. - Orphaned workers are reaped on supervisor boot — when the master is hard-killed (SIGKILL/OOM, or restarted under a non-cgroup manager like supervisord), its
queue:workchildren no longer pile up. Completes the cross-platform reap the#vigilancename marker always intended.
Validated (no code change)
- Web servers: Nginx, Apache (mod_proxy_fcgi), Caddy — all + PHP-FPM.
- Octane on every server: FrankenPHP, Swoole 6.2, OpenSwoole 26.2, RoadRunner 2025.1 — 800 req @ concurrency 16, 0 failed, constant per-request span count (no cross-request leakage).
- All four supervisable queue drivers: database, Redis, beanstalkd (1.13 + pheanstalk v8), each drained by the auto-scaling supervisor.
- supervisord + OPcache with
config:cache/route:cache/event:cache. - Never breaks the app when its storage is down: storage taken down mid-traffic → 100% of requests still served, queue still drained, capture resumed cleanly on recovery.
- Job lifecycles: retries, timeouts (captured as failures), batches, chains.
- Concurrency: 1200 req @ 24 against MySQL — no lost writes, aggregate counts exact, no deadlocks.
- Dashboard at scale: every page 200 / sub-300 ms on 60k runs / 100k entries / 22k traces.
- Fresh install:
vigilance:install,migrate,vigilance:doctor(green), dashboard — all clean. - Full suite green on PostgreSQL 18.4 and MySQL 8.4 (CI uses PG 16 + MariaDB 11.4).
Full notes: see CHANGELOG.md.
v0.5.3 — cross-database hardening (MySQL install fix)
Cross-database hardening. The test suite previously ran only on SQLite in CI; this release adds PostgreSQL 16 and MariaDB 11.4 to CI (the suite is now connection-configurable via VIGILANCE_TEST_DB), which surfaced and fixed real bugs the other engines hit.
Fixed
- 🔴 MySQL / MariaDB install was completely broken. The
vigilance_aggregatesunique index auto-named to 65 chars — over MySQL's 64-char identifier limit (error 1059) — so migrations failed outright. PostgreSQL silently truncates and SQLite has no limit, which is why SQLite-only CI never caught it. Index names are now explicit. - PostgreSQL: float → bigint.
wait_ms/duration_mswrote Carbon-3 float milliseconds intobigintcolumns (PG rejects; MySQL/SQLite truncate). Now cast to int. - Cross-driver LIKE filters. Silenced-jobs and name/message searches used
LIKEwith class names whose backslashes are escape chars on PG/MySQL (not SQLite), so they silently failed there — newLikehelper with an explicitESCAPEclause. - Queue-depth probe. A missing
jobstable threw, and on PostgreSQL a thrown query inside a transaction aborts it —QueueDepthnow checks the table exists first.
Verified
- Full suite green on SQLite, PostgreSQL 16, MariaDB 11.4 (234 tests each).
- Worker supervisor drained real queues on the database and redis drivers; failed jobs routed to
failed_jobs; all runs captured. - Octane state-reset hook flushes the in-flight trace on
RequestReceived(no cross-request leakage).
v0.5.2 — Boost guidelines: once-per-incident alerting
Docs-only patch. The Laravel Boost AI guidelines (core.blade.php) now document the once-per-incident alerting behaviour from v0.5.1 — so coding agents know a sustained condition alerts once and the rest lives on the dashboard. Matches the Boost skill, README, observability guide and config. No code change.
v0.5.1 — notify once per incident (no alert spam)
Fixed
- Alert spam on sustained conditions. A persistent problem (e.g. a breaching SLO) used to re-email you every throttle window — bad DX. With incident tracking on (the default), Vigilance now notifies once when the incident opens, and again only if its severity escalates or it resolves and later recurs. So a SLO that stays breached pages you once, not every 15 minutes.
- Set
alerts.renotify_minutes(VIGILANCE_ALERT_RENOTIFY_MINUTES) > 0 to get a reminder every N minutes while an incident stays open (default0= notify once). With incidents disabled, behaviour is unchanged (one notification perthrottle_minutes).
This applies to every rule (SLO burn, queue backlog, error rate, anomalies, deploy regressions, …).
v0.5.0 — release health, smarter alerting, RUM source maps
A proactive-monitoring release — Vigilance now gates deploys, finds the problems no fixed threshold would catch, and makes browser errors readable. Full guide: docs/observability.md.
Added
- Release health & deploy-regression guard (
/vigilance/releases). Each deploy marker gets a before/after verdict (error rate · latency · throughput); aregresseddeploy fires a criticaldeploy_regressionalert — point a webhook at it to auto-roll-back. Issues are tagged withfirst_release/regressed_release. Set the release viaVIGILANCE_RELEASE(orapp.version); record withphp artisan vigilance:deploy --release=…. - New-issue & regression alerting —
new_issuefires the first time an error signature appears;issue_regressionfires when a resolved issue comes back (with a "regressed" badge). Evaluated at snapshot time, never on the request thread. - Dynamic-baseline anomaly detection (
anomaly) — z-scores each watched metric (request latency, 5xx rate, exceptions by default) against its rolling baseline; guarded against false positives. - RUM source-map symbolication — a pure-PHP Source Map v3 decoder +
php artisan vigilance:sourcemaps <build-dir> --release=…. Minified browser stacks are symbolicated at ingest, so the Issues inbox shows original source locations. Togglerum.symbolicate. - Global
ignore_paths— one config list (wildcards like/admin/*or#regex#) excludes a path from APM, tracing, RUM and web-request error capture at once.
Notes
- 230 tests (+21) · axe 0 violations (desktop + mobile) on all new/changed pages · CI on PHP 8.2–8.4 · Laravel 12/13 · Livewire 3/4.
- Additive migrations (
regressed_at,first_release/regressed_releaseon failure groups;vigilance_sourcemapstable) —php artisan migrateafter upgrading.
v0.4.1 — Boost integration for the observability suite
A maintenance release that completes the v0.4.0 rollout.
Changed
- Laravel Boost integration updated for the full observability suite — the AI guidelines (
resources/boost/guidelines/core.blade.php) and thevigilance-developmentskill now cover Issues error tracking, per-route performance, RUM / Web Vitals, SLOs, custom business metrics, the trace-correlated log explorer, and the expanded alerting channels (Discord / Teams / webhooks) with incident tracking — includingVigilance::increment()/gauge()and@vigilanceRumsnippets. Coding agents (Claude Code, Cursor, Copilot, …) now generate correct code against the new features.
Added
RELEASING.md— a pre-release checklist covering code, version strings, changelog, user-facing docs, the Boost integration, accessibility, and the tag/release steps, so every release moves all surfaces forward together.
v0.4.0 — observability suite
A front-to-back observability release — seven new dashboard areas, each built to the same production-first posture as the rest of Vigilance (captured cheaply, flushed after the response, sampled and bounded). Full guide: docs/observability.md.
Added
- Unified Issues error tracker (
/vigilance/issues) — every exception across web, queue, command,Vigilance::report()and browser errors, fingerprinted into a grouped inbox with stacktrace, context, occurrence sparkline and an assign / prioritise / ack / mute / resolve workflow. - Per-route performance (
/vigilance/routes) — throughput, error rate, Apdex and exact p50/p95/p99 latency per route. - Real User Monitoring (
/vigilance/vitals) — Core Web Vitals (LCP/INP/CLS/FCP/TTFB) + JS errors from real visitors via the@vigilanceRumbeacon. Off by default (VIGILANCE_RUM). - SLOs & error budgets (
/vigilance/slos) — availability / latency objectives vs. an error budget, with a short-window burn-rate alert. - Alerting depth & incidents (
/vigilance/incidents) — Discord, Microsoft Teams and generic webhooks on top of mail / Slack; fired alerts persisted as incidents (auto-resolved) with occurrence counts and MTTR. - Custom business metrics (
/vigilance/custom-metrics) —Vigilance::increment()/gauge()→ auto-discovered counter & gauge cards with sparklines. - Trace-correlated log explorer (
/vigilance/logs) — searchable application logs correlated to the trace that emitted them. Off by default (VIGILANCE_LOGS).
Changed
- Tracing now also records
redis,mailandnotificationspans. - New
docs/observability.md; README + docs site updated. Every new page verified with axe-core — zero violations, desktop and mobile. CI onactions/checkout@v6.
Schema note: new columns/tables were folded into the base migration. If you ran a pre-0.4 dev build, run
php artisan migrate:freshto pick them up.
v0.3.0
Minor release.
Added
- Laravel Boost integration. Vigilance now ships first-class Laravel Boost support: AI guidelines (
resources/boost/guidelines/core.blade.php) and avigilance-developmentagent skill (resources/boost/skills/vigilance-development/SKILL.md). In any project running Boost,boost:install/boost:updateautomatically loads them, so coding agents (Claude Code, Cursor, Copilot, …) know Vigilance's conventions out of the box — dashboard authorization (viewVigilance), theDispatchable/ShouldNotBeMonitored/ShouldNotBeDispatchedManuallymarkers, the driver-agnostic worker supervisor,.envalert routing, APM and tracing — and generate correct code against the package. Verified against the installed Boost's package-discovery.
Full changelog: https://github.com/anousss007/laravel-vigilance/blob/main/CHANGELOG.md