Spec 024 — Scheduled checks + uptime calc + activity events#74
Merged
Conversation
…vents Automation half of phase 5. Builds on spec 023's actions: dispatcher job picks due websites every minute, per-website job runs the probe async, RecordWebsiteCheckAction emits incident / recovery activity events on category transitions, GetWebsitePerformanceSummaryQuery returns count-based uptime % over 24h / 7d / 30d windows.
Automation half of phase 5. Spec 023 shipped CRUD + manual probes;
this spec adds the scheduler-driven background probing, count-based
uptime % over rolling windows, and incident/recovery activity events
on healthy↔failed status transitions.
- DispatchDueWebsiteChecksJob: every-minute scheduler-bound dispatcher
bound via Schedule::job(...)->everyMinute()->withoutOverlapping().
Soft-cap 500 websites per tick. Loads + filters in PHP (the
last_checked_at + check_interval_seconds predicate is cross-DB
awkward to express in raw SQL). Ordered by last_checked_at ASC with
nulls first so the oldest-stale rows always land in the cap window
— never-checked first, then oldest stale; an orderBy('id') would
silently strand the high-id tail when total > cap.
- RunWebsiteCheckJob: per-website async wrapper around spec-023's
RunWebsiteProbeAction + RecordWebsiteCheckAction. tries=1 (a failed
probe IS a recorded outcome; retries would double-record).
- GetWebsitePerformanceSummaryQuery: count-based uptime % over 24h /
7d / 30d windows + last_incident_at. Slow counts as up. Null on
empty windows (Vue renders as —%).
- RecordWebsiteCheckAction now takes CreateActivityEventAction in its
constructor. Captures previous status before the update, classifies
current + previous into healthy / failed / pending, emits an
activity event ONLY on category swings:
Healthy → Failed (or Pending → Failed) → website.down / danger
Failed → Healthy → website.up / success
Pending → Healthy → silent (uneventful first probe)
Steady-state (Healthy→Healthy etc.) → silent
- ActivityEventCreated::broadcastOn() extended: monitoring-source
rows have repository_id = null, which previously made the
broadcaster return [] and silently drop the realtime fan-out.
Broadcaster now resolves channel via metadata.website_id → website
→ project → owner_user_id when source === 'monitoring', so the
right rail receives website incidents / recoveries live.
- RecentActivityForUserQuery extended to surface monitoring events
in the user's feed alongside repo events. Pulls user's website ids
upfront and OR's the existing repo-scoped predicate with
`source = 'monitoring' AND metadata->website_id IN (user's ids)`.
- Show page integration: WebsiteController::show injects the summary
query, threads `summary` into the Inertia payload. Show.vue renders
a 4-tile uptime stats strip (24h / 7d / 30d / last incident) below
the existing meta dl.
Tests: 27 new passing tests across 4 new test files + 2 extended.
Full suite 339 passed (was 312); 0 regressions.
Self-review pass via superpowers:code-reviewer flagged the broadcast
no-op as material — the spec originally claimed "free realtime" for
monitoring events but the broadcaster's repo-only channel resolution
silently dropped them. Fixed by extending broadcastOn() with the
website-scoped path. Also addressed the recommended dispatcher
starvation fix (orderBy last_checked_at instead of id).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #73
Spec: specs/phase-5-monitoring/024-scheduled-checks-and-uptime.md
Automation half of phase 5. Spec 023 shipped CRUD + manual probes; this spec automates them via Laravel's scheduler, calculates count-based uptime % over rolling windows, and emits incident / recovery activity events on healthy↔failed status transitions (which now broadcast in realtime to the right rail).
Summary
DispatchDueWebsiteChecksJob— every-minute scheduler dispatcher bound viaSchedule::job(...)->everyMinute()->withoutOverlapping(). Soft-cap 500. Loads + filters in PHP (thelast_checked_at + check_interval_secondspredicate is cross-DB awkward to express in raw SQL). Orderedlast_checked_at ASCwith nulls first → never-checked rows first, then oldest stale; prevents starvation on the high-id tail when total > cap.RunWebsiteCheckJob— per-website async wrapper around spec-023's actions.tries=1(a failed probe IS a recorded outcome).GetWebsitePerformanceSummaryQuery— count-based uptime % over 24h / 7d / 30d windows +last_incident_at. Slow counts as up. Null on empty windows (Vue renders—%).RecordWebsiteCheckActionextended with category-transition detection. Healthy ↔ Failed swings emitwebsite.down/website.upactivity events. Pending → Healthy first probe + steady-state runs are silent.ActivityEventCreated::broadcastOn()extended to resolve the recipient channel for monitoring-source rows viametadata.website_id → website → project → owner_user_id(those rows haverepository_id = null, which previously made the broadcaster silently drop realtime fan-out).RecentActivityForUserQueryextended to surface monitoring events alongside repo events.Test plan
vendor/bin/pint --testpasses.php artisan test— 27 net new passing tests across 4 new test files + 2 extended (probe action — actually unchanged, summary query, dispatcher, run job, record action transitions, activity query monitoring scope, controller show summary). Full suite 339 passed (was 312). 50 failures are env-CSRF baseline; CI passes them.npm run buildclean.https://example.com/upwith a 60s interval, confirmphp artisan schedule:worktriggersRunWebsiteCheckJoband the Show page's uptime % climbs as checks land. Stop the upstream → observe awebsite.downevent flow into the right rail in realtime.Self-review notes
Self-review pass via
superpowers:code-reviewerflagged one material item:Fixed by extending
broadcastOn()with a website-scoped path: whensource === 'monitoring', resolve channel viametadata.website_id → Website → project → owner_user_id. Also updated the spec body + action docblock to be honest about the broadcast plumbing.Plus the recommended dispatcher starvation fix:
orderBy('last_checked_at')+orderByRaw('last_checked_at IS NULL DESC')so when website count exceeds the soft cap the dispatcher prefers the never-checked / oldest-stale rows instead of stranding the high-id tail.Deferred (skip / future):
Pending → Healthyfirst-probe silence — locked decision per spec; could surface as a "monitoring online" event in a future polish.Phase 5 status
Waiting for "merge it" before squash-merging. Then spec 025 (last of phase 5) wires the Overview KPI card + perf charts on top of this data.