Skip to content

Spec 024 — Scheduled checks + uptime calc + activity events#74

Merged
Copxer merged 2 commits into
mainfrom
spec/024-scheduled-checks-and-uptime
May 1, 2026
Merged

Spec 024 — Scheduled checks + uptime calc + activity events#74
Copxer merged 2 commits into
mainfrom
spec/024-scheduled-checks-and-uptime

Conversation

@Copxer
Copy link
Copy Markdown
Owner

@Copxer Copxer commented May 1, 2026

Closes #73

Spec: specs/phase-5-monitoring/024-scheduled-checks-and-uptime.md

Automation half of phase 5. Spec 023 shipped CRUD + manual probes; this spec automates them via Laravel's scheduler, calculates count-based uptime % over rolling windows, and emits incident / recovery activity events on healthy↔failed status transitions (which now broadcast in realtime to the right rail).

Summary

  • DispatchDueWebsiteChecksJob — every-minute scheduler dispatcher bound via Schedule::job(...)->everyMinute()->withoutOverlapping(). Soft-cap 500. Loads + filters in PHP (the last_checked_at + check_interval_seconds predicate is cross-DB awkward to express in raw SQL). Ordered last_checked_at ASC with nulls first → never-checked rows first, then oldest stale; prevents starvation on the high-id tail when total > cap.
  • RunWebsiteCheckJob — per-website async wrapper around spec-023's actions. tries=1 (a failed probe IS a recorded outcome).
  • GetWebsitePerformanceSummaryQuery — count-based uptime % over 24h / 7d / 30d windows + last_incident_at. Slow counts as up. Null on empty windows (Vue renders —%).
  • RecordWebsiteCheckAction extended with category-transition detection. Healthy ↔ Failed swings emit website.down / website.up activity events. Pending → Healthy first probe + steady-state runs are silent.
  • ActivityEventCreated::broadcastOn() extended to resolve the recipient channel for monitoring-source rows via metadata.website_id → website → project → owner_user_id (those rows have repository_id = null, which previously made the broadcaster silently drop realtime fan-out).
  • RecentActivityForUserQuery extended to surface monitoring events alongside repo events.
  • Show page renders a 4-tile uptime strip (24h / 7d / 30d / last incident) under the existing meta dl.

Test plan

  • vendor/bin/pint --test passes.
  • php artisan test — 27 net new passing tests across 4 new test files + 2 extended (probe action — actually unchanged, summary query, dispatcher, run job, record action transitions, activity query monitoring scope, controller show summary). Full suite 339 passed (was 312). 50 failures are env-CSRF baseline; CI passes them.
  • npm run build clean.
  • Manual smoke (post-merge): create a monitor for https://example.com/up with a 60s interval, confirm php artisan schedule:work triggers RunWebsiteCheckJob and the Show page's uptime % climbs as checks land. Stop the upstream → observe a website.down event flow into the right rail in realtime.

Self-review notes

Self-review pass via superpowers:code-reviewer flagged one material item:

Monitoring activity events do NOT broadcast in realtime. ActivityEventCreated::broadcastOn() returns [] when repository === null. Monitoring events always have repository_id = null, so every website.up/website.down is silently dropped at broadcast time.

Fixed by extending broadcastOn() with a website-scoped path: when source === 'monitoring', resolve channel via metadata.website_id → Website → project → owner_user_id. Also updated the spec body + action docblock to be honest about the broadcast plumbing.

Plus the recommended dispatcher starvation fix: orderBy('last_checked_at') + orderByRaw('last_checked_at IS NULL DESC') so when website count exceeds the soft cap the dispatcher prefers the never-checked / oldest-stale rows instead of stranding the high-id tail.

Deferred (skip / future):

  • JSON predicate type-coercion across drivers — works on SQLite (tests) and MySQL (prod) per Laravel docs; defensive cast not added.
  • Pending → Healthy first-probe silence — locked decision per spec; could surface as a "monitoring online" event in a future polish.
  • Aggregate caching — phase-1 row counts make the 6 count queries per Show page sub-millisecond; revisit when slow-query logs flag it.

Phase 5 status

# Spec Status
023 Website monitor MVP 🟢
024 Scheduled checks + uptime + activity events 🟢 (this PR)
025 Overview integration + Reverb live updates + perf charts ⬜ next

Waiting for "merge it" before squash-merging. Then spec 025 (last of phase 5) wires the Overview KPI card + perf charts on top of this data.

Copxer added 2 commits April 30, 2026 20:31
…vents

Automation half of phase 5. Builds on spec 023's actions: dispatcher
job picks due websites every minute, per-website job runs the probe
async, RecordWebsiteCheckAction emits incident / recovery activity
events on category transitions, GetWebsitePerformanceSummaryQuery
returns count-based uptime % over 24h / 7d / 30d windows.
Automation half of phase 5. Spec 023 shipped CRUD + manual probes;
this spec adds the scheduler-driven background probing, count-based
uptime % over rolling windows, and incident/recovery activity events
on healthy↔failed status transitions.

- DispatchDueWebsiteChecksJob: every-minute scheduler-bound dispatcher
  bound via Schedule::job(...)->everyMinute()->withoutOverlapping().
  Soft-cap 500 websites per tick. Loads + filters in PHP (the
  last_checked_at + check_interval_seconds predicate is cross-DB
  awkward to express in raw SQL). Ordered by last_checked_at ASC with
  nulls first so the oldest-stale rows always land in the cap window
  — never-checked first, then oldest stale; an orderBy('id') would
  silently strand the high-id tail when total > cap.
- RunWebsiteCheckJob: per-website async wrapper around spec-023's
  RunWebsiteProbeAction + RecordWebsiteCheckAction. tries=1 (a failed
  probe IS a recorded outcome; retries would double-record).
- GetWebsitePerformanceSummaryQuery: count-based uptime % over 24h /
  7d / 30d windows + last_incident_at. Slow counts as up. Null on
  empty windows (Vue renders as —%).
- RecordWebsiteCheckAction now takes CreateActivityEventAction in its
  constructor. Captures previous status before the update, classifies
  current + previous into healthy / failed / pending, emits an
  activity event ONLY on category swings:
    Healthy → Failed (or Pending → Failed) → website.down / danger
    Failed → Healthy                       → website.up / success
    Pending → Healthy                      → silent (uneventful first probe)
    Steady-state (Healthy→Healthy etc.)    → silent
- ActivityEventCreated::broadcastOn() extended: monitoring-source
  rows have repository_id = null, which previously made the
  broadcaster return [] and silently drop the realtime fan-out.
  Broadcaster now resolves channel via metadata.website_id → website
  → project → owner_user_id when source === 'monitoring', so the
  right rail receives website incidents / recoveries live.
- RecentActivityForUserQuery extended to surface monitoring events
  in the user's feed alongside repo events. Pulls user's website ids
  upfront and OR's the existing repo-scoped predicate with
  `source = 'monitoring' AND metadata->website_id IN (user's ids)`.
- Show page integration: WebsiteController::show injects the summary
  query, threads `summary` into the Inertia payload. Show.vue renders
  a 4-tile uptime stats strip (24h / 7d / 30d / last incident) below
  the existing meta dl.

Tests: 27 new passing tests across 4 new test files + 2 extended.
Full suite 339 passed (was 312); 0 regressions.

Self-review pass via superpowers:code-reviewer flagged the broadcast
no-op as material — the spec originally claimed "free realtime" for
monitoring events but the broadcaster's repo-only channel resolution
silently dropped them. Fixed by extending broadcastOn() with the
website-scoped path. Also addressed the recommended dispatcher
starvation fix (orderBy last_checked_at instead of id).
@Copxer Copxer merged commit 4ea9878 into main May 1, 2026
1 check passed
@Copxer Copxer deleted the spec/024-scheduled-checks-and-uptime branch May 1, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spec 024 — Scheduled checks + uptime calc + activity events

1 participant