Skip to content

Phase 1039: Background monitoring with email notifications#170

Open
HanSur94 wants to merge 14 commits into
mainfrom
claude/fervent-pasteur-d7f590
Open

Phase 1039: Background monitoring with email notifications#170
HanSur94 wants to merge 14 commits into
mainfrom
claude/fervent-pasteur-d7f590

Conversation

@HanSur94
Copy link
Copy Markdown
Owner

Summary

Phase 1039: Background monitoring with email notifications

Makes unattended, email-alerting threshold monitoring a first-class workflow. Wires NotificationService into LiveEventPipeline as a constructor NV-pair, fixes a latent bug where notification snapshots received empty sensor data, and adds a headless runner so monitoring can run under launchd/systemd/cron via matlab -batch.

Verification: ✓ passed (7/7 must-haves). The sensorData regression test runs green on both Octave and MATLAB through the real runCycle path; the runner lifecycle test passes 5/5 on MATLAB.

Changes

Core wiring (libs/EventDetection/LiveEventPipeline.m)

  • New 'NotificationService' constructor NV-pair (default [], replacing the old auto-created dry-run instance). Public property stays assignable post-construction for back-compat.
  • Bug fix: runCycle previously called notify(ev, struct()), so event snapshots had no data. New private helper sensorDataForEvent_(ev) resolves real per-event data from MonitorTargets(ev.SensorName).Parent.getXY(), sliced to the event window with the matching rule's ContextHours padding, producing the .X/.Y/.thresholdValue/.thresholdDirection contract that generateEventSnapshot expects.

Headless runner (libs/EventDetection/runBackgroundMonitoring.m, new)

  • runBackgroundMonitoring(setupFcn, 'PollSec', N, 'MaxRuntimeSec', M) — takes a setup function handle returning a configured pipeline, starts it, prints grep-friendly [BG] HH:MM:SS events=N emails=M uptime=Ts heartbeats, and blocks until the runtime cap, an error, or Ctrl-C.
  • onCleanup-guaranteed pipeline.stop() on every exit path; namespaced error IDs (EventDetection:invalidSetupFcn, invalidOption, setupFcnFailed, setupFcnBadReturn); returns the pipeline handle on exit.

Demo + operator docs (examples/05-events/)

  • example_background_email_monitor_setup.m — top-level function returning a 2-sensor pipeline; env-var-driven SMTP (FASTSENSE_SMTP_SERVER), defaults to dry-run when unset.
  • example_background_email_monitor.m — thin wrapper invoking the runner.
  • README_background_email.md — launchd/systemd/cron snippets, SMTP config, dry-run↔real-email toggle, security notes.

Latent library bug fixes (found while building the demo)

  • NotificationRule.fillTemplate + generateEventSnapshot: harden against open events (NaN EndTime) — datestr(NaN) / xlim([NaN NaN]) previously crashed the alert/snapshot path. Defensive guards only; closed-event path unchanged (test_notification_rule.m 5/5).

Tests (tests/)

  • test_live_event_pipeline_notif_sensor_data.m — proves sensorData is populated through the real runCycle (regression guard for the struct() bug). 2/2.
  • test_run_background_monitoring.m — runner lifecycle + error-ID paths. 5/5 on MATLAB (3/3 error-ID on Octave).
  • CaptureNotificationService.m — test double capturing notify(event, sensorData) args.

Verification

  • Automated: sensorData regression test green on Octave + MATLAB; runner test 5/5 on MATLAB; existing test_live_event_pipeline_tag (3/3) and test_notification_service (7/7) regression-clean.
  • Human: live timer-driven loop is MATLAB-only (Octave lacks timer).
  • Human: real email + PNG attachment delivery needs an SMTP relay to smoke-test end-to-end.

Notes

  • Cluster mode: no new gating — Phase 1032's single-source guarantee already de-duplicates events, so exactly one notify() fires per violation.
  • Known cosmetic issue (follow-up): the runner's [BG] ...exit: status=... line reads status before onCleanup stops the pipeline, so it logs running; the pipeline always stops correctly (final state stopped). Tracked separately.
  • Planning artifacts under .planning/phases/1039-* are included per the repo's .planning-tracking convention (chore: track GSD planning docs (un-ignore .planning + docs/superpowers) #168). STATE.md intentionally left at main's canonical version.

🤖 Generated with Claude Code

HanSur94 and others added 10 commits May 29, 2026 20:30
- Add defaults.NotificationService = [] (D-01): explicit NV-pair, default empty
- Replace auto-created NotificationService('DryRun', true) with opts.NotificationService
- Public NotificationService property stays assignable post-construction (back-compat)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Resolves per-event sensor data from MonitorTargets(ev.SensorName).Parent.getXY()
- Produces .X/.Y/.thresholdValue/.thresholdDirection (generateEventSnapshot contract)
- Slices window [evStart - ctxHours/24 - pad, evEnd + pad]; ctxHours/padFrac from best-matching NotificationRule (fallback 2.0h / 0.1)
- Open events (EndTime=NaN) use X(end) as evEnd
- Defensive: missing key / no Parent / no getXY -> empty X/Y + warning, no throw

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Replace notify(ev, struct()) with sd = sensorDataForEvent_(ev); notify(ev, sd)
- Fixes empty snapshots: generateEventSnapshot reads sensorData.X/.Y which were empty
- try/catch wrapper preserved; ~isempty(NotificationService) guard unchanged
- Verified end-to-end: notify receives 211-pt populated sensorData covering the event window

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New libs/EventDetection/runBackgroundMonitoring.m: pipeline = runBackgroundMonitoring(setupFcn, varargin)
- NV-pairs 'PollSec' (default 60, >=1) and 'MaxRuntimeSec' (default 0=infinite, >=0) via parseOpts
- Validates setupFcn type + return shape; error IDs invalidSetupFcn / invalidOption / setupFcnFailed / setupFcnBadReturn
- pipeline.start() then [BG] heartbeat loop ([BG] HH:MM:SS  events=N  emails=M  uptime=Ts), onCleanup(safeStop_) stops on every exit path
- Catch block is a single fprintf fall-through (no dead-code conditional, no catenate:dimensionMismatch)
- Octave-portability fixes in validation + safeStop_ (ismethod cell-array/non-object, isvalid not in Octave) so CI Octave path works

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Open events (still in violation) carry EndTime=NaN, which made two
notification-path functions throw and abort the notify/snapshot for
every open-event alert:

- NotificationRule.fillTemplate: datestr(NaN) threw ("Date number out
  of range" on MATLAB / "monthlength(nan)" on Octave). Now {endTime}
  renders as (open) and NaN {duration} as (ongoing) via the new private
  static formatTimeOrOpen_; closed-event formatting is unchanged.
- generateEventSnapshot: NaN EndTime propagated to evDur/padAmount/xMin/
  xMax, so xlim() threw ("Limits must be a 2-element vector of
  increasing numeric values"). Now clamps open-event end to the last
  sample (mirrors LiveEventPipeline.sensorDataForEvent_) and guards the
  xlim call against degenerate/non-increasing windows.

Surfaced by the Phase 1039 background-email demo: a fast-firing pipeline
produces open events at notify time. Required for the demo to run
warning-free and for live/background email alerts on open events.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pper

- example_background_email_monitor_setup.m: TOP-LEVEL function file
  returning a configured LiveEventPipeline (2 sensors + 1 catch-all
  NotificationRule). DryRun toggles off only when FASTSENSE_SMTP_SERVER
  is set. Wired via the 'NotificationService' NV-pair (Plan 01). A
  standalone function file so @example_background_email_monitor_setup
  resolves from matlab -batch supervisor invocations.
- example_background_email_monitor.m: thin wrapper SCRIPT (no embedded
  setup-function def) that bootstraps install.m and calls
  runBackgroundMonitoring(@example_background_email_monitor_setup,
  'PollSec', 2, 'MaxRuntimeSec', 8) for a bounded demo run.

Each MonitorTag is bound to a shared EventStore so the pipeline can
harvest per-tick event deltas (proven Tag-path pattern from
tests/test_live_event_pipeline_tag.m); without it zero events are
harvested and the notify path never fires.

Verified on MATLAB R2025b: demo runs end-to-end in ~8.7s, harvests 36
events, fires 36 dry-run notifications, generates snapshot PNGs, and
exits cleanly (Status=stopped). which/@-handle resolution confirmed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
examples/05-events/README_background_email.md documents unattended
LiveEventPipeline operation:
- matlab -batch invocation + bounded demo quick-start
- launchd (.plist), systemd (.service), and cron supervisor snippets,
  each invoking runBackgroundMonitoring(@example_background_email_monitor_setup, ...)
  as a function handle (valid now that the setup is a top-level file)
- SMTP config via env vars (FASTSENSE_SMTP_SERVER / FROM_ADDR / RECIPIENT)
  + optional setpref('Internet', ...) auth, with a never-commit-secrets warning
- dry-run <-> real-email toggle table, heartbeat grep/awk recipe,
  multi-Companion notification note, and a troubleshooting section

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- tests/CaptureNotificationService.m: NotificationService subclass that
  stashes notify() (event, sensorData) args instead of emailing
- tests/test_live_event_pipeline_notif_sensor_data.m: 2 sub-tests proving
  runCycle passes populated sensorData (.X/.Y/.thresholdValue/
  .thresholdDirection), guarding Plan 01's struct() bug fix
- exercises the real runCycle notify path end-to-end (not sensorDataForEvent_
  in isolation); reuses MakePhase1009Fixtures.makeEventStoreTmp()

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- tests/test_run_background_monitoring.m: 5 sub-tests for Plan 02's headless runner
  - lifecycle: MaxRuntimeSec=2 returns in [2,5)s with pipeline.Status=='stopped'
  - error IDs: invalidSetupFcn, setupFcnBadReturn, invalidOption
- lifecycle sub-tests verified 5/5 under MATLAB R2025b (timer-backed start());
  3 error-ID sub-tests also pass under Octave (timer not reached)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the Phase 1039 progress row + detail section to ROADMAP.md and the
phase's planning artifacts (CONTEXT, PLANs, SUMMARYs, VERIFICATION) now
that main tracks .planning/ (#168). STATE.md left to main's canonical
version (ephemeral session tracking).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 73.13433% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
libs/EventDetection/runBackgroundMonitoring.m 68.75% 20 Missing ⚠️
libs/EventDetection/LiveEventPipeline.m 80.95% 8 Missing ⚠️
libs/EventDetection/generateEventSnapshot.m 27.27% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

HanSur94 and others added 3 commits May 29, 2026 20:49
… stop, fig leak, demo store)

Code review of PR #170 surfaced 4 issues; all fixed:

- [M] NotificationRule.fillTemplate: {peak}/{mean}/{rms}/{std} expanded to
  empty strings for open events (sprintf('%.4g', []) -> ''), rendering
  'Peak: , Mean: ' in alerts. Added formatStatOrOpen_ guard -> '(ongoing)'.
  Completes this phase's open-event handling (time/duration were already
  guarded; stats were missed).
- [M] runBackgroundMonitoring.safeStop_: only stopped on Status=='running',
  so the error-exit path (timerError sets 'error' without stopping the
  timer) leaked the timer handle. Now stops on {'running','error'}.
- [L] generateEventSnapshot.renderSnapshot: figure closed via onCleanup so a
  throwing print() can't leak invisible figures in the long-running runner.
- [L] example setup: reuse the single EventStore handle (pass EventFile='',
  assign pipeline.EventStore) instead of opening a second store on the same
  path — correct template for real deployments.

Verified: test_notification_rule, test_notification_service,
test_live_event_pipeline_notif_sensor_data (2/2), test_run_background_monitoring
(5/5) all pass; open-event template renders '(ongoing)'/'(open)' not blanks;
MISS_HIT lint+style clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The [BG] exit line read pipeline.Status before onCleanup stopped the
pipeline, so it always logged status=running even though the pipeline
stopped correctly right after. Call safeStop_ explicitly before the exit
log (idempotent; onCleanup remains a safety net) so the line reports the
true final status. Verified: exit line now reads status=stopped;
test_run_background_monitoring 5/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov flagged ~0% patch coverage on runBackgroundMonitoring.m and the
LiveEventPipeline sensorData path. Root cause: run_all_tests' MATLAB path
runs TestSuite.fromFolder(tests/suite) only — the Phase 1039 regression
tests are function-based (tests/test_*.m) and execute on the Octave path,
so the runner's MATLAB-only (timer) behavior had zero MATLAB-measured
coverage. This class-based suite mirrors those tests so the coverage job
exercises the new code: runner lifecycle + 3 error-ID branches, the
runCycle sensorData fix, and the NotificationRule open-event guards.

9/9 pass under matlab.unittest (R2025b).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@HanSur94
Copy link
Copy Markdown
Owner Author

Addressed CI feedback

Codecov patch coverage (was 8.9%) — fixed in 1114b1ee.
Root cause wasn't untested code — it's that run_all_tests.m's MATLAB path runs TestSuite.fromFolder(tests/suite) only, so the Phase 1039 regression tests (function-based tests/test_*.m) execute on the Octave path and contributed zero MATLAB-measured coverage. Since the runner's live loop is MATLAB-only (Octave lacks timer), its MATLAB behavior had no coverage at all.

Added tests/suite/TestBackgroundEmailMonitoring.m — a class-based matlab.unittest suite the coverage job runs — exercising the previously-uncovered lines:

  • runBackgroundMonitoring lifecycle (timer-driven MaxRuntimeSec exit) + all 3 input-validation error-ID branches,
  • LiveEventPipeline.sensorDataForEvent_ + the runCycle notify path,
  • NotificationRule open-event guards ({peak}(ongoing)).

9/9 pass under matlab.unittest (R2025b); MISS_HIT lint+style clean.

"MATLAB Example Smoke Tests" failure — pre-existing on main, not from this PR.
The failing example is example_dock (25/26 pass). It references none of this PR's files, and this PR touches none of its dependencies. The same examples.yml job is already failing on main at the exact base commit this branch rebased onto (644ab141). Out of scope for Phase 1039 — flagging for a separate fix rather than bundling an unrelated docking fix here.

🤖 Generated with Claude Code

…mer)

test_run_background_monitoring ran its two lifecycle sub-tests
unconditionally; on Octave pipeline.start() -> timer errors ('timer'
undefined), so the Octave Tests CI job failed on this new test (on top of
main's pre-existing test_mex_parity failure). Gate the timer tests behind
exist('OCTAVE_VERSION','builtin') so Octave runs only the 3 input-validation
error-ID tests (which throw before any timer is created); MATLAB still runs
all 5. Verified locally: Octave 3/3 + SKIP, MATLAB 5/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant