Skip to content

refactor(alertd): retire the YAML alert engine and standalone CLI#441

Open
passcod wants to merge 2 commits into
phase2-migrate-alertsfrom
phase3-retire-alert-engine
Open

refactor(alertd): retire the YAML alert engine and standalone CLI#441
passcod wants to merge 2 commits into
phase2-migrate-alertsfrom
phase3-retire-alert-engine

Conversation

@passcod
Copy link
Copy Markdown
Member

@passcod passcod commented May 30, 2026

🤖 Phase 3 of consolidating monitoring (TODO #10, plan in docs/plans/healthchecks-into-alertd.md). Stacked on #440.

With the YAML alerts now covered by healthchecks (#440) and canopy owning alerting, this retires the YAML alert engine and the standalone CLI, leaving bestool-alertd as a thin daemon that runs background tasks (the doctor sweep) on a schedule, posts to canopy, and serves task/status/health/metrics HTTP endpoints.

Removed from bestool-alertd: the alert engine (alert, loader, glob_resolver, events, targets, templates, state_file, scheduler), the daemon-control commands, and the standalone binary (main.rs, the [[bin]], the cli feature) plus its generated usage docs and the ALERTS.md/TARGETS.md format docs. daemon.rs drops file-watching, glob re-resolution, state persistence, the DB-down alert event (DB health is covered by the db_connect/db_version checks now), and all reload paths — it keeps the canopy client + cert renewal, background-task scheduling, the HTTP server, shutdown handling, and the watchdog. DaemonConfig loses its alert-only fields; http_server keeps /, /status, /health, /metrics, /tasks/*. InternalContext moved to a slim context.rs. A pile of now-unused deps (serde_yaml, tera, mailgun-rs, notify, walkdir, glob, blake3, clap, …) are dropped.

bestool tamanu alertd keeps run and the Windows service subcommands (the daemon still runs as a Windows service to perform the sweep on Windows hosts — windows_service.rs is retained) and drops the alert-management subcommands and YAML-dir/email/server-kind config. The legacy bestool tamanu alerts command and its fixtures are removed, along with the now-dead release-alertd.yml workflow that built the deleted binary.

bestool tamanu doctor is unchanged for users — it still fetches /tasks/doctor/latest and /tasks/doctor/recompute from the daemon.

Verified: workspace build + clippy + fmt clean, alertd/bestool tests pass against the local central DB, and the Windows GNU cross-build is clean (the service code compiles against the trimmed DaemonConfig).

A follow-up phase reviews triggering thresholds across all checks (migrated land at FAIL).

passcod and others added 2 commits May 30, 2026 18:23
The YAML alert command, its definition/template/target parsing, the
PostgreSQL-to-JSON helper it relied on, and its trycmd fixtures (plus the
now-orphaned Postgres test fixture) are removed. The alerts engine is being
retired in favour of the doctor healthcheck sweep.

Co-authored-by: Claude <noreply@anthropic.com>
Removes the YAML alert subsystem from bestool-alertd, leaving a thin daemon
that runs background tasks (the doctor sweep) on a schedule, posts to canopy,
and serves task/status/health/metrics over HTTP. Canopy now owns alerting;
the daemon no longer loads YAML alerts or sends email/Slack.

Removed: the standalone binary (main.rs, the [[bin]], the cli feature),
alert.rs, loader.rs, glob_resolver.rs, events.rs, targets, templates.rs,
state_file.rs, scheduler.rs, commands, the alert HTTP endpoints, and the
now-unused deps. InternalContext moves to a slim context.rs; DaemonConfig
drops alert_globs, email, server_kind, and dry_run. The Windows service is
kept and now runs the daemon via run_with_shutdown.

bestool tamanu alertd loses the status/reload/loaded-alerts/pause/validate
passthroughs and the alert-dir/email/server-kind plumbing; it always registers
the doctor sweep. The obsolete release-alertd workflow is removed.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant