feat(cloud): detect dead local daemon in cloud status and document launchd unit#337
Conversation
caf81ae to
ee6786b
Compare
|
Rebased onto current Two friendly asks whenever you have a moment:
Happy to iterate on review feedback. |
1b04869 to
b89d25a
Compare
|
Hi @Gentleman-Programming, friendly ping per the 7-day cadence noted in CONTRIBUTING.md — this PR has been open since 2026-05-05. Whenever you have a moment, the two pending things on your side are approving the gated workflow runs (first-time-contributor policy) and applying the |
fa651e0 to
c63a73c
Compare
…unchd unit `engram cloud status` now probes the local engram serve daemon at 127.0.0.1:7437 (respects ENGRAM_PORT) with a 1s timeout and prints a `Local daemon:` line so users can detect a silently dead autosync after brew upgrade engram, log out, or any binary replacement. Exit code is unchanged (informational) and the probe is only run when cloud is configured. DOCS.md "Running as a Service" gains a launchd (macOS) subsection with a KeepAlive plist template that survives brew upgrade by relaunching engram serve automatically. The Homebrew section in docs/INSTALLATION.md links to the new template so macOS users hit the supervisor guidance right after install. Closes Gentleman-Programming#279
c63a73c to
642fa00
Compare
Alan-TheGentleman
left a comment
There was a problem hiding this comment.
Reviewed against #279 in a fresh worktree. The daemon probe is read-only with a 1s timeout, the dead vs unreachable split is sound, and all four probe outcomes plus the suppression-when-not-configured path are tested. One non-blocking note: the probe resolves the port from ENGRAM_PORT only, so a daemon started via a positional engram serve 8080 would read as not running. Approving.
e840ea2
into
Gentleman-Programming:main
🔗 Linked Issue
Closes #279
🏷️ PR Type
type:bug— Bug fixtype:feature— New featuretype:docs— Documentation onlytype:refactor— Code refactoring (no behavior change)type:chore— Maintenance, dependencies, toolingtype:breaking-change— Breaking change📝 Summary
engram cloud statusnow probes the localengram servedaemon at127.0.0.1:7437(respectsENGRAM_PORT) with a 1s timeout and prints aLocal daemon: running | not running | unreachableline so users can detect a silently dead autosync afterbrew upgrade engramor any other binary replacement.DOCS.md "Running as a Service"so macOS users can superviseengram servethe same way Linux users do with systemd. WithKeepAlive=true, autosync now survivesbrew upgradeautomatically.📂 Changes
cmd/engram/cloud_daemon_probe.gocloudDaemonProbevariable function (1s timeoutGET /health), port resolution (ENGRAM_PORT→ 7437), andprintCloudStatusDaemonProbewriter with recovery hint when the daemon is down.cmd/engram/cloud_daemon_probe_test.gocmd/engram/cloud.gocmdCloudStatuscallsprintCloudStatusDaemonProbein each cloud-configured branch (token, token+insecure, no-token) before the existing sync diagnostic. No behavior change in the "not configured" branch.cmd/engram/main_extra_test.gostubRuntimeHooksnow stubscloudDaemonProbeso existing tests stay deterministic; newTestCmdCloudStatusEmitsLocalDaemonLineverifies the line is printed when configured (and suppressed when not).DOCS.mdUsing systemd→Using systemd (Linux). AddsUsing launchd (macOS)with full plist template (KeepAlive=true so brew upgrade does not break autosync), load/unload steps, and verification viaengram cloud status. Updates theengram cloud statusreference bullet to describe the newLocal daemon:line.docs/INSTALLATION.mdbrew upgrade.🧪 Test Plan
go test ./...(passes with the standard CI environment; my local env hadENGRAM_CLOUD_SERVERset which leaks into pre-existing tests — repro byunset ENGRAM_CLOUD_SERVER ENGRAM_CLOUD_TOKEN ENGRAM_CLOUD_INSECURE_NO_AUTH ENGRAM_CLOUD_AUTOSYNC ENGRAM_PORTbefore running, same isolation as CI).go test -tags e2e ./internal/server/....cloud config --server ...:Local daemon: not running on port 7777+ recovery hint mentioningengram serveand the launchd templateLocal daemon: running on port 7777ENGRAM_PORT=9000while daemon stays on 7777 → probe targets 9000 and reportsnot running on port 9000(env override honored)🤖 Automated Checks
These run automatically and all must pass before merge:
Closes #N/Fixes #N/Resolves #Nstatus:approvedlabeltype:*labelgo test ./...passesgo test -tags e2e ./internal/server/...passes✅ Contributor Checklist
Closes #279)type:*label to this PR (type:feature)go test ./...go test -tags e2e ./internal/server/...Co-Authored-Bytrailers in commits💬 Notes for Reviewers
homebrew-tap, so it is intentionally out of scope for this PR. The two in-repo mitigations (status probe + launchd template) close the gap on this side.not_running(TCP dial error to 127.0.0.1) fromunreachable(timeout / non-2xx / unexpected error) so the recovery hint only fires when restartingengram serveis the right action.var daemonProbeTimeoutso tests can shorten it; default in production stays at 1s.<HOME>placeholders because launchd does not expand$HOME/~inside plist values; the docs explicitly call this out.