fix(init): IPC socket probe for daemon liveness + polling spawn#20
Merged
Conversation
Three small fixes wired through one new `init::util` module: 1. Replace `pgrep -x muxad` with a unix-socket connect probe in both the pre-flight detection and the `--start-daemon` action's already-running short-circuit. The pgrep approach misled the wizard in the v0.4.0 incident — a stale muxad pid lingered with its socket gone, pgrep said "running", we skipped the spawn, and the next `muxa status` still failed with `daemon not reachable`. Socket-connect captures the only thing that actually matters — "is the daemon answering" — and a true cold-start errors in microseconds. 2. Replace the static 300 ms post-spawn sleep with bounded polling (3 s timeout, 20 ms interval). systemd's `enable --now` and launchd's `bootstrap` return as soon as the spawn is *initiated*, not when the child has bound its socket; 300 ms worked on hot hardware and silently raced on cold-cached / VM / CI runners, surfacing a misleading "muxad not responding" warning right after a successful spawn. The poll loop hits the first iteration on the hot path (<20 ms) and gives slow boots a generous grace window. 3. `locate_muxad` for the macOS launchd plist now also checks `/opt/homebrew/bin/muxad` and `/usr/local/bin/muxad` after the cargo bin fallback, so a brew-installed muxad lands at the correct path on first install. Internals: - New `crates/muxa-cli/src/init/util.rs` deduplicates the `uid_string()` helper that was forked across `detect.rs` and `files/launchd.rs`, and hosts the new `default_muxad_socket()`, `muxad_responsive()`, `wait_for_muxad()` primitives. - Six new unit tests cover the polling helper end-to-end against a real `UnixListener` — both the immediate-return path and the timeout path are asserted (with elapsed bounds to catch a regression where polling becomes a busy-loop or an over-long sleep). - detect.rs / launchd.rs / apply.rs each lost ~10 LOC from the duplicate `uid_string` and pgrep wrappers; net ~+150 LOC because of util.rs and tests. Workspace tests 365 → 375, all green. Clippy clean with `-D warnings`, rustfmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three small fixes wired through one new
init::utilmodule. Closes the v0.4.2 self-review's three highest-priority findings — same failure mode the v0.4.0 user originally hit.pgrep -x muxadwith a unix-socket connect probe. Used in both pre-flight detection ("muxad already running") and--start-daemon's already-up short-circuit. pgrep returns true for zombie processes whose listen socket has died — exactly the failure mode the v0.4.0 user hit. Socket-connect captures the only thing that matters: "is the daemon answering". A true cold-start errors out in microseconds anyway.enable --nowand launchd'sbootstrapreturn as soon as the spawn is initiated, not when the socket is bound; 300 ms worked on hot hardware and silently raced on slow VMs / CI / cold caches. Polling adapts: hot path returns in <20 ms, slow boots get the grace window.locate_muxad(macOS launchd plist) checks/opt/homebrew/bin/muxadand/usr/local/bin/muxadafter the cargo fallback, so a brew-installed muxad lands at the correct path on first install. Cargo path stays first.Internals
crates/muxa-cli/src/init/util.rshostsuid_string()(deduplicated fromdetect.rs+files/launchd.rs),default_muxad_socket(),muxad_responsive(),wait_for_muxad().UnixListener: connect-when-bound, refuse-when-absent, immediate-return path, timeout-elapsed path with elapsed bounds (catches busy-loop or over-long-sleep regressions).detect.rs/apply.rs/files/launchd.rseach lose ~10 LOC of duplicated wrappers; net delta ≈ +150 LOC because of util.rs + tests.Test plan
cargo test --workspace— 375 / 375 (was 365)cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --all -- --check— cleanRelease
Ship as v0.4.3 (patch — bugfix only).
🤖 Generated with Claude Code