Skip to content

fix(init): IPC socket probe for daemon liveness + polling spawn#20

Merged
jiunbae merged 1 commit into
mainfrom
fix/init-ipc-liveness
Apr 30, 2026
Merged

fix(init): IPC socket probe for daemon liveness + polling spawn#20
jiunbae merged 1 commit into
mainfrom
fix/init-ipc-liveness

Conversation

@jiunbae
Copy link
Copy Markdown
Member

@jiunbae jiunbae commented Apr 30, 2026

Summary

Three small fixes wired through one new init::util module. Closes the v0.4.2 self-review's three highest-priority findings — same failure mode the v0.4.0 user originally hit.

  1. Replace pgrep -x muxad with a unix-socket connect probe. Used in both pre-flight detection ("muxad already running") and --start-daemon's already-up short-circuit. pgrep returns true for zombie processes whose listen socket has died — exactly the failure mode the v0.4.0 user hit. Socket-connect captures the only thing that matters: "is the daemon answering". A true cold-start errors out in microseconds anyway.
  2. Replace the static 300 ms post-spawn sleep with bounded polling (3 s timeout, 20 ms interval). systemd's enable --now and launchd's bootstrap return as soon as the spawn is initiated, not when the socket is bound; 300 ms worked on hot hardware and silently raced on slow VMs / CI / cold caches. Polling adapts: hot path returns in <20 ms, slow boots get the grace window.
  3. locate_muxad (macOS launchd plist) checks /opt/homebrew/bin/muxad and /usr/local/bin/muxad after the cargo fallback, so a brew-installed muxad lands at the correct path on first install. Cargo path stays first.

Internals

  • New crates/muxa-cli/src/init/util.rs hosts uid_string() (deduplicated from detect.rs + files/launchd.rs), default_muxad_socket(), muxad_responsive(), wait_for_muxad().
  • Six new unit tests against a real UnixListener: connect-when-bound, refuse-when-absent, immediate-return path, timeout-elapsed path with elapsed bounds (catches busy-loop or over-long-sleep regressions).
  • detect.rs / apply.rs / files/launchd.rs each lose ~10 LOC of duplicated wrappers; net delta ≈ +150 LOC because of util.rs + tests.

Test plan

  • cargo test --workspace — 375 / 375 (was 365)
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --all -- --check — clean
  • Local dry-run: pre-flight ✔ shows "muxad already running" via socket probe (was via pgrep)
  • macOS smoke: stale muxad pid + dead socket → wizard correctly re-spawns instead of skipping
  • Slow / cold runner: spawn + poll path completes within 3 s, no false "muxad not responding" warning

Release

Ship as v0.4.3 (patch — bugfix only).

🤖 Generated with Claude Code

Three small fixes wired through one new `init::util` module:

1. Replace `pgrep -x muxad` with a unix-socket connect probe in
   both the pre-flight detection and the `--start-daemon` action's
   already-running short-circuit. The pgrep approach misled the
   wizard in the v0.4.0 incident — a stale muxad pid lingered with
   its socket gone, pgrep said "running", we skipped the spawn,
   and the next `muxa status` still failed with
   `daemon not reachable`. Socket-connect captures the only thing
   that actually matters — "is the daemon answering" — and a true
   cold-start errors in microseconds.

2. Replace the static 300 ms post-spawn sleep with bounded polling
   (3 s timeout, 20 ms interval). systemd's `enable --now` and
   launchd's `bootstrap` return as soon as the spawn is *initiated*,
   not when the child has bound its socket; 300 ms worked on hot
   hardware and silently raced on cold-cached / VM / CI runners,
   surfacing a misleading "muxad not responding" warning right
   after a successful spawn. The poll loop hits the first iteration
   on the hot path (<20 ms) and gives slow boots a generous grace
   window.

3. `locate_muxad` for the macOS launchd plist now also checks
   `/opt/homebrew/bin/muxad` and `/usr/local/bin/muxad` after the
   cargo bin fallback, so a brew-installed muxad lands at the
   correct path on first install.

Internals:

- New `crates/muxa-cli/src/init/util.rs` deduplicates the
  `uid_string()` helper that was forked across `detect.rs` and
  `files/launchd.rs`, and hosts the new `default_muxad_socket()`,
  `muxad_responsive()`, `wait_for_muxad()` primitives.
- Six new unit tests cover the polling helper end-to-end against a
  real `UnixListener` — both the immediate-return path and the
  timeout path are asserted (with elapsed bounds to catch a
  regression where polling becomes a busy-loop or an over-long
  sleep).
- detect.rs / launchd.rs / apply.rs each lost ~10 LOC from the
  duplicate `uid_string` and pgrep wrappers; net ~+150 LOC because
  of util.rs and tests.

Workspace tests 365 → 375, all green. Clippy clean with `-D warnings`,
rustfmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jiunbae jiunbae merged commit cecc1a5 into main Apr 30, 2026
6 checks passed
@jiunbae jiunbae deleted the fix/init-ipc-liveness branch April 30, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant