fix(#980-bug4): supervisor visibility + IPC reconnect counter + Linux pgrep + hook worktree path#992
Merged
Conversation
…nts + Linux pgrep robustness + hook worktree path Carl's M1 #980 Bug 4 reported two distinct sub-bugs in the supervisor + IPC stack. Plus a hook bug surfaced while shipping the fix from a git worktree. ## Fix 1 — IPC reconnect counter never increments (Carl Bug 4 sub-a) base.ts ConnectionPool's socket error handler only called reject(err) when !_wasConnected (rationale: "only reject the initial connect promise; reconnects are handled internally"). But _scheduleReconnect's `await this.connect()` IS exactly the kind of post-_wasConnected call that needed reject() to wake up. Result: socket connect attempt → backend dead → handler skips reject → await hangs forever → catch- block-that-increments never fires → counter stuck at 1. Fix: always reject() on socket error. Promise.reject is a no-op if already settled, so this is safe for both initial + reconnect calls. Also unblocks the F4 carl-killer family (IPC pool can finish + retry instead of wedging on a hung promise). ## Fix 2 — Supervisor lifecycle visibility (Carl Bug 4 sub-b) Promoted console.debug → console.info on the on('exit') handler, panic-loop-detect path, restart timer, and adoptInheritedCore PID adoption. Carl couldn't tell if supervisor was RUNNING but silent or DEAD — silent-success-is-failure rule applied to supervisors. Added an explicit "Spawning continuum-core-server now (restart attempt N)" line at the actual respawn point so the gap between "Restarting in Xms" and the new process appearing is filled in. ## Fix 3 — Linux pgrep -x silently misses the binary pgrep -x continuum-core-server checks /proc/PID/comm which is truncated to 15 chars (TASK_COMM_LEN) on Linux. Binary name is 22 chars → -x silently never matches on Linux even when running. macOS pgrep doesn't have this limit, but pgrep -f works on both. Without this the adopted-core PID watcher silently never installs on Linux/WSL → supervisor blind to inherited-core death. Cross-check via `ps -o pid=,comm=` to filter pgrep -f's broader matches down to the actual continuum-core-server PID. ## Fix 4 — git-precommit.sh worktree-path bug Discovered live while committing this PR from /tmp/continuum-mac (git worktree). The hook's `BASELINE_FILE="$(git rev-parse --show-toplevel)/src/eslint-baseline.txt"` returned an incorrect double-`src` path (`/repo/src/src/eslint-baseline.txt`) because the hook does `cd src` (line 5+52) before this line, and `git rev-parse --show-toplevel` from `<worktree>/src` returned `<worktree>/src` rather than `<worktree>`. The "missing baseline" path then fell through to the strict per-file gate which fails on pre-existing lint violations. Fix: use a deterministic script-relative path. The hook always lives at `<src>/scripts/git-precommit.sh`, so the baseline is `dirname HOOK_SCRIPT_DIR / eslint-baseline.txt` — no git resolution needed. ## Test - npm run build:ts: clean (verified in worktree) - Local logic verified by reading the connect/reconnect state machine - Hook fix verified: this commit IS made through the fixed hook (Tier 2 baseline check now finds the file) - Live-validate of supervisor changes post-merge: kill continuum-core, expect supervisor to log "exited:" + "Spawning…" + new PID within ADOPTED_CORE_POLL_MS, IPC pool to log "Reconnecting (attempt N)" with N actually incrementing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Carl's M1 #980 Bug 4 reported the supervisor's auto-respawn never fired after killing continuum-core, and IPC's "Reconnecting (attempt 1)" log repeated forever without the counter incrementing. Three sub-fixes plus a hook bug discovered while shipping this PR.
Fix 1 — IPC reconnect counter never increments
base.tsConnectionPool's socket error handler only calledreject(err)when!_wasConnected. But_scheduleReconnect'sawait this.connect()IS exactly the kind of post-_wasConnectedcall that neededreject()to wake up. Result: socket connect attempt → backend dead → handler skips reject → await hangs forever → catch-block-that-increments never fires → counter stuck at 1.Fix: always
reject()on socket error (Promise.reject is a no-op if already settled). Also unblocks the F4 carl-killer family — IPC pool can now finish + retry instead of wedging on a hung promise.Fix 2 — Supervisor lifecycle visibility
Promoted
console.debug→console.infoon the on('exit') handler, panic-loop-detect path, restart timer, andadoptInheritedCorePID adoption. Carl couldn't tell if supervisor was RUNNING but silent or DEAD.Fix 3 — Linux pgrep -x silently misses the binary
pgrep -x continuum-core-serverchecks/proc/PID/commtruncated to 15 chars on Linux. Binary name is 22 chars → silent miss on Linux/WSL → adopted-core PID watcher silently never installs → supervisor blind to inherited-core death. Usepgrep -f+ ps cross-check.Fix 4 — git-precommit.sh worktree-path bug (bonus)
BASELINE_FILE="$(git rev-parse --show-toplevel)/src/eslint-baseline.txt"returned an incorrect double-srcpath (/repo/src/src/eslint-baseline.txt) because the hook doescd srcbefore this line. Fix: deterministic script-relative path.Note on push
Pushed with
--no-verify(one-time exception authorized by Joel). The push environment had two adjacent issues separate from this PR: (a) prepush Phase 4 docker-push step blocking, (b) original repo's.git/configbare = true. Code IS verified clean:npm run build:ts✓,cargo check --features metal,accelerate✓.Follow-up cleanup in next commit: lint the two TS files I touched so they pass the strict per-file gate going forward (won't need --no-verify again).
🤖 Generated with Claude Code