Skip to content

fix(#980-bug4): supervisor visibility + IPC reconnect counter + Linux pgrep + hook worktree path#992

Merged
joelteply merged 1 commit into
canaryfrom
mac/980-bug4-supervisor-visibility
May 2, 2026
Merged

fix(#980-bug4): supervisor visibility + IPC reconnect counter + Linux pgrep + hook worktree path#992
joelteply merged 1 commit into
canaryfrom
mac/980-bug4-supervisor-visibility

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

What

Carl's M1 #980 Bug 4 reported the supervisor's auto-respawn never fired after killing continuum-core, and IPC's "Reconnecting (attempt 1)" log repeated forever without the counter incrementing. Three sub-fixes plus a hook bug discovered while shipping this PR.

Fix 1 — IPC reconnect counter never increments

base.ts ConnectionPool's socket error handler only called reject(err) when !_wasConnected. But _scheduleReconnect's await this.connect() IS exactly the kind of post-_wasConnected call that needed reject() to wake up. Result: socket connect attempt → backend dead → handler skips reject → await hangs forever → catch-block-that-increments never fires → counter stuck at 1.

Fix: always reject() on socket error (Promise.reject is a no-op if already settled). Also unblocks the F4 carl-killer family — IPC pool can now finish + retry instead of wedging on a hung promise.

Fix 2 — Supervisor lifecycle visibility

Promoted console.debugconsole.info on the on('exit') handler, panic-loop-detect path, restart timer, and adoptInheritedCore PID adoption. Carl couldn't tell if supervisor was RUNNING but silent or DEAD.

Fix 3 — Linux pgrep -x silently misses the binary

pgrep -x continuum-core-server checks /proc/PID/comm truncated to 15 chars on Linux. Binary name is 22 chars → silent miss on Linux/WSL → adopted-core PID watcher silently never installs → supervisor blind to inherited-core death. Use pgrep -f + ps cross-check.

Fix 4 — git-precommit.sh worktree-path bug (bonus)

BASELINE_FILE="$(git rev-parse --show-toplevel)/src/eslint-baseline.txt" returned an incorrect double-src path (/repo/src/src/eslint-baseline.txt) because the hook does cd src before this line. Fix: deterministic script-relative path.

Note on push

Pushed with --no-verify (one-time exception authorized by Joel). The push environment had two adjacent issues separate from this PR: (a) prepush Phase 4 docker-push step blocking, (b) original repo's .git/config bare = true. Code IS verified clean: npm run build:ts ✓, cargo check --features metal,accelerate ✓.

Follow-up cleanup in next commit: lint the two TS files I touched so they pass the strict per-file gate going forward (won't need --no-verify again).

🤖 Generated with Claude Code

…nts + Linux pgrep robustness + hook worktree path

Carl's M1 #980 Bug 4 reported two distinct sub-bugs in the supervisor
+ IPC stack. Plus a hook bug surfaced while shipping the fix from a
git worktree.

## Fix 1 — IPC reconnect counter never increments (Carl Bug 4 sub-a)

base.ts ConnectionPool's socket error handler only called reject(err)
when !_wasConnected (rationale: "only reject the initial connect
promise; reconnects are handled internally"). But _scheduleReconnect's
`await this.connect()` IS exactly the kind of post-_wasConnected call
that needed reject() to wake up. Result: socket connect attempt →
backend dead → handler skips reject → await hangs forever → catch-
block-that-increments never fires → counter stuck at 1.

Fix: always reject() on socket error. Promise.reject is a no-op if
already settled, so this is safe for both initial + reconnect calls.
Also unblocks the F4 carl-killer family (IPC pool can finish + retry
instead of wedging on a hung promise).

## Fix 2 — Supervisor lifecycle visibility (Carl Bug 4 sub-b)

Promoted console.debug → console.info on the on('exit') handler,
panic-loop-detect path, restart timer, and adoptInheritedCore PID
adoption. Carl couldn't tell if supervisor was RUNNING but silent or
DEAD — silent-success-is-failure rule applied to supervisors.

Added an explicit "Spawning continuum-core-server now (restart attempt
N)" line at the actual respawn point so the gap between "Restarting
in Xms" and the new process appearing is filled in.

## Fix 3 — Linux pgrep -x silently misses the binary

pgrep -x continuum-core-server checks /proc/PID/comm which is
truncated to 15 chars (TASK_COMM_LEN) on Linux. Binary name is 22
chars → -x silently never matches on Linux even when running. macOS
pgrep doesn't have this limit, but pgrep -f works on both. Without
this the adopted-core PID watcher silently never installs on
Linux/WSL → supervisor blind to inherited-core death.

Cross-check via `ps -o pid=,comm=` to filter pgrep -f's broader
matches down to the actual continuum-core-server PID.

## Fix 4 — git-precommit.sh worktree-path bug

Discovered live while committing this PR from /tmp/continuum-mac
(git worktree). The hook's `BASELINE_FILE="$(git rev-parse
--show-toplevel)/src/eslint-baseline.txt"` returned an incorrect
double-`src` path (`/repo/src/src/eslint-baseline.txt`) because the
hook does `cd src` (line 5+52) before this line, and `git rev-parse
--show-toplevel` from `<worktree>/src` returned `<worktree>/src`
rather than `<worktree>`. The "missing baseline" path then fell
through to the strict per-file gate which fails on pre-existing lint
violations.

Fix: use a deterministic script-relative path. The hook always lives
at `<src>/scripts/git-precommit.sh`, so the baseline is `dirname
HOOK_SCRIPT_DIR / eslint-baseline.txt` — no git resolution needed.

## Test

- npm run build:ts: clean (verified in worktree)
- Local logic verified by reading the connect/reconnect state machine
- Hook fix verified: this commit IS made through the fixed hook (Tier 2
  baseline check now finds the file)
- Live-validate of supervisor changes post-merge: kill continuum-core,
  expect supervisor to log "exited:" + "Spawning…" + new PID within
  ADOPTED_CORE_POLL_MS, IPC pool to log "Reconnecting (attempt N)"
  with N actually incrementing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 7b7fb1a into canary May 2, 2026
3 checks passed
@joelteply joelteply deleted the mac/980-bug4-supervisor-visibility branch May 2, 2026 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant