Skip to content

Fix shellper processes dying when Tower restarts (#324)#340

Merged
waleedkadous merged 5 commits into
mainfrom
builder/bugfix-324-shellper-processes-do-not-surv
Feb 16, 2026
Merged

Fix shellper processes dying when Tower restarts (#324)#340
waleedkadous merged 5 commits into
mainfrom
builder/bugfix-324-shellper-processes-do-not-surv

Conversation

@waleedkadous
Copy link
Copy Markdown
Contributor

Summary

  • Redirect shellper stderr to a log file instead of a pipe, eliminating the parent→child pipe dependency that caused shellper crashes on Tower exit
  • Add defensive error handlers on process.stdout/process.stderr in shellper-main to swallow EPIPE/EBADF errors
  • Add regression test verifying shellper survives parent exit with both fix approaches

Root Cause

Shellper was spawned with stdio: ['ignore', 'pipe', 'pipe']. The stderr pipe connected shellper fd 2 to Tower. When Tower exited via process.exit(0), the pipe broke. On macOS, process.stderr connected to a pipe uses async writes — the resulting EPIPE errors surfaced as unhandled 'error' events on the stream, crashing the shellper process.

Changes

session-manager.ts:

  • Changed shellper spawn to use a file FD for stderr (socketPath.replace('.sock', '.log')) instead of 'pipe'
  • Added cleanup of .log companion files in unlinkSocketIfExists and cleanupStaleSockets
  • Added explicit exit log message (previously bundled inside logStderrTail)

shellper-main.ts:

  • Added stream.on('error', () => {}) handlers on process.stdout and process.stderr at startup as a defensive measure

shellper-survive-parent-exit.test.ts (new):

  • Test 1: Spawns shellper with file-based stderr, verifies it stays alive and writes to log file
  • Test 2: Spawns shellper with old pipe-based stderr, destroys the pipe, verifies shellper survives thanks to error handlers

session-manager.test.ts:

  • Updated 3 stderr tail logging tests to reflect new behavior (stderr goes to file, not pipe)

Test plan

  • All 211 terminal tests pass (0 failures)
  • Regression test: shellper with file stderr survives parent exit
  • Regression test: shellper with pipe stderr survives broken pipe (error handler)
  • TypeScript compiles cleanly
  • Manual verification: restart Tower and confirm shellper processes survive

Closes #324

🤖 Generated with Claude Code

Root cause: shellper stderr was piped to Tower. When Tower exited,
the pipe broke and async EPIPE errors on process.stderr (which has
no error handler by default) crashed the shellper.

Two-part fix:
1. session-manager: redirect shellper stderr to a log file (not a
   pipe), eliminating the parent→child pipe dependency
2. shellper-main: add defensive error handlers on process.stdout
   and process.stderr to swallow EPIPE/EBADF

Also adds regression test that verifies shellper survives with both
the file-based stderr approach and the error handler approach.
Force a post-break stderr write by connecting/disconnecting from the
shellper socket after destroying the stderr pipe. This ensures the
error handler is actually exercised.

Addresses Codex review feedback on PR #340.
@waleedkadous waleedkadous merged commit 464d890 into main Feb 16, 2026
7 checks passed
@waleedkadous waleedkadous deleted the builder/bugfix-324-shellper-processes-do-not-surv branch February 16, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shellper processes do not survive Tower restart despite detached:true

1 participant