Skip to content

Drain ClaudeSession stderr into the logger (progress on #921)#922

Merged
FidoCanCode merged 1 commit into
mainfrom
fido/drain-claude-stderr
Apr 24, 2026
Merged

Drain ClaudeSession stderr into the logger (progress on #921)#922
FidoCanCode merged 1 commit into
mainfrom
fido/drain-claude-stderr

Conversation

@FidoCanCode
Copy link
Copy Markdown
Owner

Progress on #921.

ClaudeSession._spawn sets stderr=subprocess.PIPE but nothing ever reads it. Once the pipe buffer fills, the claude subprocess blocks on its next stderr write and eventually dies silently. Symptom: on boot, the first set_status prompt finds the session dead with no diagnostic, and both workers crash-loop on Claude session died during prompt.

This PR starts a daemon reader thread from _spawn that reads stderr line-by-line and forwards each line through the repo's logger at INFO with a ClaudeSession[pid=N] stderr: prefix. The thread exits on stderr EOF (subprocess gone) or on OSError/ValueError (pipe closed), so no manual cleanup is needed.

Primarily an instrumentation fix — claude's own complaint will now land in ~/log/fido.log instead of being swallowed. It also removes the pipe-buffer-fills-and-claude-blocks failure mode as a side effect, which may be the root cause of #921 on its own.

The shared-state-contention theory (host claude-code + container claude writing to the same ~/.claude.json) was ruled out by a manual test: user killed the host claude-code session, started fido from a plain shell, and got the identical crash.

./fido ci green.

ClaudeSession._spawn sets stderr=subprocess.PIPE but nothing ever reads
it.  Once the pipe buffer fills, the claude subprocess blocks on its
next stderr write and eventually dies silently.  Symptom: on boot, the
first set_status prompt finds the session dead with no diagnostic, and
both workers crash-loop on "Claude session died during prompt".

Start a daemon reader thread from _spawn that reads stderr line-by-line
and forwards each line through the repo's logger at INFO with a
ClaudeSession[pid=N] stderr: prefix.  The thread exits on stderr EOF
(subprocess gone) or on OSError/ValueError (pipe closed), so no manual
cleanup is needed.

This is primarily an instrumentation fix — claude's own complaint will
now land in ~/log/fido.log instead of being swallowed.  It also removes
the pipe-buffer-fills-and-claude-blocks failure mode as a side effect,
which may be the root cause of #921 on its own.
@FidoCanCode FidoCanCode requested a review from rhencke April 24, 2026 18:10
@FidoCanCode FidoCanCode merged commit f7d43b1 into main Apr 24, 2026
1 check passed
@FidoCanCode FidoCanCode deleted the fido/drain-claude-stderr branch April 24, 2026 18:11
FidoCanCode added a commit that referenced this pull request Apr 24, 2026
Fixes #921.

Three separate modules (`gh_status.py`, `status.py`, `worker.py`) still
had their own pre-rename copies of the filesystem walk to `sub/`. Only
`config.py` was updated when the package moved from `kennel/` to
`src/fido/` (#920). The other three kept pointing at the non-existent
`/workspace/src/sub/`, which is why ClaudeSession launched claude with
`--system-prompt-file /workspace/src/sub/persona.md` and claude exited
silently with:

```
ClaudeSession[pid=54] stderr: Error: System prompt file not found: /workspace/src/sub/persona.md
```

That error was hidden behind an unread stderr pipe until #922 landed the
stderr drain and made it visible. This PR fixes the actual crash by
moving all four callers onto a single `fido.config.default_sub_dir()`
helper — inline `parents[N] / "sub"` is now a review-time smell.

`./fido ci` green.

Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants