Observed
After fixes #918 and #920 landed, both workers come up through preflight but crash on the first iteration. The crash is identical for both repos: the first set_status call (to update the idle 'All done — no issues to fetch' status emoji) launches claude, claude returns an empty result in ~0.36s, recovery respawns a new session, the next prompt also fails, and the worker thread raises RuntimeError("Claude session died during prompt") which kills the thread.
17:50:52 INFO [home] picker[cache]: no eligible issue for FidoCanCode in FidoCanCode/home
17:50:52 INFO [home] set_status: nudging session for status + emoji
17:50:52 INFO [home] session.prompt: queuing behind none holder (tid=..., model=claude-opus-4-6)
17:50:52 INFO [home] session.prompt: lock acquired (tid=..., waited=0.00s)
17:50:52 INFO [home] ClaudeSession: recovering session (model=claude-opus-4-6)
Exception ignored while finalizing file <_io.TextIOWrapper name=15 encoding='utf-8'>:
Traceback (most recent call last):
File "/workspace/src/fido/claude.py", line 686, in _respawn
self._proc = self._spawn()
BrokenPipeError: [Errno 32] Broken pipe
17:50:52 INFO [home] ClaudeSession: respawn complete, new pid 137
17:50:52 WARNING [home] ClaudeClient: recovered session after prompt failure: [Errno 32] Broken pipe
17:50:52 INFO [home] session.prompt: turn complete (tid=..., total=0.36s, result_len=0, cancelled=False)
17:50:53 ERROR [home] WorkerThread worker-home: unexpected error
...
File "/workspace/src/fido/worker.py", line 1167, in set_status
raw = self._provider_agent.run_turn(...)
File "/workspace/src/fido/claude.py", line 1503, in run_turn
result = self._prompt_with_recovery(session, ...)
File "/workspace/src/fido/session_agent.py", line 276, in _prompt_with_recovery
raise RuntimeError(self._dead_prompt_error_message())
RuntimeError: Claude session died during prompt
Signals
BrokenPipeError during _respawn finalizer — the prior session's stdout/stdin pipes are closed when the replacement is spun up.
- The first (pre-recovery) prompt also had this issue — set_status doesn't see a healthy session on cold boot.
result_len=0, cancelled=False means claude exited cleanly with empty output.
- claude-code quota looks healthy:
claude-code 7% (five hour, resets 2026-04-24 21:50 UTC).
preflight: all required tools found: git, gh, claude, copilot — binaries are there.
Possible causes
claude subprocess warmup — first invocation is bailing out before stream-json handshake completes (empty result_len, sub-second duration). Could be a container-side config/auth gap (no ~/.claude.json visible inside container, even though the host mount is present).
session_agent._prompt_with_recovery over-eager — raising on first empty result instead of retrying the turn. The recovery path respawns and runs the same prompt, also gets empty result, declares dead. Arguably the correct behavior: if claude legitimately can't answer, the worker shouldn't loop. But the empty result on startup is the real bug — recovery is just reporting it.
- Model selection —
model=claude-opus-4-6 used for status voice. If that model is having intermittent capacity issues, recovery can't help because respawn picks the same model. (Not a kennel→fido bug but would interact with it.)
- Stale
~/.claude.json inside container — the bind-mount might be showing a shape the in-container claude doesn't like after recent CLI updates.
Repro
Both workers crash within ~10 seconds of listening. Watchdog respawns them, they crash again.
Proposed next step
I do not have a clean fix for this yet — the empty-result behavior of claude suggests the problem is outside fido's coordination code. Next steps I'd try:
- Reproduce
claude -p 'hello' directly inside the fido container (docker exec fido claude -p 'hi') to see whether it's a claude-in-container auth/setup problem rather than a fido session-agent problem.
- If that works, strace the first
stream-json turn to see whether the handshake actually goes through.
- If the claude CLI itself is broken inside the container, this is upstream of fido — raise separately.
Filing to capture the state; need a manual bisect session to actually fix.
Side-note
Before this crash, I hit a separate git-identity invariant failure because the home workspace had no local user.name / user.email in .git/config and was relying on host global config (not visible inside the container). Worked around by git config user.name/user.email in the workspace. Possibly worth another issue to teach fido to self-heal that or fail with a more actionable error.
Observed
After fixes #918 and #920 landed, both workers come up through preflight but crash on the first iteration. The crash is identical for both repos: the first
set_statuscall (to update the idle 'All done — no issues to fetch' status emoji) launches claude, claude returns an empty result in ~0.36s, recovery respawns a new session, the next prompt also fails, and the worker thread raisesRuntimeError("Claude session died during prompt")which kills the thread.Signals
BrokenPipeErrorduring_respawnfinalizer — the prior session's stdout/stdin pipes are closed when the replacement is spun up.result_len=0, cancelled=Falsemeans claude exited cleanly with empty output.claude-code 7% (five hour, resets 2026-04-24 21:50 UTC).preflight: all required tools found: git, gh, claude, copilot— binaries are there.Possible causes
claudesubprocess warmup — first invocation is bailing out before stream-json handshake completes (empty result_len, sub-second duration). Could be a container-side config/auth gap (no~/.claude.jsonvisible inside container, even though the host mount is present).session_agent._prompt_with_recoveryover-eager — raising on first empty result instead of retrying the turn. The recovery path respawns and runs the same prompt, also gets empty result, declares dead. Arguably the correct behavior: if claude legitimately can't answer, the worker shouldn't loop. But the empty result on startup is the real bug — recovery is just reporting it.model=claude-opus-4-6used for status voice. If that model is having intermittent capacity issues, recovery can't help because respawn picks the same model. (Not a kennel→fido bug but would interact with it.)~/.claude.jsoninside container — the bind-mount might be showing a shape the in-container claude doesn't like after recent CLI updates.Repro
Both workers crash within ~10 seconds of listening. Watchdog respawns them, they crash again.
Proposed next step
I do not have a clean fix for this yet — the empty-result behavior of
claudesuggests the problem is outside fido's coordination code. Next steps I'd try:claude -p 'hello'directly inside thefidocontainer (docker exec fido claude -p 'hi') to see whether it's a claude-in-container auth/setup problem rather than a fido session-agent problem.stream-jsonturn to see whether the handshake actually goes through.Filing to capture the state; need a manual bisect session to actually fix.
Side-note
Before this crash, I hit a separate git-identity invariant failure because the
homeworkspace had no localuser.name/user.emailin.git/configand was relying on host global config (not visible inside the container). Worked around bygit config user.name/user.emailin the workspace. Possibly worth another issue to teach fido to self-heal that or fail with a more actionable error.