Workers crash on first iteration: claude session dies during set_status prompt

## Observed

After fixes #918 and #920 landed, both workers come up through preflight but crash on the first iteration. The crash is identical for both repos: the first `set_status` call (to update the idle 'All done — no issues to fetch' status emoji) launches claude, claude returns an empty result in ~0.36s, recovery respawns a new session, the next prompt also fails, and the worker thread raises `RuntimeError("Claude session died during prompt")` which kills the thread.

```
17:50:52 INFO  [home] picker[cache]: no eligible issue for FidoCanCode in FidoCanCode/home
17:50:52 INFO  [home] set_status: nudging session for status + emoji
17:50:52 INFO  [home] session.prompt: queuing behind none holder (tid=..., model=claude-opus-4-6)
17:50:52 INFO  [home] session.prompt: lock acquired (tid=..., waited=0.00s)
17:50:52 INFO  [home] ClaudeSession: recovering session (model=claude-opus-4-6)
Exception ignored while finalizing file <_io.TextIOWrapper name=15 encoding='utf-8'>:
Traceback (most recent call last):
  File "/workspace/src/fido/claude.py", line 686, in _respawn
    self._proc = self._spawn()
BrokenPipeError: [Errno 32] Broken pipe
17:50:52 INFO  [home] ClaudeSession: respawn complete, new pid 137
17:50:52 WARNING [home] ClaudeClient: recovered session after prompt failure: [Errno 32] Broken pipe
17:50:52 INFO  [home] session.prompt: turn complete (tid=..., total=0.36s, result_len=0, cancelled=False)
17:50:53 ERROR [home] WorkerThread worker-home: unexpected error
  ...
  File "/workspace/src/fido/worker.py", line 1167, in set_status
    raw = self._provider_agent.run_turn(...)
  File "/workspace/src/fido/claude.py", line 1503, in run_turn
    result = self._prompt_with_recovery(session, ...)
  File "/workspace/src/fido/session_agent.py", line 276, in _prompt_with_recovery
    raise RuntimeError(self._dead_prompt_error_message())
RuntimeError: Claude session died during prompt
```

## Signals

- `BrokenPipeError` during `_respawn` finalizer — the prior session's stdout/stdin pipes are closed when the replacement is spun up.
- The first (pre-recovery) prompt also had this issue — set_status doesn't see a healthy session on cold boot.
- `result_len=0, cancelled=False` means claude exited cleanly with empty output.
- claude-code quota looks healthy: `claude-code 7% (five hour, resets 2026-04-24 21:50 UTC)`.
- `preflight: all required tools found: git, gh, claude, copilot` — binaries are there.

## Possible causes

1. **`claude` subprocess warmup** — first invocation is bailing out before stream-json handshake completes (empty result_len, sub-second duration). Could be a container-side config/auth gap (no `~/.claude.json` visible inside container, even though the host mount is present).
2. **`session_agent._prompt_with_recovery` over-eager** — raising on first empty result instead of retrying the turn. The recovery path respawns and runs the same prompt, also gets empty result, declares dead. Arguably the correct behavior: if claude legitimately can't answer, the worker shouldn't loop. But the empty result on startup is the real bug — recovery is just reporting it.
3. **Model selection** — `model=claude-opus-4-6` used for status voice. If that model is having intermittent capacity issues, recovery can't help because respawn picks the same model. (Not a kennel→fido bug but would interact with it.)
4. **Stale `~/.claude.json` inside container** — the bind-mount might be showing a shape the in-container claude doesn't like after recent CLI updates.

## Repro

```
./fido down
./fido up
```

Both workers crash within ~10 seconds of listening. Watchdog respawns them, they crash again.

## Proposed next step

I do not have a clean fix for this yet — the empty-result behavior of `claude` suggests the problem is outside fido's coordination code. Next steps I'd try:

- Reproduce `claude -p 'hello'` directly inside the `fido` container (`docker exec fido claude -p 'hi'`) to see whether it's a claude-in-container auth/setup problem rather than a fido session-agent problem.
- If that works, strace the first `stream-json` turn to see whether the handshake actually goes through.
- If the claude CLI itself is broken inside the container, this is upstream of fido — raise separately.

Filing to capture the state; need a manual bisect session to actually fix.

## Side-note

Before this crash, I hit a separate git-identity invariant failure because the `home` workspace had no local `user.name` / `user.email` in `.git/config` and was relying on host global config (not visible inside the container). Worked around by `git config user.name/user.email` in the workspace. Possibly worth another issue to teach fido to self-heal that or fail with a more actionable error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers crash on first iteration: claude session dies during set_status prompt #921

Observed

Signals

Possible causes

Repro

Proposed next step

Side-note

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Workers crash on first iteration: claude session dies during set_status prompt #921

Description

Observed

Signals

Possible causes

Repro

Proposed next step

Side-note

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions