Skip to content

Sweep: empty-provider-output fail-closed raises that still kill their callers (follow-up to #935) #938

@FidoCanCode

Description

@FidoCanCode

Observed

After merging #937 (which fixed the _notify_thread_change call site), the same crash pattern surfaced from a different call site:

21:49:34 CRITICAL [-] uncaught exception in thread reorder-home
  File "/workspace/src/fido/tasks.py", line 578, in reorder_tasks
    _on_done()
  File "/workspace/src/fido/events.py", line 1534, in on_done
    rewrite_fn(...)
  File "/workspace/src/fido/events.py", line 1481, in _rewrite_pr_description
    _write_pr_description(...)
  File "/workspace/src/fido/worker.py", line 866, in _write_pr_description
    raise ValueError(
      "_write_pr_description: provider returned no <body> content for PR #934 (raw='')"
    )

Same root cause: the voice turn was preempted by an incoming webhook (result_len=0, cancelled=True), caller couldn't distinguish that from a real empty-Opus-reply, and raise ValueError killed the daemon thread.

What to audit

Every raise ValueError / RuntimeError inside an agent call path whose caller runs on a thread without surrounding try/except. Known instances so far:

Probable cousins (worth checking):

  • _rewrite_pr_description (events.py ~1481)
  • PR-body composers that call agent.run_turn
  • Reply generators (generate_ACT_reply, generate_DO_reply, etc.) if any raise on empty
  • The generate_persona_status / generate_persona_emoji pair in gh_status.py (already raises ValueError in the code we read earlier — same shape, probably same bug; falls back to _FALLBACK_MESSAGES via the caller's except Exception, but that's hidden behind a broad catch)

Fix shape

Same as #937:

  1. Pass retry_on_preempt=True to run_turn at each site.
  2. On genuine empty-after-retries, fall back to a plain-text default (or skip + log, depending on call-site semantics — PR-description rewrite can legitimately no-op if the rewrite would have been a no-op anyway).
  3. Update the corresponding raise ValueError sites to log + fallback, not raise, since the callers run on daemon threads without a top-level handler.

Alternative design consideration: a module-level safe_voice_turn helper that wraps run_turn with retry_on_preempt=True + empty-result → fallback + log. Every call site that currently raises on empty gets switched to use this helper. One place to fix, one place to audit.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions