You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reviewer comment receives no fido reply at all (no triage line, no error processing action, no comment posted). Webhook arrived, was triaged through needs_more_context (got NO), then the next step — the opus triage classifier — never logs a result, and the comment sits unanswered.
00:51:18Z rhencke comment 3083330278: "Why protocol over ABC?"
00:51:23 webhook pull_request_review_comment delivered
00:51:23 replying to 1 review comments
00:51:23 comment 3083330278 locked by another process — skipping ← review path skips
00:51:24 fetched 1 comment(s) in thread for context ← comment path proceeds
00:51:24 session.prompt: preempt requested (model=haiku-4-5) ← needs_more_context
00:51:26 session: worker ceding lock, preempter acquired after 0.000s
00:51:35 claude result: NO ← needs_more_context = NO
00:51:35 session.prompt: preempt requested (model=opus-4-6) ← triage classifier
(silence — no "triage: ..." log, no reply, no error)
00:52:14 home worker resumes its own task work
By 00:52:14 the worker is back doing tool calls for the migration task it was on, so home's session was clearly released. But the webhook thread between 00:51:35 and 00:52:14 either:
ran the opus call and got an empty/un-parseable result that silently went nowhere, or
got starved by the worker re-acquiring the lock, never completed its session.prompt, and the thread died/leaked silently, or
raised an exception that was swallowed somewhere outside _process_action's known catch block (no error processing action was logged).
5+ minutes elapsed, no reply ever posted.
Likely candidates (need instrumentation to confirm)
Triage classifier returned empty / unparseable category — _triage parses Opus output for ACT/ASK/ANSWER/DO/DEFER. If parsing fails, the function may silently return ("", []) or similar; the rest of reply_to_comment would then fall through without posting. No log, no error.
session.prompt yield-starvation regression — even with _preempt_pending (Strip tools from webhook reply-gen + fair preempt yield + more color #517), if the worker re-acquires fast enough between webhook's preempt and webhook's actual lock acquire, the webhook can wait a long time. Combined with the 50ms poll loop, races are plausible.
Silent exception in a sub-call — _summarize_as_action_item or fetch_comment_thread raised something that bubbled out of reply_to_comment but was swallowed by the bg-thread excepthook (which logs CRITICAL — but I see no CRITICAL line in this window either).
Fix direction
Step 1: instrument. Add log lines bracketing every _print_prompt call inside reply_to_comment ("triage classifier: requesting" / "triage classifier: returned " or empty), and a tail reply_to_comment: returning category=… body_len=… so we can pinpoint where the path drops the ball.
Step 2: depending on what the instrumentation shows, either harden _triage's category-parse to fail loud, or re-audit session.prompt yield semantics on contention.
This is in the same family as #499 (silent stream-leak) and #523 (silent picker idle while there's work) — kennel needs to be louder when a webhook handler exits without producing a visible side effect.
Symptom
Reviewer comment receives no fido reply at all (no triage line, no
error processing action, no comment posted). Webhook arrived, was triaged throughneeds_more_context(got NO), then the next step — the opus triage classifier — never logs a result, and the comment sits unanswered.Concrete repro on FidoCanCode/home PR #519:
By 00:52:14 the worker is back doing tool calls for the migration task it was on, so home's session was clearly released. But the webhook thread between 00:51:35 and 00:52:14 either:
_process_action's known catch block (noerror processing actionwas logged).5+ minutes elapsed, no reply ever posted.
Likely candidates (need instrumentation to confirm)
Triage classifier returned empty / unparseable category —
_triageparses Opus output for ACT/ASK/ANSWER/DO/DEFER. If parsing fails, the function may silently return("", [])or similar; the rest ofreply_to_commentwould then fall through without posting. No log, no error.session.prompt yield-starvation regression — even with
_preempt_pending(Strip tools from webhook reply-gen + fair preempt yield + more color #517), if the worker re-acquires fast enough between webhook's preempt and webhook's actual lock acquire, the webhook can wait a long time. Combined with the 50ms poll loop, races are plausible.Silent exception in a sub-call —
_summarize_as_action_itemorfetch_comment_threadraised something that bubbled out ofreply_to_commentbut was swallowed by the bg-thread excepthook (which logs CRITICAL — but I see no CRITICAL line in this window either).Fix direction
Step 1: instrument. Add log lines bracketing every
_print_promptcall insidereply_to_comment("triage classifier: requesting" / "triage classifier: returned " or empty), and a tailreply_to_comment: returning category=… body_len=…so we can pinpoint where the path drops the ball.Step 2: depending on what the instrumentation shows, either harden
_triage's category-parse to fail loud, or re-audit session.prompt yield semantics on contention.This is in the same family as #499 (silent stream-leak) and #523 (silent picker idle while there's work) — kennel needs to be louder when a webhook handler exits without producing a visible side effect.