Skip to content

Fix control_response request_id check — read from nested response (closes #975)#977

Merged
FidoCanCode merged 1 commit into
mainfrom
fix-control-response-shape
Apr 25, 2026
Merged

Fix control_response request_id check — read from nested response (closes #975)#977
FidoCanCode merged 1 commit into
mainfrom
fix-control-response-shape

Conversation

@FidoCanCode
Copy link
Copy Markdown
Owner

Fixes #975.

Real bug

Probed the actual claude-code subprocess (claude --version → 2.1.120). Its control_response shape is:

{"type": "control_response",
 "response": {"subtype": "success", "request_id": "..."}}

request_id is nested under response. Fido's _send_control_set_model checked obj.get("request_id") at the top level. Always None. Predicate never matched. Every switch_model call hung until the subprocess was killed.

Proof

$ grep -c 'switch_model: now on model=' ~/log/fido.log
0
$ grep -c 'switch_model:.*(control_request)' ~/log/fido.log
9

Zero successful returns. Nine attempts. Every one hung. Concrete trace:

  • 15:40:23 [home] switch_model: opus → haiku
  • (no [home] log lines for 5 minutes)
  • 15:45:23 [home] worker started (fresh thread post-fido-restart)

The session was wedged the entire window. Same shape across every switch_model log line in the file.

Why fido seemed to work anyway

Many switch_model calls are no-ops (target == current model) and return at line 1102 before sending the control_request — so sessions that didn't need an actual model change kept working. The 15:40:30 webhook test that succeeded landed on a freshly-booted confusio session that was already on opus from spawn — no switch needed. Bug only fires on real model changes.

Fix

claude.py:1023-1028:

if obj.get("type") == "control_response":
    response = obj.get("response") or {}
    if response.get("request_id") == request_id:
        return

Test

Existing test fixtures had the wrong shape too — they put request_id at the top level, matching fido's broken expectation. Both helpers (TestClaudeSessionSwitchModel._make_response_line and TestClaudeSessionSendControlSetModel._make_response_line) updated to emit the real nested shape.

New regression test test_ignores_top_level_request_id_in_control_response explicitly emits a malformed top-level placement first, then the correct nested one — asserts the malformed one is skipped and the wait completes only on the nested response. This catches any future regression that re-introduces top-level lookup.

All 264 claude tests pass.

Related

…oses #975)

Real claude-code (verified against 2.1.120) emits control_response with
request_id nested under response, not at the top level:

  {"type": "control_response",
   "response": {"subtype": "success", "request_id": "..."}}

Fido's _send_control_set_model checked obj.get("request_id") at the top
level. That always returned None, the predicate never matched, and every
switch_model call hung until the subprocess was killed.

Verified by direct probing of the claude subprocess and by grepping
fido.log: switch_model: now on model= (the post-success log line) appears
zero times across the entire log; switch_model: ... (control_request)
appears 9 times. Every one of those 9 calls hung. The 5-minute home gap
at 15:40:23-15:45:23 in fido.log shows the hang concretely.

Plus updates the existing test fixtures, which had the wrong shape too —
tests passed against the broken code because both used top-level
request_id. Adds a regression test that explicitly emits a malformed
top-level shape and asserts it is NOT matched, then a correct nested
shape and asserts it is.
@FidoCanCode FidoCanCode requested a review from rhencke April 25, 2026 16:11
@FidoCanCode FidoCanCode merged commit 95a5552 into main Apr 25, 2026
1 check passed
@FidoCanCode FidoCanCode deleted the fix-control-response-shape branch April 25, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: switch_model hangs when called with _in_turn=True after preempt-cancelled drain

2 participants