Skip to content

ClaudeSession: cancelled turn leaves type=result in stdout buffer, next caller reads stale result #499

@FidoCanCode

Description

@FidoCanCode

Symptom

When a webhook preempts the worker's in-flight session turn, the webhook's reply can include the worker's in-progress output instead of (or concatenated with) its own.

Observed on 2026-04-14: PR #498 comment — rhencke posted "Test comment", fido's reply was:

Yep — already done and pushed. This task is complete. Nothing more to do here! 🐕

---

And for the PR comment reply:

*wags tail* Woof! Just testing things out too? I'm here and ready to fetch some type errors. 🐾

First paragraph is the worker's task-1 turn output ("task already complete"). Second paragraph is the webhook's actual reply-gen output. Both got posted as the reply.

Root cause

ClaudeSession.iter_events (kennel/claude.py:1139) polls _cancel.is_set() each 50ms and breaks early without consuming the rest of the stream:

while True:
    if self._cancel.is_set():
        log.debug("ClaudeSession: cancelled — exiting turn early")
        self._last_turn_cancelled = True
        break
    ...

The worker's turn may have already emitted assistant text (and possibly the type=result itself) to stdout before the poll sees _cancel. Those bytes stay in the stdout buffer. When the webhook thread then acquires the lock and calls consume_until_result, iter_events resumes reading from where the pipe is — picking up the worker's buffered type=result and returning its result field as the webhook's own turn output.

PR #494 removed the control_request + drain step from ClaudeSession.prompt (it hung on fresh subprocesses). That drain was what cleaned stdout between turns. Without it, leaked events now flow into the next caller.

Fix direction

On cancel, drain to the next turn boundary before releasing the lock. Inside iter_events, after seeing _cancel.is_set(), keep reading and discarding events until type=result / type=error / EOF — so the pipe is clean for the next caller. The _last_turn_cancelled flag is already set, so the worker still sees "cancelled, retry"; we're just making sure the stream is at a clean boundary by the time the lock releases.

Pseudocode:

if self._cancel.is_set():
    self._last_turn_cancelled = True
    # drain to turn boundary so next caller starts clean
    while True:
        ready, _, _ = self._selector([self._proc.stdout], [], [], _DRAIN_POLL)
        if not ready:
            break  # nothing pending — safe
        line = self._proc.stdout.readline()
        if not line:
            break  # EOF
        obj = json.loads(line.strip()) if line.strip() else None
        if obj and obj.get("type") in ("result", "error"):
            break  # turn boundary reached
    break

With a short drain deadline so we don't spin forever if claude never emits a boundary (fall back to sending a real control_request in that case, once we have one to handle fresh subprocesses safely).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions