Summary
In claude.py, _kill() clears self._proc = None (line 782) before cancelling self._stderr_task (line 797-798). The _drain_stderr task (line 276) has a while-loop that checks self._proc on each iteration. Between the proc reference being cleared and the task being cancelled, _drain_stderr can observe self._proc as None in a state that was not intended to be visible to it.
The ordering in _kill()
async def _kill(self) -> None:
if self._proc:
# ... SIGKILL + wait ...
self._proc = None # line 782: proc reference cleared
self._pgid = None # line 783
self._session_id = None # line 784
self._session_started_at = None # line 785
# ... pgid cleanup ...
if self._stderr_task: # line 797: task cancelled AFTER proc cleared
self._stderr_task.cancel()
self._stderr_task = None
The drain loop in _drain_stderr:
while self._proc and self._proc.stderr: # line 276
try:
line = await self._proc.stderr.readline() # line 278
Race window
When _kill() sends SIGKILL and awaits self._proc.wait() at line 779, it yields to the event loop. Both _kill and _drain_stderr are waiting on the same process to die. When the process exits, the event loop can resume either coroutine first.
If _kill resumes first:
_kill clears self._proc = None (synchronous, no yield)
_kill proceeds to cancel _stderr_task (still synchronous)
_drain_stderr resumes, checks while self._proc - False, exits cleanly
If _drain_stderr resumes first:
- readline returns
b"" (EOF), drain checks if not line: break, exits
_kill resumes, clears proc, cancels the already-completed task
Both orderings happen to work correctly today, but only because:
- The while condition re-checks
self._proc on every iteration (so a None proc exits the loop)
- No yield point exists between the while-condition check and the
self._proc.stderr.readline() expression evaluation (so proc can't become None between the guard and the access)
CancelledError is a BaseException, not caught by the except Exception on line 284 (so task cancellation propagates cleanly)
Why this matters
The correctness depends on three implementation details that are not documented and could easily be broken by routine maintenance:
-
If someone adds a yield point (logging, metrics) between the while check and the readline, self._proc could become None between the guard and the self._proc.stderr access, causing an AttributeError.
-
If someone changes except Exception to except BaseException (a common "catch everything" reflex), CancelledError gets swallowed and the drain loop continues checking a None proc.
-
If the drain loop is refactored to cache proc = self._proc locally (a common optimization), the while-condition guard no longer protects against the reference going stale mid-iteration.
The fundamental issue is that _kill() invalidates state that another running task depends on, then cancels that task as an afterthought. The invariant should be: cancel the consumer before destroying what it consumes.
Severity
MEDIUM - not an active bug under current code, but a latent defect. The "accidental correctness" makes this a maintenance trap that will surface as a confusing AttributeError or silent malfunction the next time someone touches either _kill() or _drain_stderr().
Summary
In
claude.py,_kill()clearsself._proc = None(line 782) before cancellingself._stderr_task(line 797-798). The_drain_stderrtask (line 276) has a while-loop that checksself._procon each iteration. Between the proc reference being cleared and the task being cancelled,_drain_stderrcan observeself._procas None in a state that was not intended to be visible to it.The ordering in _kill()
The drain loop in
_drain_stderr:Race window
When
_kill()sends SIGKILL and awaitsself._proc.wait()at line 779, it yields to the event loop. Both_killand_drain_stderrare waiting on the same process to die. When the process exits, the event loop can resume either coroutine first.If
_killresumes first:_killclearsself._proc = None(synchronous, no yield)_killproceeds to cancel_stderr_task(still synchronous)_drain_stderrresumes, checkswhile self._proc- False, exits cleanlyIf
_drain_stderrresumes first:b""(EOF), drain checksif not line: break, exits_killresumes, clears proc, cancels the already-completed taskBoth orderings happen to work correctly today, but only because:
self._procon every iteration (so a None proc exits the loop)self._proc.stderr.readline()expression evaluation (so proc can't become None between the guard and the access)CancelledErroris aBaseException, not caught by theexcept Exceptionon line 284 (so task cancellation propagates cleanly)Why this matters
The correctness depends on three implementation details that are not documented and could easily be broken by routine maintenance:
If someone adds a yield point (logging, metrics) between the while check and the readline,
self._proccould become None between the guard and theself._proc.stderraccess, causing anAttributeError.If someone changes
except Exceptiontoexcept BaseException(a common "catch everything" reflex),CancelledErrorgets swallowed and the drain loop continues checking a None proc.If the drain loop is refactored to cache
proc = self._proclocally (a common optimization), the while-condition guard no longer protects against the reference going stale mid-iteration.The fundamental issue is that
_kill()invalidates state that another running task depends on, then cancels that task as an afterthought. The invariant should be: cancel the consumer before destroying what it consumes.Severity
MEDIUM - not an active bug under current code, but a latent defect. The "accidental correctness" makes this a maintenance trap that will surface as a confusing
AttributeErroror silent malfunction the next time someone touches either_kill()or_drain_stderr().