What happened
The LLM stream stall detector (processor.ts lines 81–84) fires after 3 minutes of no token output, regardless of whether the session is an autonomous subagent or an interactive PM session. When the PM session calls the question tool and waits for a human response, no tokens flow — so the stall detector incorrectly kills the session with an error before the human has a chance to reply.
Expected behaviour
The stall timeout should only apply to autonomous subagent sessions spawned via the task tool. Interactive PM sessions (root sessions with no parentID) should never be killed by the stall detector, as they may legitimately be waiting on human input for an indefinite period.
Steps to reproduce
- Start an interactive PM session
- Have the PM call the
question tool (e.g. asking the user to make a decision)
- Wait more than 3 minutes before responding
- Session errors with:
LLM stream stalled: no tokens received for X minutes
Environment
- Branch:
dev
- File:
packages/opencode/src/session/processor.ts
- Config:
OPENCODE_STALL_TIMEOUT_MS (default 180000ms)
Root Cause
processor.ts:81 checks Date.now() - lastTokenTime > stallTimeout on every stream iteration with no awareness of:
- Whether the session is interactive (PM, no
parentID) or autonomous (subagent, parentID set)
- Whether a tool call is currently pending/waiting for external input
Subagent sessions are created via task.ts:264 with parentID: ctx.sessionID. Root PM sessions have parentID: undefined. This distinction is already tracked in Session.Info but is not consulted by the stall detector.
Proposed Fix (Option A)
In processor.ts at the stall check (~line 81), gate the stall error on session.parentID being defined:
// Only apply stall detection to task-spawned subagent sessions
if (session.parentID && Date.now() - lastTokenTime > stallTimeout) {
throw new Error(`LLM stream stalled: no tokens received for ${Math.round(stallTimeout / 60000)} minutes`)
}
Root sessions (no parentID) are permanently exempt. Subagent sessions (with parentID) retain the 3-minute timeout as intended.
Acceptance Criteria
Definition of Done
What happened
The LLM stream stall detector (
processor.tslines 81–84) fires after 3 minutes of no token output, regardless of whether the session is an autonomous subagent or an interactive PM session. When the PM session calls thequestiontool and waits for a human response, no tokens flow — so the stall detector incorrectly kills the session with an error before the human has a chance to reply.Expected behaviour
The stall timeout should only apply to autonomous subagent sessions spawned via the
tasktool. Interactive PM sessions (root sessions with noparentID) should never be killed by the stall detector, as they may legitimately be waiting on human input for an indefinite period.Steps to reproduce
questiontool (e.g. asking the user to make a decision)LLM stream stalled: no tokens received for X minutesEnvironment
devpackages/opencode/src/session/processor.tsOPENCODE_STALL_TIMEOUT_MS(default 180000ms)Root Cause
processor.ts:81checksDate.now() - lastTokenTime > stallTimeouton every stream iteration with no awareness of:parentID) or autonomous (subagent,parentIDset)Subagent sessions are created via
task.ts:264withparentID: ctx.sessionID. Root PM sessions haveparentID: undefined. This distinction is already tracked inSession.Infobut is not consulted by the stall detector.Proposed Fix (Option A)
In
processor.tsat the stall check (~line 81), gate the stall error onsession.parentIDbeing defined:Root sessions (no
parentID) are permanently exempt. Subagent sessions (withparentID) retain the 3-minute timeout as intended.Acceptance Criteria
questiontool for >3 minutes does NOT receive a stall errorprocessor.tsstall check is gated onsession.parentID !== undefinedtest/session/processor-stall.test.tshas a test verifying root sessions (noparentID) are exempt from stall detectiontest/session/processor-stall.test.tshas a test verifying subagent sessions (withparentID) still trigger stall detectionDefinition of Done
bun run typecheckpasses