Skip to content

Bug: LLM stall timeout fires on interactive PM sessions waiting for human input #22023

@randomm

Description

@randomm

What happened

The LLM stream stall detector (processor.ts lines 81–84) fires after 3 minutes of no token output, regardless of whether the session is an autonomous subagent or an interactive PM session. When the PM session calls the question tool and waits for a human response, no tokens flow — so the stall detector incorrectly kills the session with an error before the human has a chance to reply.

Expected behaviour

The stall timeout should only apply to autonomous subagent sessions spawned via the task tool. Interactive PM sessions (root sessions with no parentID) should never be killed by the stall detector, as they may legitimately be waiting on human input for an indefinite period.

Steps to reproduce

  1. Start an interactive PM session
  2. Have the PM call the question tool (e.g. asking the user to make a decision)
  3. Wait more than 3 minutes before responding
  4. Session errors with: LLM stream stalled: no tokens received for X minutes

Environment

  • Branch: dev
  • File: packages/opencode/src/session/processor.ts
  • Config: OPENCODE_STALL_TIMEOUT_MS (default 180000ms)

Root Cause

processor.ts:81 checks Date.now() - lastTokenTime > stallTimeout on every stream iteration with no awareness of:

  • Whether the session is interactive (PM, no parentID) or autonomous (subagent, parentID set)
  • Whether a tool call is currently pending/waiting for external input

Subagent sessions are created via task.ts:264 with parentID: ctx.sessionID. Root PM sessions have parentID: undefined. This distinction is already tracked in Session.Info but is not consulted by the stall detector.

Proposed Fix (Option A)

In processor.ts at the stall check (~line 81), gate the stall error on session.parentID being defined:

// Only apply stall detection to task-spawned subagent sessions
if (session.parentID && Date.now() - lastTokenTime > stallTimeout) {
  throw new Error(`LLM stream stalled: no tokens received for ${Math.round(stallTimeout / 60000)} minutes`)
}

Root sessions (no parentID) are permanently exempt. Subagent sessions (with parentID) retain the 3-minute timeout as intended.

Acceptance Criteria

  • Interactive PM session waiting on question tool for >3 minutes does NOT receive a stall error
  • Autonomous subagent session with no token output for >3 minutes DOES receive a stall error as before
  • processor.ts stall check is gated on session.parentID !== undefined
  • test/session/processor-stall.test.ts has a test verifying root sessions (no parentID) are exempt from stall detection
  • test/session/processor-stall.test.ts has a test verifying subagent sessions (with parentID) still trigger stall detection

Definition of Done

  • Tests written (TDD preferred) and passing
  • bun run typecheck passes
  • Linting passes
  • No regression in stall detection for subagent sessions

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions