Skip to content

Agent sandbox run should fail or timeout when nested agent remains processing with pending tools #534

@chubes4

Description

@chubes4

Problem

WP Codebox wp-codebox.agent-sandbox-run can complete the recipe step with exit code 0 even when the nested Data Machine agents/chat result is still processing, has pending tools, and reached max turns without a final answer or file changes.

This lets parent Homeboy logic see runtime artifacts and attempt late outcome classification, even though the agent did not complete the task.

Evidence

Parent tracker: Extra-Chill/homeboy#3378

Overlay run:

  • Run id: homeboy-3378-release-channel-fanout-overlay-20260603015451
  • Transcript A: /var/folders/lr/c_cmmt7s0592m4njz99v5yb40000gn/T/opencode/homeboy-3378-release-channel-fanout-overlay-20260603015451-a-artifacts/runtime-mpxez1y8-ubvf6z/files/transcript.json
  • Agent result A: /var/folders/lr/c_cmmt7s0592m4njz99v5yb40000gn/T/opencode/homeboy-3378-release-channel-fanout-overlay-20260603015451-a-artifacts/runtime-mpxez1y8-ubvf6z/files/agent-result.json

Nested agents/chat metadata:

status: processing
current_turn: 20
has_pending_tools: true
completed: null/false

WP Codebox agent result:

{
  "schema": "wp-codebox/agent-result/v1",
  "status": "completed",
  "actionable": false,
  "summary": "Agent sandbox completed without actionable file changes.",
  "changedFiles": { "count": 0 },
  "patch": { "bytes": 0 },
  "noOpReason": "no_file_changes"
}

The recipe command itself had exit code 0, and the parent Homeboy run later preserved empty patch artifacts.

Expected behavior

A sandbox agent run should not be classified as completed when the nested conversation is still processing or has pending tools at max turns.

Acceptance criteria

  • Inspect nested agent_runtime.result.completed, metadata.status, metadata.has_pending_tools, metadata.current_turn, and max-turn diagnostics.
  • If nested state is processing with pending tools or max turns reached, classify the WP Codebox agent result as timeout, incomplete, or failed, not completed.
  • Emit a structured diagnostic such as agent_runtime.incomplete_pending_tools.
  • Produce non-zero recipe outcome when the nested agent cannot complete and no explicit no-op final answer exists.
  • Add smoke coverage using a fake nested agents/chat result with status: processing, has_pending_tools: true, and no file changes.

Related

AI assistance

  • AI assistance: Yes
  • Tool(s): OpenCode (openai/gpt-5.5)
  • Used for: Reproducing the nested agent pending-tools outcome and drafting this tracker. Chris remains responsible for review and prioritization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions