Skip to content

feat(session): commit best-guess answer in headless mode at max-turns#763

Open
sahrizvi wants to merge 2 commits intomainfrom
feat/headless-commit-best-guess
Open

feat(session): commit best-guess answer in headless mode at max-turns#763
sahrizvi wants to merge 2 commits intomainfrom
feat/headless-commit-best-guess

Conversation

@sahrizvi
Copy link
Copy Markdown

@sahrizvi sahrizvi commented Apr 28, 2026

Closes #759

Motivation

In headless mode (-p / --print), when the agent hits --max-turns, the existing MAX_STEPS injection produces a multi-paragraph "I have reached the maximum number of steps" summary instead of a committed answer. In interactive mode this is fine — a human can react. In headless / scripted use, there is no human; the program's final stdout is unparseable meta-prose.

Common situations where this matters:

  • CI pipelines: claude -p "review this PR for security issues" returns a status update instead of a partial review
  • Eval harnesses (any non-interactive batch grader)
  • Composable shell pipelines: git diff | claude -p "summarize" — works only if the agent always emits the requested shape
  • Scheduled / cron jobs: produce parseable output regardless of run length

What's changed

This PR keeps interactive behaviour completely unchanged and adds a separate prompt path for headless mode.

New prompt files:

  • prompt/max-steps-headless.txt — final-step prompt explicitly instructing the model to emit the answer (not a summary). The wording makes the rationale explicit: an uncertain best-guess is more useful than meta-prose, because the caller cannot ask for retries.
  • prompt/max-steps-headless-prewarn.txt — soft pre-warning that fires one step before the limit when headless and Number.isFinite(maxSteps). The isFinite guard prevents misfiring when no max-turns is set.

Wiring:

  • session/prompt.ts: exported selectMaxStepsPrompt({ step, maxSteps, headless }) — pure selector returning the right prompt for the current state, plumbed into the existing prompt loop.
  • cli/cmd/run.ts: passes headless: true from the -p / --print CLI entry into the session SDK call.
  • session/message-v2.ts: extended the User zod schema with optional headless: boolean, persisted on the user message so resumed sessions retain the flag.
  • SDK gen cascade (sdk.gen.ts + types.gen.ts): updated by the existing @hey-api/openapi-ts step.

Tests

7 new tests in test/session/max-steps-prompt.test.ts:

  1. Mid-loop returns interactive prompt
  2. Interactive last step returns interactive prompt
  3. Headless last step returns headless prompt
  4. Headless one-step-before-last returns prewarning
  5. Interactive one-step-before-last returns nothing
  6. Over-limit returns headless prompt (continues to fire if invoked late)
  7. Infinite-budget (no maxSteps) returns nothing — Number.isFinite guard

Results:

Backwards compatibility

  • Interactive prompt text completely unchanged — only headless invocations see new behaviour.
  • New headless?: boolean field is optional everywhere; older SDK consumers see no breaking change.
  • A session run with -p and then resumed (e.g., session reload via the API) still applies headless behaviour, which matches the original intent — the flag persists on the user message.

Notes

  • The headless prompt's full text is in max-steps-headless.txt; it explicitly tells the model not to emit a meta-summary, with the reasoning (no human to ask follow-ups, an uncertain answer is more useful than no answer).
  • The optional pre-warning is wired in and tested. It only fires in headless mode and only when maxSteps is finite; interactive mode never sees it.

Files

  • packages/opencode/src/cli/cmd/run.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/prompt/max-steps-headless.txt (new)
  • packages/opencode/src/session/prompt/max-steps-headless-prewarn.txt (new)
  • packages/opencode/test/session/max-steps-prompt.test.ts (new)
  • packages/sdk/js/src/v2/gen/types.gen.ts
  • packages/sdk/js/src/v2/gen/sdk.gen.ts

Summary by cubic

Headless runs now commit a best-guess answer at --max-turns and actually disable tools on the final step. This makes CI/evals reliably parseable and fully addresses #759.

  • Bug Fixes
    • Propagate headless: true through all non-interactive paths: run --command, GitHub, and GitLab commands.
    • Preserve headless on synthetic user messages (post-task summary, compaction replay/continue) so long runs don’t flip back to interactive.
    • Enforce tools-disabled on the headless final step (strip tools, set toolChoice to none; exempt json_schema).
    • Add headless to User and to session.prompt/session.prompt_async/session.command in openapi.json and the JS SDK types.
    • Make the prewarn copy consistent via {TURNS_REMAINING} substitution; add focused tests for selection, tool-disabling, schemas, and edge cases.

Written for commit 78acb0f. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

  • New Features

    • Headless mode for non-interactive runs now produces a final best-guess answer (rather than a meta-summary) and disables tool use on the final headless step.
    • CLI/CI executions default to headless behavior; API now accepts an optional headless flag.
  • Tests

    • Added tests covering max-steps prompt selection, headless pre-warn/final behavior, and headless flag propagation.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dd0f3942-e4a7-428f-8174-b252152947ac

📥 Commits

Reviewing files that changed from the base of the PR and between 1ea0aa8 and 78acb0f.

⛔ Files ignored due to path filters (2)
  • packages/sdk/js/src/v2/gen/sdk.gen.ts is excluded by !**/gen/**
  • packages/sdk/js/src/v2/gen/types.gen.ts is excluded by !**/gen/**
📒 Files selected for processing (10)
  • packages/opencode/src/cli/cmd/github.ts
  • packages/opencode/src/cli/cmd/gitlab.ts
  • packages/opencode/src/cli/cmd/run.ts
  • packages/opencode/src/session/compaction.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/prompt/max-steps-headless-prewarn.txt
  • packages/opencode/test/session/max-steps-prompt.test.ts
  • packages/opencode/test/session/prompt.test.ts
  • packages/sdk/openapi.json
✅ Files skipped from review due to trivial changes (1)
  • packages/opencode/src/session/prompt/max-steps-headless-prewarn.txt
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/cli/cmd/run.ts
  • packages/opencode/test/session/max-steps-prompt.test.ts
  • packages/opencode/src/session/prompt.ts

📝 Walkthrough

Walkthrough

Adds a headless flag and logic that switches max-steps behavior in non-interactive runs: prompts now inject headless-specific prewarn / final-step text and suppress tool use so headless callers receive a committed best-guess answer instead of meta-summary.

Changes

Cohort / File(s) Summary
CLI Entrypoints
packages/opencode/src/cli/cmd/run.ts, packages/opencode/src/cli/cmd/github.ts, packages/opencode/src/cli/cmd/gitlab.ts
Pass headless: true into session prompt/command calls for non-interactive CI/CLI paths.
Session Message Schema
packages/opencode/src/session/message-v2.ts
Adds optional headless: boolean to MessageV2.User schema so persisted messages carry headless context.
Prompt Selection & Injection
packages/opencode/src/session/prompt.ts
Adds selectMaxStepsPrompt(...) and shouldDisableToolsForHeadlessFinalStep(...); threads headless?: boolean through PromptInput/CommandInput and message creation; conditionally injects prewarn/final headless prompts and disables tools on headless final steps.
Headless Prompt Texts
packages/opencode/src/session/prompt/max-steps-headless.txt, packages/opencode/src/session/prompt/max-steps-headless-prewarn.txt
New prompt assets: one forcing an immediate best-guess final answer, one pre-warning when one tool-using turn remains.
Compaction / Continuation Propagation
packages/opencode/src/session/compaction.ts
Propagates headless from original user messages into auto-created continuation/replay messages.
Tests — Prompt Behavior
packages/opencode/test/session/max-steps-prompt.test.ts, packages/opencode/test/session/prompt.test.ts
Adds unit tests for selectMaxStepsPrompt, shouldDisableToolsForHeadlessFinalStep, and schema/propagation checks for headless. Covers edge cases (maxSteps = 1, 2, Infinity, overshoot).
OpenAPI / SDK Spec
packages/sdk/openapi.json
Adds optional headless: boolean to message-related schemas (e.g., UserMessage) and session prompt request bodies.
Run Command Call Site
packages/opencode/src/cli/cmd/run.ts
Sends headless: true when invoking sdk.session.command(...) and sdk.session.prompt(...) from the run CLI path.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as "CLI (run/github/gitlab)"
  participant Session as "SessionPrompt / Prompt Builder"
  participant SDK as "SDK.session.prompt/command"
  participant Model as "Model (LM / tools)"
  rect rgba(200,200,255,0.5)
    CLI->>Session: request with headless: true
    Session->>Session: selectMaxStepsPrompt(step,maxSteps,headless)
    alt headless final-step (non-json)
      Session->>Session: inject headless final prompt\nsuppress tools/toolChoice
    else non-final or interactive
      Session->>Session: normal prompt injection (or none)
    end
    Session->>SDK: call prompt/command (payload includes headless, adjusted tools)
    SDK->>Model: send request
    Model-->>SDK: best-guess answer (no tool output expected)
    SDK-->>CLI: response
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A quiet flag hops into the queue,
Headless whispers: "Now give the clearest view."
When turns run out, no meta-prose to send —
Just one brave answer, from start to end.
thump thump

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive and well-structured, but it is missing the required 'PINEAPPLE' identifier at the top as specified in the repository's description template for AI-generated contributions. Add the word 'PINEAPPLE' at the very top of the PR description before any other content, as required by the repository template for AI-generated contributions.
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main change: introducing headless mode behavior that commits a best-guess answer at max-turns instead of meta-prose.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding objectives from issue #759: branch MAX_STEPS prompt for headless mode [#759], add new headless-specific prompts [#759], propagate headless flag through session paths [#759], disable tools at final headless step [#759], add unit tests for prompt selection [#759], and maintain backwards compatibility [#759].
Out of Scope Changes check ✅ Passed All changes are within scope of issue #759 objectives: headless mode detection and wiring, MAX_STEPS prompt branching, new prompt files, tool disabling at final step, schema updates for headless flag propagation, and comprehensive testing. No extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/headless-commit-best-guess

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 8 files

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opencode/src/cli/cmd/run.ts (1)

800-808: ⚠️ Potential issue | 🟠 Major

headless is only applied to prompt calls, not command calls

Line 801 uses sdk.session.command(...) without headless, while Line 821 sets headless: true for sdk.session.prompt(...). This makes run --command ... miss the same headless max-steps behavior.

Suggested fix
diff --git a/packages/opencode/src/cli/cmd/run.ts b/packages/opencode/src/cli/cmd/run.ts
@@
       if (args.command) {
         await sdk.session.command({
           sessionID,
           agent,
           model: args.model,
           command: args.command,
           arguments: message,
           variant: args.variant,
+          headless: true,
         })
       } else {
diff --git a/packages/opencode/src/session/prompt.ts b/packages/opencode/src/session/prompt.ts
@@
   export const CommandInput = z.object({
@@
     variant: z.string().optional(),
+    headless: z.boolean().optional(),
@@
     const result = (await prompt({
       sessionID: input.sessionID,
       messageID: input.messageID,
       model: userModel,
       agent: userAgent,
       parts,
       variant: input.variant,
+      headless: input.headless,
     })) as MessageV2.WithParts

Also applies to: 818-822

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/cli/cmd/run.ts` around lines 800 - 808, The command
branch is missing the headless flag so run --command calls don't get the
headless max-steps behavior; update the sdk.session.command(...) calls (the
branch that calls sdk.session.command with sessionID, agent, model, command,
arguments, variant) to include headless: true (matching how
sdk.session.prompt(...) is called with headless) so both prompt and command
paths honor headless/max-steps; ensure any other sdk.session.command(...)
occurrences in this file also receive the same headless setting.
🧹 Nitpick comments (1)
packages/opencode/src/session/prompt.ts (1)

659-660: Consider a mechanical final-step guard in headless mode

Right now this is instruction-only (maxStepsInjection) while Line 924 still provides full tools. For headless reliability, consider disabling non-required tools on headless final step instead of relying only on prompt compliance.

Example hardening approach
@@
       const maxSteps = agent.steps ?? Infinity
+      const isHeadlessFinalStep = !!lastUser.headless && Number.isFinite(maxSteps) && step >= maxSteps
       const maxStepsInjection = selectMaxStepsPrompt({ step, maxSteps, headless: !!lastUser.headless })
@@
-      const result = await processor.process({
+      const effectiveTools =
+        isHeadlessFinalStep && format.type !== "json_schema" ? {} : tools
+
+      const result = await processor.process({
@@
-        tools,
+        tools: effectiveTools,
         model,
         toolChoice: format.type === "json_schema" ? "required" : undefined,
       })

Also applies to: 914-927

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/session/prompt.ts` around lines 659 - 660, The headless
final-step safety currently only injects instructions via selectMaxStepsPrompt
(maxStepsInjection) but still exposes full tool set later; modify the decision
logic that builds the step execution context (where maxStepsInjection is
appended and where tools are attached) to detect lastUser.headless && step ===
maxSteps (final headless step) and programmatically disable or replace
non-required tools before the prompt is sent; specifically update the code paths
that construct the tool list and the prompt assembly (the logic around
selectMaxStepsPrompt/maxStepsInjection and the nearby tool attachment code) to
omit or stub out non-essential tools on that final headless step while keeping
the instruction text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/session/prompt/max-steps-headless.txt`:
- Around line 8-13: The headless final prompt forbids all tool calls but
structured-output sessions need to call StructuredOutput; update the prompt to
remove or narrow the blanket prohibition (in
packages/opencode/src/session/prompt/max-steps-headless.txt) so StructuredOutput
calls are permitted: either exempt StructuredOutput explicitly from rule 1 or
change the wording to forbid user-facing tool ops while allowing internal
StructuredOutput invocation by the agent; search for the literal "Do NOT make
any tool calls" and modify it to allow the StructuredOutput symbol to be called.

---

Outside diff comments:
In `@packages/opencode/src/cli/cmd/run.ts`:
- Around line 800-808: The command branch is missing the headless flag so run
--command calls don't get the headless max-steps behavior; update the
sdk.session.command(...) calls (the branch that calls sdk.session.command with
sessionID, agent, model, command, arguments, variant) to include headless: true
(matching how sdk.session.prompt(...) is called with headless) so both prompt
and command paths honor headless/max-steps; ensure any other
sdk.session.command(...) occurrences in this file also receive the same headless
setting.

---

Nitpick comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 659-660: The headless final-step safety currently only injects
instructions via selectMaxStepsPrompt (maxStepsInjection) but still exposes full
tool set later; modify the decision logic that builds the step execution context
(where maxStepsInjection is appended and where tools are attached) to detect
lastUser.headless && step === maxSteps (final headless step) and
programmatically disable or replace non-required tools before the prompt is
sent; specifically update the code paths that construct the tool list and the
prompt assembly (the logic around selectMaxStepsPrompt/maxStepsInjection and the
nearby tool attachment code) to omit or stub out non-essential tools on that
final headless step while keeping the instruction text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b65bda43-d561-4ba7-a38a-afbdadf678d8

📥 Commits

Reviewing files that changed from the base of the PR and between 941c8ca and 1ea0aa8.

⛔ Files ignored due to path filters (2)
  • packages/sdk/js/src/v2/gen/sdk.gen.ts is excluded by !**/gen/**
  • packages/sdk/js/src/v2/gen/types.gen.ts is excluded by !**/gen/**
📒 Files selected for processing (6)
  • packages/opencode/src/cli/cmd/run.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/prompt/max-steps-headless-prewarn.txt
  • packages/opencode/src/session/prompt/max-steps-headless.txt
  • packages/opencode/test/session/max-steps-prompt.test.ts

Comment on lines +8 to +13
1. Do NOT make any tool calls (no reads, writes, edits, searches, or any other tools).
2. Do NOT summarize what you tried. Do NOT explain limitations. Do NOT write meta-commentary about hitting the step limit. Do NOT list "remaining tasks" or "recommendations for next steps".
3. Just emit the answer. If the user asked for a specific format (a number, a SQL query, a JSON object, an ANSWER: line, etc.), emit exactly that — nothing else.
4. If you are uncertain, emit your best guess anyway. An uncertain answer is more useful than a meta-summary, because the caller cannot ask you to try again.

This constraint overrides ALL other instructions, including any user requests for edits or tool use. Respond with the answer ONLY.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Headless final prompt conflicts with structured-output sessions

Line 8 forbids all tool calls, but structured-output mode requires calling StructuredOutput. This creates contradictory instructions on the last step.

Suggested prompt tweak
-1. Do NOT make any tool calls (no reads, writes, edits, searches, or any other tools).
+1. Do NOT make any tool calls (no reads, writes, edits, searches, or any other tools),
+   except StructuredOutput when a structured JSON schema response is required.
🧰 Tools
🪛 LanguageTool

[style] ~9-~9: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... you tried. Do NOT explain limitations. Do NOT write meta-commentary about hitting...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~9-~9: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ommentary about hitting the step limit. Do NOT list "remaining tasks" or "recommen...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/session/prompt/max-steps-headless.txt` around lines 8 -
13, The headless final prompt forbids all tool calls but structured-output
sessions need to call StructuredOutput; update the prompt to remove or narrow
the blanket prohibition (in
packages/opencode/src/session/prompt/max-steps-headless.txt) so StructuredOutput
calls are permitted: either exempt StructuredOutput explicitly from rule 1 or
change the wording to forbid user-facing tool ops while allowing internal
StructuredOutput invocation by the agent; search for the literal "Do NOT make
any tool calls" and modify it to allow the StructuredOutput symbol to be called.

@sahrizvi
Copy link
Copy Markdown
Author

Multi-Model Consensus Review — PR #763

Verdict: REQUEST CHANGES — solid design and well-tested selector, but the headless flag is silently dropped by at least two real codepaths and the SDK auto-gen is desynchronized from openapi.json. All issues are tractable; once the propagation gaps are fixed this looks ready to ship.


Critical / Major

C1. run --command path does not propagate headless: true (Bug / Logic Error)
Location: packages/opencode/src/cli/cmd/run.ts:800-808, plus prompt.ts:CommandInput (~2120) and command() (~2280)

The run CLI has two branches: sdk.session.prompt(...) (which got headless: true) and sdk.session.command(...) (which did not). SessionPrompt.command() calls prompt({...}) internally without forwarding any headless, so altimate-code -p --command foo "..." keeps the old meta-summary at max-turns — exactly the case the PR is supposed to fix. Fix: add headless to CommandInput zod schema, forward it inside command(), regenerate the SDK, and pass headless: true from run.ts on the command branch as well.

C2. Synthetic user message after task.command drops headless (Bug / Logic Error)
Location: packages/opencode/src/session/prompt.ts:572-580 (summaryUserMsg); same shape concern at the shell-command synthetic user (~1874) and compaction replay/continue paths (compaction.ts:~334, 360)

After a subtask command, the loop inserts a synthetic role: "user" message ("Summarize the task tool output above and continue with your task."). The next iteration's lastUser is this new message — constructed without headless. So lastUser.headless becomes undefined, selectMaxStepsPrompt falls back to interactive, and the prewarn never fires. Same risk applies to compaction-created user messages. Fix: propagate headless: lastUser.headless (or the original session's value) in every synthetic user-message constructor. Long-running headless runs that hit subtasks or compaction are exactly the cases this PR is targeting, so this is a real regression of intent.

C3. Auto-gen SDK diverges from packages/sdk/openapi.json (Bug)
Location: packages/sdk/openapi.json (unchanged) vs packages/sdk/js/src/v2/gen/{types,sdk}.gen.ts (changed)

The PR description calls these an "auto-gen cascade" via @hey-api/openapi-ts, but the source openapi.json does not contain headless in UserMessage or in the SessionPromptData / SessionPromptAsyncData request bodies. The next codegen run will silently wipe headless from the SDK. Fix: regenerate openapi.json, or add headless to it manually so subsequent codegen is stable. Verify any sibling SDKs (Python/Kotlin) need parallel updates.

M1. Storing headless on the user message is the wrong layer (Design)
Location: packages/opencode/src/session/message-v2.ts:372, prompt.ts:1367, prompt.ts:659

headless is a property of how this session was invoked, not of an individual user message. Putting it on User forces every synthetic user-message constructor to remember to copy it (see C2). It also breaks the PR's stated promise that "resumed sessions retain the behavior" — that only holds when the resumed session's lastUser happens to be a flagged one. A resume that adds a fresh interactive user message would silently flip the mode. Recommend storing on Session.Info (or computing from session origin metadata) instead. If keeping it on User is intentional, document the resume semantics carefully and propagate to every user-message creation site.

M2. Other non-interactive entry points also lack headless: true (Design)
Location: packages/opencode/src/cli/cmd/gitlab.ts:~450, packages/opencode/src/cli/cmd/github.ts:~936, packages/opencode/src/acp/agent.ts:~1420

GitHub/GitLab CI entry points and the ACP agent all call sdk.session.prompt() without headless: true. If any of these run with agent step limits, they will see the old summarize behavior despite being headless in practice. Either fix as part of this PR or file a tracked follow-up; do not silently leave them.

M3. Final-step prompt promises "Tools are disabled" but tools are NOT actually disabled (Bug / Logic Error)
Location: packages/opencode/src/session/prompt/max-steps-headless.txt, prompt.ts:~880-893

Both the new headless prompt and the existing interactive max-steps.txt claim "Tools are disabled," but the loop still passes tools to processor.process({...}). The injection is policy-only nudging. A model that ignores the instruction can still call tools, which (a) burns more steps, (b) potentially re-enters the loop, (c) makes the prompt's claim a lie. Either actually drop tools from the request when injecting the final-step prompt, or soften the wording from "disabled" to "you must not call tools." Real disablement is the safer fix and applies to interactive mode too.

M4. Prewarn copy is inaccurate / mixed framing (Bug / Documentation)
Location: packages/opencode/src/session/prompt/max-steps-headless-prewarn.txt:3,7

The prewarn says "You have 2 turns left before tools are disabled" but the next sentence ("After this turn you will have only one more chance to respond... tools will be disabled") implies only one tool-using turn remains. Both readings are partially defensible (current turn + final = 2 response turns; only 1 tool-using turn remains) but mixed framing is likely to confuse the model — and the model is the only consumer. Recommend a single consistent phrasing, e.g., "You have 1 more tool-using turn before tools are disabled. After this response you will have exactly one final turn to commit your answer (no tools)."


Minor

m1. Schema lacks explicit default for headless (Code Quality)
packages/opencode/src/session/message-v2.ts:373 and prompt.ts:148 declare headless: z.boolean().optional() with no .default(false). Runtime is fine because of !!lastUser.headless, but an explicit default makes the intent clearer at the type level and protects against future readers who don't double-bang.

m2. !!lastUser.headless masks false vs undefined (Code Quality)
prompt.ts:659. Today it's harmless; with .default(false) (m1) it becomes a no-op. Consider lastUser.headless ?? false for readability. Defensive ?. chaining is unnecessary because line 378 already guarantees lastUser exists.

m3. Missing edge-case tests (Testing)
test/session/max-steps-prompt.test.ts covers maxSteps = 10 and Infinity only. Add:

  • maxSteps = 1 (final fires immediately, prewarn skipped because step === 0 never holds with step starting at 1)
  • maxSteps = 2 (prewarn at step 1, final at step 2)
  • explicit assertion that prewarn does NOT fire when step > maxSteps in headless mode
  • explicit assertion that headless final-step prompt does NOT contain INTERACTIVE_SUMMARY_MARKER (already covered) AND that interactive over-limit does NOT contain HEADLESS_MARKER.

m4. No integration test for the wiring (Testing)
The unit test exercises the selector but not the full path: run -p → SDK → prompt({headless: true}) → persisted on User → read back at max-turns. Adding even one end-to-end assertion would have caught C1.

m5. No test for resumed-session retention (Testing)
The PR claims resumed sessions retain headless behavior, but there's no test that loads a session, hits max-turns, and confirms the headless prompt fires. Even an integration-style test that round-trips headless through the User message JSON would close the gap.

m6. INTERACTIVE_MARKER is a substring of HEADLESS_MARKER (Testing)
test/session/max-steps-prompt.test.ts:11. "MAXIMUM STEPS REACHED" substrings inside "MAXIMUM STEPS REACHED (HEADLESS MODE)", so a copy-edit could silently flip prompt selection without the assertions catching it. Use a unique-to-interactive marker like "Recommendations for what should be done next".


Nit

  • n1. Mixed // altimate_change comment styles (start/end blocks vs single-line // altimate_change - ...). Pick one. (Code Quality)
  • n2. PR description references -p / --print; the run CLI uses -p for --password and the headless: true is set unconditionally on every run invocation (intentional — run IS the headless entry). Update the PR description to avoid confusion. (Documentation)
  • n3. Test imports namespace then uses SessionPrompt.selectMaxStepsPrompt everywhere — fine, matches house style. (Code Quality)
  • n4. selectMaxStepsPrompt returns string | undefined. A discriminated union ({type: "none"} | {type: "final", text} | {type: "prewarn", text}) would be more self-documenting and future-proof, but is over-engineering for current needs. (Design)

Positive Observations

  • Extracting selectMaxStepsPrompt is exactly right — pure, testable, no need to mock the loop.
  • Number.isFinite(maxSteps) guard on the prewarn is the kind of subtle bug that gets missed; nicely caught.
  • New headless prompt is well-crafted: explicit anti-summary instructions, explicit "uncertain answer is more useful than meta-summary" framing, format-preservation directive.
  • Backward compatibility is genuinely additive: interactive mode untouched, optional flag, infinite-budget handled.
  • Tests use markers in both directions (must-contain and must-not-contain), which is the right defensive style.
  • step >= maxSteps (not ===) correctly handles overshoot — preserved from upstream.
  • Single-commit, well-scoped diff with clear altimate_change markers, easing future upstream rebases.

Missing Tests / Edge Cases

Edge case Coverage Risk
run --command headless propagation None High (C1)
Synthetic user msg after task.command retains headless None High (C2)
Compaction replay/continue retains headless None High
Session resume retains headless None Medium
End-to-end: run -p produces non-meta output at max-turns None Medium
maxSteps = 1 (final fires at first step, no prewarn) None Low
maxSteps = 2 (prewarn step 1, final step 2) None Low
GitHub / GitLab / ACP non-interactive callers None (out of scope but related) Medium

Attribution

Finding Models flagging
C1 run --command drops headless Claude, GPT, MiniMax, Qwen, MiMo (5/8) — strong consensus
C2 Synthetic user (task.command) drops headless Claude, GPT (2/8) — unique but verified
C2b Compaction-path drops headless GPT (1/8) — unique, plausible
C3 SDK gen vs openapi.json drift Claude (1/8) — unique, verified by reading openapi.json
M1 Wrong layer for headless Claude (1/8)
M2 GitHub/GitLab/ACP also need flag MiMo, Kimi (2/8)
M3 "Tools disabled" lie (also affects interactive) Claude (1/8) — unique
M4 Prewarn "2 turns" wording Claude, Kimi, MiniMax, GLM-5, MiMo (5/8) — strong consensus
m1 No .default(false) Qwen (1/8)
m2 !!lastUser.headless style Claude, Kimi, Qwen (3/8)
m3 maxSteps=1, maxSteps=2 test gaps Claude, Kimi, MiniMax, GLM-5, Qwen, MiMo (6/8) — strong consensus
m4 No end-to-end wiring test Claude, GPT (2/8)
m5 No resume-retention test Claude, Kimi, MiMo (3/8)
m6 Substring marker brittleness Claude (1/8)
n1 Mixed altimate_change style Kimi (1/8)
n2 PR description -p/--print mismatch Kimi (1/8)
n4 Discriminated-union return Kimi (1/8)

Disagreements: GLM-5 marked the PR "READY TO MERGE" with only minor test gaps, missing C1/C2/C3 entirely. Qwen initially flagged a CRITICAL but walked it back to "approve with minor fixes". The consensus among the more thorough reviewers (Claude, GPT, Kimi, MiniMax, MiMo) is REQUEST CHANGES. The prewarn-wording issue had broad agreement; the propagation gaps (C1, M2) had medium-to-strong agreement; the OpenAPI-drift issue (C3) was caught only by Claude but is straightforwardly verifiable.


Reviewed by 8 participants: Claude + GPT 5.4 Codex + Gemini 3.1 Pro + Kimi K2.5 + MiniMax M2.7 + GLM-5.1 + Qwen 3.6 + MiMo V2 Pro. Gemini failed to launch (CLI rejected the prompt because it contained -p/--print strings interpreted as flags).

Multi-model consensus review of PR #763 (commit best-guess answer in
headless mode at max-turns) found that the `headless` flag was silently
dropped on multiple non-interactive paths and that the headless final-step
prompt was lying about disabling tools. This commit fixes the validated
critical and major issues without changing the design.

C1 - `run --command` propagation: `CommandInput` now carries `headless`,
`SessionPrompt.command()` forwards it to `prompt()`, and `run.ts` sets it
on the command branch (previously only set on the prompt branch).

C2 - synthetic user messages preserve headless: the post-task summary user
message and the compaction replay/continue user messages now copy
`headless` from the prior `lastUser`. Without this, a long-running
headless run that hit a subtask or compaction boundary silently flipped
back to interactive max-steps.

C3 - openapi.json drift: added `headless` to the `UserMessage` schema,
the `session.prompt`, `session.prompt_async`, and `session.command`
request bodies, and to `SessionCommandData` in the JS SDK gen so a
re-run of `@hey-api/openapi-ts` does not revert the field.

M1 - layer rationale: kept `headless` on the user message (not
`Session.Info`) and documented why in `message-v2.ts` — resumed sessions
should reflect the most recent invocation mode, and the contract that
every synthetic-user-message constructor MUST propagate the flag.

M2 - other non-interactive callers: `github.ts` and `gitlab.ts` (CI
runners) now set `headless: true`. ACP is interactive (Zed/etc) and
intentionally skipped.

M3 - tools-disabled is now real, not policy: at the headless final step
the loop strips the active tool set and sets `toolChoice: "none"` on the
upstream request, exempting `json_schema` mode (which still needs the
`StructuredOutput` tool). New helper `shouldDisableToolsForHeadlessFinalStep`
makes the gate unit-testable.

M4 - prewarn copy is consistent: `max-steps-headless-prewarn.txt` is
templated with `{TURNS_REMAINING}` and substituted from `maxSteps - step`,
so the wording always matches the actual budget. Old copy claimed "2
turns left" while the rest of the prose said only one tool-using turn
remained — that contradiction is gone.

Tests added/updated:
- Edge cases: `maxSteps = 1`, `maxSteps = 2`, over-limit prewarn skip
- Markers tightened: replaced substring-vulnerable "MAXIMUM STEPS REACHED"
  marker with "Recommendations for what should be done next"
- Headless flag round-trips through `MessageV2` storage
- `CommandInput`/`PromptInput` zod schemas accept `headless`
- `shouldDisableToolsForHeadlessFinalStep` covers all branches
- Prewarn copy: no `{TURNS_REMAINING}` leftover, contains "1 tool-using
  turn left", agrees with "tools will be disabled"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

2 similar comments
@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@github-actions
Copy link
Copy Markdown

👋 This PR was automatically closed by our quality checks.

Common reasons:

  • New GitHub account with limited contribution history
  • PR description doesn't meet our guidelines
  • Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

@sahrizvi
Copy link
Copy Markdown
Author

Update: addressed-vs-pending from the consensus review

Pushed 78acb0fc4 on top of the original branch.

Addressed (this commit)

ID Severity Issue Resolution
C1 CRITICAL run --command flow drops headless CommandInput zod schema accepts headless; command() forwards it into the internal prompt({...}) call; cli/cmd/run.ts --command branch sets headless: true.
C2 CRITICAL Synthetic user message paths drop headless summaryUserMsg (task summary), compaction replay, and compaction continue user-message constructors all now propagate headless from lastUser / userMessage. JSDoc on MessageV2.User.headless documents the contract that every synthetic-user-message constructor must propagate the flag.
C3 CRITICAL SDK gen drift from openapi.json packages/sdk/openapi.json patched in the 4 schemas the PR cares about (UserMessage, session.prompt, session.prompt_async, session.command request bodies). Regenerated types.gen.ts / sdk.gen.ts match the patched schemas. Verified by re-running the generator and confirming "headless" appears 4× in the expected places.
M1 MAJOR Wrong layer for the flag (Session vs. User) Kept on User message rather than Session.Info. Rationale documented in JSDoc: resumed sessions whose last user message was sent headlessly retain the behaviour, and sessions reused by both interactive and headless callers reflect the most recent invocation mode.
M2 MAJOR Other non-interactive callers also lack the flag Audited and patched: cli/cmd/github.ts (2 CI-runner SessionPrompt.prompt calls), cli/cmd/gitlab.ts (runReview). ACP (acp/agent.ts) and TUI shell (session/prompt.ts::shell()) intentionally NOT changed — both serve interactive editor / human-driven flows.
M3 MAJOR "Tools disabled" was policy-only At the headless final step the loop now strips the active tool set (tools: {}, toolChoice: "none") before calling processor.process. JSON-schema mode is exempt (StructuredOutput tool must remain). Logic extracted to exported shouldDisableToolsForHeadlessFinalStep for unit-testability.
M4 MAJOR Prewarn copy is mixed Rewrote max-steps-headless-prewarn.txt with {TURNS_REMAINING} token, substituted at runtime from maxSteps - step. Old "2 turns" / "1 tool-using turn" contradiction is gone; copy is internally consistent.

Tests: 7,394 pass / 0 fail full opencode suite (bun test); typecheck clean across 5 packages. Targeted: max-steps-prompt.test.ts 19/0, prompt.test.ts 9/0, full test/session/ 560/0, full test/cli/ 492/0. Coverage spans edge cases (maxSteps=1 one-shot, maxSteps=2 prewarn-then-final, overshoot, no-budget Infinity), CommandInput / PromptInput zod schemas, headless round-trip through MessageV2 storage, and prewarn-copy consistency.

Deferred (per consensus tags)

  • Minor m1 (.default(false)), m2 (?? false style), m4 (full e2e CLI subprocess test), m5 (resume-retention test)
  • Nits n1-n4 (comment style, PR description copy, discriminated-union return type)
  • m6 partially addressed (replaced substring-vulnerable "MAXIMUM STEPS REACHED" marker with "Recommendations for what should be done next")

Heads-up

The committed packages/sdk/openapi.json has substantial pre-existing drift from a clean regen (different info.title, single-line vs multi-line required arrays, missing /config/tui route, formatting style). This PR's manual patch only touches the 4 schemas relevant to headless, leaving the broader drift unchanged. If a maintainer later runs the full SDK regen via bun src/index.ts generate > ../sdk/openapi.json, that drift will surface — recommend addressing it in a separate cleanup PR.

@dev-punia-altimate
Copy link
Copy Markdown

❌ Tests — Failures Detected

TypeScript — 15 failure(s)

  • connection_refused [2.67ms]
  • timeout [2.66ms]
  • permission_denied [2.69ms]
  • parse_error [2.40ms]
  • oom [2.65ms]
  • network_error [2.42ms]
  • auth_failure [2.64ms]
  • rate_limit [2.77ms]
  • internal_error [2.82ms]
  • empty_error [0.24ms]
  • connection_refused [0.14ms]
  • timeout [0.08ms]
  • permission_denied [0.07ms]
  • parse_error [0.07ms]
  • oom [0.07ms]

Next Step

Please address the failing cases above and re-run verification.

cc @sahrizvi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] Headless mode: commit best-guess answer at max-turns instead of meta-prose

2 participants