Skip to content

🤖 feat: stream advisor reasoning in tool UI#3430

Merged
ThomasK33 merged 1 commit into
mainfrom
advisor-streaming-n9c5
May 30, 2026
Merged

🤖 feat: stream advisor reasoning in tool UI#3430
ThomasK33 merged 1 commit into
mainfrom
advisor-streaming-n9c5

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Surface nested advisor model reasoning deltas in the advisor tool UI as transient, tool-scoped Thinking output while keeping advisor Advice separate and final tool results clean.

Background

GPT-5 Pro advisor calls can emit reasoning chunks before final advice, but the previous advisor streaming path only forwarded text/advice chunks. This made Debug Logs show advisor activity while the main advisor tool card appeared idle during long advisor runs.

Implementation

  • Added a UI-only advisor-reasoning-output stream event and shared schema/type plumbing.
  • Forward nested advisor reasoning-delta chunks from the backend advisor tool without appending them to final advice.
  • Store live advisor reasoning transiently by toolCallId, clear it on advisor tool completion, and render it under a separate Thinking section.
  • Forward advisor reasoning progress through ACP as in-progress tool updates.
  • Added backend, store, and UI coverage for the new event path.

Validation

  • bun test src/node/services/tools/advisor.test.ts
  • bun test src/browser/features/Tools/AdvisorToolCall.test.tsx
  • bun test src/browser/stores/WorkspaceStore.test.ts
  • make static-check
  • Dogfooded in an isolated dev-server sandbox with a local OpenAI-compatible mock provider emitting reasoning_content through the real backend advisor stream path.
    • Key evidence captured under dogfood-output/screenshots/ and dogfood-output/videos/ in the implementation workspace.

Risks

Low-to-medium risk, scoped to live advisor tool progress rendering and stream-event plumbing. The final advisor tool result remains unchanged, and reasoning is transient UI state rather than persisted tool output.


📋 Implementation Plan

Plan: surface advisor GPT-5 Pro thinking chunks in the Mux UI

Investigation summary

Verified root cause

The advisor live-streaming path exists, but it only forwards advisor answer text chunks. It drops nested advisor reasoning-delta / thinking chunks before they ever reach the main UI.

Evidence from read-only exploration:

  • src/node/services/tools/advisor.ts
    • createAdvisorTool() runs a nested streamText() call for the advisor model.
    • getAdvisorTextDelta() only accepts chunk types "text-delta" and "text".
    • The advisor onChunk callback emits advisor-output only when getAdvisorTextDelta() returns text.
    • Result: GPT-5 Pro reasoning-delta chunks are ignored for UI purposes.
  • src/common/orpc/schemas/stream.ts
    • AdvisorOutputEventSchema contains only { text }; it has no reasoning/thinking channel.
    • Normal assistant reasoning-delta events are message-scoped, not advisor tool-call-scoped.
  • src/browser/stores/WorkspaceStore.ts
    • The browser tracks only transient liveAdvisorOutput: { text, timestamp } and liveAdvisorPhase keyed by toolCallId.
    • It appends only advisor-output.text, then clears that transient advisor state on tool-call-end.
  • src/browser/features/Tools/AdvisorToolCall.tsx
    • The running advisor UI subscribes to useAdvisorToolLiveOutput() and renders live content only as an Advice section when text exists.
    • There is no place to render advisor-scoped reasoning/thinking chunks.
  • src/node/services/devToolsMiddleware.ts
    • Debug/devtools capture records provider chunks including reasoning-start, reasoning-delta, and reasoning-end, which explains why debug logs show activity while the main advisor UI remains visually idle.

So the inconsistency is not that the frontend misses already-emitted advisor chunks. The backend advisor tool never emits reasoning chunks into the chat UI event stream, and the frontend schema/state cannot represent them anyway.

Recommended approach

Implement a separate tool-scoped advisor reasoning live stream and render it separately from final/live advice.

Net product LoC estimate: ~140-230 LoC.

Why this is the best default:

  • Preserves the meaning of existing advisor-output as final-answer/advice text.
  • Avoids misleadingly mixing thinking text into the Advice markdown block.
  • Mirrors the existing main-assistant reasoning-delta support while respecting that advisor streams are nested tool-call streams keyed by toolCallId, not assistant messageId.
  • Keeps reasoning transient by default, matching existing advisor live output behavior.

Implementation plan

Phase 1: Add advisor reasoning event plumbing

  1. Add a new UI-only stream event schema in src/common/orpc/schemas/stream.ts, for example:
    • type: "advisor-reasoning-output"
    • workspaceId
    • toolCallId
    • text or delta
    • timestamp
  2. Include the new event in the stream event union and any event helper/type guard lists.
  3. Add backend forwarding for the event wherever advisor-output / advisor-phase are currently forwarded, including src/node/services/agentSession.ts.
  4. If ACP clients should receive this progress too, add a droppable in-progress translation in src/node/acp/streamTranslator.ts; otherwise explicitly leave ACP unchanged and document it in the code review notes.

Quality gate: typecheck should catch every union exhaustiveness miss before moving on.

Phase 2: Emit advisor reasoning chunks from the nested advisor stream

  1. In src/node/services/tools/advisor.ts, add a companion extractor for reasoning chunks, analogous to the main stream handling in src/node/services/streamManager.ts:
    • accept chunk.type === "reasoning-delta"
    • extract text ?? delta ?? textDelta ?? ""
    • assert/guard that extracted data is a string before emitting
  2. Add emitAdvisorReasoningOutput() next to emitAdvisorOutput().
  3. In the advisor onChunk, emit:
    • existing advisor-output for text / text-delta
    • new advisor-reasoning-output for reasoning-delta
  4. Do not append reasoning deltas to streamedAdviceChunks; final tool results should remain final advice only.

Quality gate: update/add src/node/services/tools/advisor.test.ts so a simulated reasoning-delta chunk produces an advisor reasoning event and does not pollute the final advice result.

Phase 3: Store advisor reasoning as transient UI state

  1. In src/browser/stores/WorkspaceStore.ts, add transient advisor reasoning state keyed by toolCallId, parallel to liveAdvisorOutput.
  2. Process advisor-reasoning-output by appending text to that transient buffer and bumping streaming state.
  3. Clear advisor reasoning state on advisor tool-call-end alongside liveAdvisorOutput and liveAdvisorPhase.
  4. Add a selector/hook such as getAdvisorToolLiveReasoning() / useAdvisorToolLiveReasoning().

Quality gate: add store coverage showing reasoning chunks accumulate while running and are cleared after the advisor tool call completes.

Phase 4: Render live advisor thinking in the tool UI

  1. In src/browser/features/Tools/AdvisorToolCall.tsx, subscribe to the new live advisor reasoning hook.
  2. While the advisor is executing and no final result exists, render accumulated reasoning in a distinct section such as Thinking or Reasoning.
  3. Keep existing live/final Advice rendering unchanged for advisor text chunks and final result text.
  4. Prefer the simplest UI:
    • show phase/elapsed status in the header as today
    • show Thinking above Advice when reasoning exists
    • do not persist completed reasoning after the final result unless explicitly requested later

Quality gate: update src/browser/features/Tools/AdvisorToolCall.test.tsx to verify live reasoning renders under a separate label and live advice still renders as advice.

Phase 5: Validate with targeted and broad checks

Run, at minimum:

  1. Targeted backend tests for advisor stream event emission.
  2. Targeted frontend/store/tool-call tests for advisor reasoning rendering.
  3. make typecheck.
  4. make lint or make static-check if time permits / before PR submission.

Do not claim success until these pass, or report exact blockers.

Dogfooding plan

Skill guidance applied:

  • Use make dev-server-sandbox so the dogfood run has an isolated temporary MUX_ROOT, free BACKEND_PORT / VITE_PORT, and no conflict with the user's live Mux instance.
  • Use the direct agent-browser binary, never npx agent-browser.
  • Before browser automation, load the installed CLI's current workflow with agent-browser skills get core.
  • Use agent-browser snapshot -i before interacting, re-snapshot after page changes because refs become stale, and prefer semantic waits (wait --load networkidle, wait --text, wait --url) over fixed sleeps except for human-paced video evidence.
  • Capture reproducible evidence as the dogfood skill requires: annotated screenshots plus a short video for the interactive advisor-streaming behavior.

Recommended dogfood flow:

  1. Start an isolated dev-server sandbox from the implementation worktree:
    • Prefer make dev-server-sandbox DEV_SERVER_SANDBOX_ARGS="--clean-projects" so provider config is available for GPT-5 Pro but test projects do not leak from the seed config.
    • Use SEED_MUX_ROOT=~/.mux-dev if the implementer needs to seed from a known dev config.
    • Use KEEP_SANDBOX=1 only when preserving the sandbox for debugging is useful.
  2. Note the sandbox's printed app URL / Vite URL and open it with a named agent-browser session:
    • agent-browser --session advisor-streaming open <sandbox-url>
    • agent-browser --session advisor-streaming wait --load networkidle
  3. Orient and record baseline state:
    • agent-browser --session advisor-streaming snapshot -i
    • agent-browser --session advisor-streaming screenshot --annotate dogfood-output/screenshots/initial.png
    • agent-browser --session advisor-streaming console
    • agent-browser --session advisor-streaming errors
  4. Configure/select an advisor model that emits reasoning chunks, e.g. GPT-5 Pro with high reasoning, in a disposable test workspace.
  5. Start video before reproducing the issue/fix:
    • agent-browser --session advisor-streaming record start dogfood-output/videos/advisor-reasoning-live.webm
  6. Trigger an advisor call from Plan Mode with a prompt likely to produce sustained reasoning.
  7. While the advisor call is still in flight, open Debug Logs side-by-side with the advisor tool call and verify:
    • Debug Logs show provider Thinking / reasoning-delta chunks.
    • The main advisor tool call simultaneously shows live Thinking / Reasoning, not only a blank pending card.
    • Live advice text, if/when emitted, appears separately from reasoning.
  8. Capture evidence at human-reviewable pacing:
    • screenshot while the advisor is still running and reasoning is visible in the main UI
    • screenshot with Debug Logs and advisor tool UI visible together
    • screenshot after final advice arrives showing final advice remains clean
    • stop the video with agent-browser --session advisor-streaming record stop
  9. Re-check agent-browser --session advisor-streaming console and agent-browser --session advisor-streaming errors; fix any new console/runtime errors before claiming success.
  10. Close the session with agent-browser --session advisor-streaming close after evidence is captured.

Dogfood acceptance evidence:

  • dogfood-output/screenshots/initial.png
  • annotated in-flight screenshot showing live advisor reasoning in the main tool UI
  • screenshot showing Debug Logs reasoning chunks and main UI reasoning visible at the same time
  • post-completion screenshot showing final advice is not polluted by reasoning text
  • dogfood-output/videos/advisor-reasoning-live.webm

Acceptance criteria

  • During an advisor GPT-5 Pro run that emits reasoning-delta chunks, the main Mux advisor tool UI shows live thinking/reasoning activity before final advice arrives.
  • Debug Logs and the main UI no longer disagree about whether advisor streaming activity is happening.
  • Advisor reasoning is visually separated from advisor advice.
  • Final advisor tool result remains the final advice text only; transient reasoning does not pollute persisted tool results.
  • Existing advisor live advisor-output behavior continues to work for text chunks.
  • Targeted backend and frontend tests cover the new reasoning event path.
  • Typecheck and relevant tests pass.

Alternatives considered

Alternative A: Reuse advisor-output for reasoning deltas

Net product LoC estimate: ~25-60 LoC.

Pros:

  • Smallest code change.
  • Would make something visible quickly.

Cons:

  • Mixes reasoning into an Advice section.
  • Risks final/live advice confusion.
  • Changes the semantic meaning of advisor-output and may surprise ACP/UI consumers.

Recommendation: avoid unless the goal is a temporary local debugging patch only.

Alternative B: Emit only reasoning heartbeat/progress, not text

Net product LoC estimate: ~60-110 LoC.

Pros:

  • Safer if raw provider reasoning text should not be exposed.
  • Fixes the “black box for 10-15 minutes” perception with lower privacy risk.

Cons:

  • Does not satisfy the expectation that visible thinking chunks should appear in the UI.
  • Debug logs would still contain richer content than the main UI.

Recommendation: use this only if product/privacy review decides advisor reasoning text must not be shown.

Alternative C: Persist advisor reasoning in final tool results

Net product LoC estimate: ~180-300 LoC.

Pros:

  • Completed advisor calls would retain the reasoning transcript.

Cons:

  • Larger history/schema change.
  • Potential privacy and storage concerns.
  • Not necessary to solve the in-flight black-box problem.

Recommendation: defer unless explicitly requested.

Risks and mitigations

  • Provider reasoning policy / chain-of-thought risk: only forward textual reasoning chunks that the provider API exposes through the same public stream shape already visible in debug logs; label them clearly and keep them transient. If this is not acceptable, choose Alternative B.
  • High-frequency UI updates: use the same buffering/batched update pattern as advisor-output; if ACP forwarding is added, make the event droppable under backpressure like other live progress events.
  • Ordering between reasoning and advice: separate buffers are simpler but do not preserve exact interleaving. That is acceptable for the minimal fix because the UI sections are semantically distinct.
  • Regression risk in existing advisor output: add tests ensuring current live advice text still appears and final advice is unchanged.

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh • Cost: $71.20

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Please review the advisor reasoning streaming changes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6687354fb1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/advisor.ts Outdated
Surface advisor reasoning deltas from nested advisor model calls as transient tool-scoped UI output, separate from advisor advice.

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$52.75`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=52.75 -->
@ThomasK33 ThomasK33 force-pushed the advisor-streaming-n9c5 branch from 6687354 to af21f81 Compare May 30, 2026 07:39
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed the advisor reasoning chunk feedback by accepting both AI SDK reasoning chunks and reasoning-delta chunks in the nested advisor stream extractor, with coverage for both shapes.

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Please take another look at the latest commit. The prior reasoning chunk feedback is addressed and CI is green.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue May 30, 2026
Merged via the queue into main with commit 6c29e2b May 30, 2026
24 checks passed
@ThomasK33 ThomasK33 deleted the advisor-streaming-n9c5 branch May 30, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant