🤖 feat: stream advisor reasoning in tool UI#3430
Conversation
|
@codex review Please review the advisor reasoning streaming changes. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6687354fb1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Surface advisor reasoning deltas from nested advisor model calls as transient tool-scoped UI output, separate from advisor advice. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$52.75`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=52.75 -->
6687354 to
af21f81
Compare
|
@codex review Addressed the advisor reasoning chunk feedback by accepting both AI SDK |
|
@codex review Please take another look at the latest commit. The prior reasoning chunk feedback is addressed and CI is green. |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Surface nested advisor model reasoning deltas in the advisor tool UI as transient, tool-scoped Thinking output while keeping advisor Advice separate and final tool results clean.
Background
GPT-5 Pro advisor calls can emit reasoning chunks before final advice, but the previous advisor streaming path only forwarded text/advice chunks. This made Debug Logs show advisor activity while the main advisor tool card appeared idle during long advisor runs.
Implementation
advisor-reasoning-outputstream event and shared schema/type plumbing.reasoning-deltachunks from the backend advisor tool without appending them to final advice.toolCallId, clear it on advisor tool completion, and render it under a separate Thinking section.Validation
bun test src/node/services/tools/advisor.test.tsbun test src/browser/features/Tools/AdvisorToolCall.test.tsxbun test src/browser/stores/WorkspaceStore.test.tsmake static-checkreasoning_contentthrough the real backend advisor stream path.dogfood-output/screenshots/anddogfood-output/videos/in the implementation workspace.Risks
Low-to-medium risk, scoped to live advisor tool progress rendering and stream-event plumbing. The final advisor tool result remains unchanged, and reasoning is transient UI state rather than persisted tool output.
📋 Implementation Plan
Plan: surface advisor GPT-5 Pro thinking chunks in the Mux UI
Investigation summary
Verified root cause
The advisor live-streaming path exists, but it only forwards advisor answer text chunks. It drops nested advisor
reasoning-delta/ thinking chunks before they ever reach the main UI.Evidence from read-only exploration:
src/node/services/tools/advisor.tscreateAdvisorTool()runs a nestedstreamText()call for the advisor model.getAdvisorTextDelta()only accepts chunk types"text-delta"and"text".onChunkcallback emitsadvisor-outputonly whengetAdvisorTextDelta()returns text.reasoning-deltachunks are ignored for UI purposes.src/common/orpc/schemas/stream.tsAdvisorOutputEventSchemacontains only{ text }; it has no reasoning/thinking channel.reasoning-deltaevents are message-scoped, not advisor tool-call-scoped.src/browser/stores/WorkspaceStore.tsliveAdvisorOutput: { text, timestamp }andliveAdvisorPhasekeyed bytoolCallId.advisor-output.text, then clears that transient advisor state ontool-call-end.src/browser/features/Tools/AdvisorToolCall.tsxuseAdvisorToolLiveOutput()and renders live content only as anAdvicesection when text exists.src/node/services/devToolsMiddleware.tsreasoning-start,reasoning-delta, andreasoning-end, which explains why debug logs show activity while the main advisor UI remains visually idle.So the inconsistency is not that the frontend misses already-emitted advisor chunks. The backend advisor tool never emits reasoning chunks into the chat UI event stream, and the frontend schema/state cannot represent them anyway.
Recommended approach
Implement a separate tool-scoped advisor reasoning live stream and render it separately from final/live advice.
Net product LoC estimate: ~140-230 LoC.
Why this is the best default:
advisor-outputas final-answer/advice text.Advicemarkdown block.reasoning-deltasupport while respecting that advisor streams are nested tool-call streams keyed bytoolCallId, not assistantmessageId.Implementation plan
Phase 1: Add advisor reasoning event plumbing
src/common/orpc/schemas/stream.ts, for example:type: "advisor-reasoning-output"workspaceIdtoolCallIdtextordeltatimestampadvisor-output/advisor-phaseare currently forwarded, includingsrc/node/services/agentSession.ts.src/node/acp/streamTranslator.ts; otherwise explicitly leave ACP unchanged and document it in the code review notes.Quality gate: typecheck should catch every union exhaustiveness miss before moving on.
Phase 2: Emit advisor reasoning chunks from the nested advisor stream
src/node/services/tools/advisor.ts, add a companion extractor for reasoning chunks, analogous to the main stream handling insrc/node/services/streamManager.ts:chunk.type === "reasoning-delta"text ?? delta ?? textDelta ?? ""emitAdvisorReasoningOutput()next toemitAdvisorOutput().onChunk, emit:advisor-outputfortext/text-deltaadvisor-reasoning-outputforreasoning-deltastreamedAdviceChunks; final tool results should remain final advice only.Quality gate: update/add
src/node/services/tools/advisor.test.tsso a simulatedreasoning-deltachunk produces an advisor reasoning event and does not pollute the finaladviceresult.Phase 3: Store advisor reasoning as transient UI state
src/browser/stores/WorkspaceStore.ts, add transient advisor reasoning state keyed bytoolCallId, parallel toliveAdvisorOutput.advisor-reasoning-outputby appending text to that transient buffer and bumping streaming state.tool-call-endalongsideliveAdvisorOutputandliveAdvisorPhase.getAdvisorToolLiveReasoning()/useAdvisorToolLiveReasoning().Quality gate: add store coverage showing reasoning chunks accumulate while running and are cleared after the advisor tool call completes.
Phase 4: Render live advisor thinking in the tool UI
src/browser/features/Tools/AdvisorToolCall.tsx, subscribe to the new live advisor reasoning hook.ThinkingorReasoning.Advicerendering unchanged for advisor text chunks and final result text.ThinkingaboveAdvicewhen reasoning existsQuality gate: update
src/browser/features/Tools/AdvisorToolCall.test.tsxto verify live reasoning renders under a separate label and live advice still renders as advice.Phase 5: Validate with targeted and broad checks
Run, at minimum:
make typecheck.make lintormake static-checkif time permits / before PR submission.Do not claim success until these pass, or report exact blockers.
Dogfooding plan
Skill guidance applied:
make dev-server-sandboxso the dogfood run has an isolated temporaryMUX_ROOT, freeBACKEND_PORT/VITE_PORT, and no conflict with the user's live Mux instance.agent-browserbinary, nevernpx agent-browser.agent-browser skills get core.agent-browser snapshot -ibefore interacting, re-snapshot after page changes because refs become stale, and prefer semantic waits (wait --load networkidle,wait --text,wait --url) over fixed sleeps except for human-paced video evidence.Recommended dogfood flow:
make dev-server-sandbox DEV_SERVER_SANDBOX_ARGS="--clean-projects"so provider config is available for GPT-5 Pro but test projects do not leak from the seed config.SEED_MUX_ROOT=~/.mux-devif the implementer needs to seed from a known dev config.KEEP_SANDBOX=1only when preserving the sandbox for debugging is useful.agent-browser --session advisor-streaming open <sandbox-url>agent-browser --session advisor-streaming wait --load networkidleagent-browser --session advisor-streaming snapshot -iagent-browser --session advisor-streaming screenshot --annotate dogfood-output/screenshots/initial.pngagent-browser --session advisor-streaming consoleagent-browser --session advisor-streaming errorsagent-browser --session advisor-streaming record start dogfood-output/videos/advisor-reasoning-live.webmThinking/reasoning-deltachunks.Thinking/Reasoning, not only a blank pending card.agent-browser --session advisor-streaming record stopagent-browser --session advisor-streaming consoleandagent-browser --session advisor-streaming errors; fix any new console/runtime errors before claiming success.agent-browser --session advisor-streaming closeafter evidence is captured.Dogfood acceptance evidence:
dogfood-output/screenshots/initial.pngdogfood-output/videos/advisor-reasoning-live.webmAcceptance criteria
reasoning-deltachunks, the main Mux advisor tool UI shows live thinking/reasoning activity before final advice arrives.advisor-outputbehavior continues to work for text chunks.Alternatives considered
Alternative A: Reuse
advisor-outputfor reasoning deltasNet product LoC estimate: ~25-60 LoC.
Pros:
Cons:
Advicesection.advisor-outputand may surprise ACP/UI consumers.Recommendation: avoid unless the goal is a temporary local debugging patch only.
Alternative B: Emit only reasoning heartbeat/progress, not text
Net product LoC estimate: ~60-110 LoC.
Pros:
Cons:
Recommendation: use this only if product/privacy review decides advisor reasoning text must not be shown.
Alternative C: Persist advisor reasoning in final tool results
Net product LoC estimate: ~180-300 LoC.
Pros:
Cons:
Recommendation: defer unless explicitly requested.
Risks and mitigations
advisor-output; if ACP forwarding is added, make the event droppable under backpressure like other live progress events.Generated with
mux• Model:openai:gpt-5.5• Thinking:xhigh• Cost:$71.20