🤖 feat: stream advisor output live#3310
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 136276da0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Please take another look. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a859ff16e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Please take another look. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d3bc040ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Please take another look. |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Streams nested advisor model response text into the advisor tool card while the tool is still running, without changing the final advisor tool result contract returned to the parent model.
Background
Advisor calls previously showed phase changes only and withheld the actual advice text until
tool-call-end. This made long advisor consultations feel opaque even though the nested model was already generating useful text.Implementation
Adds a UI-only
advisor-outputchat event, emits it from the advisor tool'sstreamText()chunks, forwards it through the workspace chat stream, stores it as bounded transient frontend state, and renders it in the expanded advisor tool card until the final persisted tool result arrives.Validation
env MUX_ESLINT_CONCURRENCY=2 make static-checkbun test src/node/services/tools/advisor.test.tsbun test src/browser/stores/WorkspaceStore.test.tsbun test src/browser/features/Tools/AdvisorToolCall.test.tsxmake typecheckmake fmt-checkgit diff --checkwaiting for response.Risks
Risk is concentrated in chat stream/transient UI state. The new event is UI-only, bounded, cleared on final advisor tool result and stale-message cleanup, and leaves model-visible tool output unchanged.
Pains
Default lint concurrency was OOM-killed in this workspace, so local lint/static-check was run with
MUX_ESLINT_CONCURRENCY=2.📋 Implementation Plan
Stream Advisor Responses to the Client
Goal
Stream the nested advisor model's response text to the client UI while the advisor tool is still running, without changing what the parent model receives from the tool.
Today the advisor tool is non-streaming from the user's perspective: the backend emits live phase changes (
preparing_context,waiting_for_response,finalizing_result), then the UI receives the full advice only when the normaltool-call-endresult arrives. The desired UX is for theadvisortool card to show advice text incrementally as the nested advisor model produces it.Current implementation evidence
Verified implementation points from the repo:
src/node/services/tools/advisor.tscreateAdvisorTool()currently callsgenerateText()for the nested advisor request.advisor-phaseUI events during execution.{ type: "advice", advice, advisorModel, reasoningLevel, remainingUses }.config.reportModelUsageafter the nested call completes.src/node/services/aiService.tsadvisorModelString.advisorRuntimeis injected intogetToolsForModel()only when eligible.onAdvisorChunkcaptures the parent model's same-step text/reasoning before the advisor tool call; it does not stream advisor output.src/common/orpc/schemas/stream.tsAdvisorPhaseEventSchemaalready defines a UI-only advisor event carried overworkspace.onChat.WorkspaceChatMessageSchemais the discriminated union that must include any new chat-stream event.src/node/services/agentSession.tsadvisor-phaseis already forwarded.src/browser/stores/WorkspaceStore.tsliveAdvisorPhase.advisor-phase, and clears advisor phase ontool-call-endforadvisor.src/browser/features/Tools/AdvisorToolCall.tsxAdvisor review status
An advisor review was requested after drafting this plan. Two advisor tool calls were attempted with compact review prompts, but both failed before returning advice with a sanitized provider error:
Advisor request failed: Invalid JSON responsewith binary-like payload content. No advisor-originated recommendations were available to incorporate.Because the review tool itself failed, the plan below is conservative and self-reviewed against the verified code paths. If the advisor tool becomes healthy before implementation starts, retry the review using the same summary and incorporate any concrete feedback before coding.
Recommended approach
Approach A — UI-only
advisor-outputevent + nestedstreamText()(recommended)Product LoC estimate: net +300 to +450 product LoC.
Implement a new UI-only chat event, tentatively named
advisor-output, that carries advisor text deltas from the backend to the renderer while the tool is running. Replace the nested advisorgenerateText()call withstreamText()and useonChunk(or an equivalentfullStreamloop) to emit output chunks as they arrive. Accumulate those same chunks in the backend and preserve the existing final tool result shape for the parent model.Why this is the best fit:
advisor-phase,bash-output,task-created) for UI-only tool progress.stream-deltaevents.execute()resolves.Alternatives considered
Approach B — Reuse
tool-call-deltafor advisor outputProduct LoC estimate: net +180 to +280 product LoC, but not recommended.
tool-call-deltacurrently means streaming tool input args from the parent model, not tool output from a running tool. Reusing it would blur semantics and complicate existing stream processing/tests.Approach C — Persist partial advisor output as mutable tool result state
Product LoC estimate: net +500 to +800 product LoC, not needed for the first version.
This would make reconnect/replay show in-progress advisor output exactly, but requires deeper changes to dynamic-tool persistence/partial writes and would risk coupling UI-only progress with model-visible tool output. Consider later only if live-only reconnect behavior is insufficient.
Scope
In scope:
AdvisorToolCallUI.tool-call-endresult is available.Out of scope for this iteration:
Detailed implementation plan
Phase 0 — Pre-flight decisions
Before coding, confirm these defaults unless the implementer discovers a strong reason to change them:
advisor-output.textrather thandelta, matchingbash-output's tool-output wording.tool-call-endremains the durable source of truth.messageIdin the first version;workspaceId + toolCallIdis enough for transient UI state and matches existingadvisor-phasescoping.Phase 1 — Add a UI-only advisor output event contract
Files likely touched:
src/common/orpc/schemas/stream.tssrc/common/orpc/schemas.tssrc/common/types/stream.tssrc/common/orpc/types.tsSteps:
Add a new schema near
AdvisorPhaseEventSchema, for example:Add
AdvisorOutputEventSchematoWorkspaceChatMessageSchema.Re-export/import the new schema through the aggregate schema module if needed (
src/common/orpc/schemas.tscurrently gathers stream schemas for downstream imports).Export the inferred
AdvisorOutputEventtype fromsrc/common/types/stream.ts.Add
isAdvisorOutputEvent()tosrc/common/orpc/types.ts.Keep the event UI-only: it should not be transformed into a
MuxMessageand should not become model-visible history.Defensive checks:
textchunks.workspaceIdandtoolCallIdrequired so the renderer can scope chunks to a single tool card.messageIdunless a later implementation needs replay dedupe;advisor-phaseandbash-outputalready scope by workspace/toolCallId.Quality gate:
make typecheckafter Phase 2 when event producers exist.Phase 2 — Stream inside the advisor backend tool
Files likely touched:
src/node/services/tools/advisor.tssrc/node/services/tools/advisor.test.tsSteps:
generateText()call withstreamText()from the AI SDK.modelsystem: ADVISOR_SYSTEM_PROMPTmessagestools: {}providerOptionsmaxOutputTokensonChunkto capturetext-deltachunks from the advisor stream.text,delta, ortextDeltabecause current stream code already handles provider/SDK variation this way.adviceTextaccumulator,{ type: "advisor-output", workspaceId, toolCallId, text: delta, timestamp: Date.now() }viaconfig.emitChatEventif available.const streamResult = streamText(...); const finalAdvice = await streamResult.text;while usingonChunkfor live updates.finalAdviceis non-empty or equal to the accumulated text when practical. Do not fail a successful call solely because chunk accumulation differs; use the final SDK text as the source of truth for the returned tool result.reportModelUsagepath.streamResult.usageorstreamResult.totalUsagebased on what best matches currentgenerateText()semantics for a one-step, tool-less advisor call.AIService.reportModelUsage.Advisor request was aborted.Advisor request failed: ...preparing_contextbefore transcript/handoff work,waiting_for_responseimmediately before startingstreamText()/awaiting response,finalizing_resultafter the nested stream finishes and before returning final result.Defensive checks:
advisorRuntime, non-empty model string, valid reasoning level, max uses, output token cap, transcript snapshot, andtoolCallId.usesThisTurn++before awaits.Quality gate:
src/node/services/tools/advisor.test.tsto mockstreamText()instead ofgenerateText().advisor-outputevents are emitted for text chunks before final result,emitChatEvent,workspaceId, ortoolCallIdis unavailable,Phase 3 — Forward advisor output events through workspace chat
Files likely touched:
src/node/services/agentSession.tsSteps:
Add a forwarding branch mirroring
advisor-phase:If
markActiveStreamHadAnyOutput()is too broad for UI-only advisor chunks, omit it; howeverbash-outputalready marks output, and advisor text is user-visible output from a running tool, so marking is reasonable.Confirm the event traverses:
config.emitChatEvent,AIService.emit(event.type, event),AgentSessionforwarding,workspace.onChatORPC subscription.Quality gate:
Phase 4 — Store live advisor output in renderer transient state
Files likely touched:
src/browser/stores/WorkspaceStore.tssrc/browser/stores/WorkspaceStore.test.tsSteps:
Add a transient output state type, for example:
Add
liveAdvisorOutput: Map<string, AdvisorLiveOutputState>toWorkspaceChatTransientState.Initialize it in the transient state factory.
Add
getAdvisorToolLiveOutput(workspaceId, toolCallId)anduseAdvisorToolLiveOutput(...).useSyncExternalStoreloops.Treat
advisor-outputas a buffered event before caught-up, likebash-output,advisor-phase, andtask-created.In
processStreamEvent():isAdvisorOutputEvent(data),data.textto the previous live output for thattoolCallId,Clear
liveAdvisorOutput:tool-call-endforadvisor, alongsideliveAdvisorPhase,liveAdvisorPhaseis already cleared.Defensive checks:
textdefensively in the store even if backend filters it.Quality gate:
WorkspaceStore.test.tswith tests for:tool-call-end,Phase 5 — Render live advisor text in the tool UI
Files likely touched:
src/browser/features/Tools/AdvisorToolCall.tsxsrc/browser/features/Tools/AdvisorToolCall.test.tsxSteps:
useAdvisorToolLiveOutput(workspaceId, toolCallId).liveAdviceTextonly while the tool is executing and finaladvisorResultis not present.Advicesection whenliveAdviceTextis non-empty,MarkdownRendererwithpreserveLineBreaks, matching final advice,LoadingDotsindicator next to the section label or below the text.advisorResult?.type === "advice", renderadvisorResult.adviceas today,ToolDetails textfallback so copy/selection/preview behavior works while streaming, if that prop is intended for that purpose.Performance guardrail:
Quality gate:
AdvisorToolCall.test.tsxto verify:status="executing"and final result is absent,Phase 6 — Validation and dogfooding
Automated validation
Run targeted tests first, then broader validation:
bun test src/node/services/tools/advisor.test.tsbun test src/browser/features/Tools/AdvisorToolCall.test.tsxbun test src/browser/stores/WorkspaceStore.test.tsmake typecheckmake lintmake testormake static-check.Fix failures before claiming success.
Manual dogfooding setup
make dev, following current project instructions.advisor-toolexperiment in Settings → Experiments.advisor, for example:agent-browser/Electron automation to capture:advisortool card in a running phase,tool-call-endshowing the final advice result.Real dogfooding continuation request (Plan Mode note)
The user requested a real dogfooding session after implementation, specifically:
dogfood,agent-browser, anddev-server-sandboxskills;agent-browser skills get corebefore browser automation;Plan Mode constraints currently prevent executing this request directly: only the plan file may be edited, and bash must remain read-only. A real dogfood session requires non-read-only operations such as starting a sandbox that creates a temporary
MUX_ROOT, copying provider/project config, changing sandbox settings/provider choices, creating dogfood output directories/artifacts, and recording video files. The next step therefore requires switching back to Exec mode.Dogfood-only approach — real sandbox session
Product LoC estimate: net +0 product LoC, unless dogfooding uncovers a bug that must be fixed.
Execution plan once in Exec mode:
make dev-server-sandbox.MUX_ROOT, backend port, and Vite URL without exposing secrets.agent-browsersession.agent-browser skills get coreguidance:open,snapshot -i, interact by refs, re-snapshot after changes.tool-call-end;Dogfood quality gates:
Dogfooding acceptance checks
stream-deltacontent.Acceptance criteria
tool-call-end.Risks and mitigations
streamText()result usage/provider metadata differs fromgenerateText()advisor-outputto buffered event handling before caught-up.stream-delta. Add store/UI tests.execute()return shape and treat streaming as UI-only.tool-call-end; final result supersedes live output.Suggested implementation order
advisor-output).This order keeps the backend producer and event contract testable before touching UI rendering, then validates the renderer pipeline with transient-state tests before dogfooding the full UX.
Generated with
mux• Model:openai:gpt-5.5• Thinking:xhigh• Cost:$139.03