🤖 fix: show live workflow task rows#3456
Merged
Merged
Conversation
Show workflow child task rows while they are running and make completed rows reveal their reports inline.
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `142689{MUX_COSTS_USD:-unknown}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=178.77 -->
Restore separate navigation and report controls for completed workflow task rows, and terminalize child task attempts that would otherwise remain stuck in started state after resume or execution failures. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$233.25`_ <!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=233.25 -->
Member
Author
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Shows workflow-spawned child tasks in the workflow run card as soon as they are created, keeps task attempts coalesced by exact step/task identity, and preserves direct child-workspace navigation even after a task has an inline report.
Background
Workflow runs previously surfaced sub-agent tasks only after their final report, leaving active parallel workflow work invisible. The first implementation added live rows; this follow-up also addresses review findings around completed-row navigation, stale resumed attempts, child failures after creation, replay backfills, and the duplicated mutex helper.
Implementation
task / startedevents when child tasks are created or resumed.stepId + taskId, with reports matched by the same identity so retries stay distinct.AsyncMuteximplementation for serialized workflow task event writes.Validation
bun test src/browser/features/Tools/WorkflowRunToolCall.test.tsx src/node/services/workflows/WorkflowRunner.test.ts— 43 passed.make typecheck— passed.make static-check— passed.dev-server-sandboxworkflow card usingagent-browser; captured screenshots for live task rows, report toggle expansion, child workspace navigation, plus a WebM recording.Risks
Moderate workflow-card risk because this changes how workflow task lifecycle state is projected into the UI. Mitigations include exact task-attempt keys, backend idempotency checks, targeted UI/backend tests for retries and stale attempts, and sandbox dogfooding of the navigation/report paths.
📋 Implementation Plan
Plan: Show workflow-spawned sub-agents while running and navigate to them
Context and evidence
src/node/services/workflows/WorkflowRunner.tscallstaskAdapter.runAgent(...)fromrunOrResumeAgentStep(...).onTaskCreatedlifecycle callback records a started step withrecordStepStarted(...)as soon asTaskService.create(...)returns a childtaskId.recordAgentResult(...)appends the visible workflowtaskevent only after the agent reports, which explains why the UI only shows task rows at completion time.src/node/services/workflows/WorkflowRunStore.ts#getRun(...)returnseventsandsteps.WorkflowRunStore.readSteps(...)returns the latest step record per(stepId, inputHash), which is useful for current report/details enrichment. Workflowtaskevents should be treated as the authoritative per-task-attempt timeline.src/browser/features/Tools/WorkflowRunToolCall.tsxpollsapi.workflows.getRun(...)every 2s while the run status ispending,running, orbackgrounded.interestingEvents = run.events.filter(...), sorun.stepsare available but not used as first-class live rows.TaskService.create(...)already emits workspace metadata for queued/running child tasks, so child workspaces can be navigated to bytaskIdas workspace id.run.stepsalone is not a safe source for live sub-agent rows becauseapplyPatchsteps also recordtaskId. The recommended plan therefore uses explicittask / startedworkflow events for agent launches, and usesstepsonly for report/details enrichment.Goal
When a workflow launches sub-agent tasks, the workflow run card should immediately show those child tasks in the list, allow the user to navigate into a child task while it is running, and update the same row when the child finishes.
Recommended approach — backend
task: startedevents plus UI task-row coalescingNet product LoC estimate: +130 to +220 LoC.
This is the recommended implementation because it uses the workflow event stream as the authoritative source of user-visible task rows, avoids misclassifying patch steps as sub-agents, and still keeps the change scoped.
Backend implementation steps
Serialize task-start workflow event writes in
WorkflowRunner.ts.AsyncMutexor small helper that protects the pairsequence.next()+appendEvent(...)for start events.parallelAgents(...)can invoke multipleonTaskCreatedcallbacks concurrently.runId,stepId, andtaskIdare non-empty before appending.Append
taskevents withstatus: "started"when agent tasks are created.runOrResumeAgentStep(...), extend the existingonTaskCreatedlifecycle callback:recordStepStarted(...)as today,{ type: "task", stepId, taskId, status: "started" }using the serialized event helper.WorkflowRunEventSchemaalready permits string task statuses, so this should not require a schema migration.runApplyPatchStep(...); patch progress continues to usetype: "patch"events.Handle resume/crash-recovery without duplicate start events.
runOrResumeAgentStep(...)resumes an existing started step viawaitForAgentTask(...), ensure there is at most onetask / startedevent for the exact(stepId, taskId)pair.(stepId, taskId)identity, not(stepId, inputHash), because validation retries can reuse the logical step/input while launching a new child task id.UI implementation steps
Introduce a workflow display-row model in
WorkflowRunToolCall.tsx.eventrows for phase/log/validation/patch/error events.taskrows coalesced from workflowtaskevents.task:${stepId}:${taskId}.run.stepsmay be consulted only to enrich a task row with report markdown/details after completion; it must not independently create sub-agent rows.taskIdand preferablystepId; do not fall back to step-id-only matching. If no exact step/result exists for that task attempt, show no report for that row.Coalesce started/completed/failed task events without hiding retries.
(stepId, taskId)pair, render one task row.stepIdbut a newtaskIdshould render as a distinct row/attempt, preserving any previous failed event.Make task rows navigable while preserving report access.
useWorkspaceStoreRaw().navigateToWorkspace(taskId)or an equivalent existing workspace-navigation path so the child task id routes to the child workspace.role="button"row with no nested native button.Preserve existing behavior for non-task events.
Rejected alternative — UI-only rows from
run.stepsNet product LoC estimate if chosen: +90 to +150 LoC.
This was the initial low-LoC idea, but it is not recommended because
WorkflowStepRecordlacks a step kind andapplyPatchsteps also recordtaskId. A steps-only renderer would either risk briefly showing patch steps as sub-agents or require brittle inference from patch events/results.Testing plan
Unit/UI tests
Update
src/browser/features/Tools/WorkflowRunToolCall.test.tsx:Live started event row appears before completion.
taskevent{ stepId: "step-id", taskId: "task_live", status: "started" }and no completed result.step-id / task_live / started.Started row updates instead of duplicating.
api.workflows.getRun(...)to return the same(stepId, taskId)first asstarted, then ascompletedwith a matching step report.task_liveand its status changes tocompleted.Retries with new task ids remain distinct.
stepIdwithtask_old / failedandtask_new / started.Patch steps/events are not navigable sub-agent rows.
patchevent and a patch step withtaskId.Report enrichment does not cross retry attempts.
task_old / failedandtask_new / completedwith the samestepId, where onlytask_newhas a matching step report.task_olddoes not showtask_new's report, whiletask_newcan show it.Clicking task row navigates to child workspace.
WorkspaceStorein a test wrapper.afterEachso navigation state does not leak across tests.task_live.Completed report affordance does not navigate.
reportMarkdown, click the report toggle/control and assert the report opens while navigation callback is not called.Regression coverage for existing events.
Backend tests
Update
src/node/services/workflows/WorkflowRunner.test.ts:Emits a started task event when an agent task is created.
lifecycle.onTaskCreated("task_live")before resolving.task / startedbefore the later terminal task event.Serializes parallel started events.
parallelAgents(...)with two child specs and a fake adapter that calls bothonTaskCreatedcallbacks concurrently.Resume path does not duplicate started events.
waitForAgentTask(...), and assert no duplicate started event is appended.Validation retry keeps attempts separate.
Validation commands
Run after implementation:
Before final handoff/PR readiness:
Dogfooding plan
This follows the requested dogfood,
agent-browser, anddev-server-sandboxworkflows. Use directagent-browsercommands, nevernpx agent-browser.Setup
Start an isolated app instance:
make dev-server-sandbox DEV_SERVER_SANDBOX_ARGS="--clean-projects"Capture the emitted frontend URL and sandbox
MUX_ROOTfrom the command output.Before browser automation, load the installed CLI's current browser guide:
Prepare evidence directories:
Open the sandboxed frontend:
Scenario
In the sandbox app, create or use a test workspace for this repo.
Add a scratch workflow that launches at least two sub-agents with
parallelAgents(...); use prompts that require enough investigation to observe the in-progress state.Run the workflow in background.
Start a recording before interacting with the workflow card:
Expand the workflow run card and capture the live state:
Click a started task row and verify navigation into the child workspace:
Navigate back to the workflow parent, wait for child completion, and capture the updated row:
Stop recording and capture diagnostics:
Dogfood acceptance gate
The reviewer evidence should include:
started-task-rows.png: workflow task rows visible while child agents are still running.navigated-to-child-task.png: clicking a task row navigates into that sub-agent workspace.completed-task-rows.png: the same tasks show completed/failed state after finishing.live-task-navigation.webm: video of the live row appearing, click navigation, and row update.Acceptance criteria
task / startedworkflow event as soon as the child task id is created.parallelAgents(...)can launch multiple children concurrently without duplicate or out-of-order workflow event sequence numbers.started,completed,failed, or any future task status string persisted by the backend).(stepId, taskId)update one logical row.taskIdappear as distinct attempts, even when they share the same workflow step id.Risks and mitigations
parallelAgents(...)can callonTaskCreatedconcurrently. Mitigate by serializingsequence.next()+appendEvent(...)withAsyncMutex, and test strict sequence ordering.(stepId, taskId)event existence checks before appending a defensive started event.(stepId, taskId), not(stepId, inputHash).taskevents emitted by agent launch paths; never create task rows directly fromrun.steps.Quality gates
make typecheckbefore dogfooding.agent-browseragainst adev-server-sandboxinstance.make static-checkand fix failures before claiming completion.Generated with
mux• Model:openai:gpt-5.5• Thinking:xhigh• Cost:$248.13