Skip to content

🤖 fix: show live workflow task rows#3456

Merged
ThomasK33 merged 2 commits into
mainfrom
task-list-v7y7
Jun 4, 2026
Merged

🤖 fix: show live workflow task rows#3456
ThomasK33 merged 2 commits into
mainfrom
task-list-v7y7

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

@ThomasK33 ThomasK33 commented Jun 4, 2026

Summary

Shows workflow-spawned child tasks in the workflow run card as soon as they are created, keeps task attempts coalesced by exact step/task identity, and preserves direct child-workspace navigation even after a task has an inline report.

Background

Workflow runs previously surfaced sub-agent tasks only after their final report, leaving active parallel workflow work invisible. The first implementation added live rows; this follow-up also addresses review findings around completed-row navigation, stale resumed attempts, child failures after creation, replay backfills, and the duplicated mutex helper.

Implementation

  • Emits idempotent workflow task / started events when child tasks are created or resumed.
  • Coalesces workflow task rows by exact stepId + taskId, with reports matched by the same identity so retries stay distinct.
  • Splits task-row navigation from report expansion: row activation opens the child workspace; a sibling Report toggle expands markdown inline.
  • Backfills missing completed task events during replay and terminalizes stale/failed child task attempts.
  • Reuses the shared AsyncMutex implementation for serialized workflow task event writes.

Validation

  • bun test src/browser/features/Tools/WorkflowRunToolCall.test.tsx src/node/services/workflows/WorkflowRunner.test.ts — 43 passed.
  • make typecheck — passed.
  • make static-check — passed.
  • Dogfooded with a seeded dev-server-sandbox workflow card using agent-browser; captured screenshots for live task rows, report toggle expansion, child workspace navigation, plus a WebM recording.

Risks

Moderate workflow-card risk because this changes how workflow task lifecycle state is projected into the UI. Mitigations include exact task-attempt keys, backend idempotency checks, targeted UI/backend tests for retries and stale attempts, and sandbox dogfooding of the navigation/report paths.


📋 Implementation Plan

Plan: Show workflow-spawned sub-agents while running and navigate to them

Advisor review: approved after revisions. Key advisor-requested fixes incorporated: task events are authoritative, start events are serialized, report enrichment exact-matches task attempts, retries stay distinct, and patch steps are not misclassified as sub-agent rows.

Context and evidence

  • The user-observed gap is in the workflow run card's task/event list: workflow-spawned agents are only shown after completion, even though the workflow has already launched them.
  • Current backend lifecycle already persists launched workflow child tasks before completion:
    • src/node/services/workflows/WorkflowRunner.ts calls taskAdapter.runAgent(...) from runOrResumeAgentStep(...).
    • The onTaskCreated lifecycle callback records a started step with recordStepStarted(...) as soon as TaskService.create(...) returns a child taskId.
    • recordAgentResult(...) appends the visible workflow task event only after the agent reports, which explains why the UI only shows task rows at completion time.
  • Current persistence already exposes both data streams needed by the UI:
    • src/node/services/workflows/WorkflowRunStore.ts#getRun(...) returns events and steps.
    • WorkflowRunStore.readSteps(...) returns the latest step record per (stepId, inputHash), which is useful for current report/details enrichment. Workflow task events should be treated as the authoritative per-task-attempt timeline.
  • Current UI already refreshes active workflow runs:
    • src/browser/features/Tools/WorkflowRunToolCall.tsx polls api.workflows.getRun(...) every 2s while the run status is pending, running, or backgrounded.
    • The same component currently renders interestingEvents = run.events.filter(...), so run.steps are available but not used as first-class live rows.
  • TaskService.create(...) already emits workspace metadata for queued/running child tasks, so child workspaces can be navigated to by taskId as workspace id.
  • Advisor review identified one important correction: run.steps alone is not a safe source for live sub-agent rows because applyPatch steps also record taskId. The recommended plan therefore uses explicit task / started workflow events for agent launches, and uses steps only for report/details enrichment.

Goal

When a workflow launches sub-agent tasks, the workflow run card should immediately show those child tasks in the list, allow the user to navigate into a child task while it is running, and update the same row when the child finishes.

Recommended approach — backend task: started events plus UI task-row coalescing

Net product LoC estimate: +130 to +220 LoC.

This is the recommended implementation because it uses the workflow event stream as the authoritative source of user-visible task rows, avoids misclassifying patch steps as sub-agents, and still keeps the change scoped.

Backend implementation steps

  1. Serialize task-start workflow event writes in WorkflowRunner.ts.

    • Add a runner-local AsyncMutex or small helper that protects the pair sequence.next() + appendEvent(...) for start events.
    • This is required because parallelAgents(...) can invoke multiple onTaskCreated callbacks concurrently.
    • Assert defensively that runId, stepId, and taskId are non-empty before appending.
  2. Append task events with status: "started" when agent tasks are created.

    • In runOrResumeAgentStep(...), extend the existing onTaskCreated lifecycle callback:
      1. keep recording recordStepStarted(...) as today,
      2. then append a workflow event { type: "task", stepId, taskId, status: "started" } using the serialized event helper.
    • WorkflowRunEventSchema already permits string task statuses, so this should not require a schema migration.
    • Do not append task-start events from runApplyPatchStep(...); patch progress continues to use type: "patch" events.
  3. Handle resume/crash-recovery without duplicate start events.

    • When runOrResumeAgentStep(...) resumes an existing started step via waitForAgentTask(...), ensure there is at most one task / started event for the exact (stepId, taskId) pair.
    • For legacy started steps that predate this change and have no task event yet, append one defensively so crash-resumed/backgrounded runs can still show the live child row.
    • Use exact (stepId, taskId) identity, not (stepId, inputHash), because validation retries can reuse the logical step/input while launching a new child task id.

UI implementation steps

  1. Introduce a workflow display-row model in WorkflowRunToolCall.tsx.

    • Add a small discriminated union such as:
      • event rows for phase/log/validation/patch/error events.
      • task rows coalesced from workflow task events.
    • Task rows are keyed by exact task:${stepId}:${taskId}.
    • run.steps may be consulted only to enrich a task row with report markdown/details after completion; it must not independently create sub-agent rows.
    • Report enrichment must match the task row by exact taskId and preferably stepId; do not fall back to step-id-only matching. If no exact step/result exists for that task attempt, show no report for that row.
  2. Coalesce started/completed/failed task events without hiding retries.

    • For each exact (stepId, taskId) pair, render one task row.
    • Keep the earliest task event position for row ordering and the latest task event status for row text.
    • A retry with the same stepId but a new taskId should render as a distinct row/attempt, preserving any previous failed event.
    • Existing completed-only workflow runs still render because a lone completed task event forms the row.
  3. Make task rows navigable while preserving report access.

    • Use useWorkspaceStoreRaw().navigateToWorkspace(taskId) or an equivalent existing workspace-navigation path so the child task id routes to the child workspace.
    • Make the main task-row content clickable and keyboard accessible to satisfy the user expectation that clicking the row navigates to the sub-agent.
    • Avoid invalid nested interactive controls: split the row into sibling clickable regions, or use a role="button" row with no nested native button.
    • If a completed row has report markdown, provide a separate sibling report toggle/control that stops propagation and does not trigger navigation.
  4. Preserve existing behavior for non-task events.

    • Phase highlighting, log JSON expansion, validation/error tones, patch details, and final report rendering should continue to behave as they do today.
    • Patch rows remain patch rows and are not navigable sub-agent rows unless a separate patch-specific UX is added later.

Rejected alternative — UI-only rows from run.steps

Net product LoC estimate if chosen: +90 to +150 LoC.

This was the initial low-LoC idea, but it is not recommended because WorkflowStepRecord lacks a step kind and applyPatch steps also record taskId. A steps-only renderer would either risk briefly showing patch steps as sub-agents or require brittle inference from patch events/results.

Testing plan

Unit/UI tests

Update src/browser/features/Tools/WorkflowRunToolCall.test.tsx:

  1. Live started event row appears before completion.

    • Render a running workflow run with a task event { stepId: "step-id", taskId: "task_live", status: "started" } and no completed result.
    • Assert the task row text is visible, e.g. step-id / task_live / started.
  2. Started row updates instead of duplicating.

    • Mock api.workflows.getRun(...) to return the same (stepId, taskId) first as started, then as completed with a matching step report.
    • Assert there is exactly one visible row for task_live and its status changes to completed.
  3. Retries with new task ids remain distinct.

    • Render task events for the same stepId with task_old / failed and task_new / started.
    • Assert both attempts are visible so the new task does not overwrite the failed attempt.
  4. Patch steps/events are not navigable sub-agent rows.

    • Render a workflow run containing a patch event and a patch step with taskId.
    • Assert the patch row remains a patch row and no sub-agent navigation affordance is created from that step alone.
  5. Report enrichment does not cross retry attempts.

    • Render task_old / failed and task_new / completed with the same stepId, where only task_new has a matching step report.
    • Assert task_old does not show task_new's report, while task_new can show it.
  6. Clicking task row navigates to child workspace.

    • Install a test navigation callback through the raw WorkspaceStore in a test wrapper.
    • Restore/reset the singleton callback in afterEach so navigation state does not leak across tests.
    • Click the started task row and assert the callback receives task_live.
    • Also cover Enter/Space activation for the chosen accessible row control.
  7. Completed report affordance does not navigate.

    • For a completed task with reportMarkdown, click the report toggle/control and assert the report opens while navigation callback is not called.
    • Include a DOM-shape assertion or interaction test that catches invalid nested button behavior if native buttons are used.
  8. Regression coverage for existing events.

    • Keep/adjust existing assertions for phase/log/validation/patch/final report rendering.
    • Avoid tautological tests that only assert static copy; focus on status precedence, deduplication, event-kind separation, and navigation behavior.

Backend tests

Update src/node/services/workflows/WorkflowRunner.test.ts:

  1. Emits a started task event when an agent task is created.

    • Use a fake task adapter that calls lifecycle.onTaskCreated("task_live") before resolving.
    • Assert the stored run includes task / started before the later terminal task event.
  2. Serializes parallel started events.

    • Use parallelAgents(...) with two child specs and a fake adapter that calls both onTaskCreated callbacks concurrently.
    • Assert workflow event sequences remain strictly increasing and both started rows are present.
  3. Resume path does not duplicate started events.

    • Pre-seed a started step/task and matching started event, resume via waitForAgentTask(...), and assert no duplicate started event is appended.
    • Optional legacy case: pre-seed a started step/task with no event, resume, and assert one started event is added.
  4. Validation retry keeps attempts separate.

    • Force one invalid structured output attempt and one retry with a new task id.
    • Assert the run preserves distinct task events for the failed and retried task ids.

Validation commands

Run after implementation:

bun test src/browser/features/Tools/WorkflowRunToolCall.test.tsx
bun test src/node/services/workflows/WorkflowRunner.test.ts
make typecheck

Before final handoff/PR readiness:

make static-check

Dogfooding plan

This follows the requested dogfood, agent-browser, and dev-server-sandbox workflows. Use direct agent-browser commands, never npx agent-browser.

Setup

  1. Start an isolated app instance:

    make dev-server-sandbox DEV_SERVER_SANDBOX_ARGS="--clean-projects"

    Capture the emitted frontend URL and sandbox MUX_ROOT from the command output.

  2. Before browser automation, load the installed CLI's current browser guide:

    agent-browser skills get core
  3. Prepare evidence directories:

    mkdir -p dogfood-output/workflow-live-tasks/screenshots dogfood-output/workflow-live-tasks/videos
  4. Open the sandboxed frontend:

    agent-browser --session workflow-live-tasks open "$FRONTEND_URL"
    agent-browser --session workflow-live-tasks wait --load networkidle
    agent-browser --session workflow-live-tasks screenshot --annotate dogfood-output/workflow-live-tasks/screenshots/initial.png
    agent-browser --session workflow-live-tasks snapshot -i

Scenario

  1. In the sandbox app, create or use a test workspace for this repo.

  2. Add a scratch workflow that launches at least two sub-agents with parallelAgents(...); use prompts that require enough investigation to observe the in-progress state.

  3. Run the workflow in background.

  4. Start a recording before interacting with the workflow card:

    agent-browser --session workflow-live-tasks record start dogfood-output/workflow-live-tasks/videos/live-task-navigation.webm
  5. Expand the workflow run card and capture the live state:

    agent-browser --session workflow-live-tasks screenshot --annotate dogfood-output/workflow-live-tasks/screenshots/started-task-rows.png
    agent-browser --session workflow-live-tasks snapshot -i
  6. Click a started task row and verify navigation into the child workspace:

    # Use the element ref from snapshot output.
    agent-browser --session workflow-live-tasks click @eN
    agent-browser --session workflow-live-tasks wait --load networkidle
    agent-browser --session workflow-live-tasks screenshot --annotate dogfood-output/workflow-live-tasks/screenshots/navigated-to-child-task.png
  7. Navigate back to the workflow parent, wait for child completion, and capture the updated row:

    agent-browser --session workflow-live-tasks screenshot --annotate dogfood-output/workflow-live-tasks/screenshots/completed-task-rows.png
  8. Stop recording and capture diagnostics:

    agent-browser --session workflow-live-tasks record stop
    agent-browser --session workflow-live-tasks errors
    agent-browser --session workflow-live-tasks console

Dogfood acceptance gate

The reviewer evidence should include:

  • started-task-rows.png: workflow task rows visible while child agents are still running.
  • navigated-to-child-task.png: clicking a task row navigates into that sub-agent workspace.
  • completed-task-rows.png: the same tasks show completed/failed state after finishing.
  • live-task-navigation.webm: video of the live row appearing, click navigation, and row update.
  • Console/errors output with no new relevant browser errors.

Acceptance criteria

  • A workflow-spawned child task produces a task / started workflow event as soon as the child task id is created.
  • parallelAgents(...) can launch multiple children concurrently without duplicate or out-of-order workflow event sequence numbers.
  • The workflow card shows the child task row before the child reports.
  • The row includes step id, child task id, and current status (started, completed, failed, or any future task status string persisted by the backend).
  • Clicking the task row navigates to the child workspace while it is still running.
  • Completed task rows still provide access to the child report without hijacking row navigation.
  • A single task attempt appears once: started/completed/failed statuses for the same (stepId, taskId) update one logical row.
  • Retries with a new taskId appear as distinct attempts, even when they share the same workflow step id.
  • Patch events/steps are not rendered as navigable sub-agent rows.
  • Existing workflow phase/log/validation/patch/error/result display remains unchanged.

Risks and mitigations

  • Concurrent started-event writes: parallelAgents(...) can call onTaskCreated concurrently. Mitigate by serializing sequence.next() + appendEvent(...) with AsyncMutex, and test strict sequence ordering.
  • Duplicate started events on resume: Crash recovery can encounter already-started steps. Mitigate with exact (stepId, taskId) event existence checks before appending a defensive started event.
  • Retry row collapse: Retries can reuse the same step id/input hash. Mitigate by keying task rows by (stepId, taskId), not (stepId, inputHash).
  • Patch misclassification: Patch steps also carry task ids. Mitigate by creating visible live sub-agent rows only from task events emitted by agent launch paths; never create task rows directly from run.steps.
  • Navigation vs report expansion conflict: Make navigation and report expansion separate sibling controls or otherwise avoid nested interactive elements; add interaction tests for both paths.
  • Stale child task ids after cleanup: Active/running workflow children should still be present. For old completed rows whose child workspace was cleaned up, navigation may no-op; the report remains visible from persisted step result.

Quality gates

  1. Backend gate: Add started-event emission and WorkflowRunner tests for serial ordering, resume de-dupe, and retry identity before changing UI navigation.
  2. UI gate: Add display-row coalescing and workflow-card tests for started rows, no duplicates, retries, patch separation, navigation, and report toggling.
  3. Targeted validation gate: Run the workflow runner and workflow tool-call test files after backend/UI changes.
  4. Type gate: Run make typecheck before dogfooding.
  5. Dogfood gate: Capture screenshots and video through agent-browser against a dev-server-sandbox instance.
  6. Final gate: Run make static-check and fix failures before claiming completion.

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh • Cost: $248.13

ThomasK33 added 2 commits June 4, 2026 18:11
Show workflow child task rows while they are running and make completed rows reveal their reports inline.

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `142689{MUX_COSTS_USD:-unknown}`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=178.77 -->
Restore separate navigation and report controls for completed workflow task rows, and terminalize child task attempts that would otherwise remain stuck in started state after resume or execution failures.

---

_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$233.25`_

<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=233.25 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Jun 4, 2026
Merged via the queue into main with commit 52c6aec Jun 4, 2026
24 checks passed
@ThomasK33 ThomasK33 deleted the task-list-v7y7 branch June 4, 2026 20:05
@mux-bot mux-bot Bot mentioned this pull request Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant