Proj/task cancellation#680
Conversation
- ChatSession.cancel(taskId?) returns boolean so callers can tell whether a controller was found and aborted vs no-op. - CipherAgent.cancelTask(taskId) fans cancel across live sessions and returns true when any session held the controller. Idempotent. - Legacy no-arg cancel semantics unchanged (REPL Esc path).
- New handleAgentCancelEvent helper delegates to CipherAgent.cancelTask and emits task:cancelled upstream only when an active controller was held. Best-effort: any error from agent or transport is logged and swallowed so the cancel pipeline never crashes the agent. - agent-process registers the listener alongside task:execute. - Unit tests cover matched emit, unmatched no-op, agent-throw and transport-throw paths, idempotency, and per-event logging.
- handleExecutorTerminalError helper branches the executor catch: SessionCancelledError suppresses task:error (T1.1 emits task:cancelled); anything else takes the existing task:error path. Curate Phase 4 is naturally skipped because the executor throws before postWork is assigned; dream's finally still releases the lock via rollback on the abort path. - task-router.handleTaskCancelled now calls agentPool.notifyTaskCompleted so the project FIFO drains after a cancel, symmetric with the completed/error paths. - New IAgentPool.cancelQueuedTask delegates to ProjectTaskQueue.cancel. task-router.handleTaskCancel checks the queue first; queued tasks are removed and the daemon emits task:cancelled directly, never forwarding to the agent that holds no controller for them. - Extract cancelTaskLocally to share broadcast and hook plumbing between the queued-cancel and no-agent-connected branches. - Tests: 5 for handleExecutorTerminalError, 4 for task-router cancel paths, 3 for agent-pool cancelQueuedTask. Update 7 existing mock IAgentPool to satisfy the new interface method.
- New test/integration/cancel-pipeline.test.ts covers the 5 T1.4 scenarios: cancel running, cancel running with queued, cancel queued, idempotent double-cancel, follow-up after cancel. - Harness uses a real SocketIOTransportServer, a real TaskRouter, and a stub IAgentPool backed by a real ProjectTaskQueue so the queue/drain logic is exercised end-to-end. A TransportClient acts as the mock agent and honours per-task behaviors (wait-for-cancel or auto-complete) so the test driver controls when each task ends. - History persistence is asserted via an ITaskLifecycleHook recorder rather than a real FileCurateLogStore — the recorder is the same hook interface CurateLogHandler implements in production to write status='cancelled' to history. The file-write itself stays covered by file-curate-log-store unit tests. - Suite stays in-memory; waitFor polls with a bounded 1500ms deadline. Whole file finishes in ~210ms with no hanging promises.
- New src/oclif/lib/cancel-task.ts exposes runCancelTask({client,
command, format, log, taskId}) — the only place in the CLI where the
task:cancel transport event name and request shape appear. T2.2-T2.5
per-command --cancel flags will call it.
- Text format prints "Cancelled <id>" on success, "Failed to cancel
<id>: <reason>" on daemon-reported failure, with a generic fallback
when the daemon omits the reason.
- JSON format reuses writeJsonResponse so the payload shape stays
consistent with other commands' --format json output. The caller's
command name is stamped on the envelope so curate/query/dream stay
greppable.
- Returns boolean so the per-command caller can decide on exit code.
- 7 unit tests cover both formats, success and failure paths, transport
payload shape, and the missing-error-reason fallback.
- New --cancel <id> flag short-circuits the create flow: no provider validation, no task:create, no context required. Delegates to the shared runCancelTask helper from T2.1 and exits 1 on daemon-reported failure so scripts can rely on the exit code. - Mutually exclusive with --files, --folder, --detach (oclif's `exclusive`). --timeout is left allowed and silently ignored on the cancel branch; rejecting a harmless combo would surprise users. - Existing curate behavior is untouched. The cancel branch is added alongside the normal path with zero overlap on the create code. - 10 new tests in test/commands/curate.test.ts cover both formats, success and failure paths, mutex enforcement against each of the three exclusive flags, and the timeout-allowed case.
- Add --cancel <id> flag that short-circuits the query flow through the shared runCancelTask helper from T2.1. - Relax the positional `query` argument to optional so --cancel can be used without a placeholder query string. A manual guard in run() preserves the existing missing-query error when neither input is given. - Reject "<query> --cancel <id>" as a mutually exclusive combination with a clear error message in both text and JSON formats, exiting non-zero without touching the transport. - Help text on both surfaces reflects the dual mode (run a query, or cancel by id). - 7 new tests cover both formats, success and failure paths, the positional+cancel conflict, and missing-both rejection.
- Add --cancel <id> flag that short-circuits the dream flow through the shared runCancelTask helper from T2.1. Branches BEFORE the --undo path so cancel never accidentally triggers a revert. - Mutually exclusive with --force, --undo, --detach (oclif's `exclusive`). --timeout is left allowed and silently ignored on the cancel branch. - Help text makes clear that cancel is a hard stop and not a revert; pair with `brv dream --undo` for rollback of completed/partial dreams. - 9 new tests cover both formats, success and failure paths, the mutex against each of the three exclusive flags, and the timeout-allowed case.
- waitForTaskCompletion installs a SIGINT handler for the duration of the wait. First Ctrl-C emits task:cancel for the current task and writes a "Cancelling task..." hint to stderr (kept off stdout so --format json output stays clean). Second Ctrl-C hard-exits with code 130 (conventional SIGINT exit) without re-emitting cancel. - The wait loop now subscribes to task:cancelled as a terminal event. Previously it only handled task:completed and task:error, which meant a remote --cancel or a Ctrl-C cancel left the foreground process hanging until the existing timeout fired. The new branch fires the optional onCancelled callback and resolves the promise. - The SIGINT handler is added to the same unsubscribers chain the existing cleanup() drains, so it is removed on every terminal path (completed, cancelled, error, disconnect, timeout). - 8 new unit tests in test/unit/oclif/lib/task-client-sigint.test.ts cover both terminal recognition and the SIGINT install/uninstall contract.
- Add src/tui/features/tasks/api/cancel-task.ts mirroring the provider/api/cancel-oauth pattern: pulls the active apiClient from the TUI transport store, emits task:cancel with the taskId, returns the daemon's TaskCancelResponse, throws when the response reports success: false. - No new transport client construction; reuses the established TUI pattern. Boundary preserved — no imports from server/, agent/, or oclif/. - 5 unit tests cover the event payload, the success resolve, the daemon-reported error path, the missing-error fallback, and the not-connected guard. Test file establishes the new test/unit/tui/features/tasks/api/ folder for T4.2 to extend.
- New useCancelRunningTaskKeybind hook installs a scoped useInput
that fires only while there is a non-terminal curate/query task in
the tasks store. On Ctrl+Q the hook calls cancelTask({taskId}) for
the most recently created running task.
- Architectural note: curate-flow and query-flow components unmount
within ~100ms of task:create ack, so they cannot own a binding that
must be active for the entire task lifetime. The keybind is scoped
by running-task presence in the store, hosted via an invisible
CancelKeybindInitializer mounted next to TaskSubscriptionInitializer.
REPL UX is unchanged (prompt stays responsive while task runs).
- A pure selectCancelTargetTaskId selector picks the target taskId —
most recently created non-terminal task, or undefined when nothing
is cancellable. Splitting the pure logic from the React hook keeps
it unit-testable without Ink.
- Ctrl-C remains the TUI's exit shortcut; only Ctrl+Q is intercepted.
Known limitation: some terminal emulators map Ctrl+Q to XON flow-
control; Ink raw mode should neutralize that on common emulators
but is not guaranteed everywhere.
- 6 unit tests cover the selector: empty map, all-terminal, single
running, multiple running (newest wins), created-status, and
terminal-newer-than-running.
…ery typing Post-review cleanup of T2.5 (ENG-2783): - Extract runCancelBranchWithRetry into src/oclif/lib/cancel-task.ts so curate / query / dream stop cloning the same 22-line withDaemonRetry wrapper. Per-command runCancelBranch becomes a 10-line delegation: pass command name, daemon-client options, format, log, transport-error handler, and taskId; receive a boolean for exit-code decision. - Wrap the SIGINT handler's client.request call in try/catch so a synchronously-throwing transport (already-disconnected socket, etc.) cannot escape the signal handler and crash the process with an uncaught exception. The wait loop still surfaces the disconnect via timeout or state-change paths. - Drop the redundant 'as QueryFlags' cast in query.ts entirely along with the now-unused QueryFlags type alias. Use oclif's inferred flag type directly and narrow 'format' inline via a ternary, matching the pattern in curate/index.ts. - 3 new unit tests for runCancelBranchWithRetry: success path, daemon- reported failure (no transport throw), and transport-throw routed to onTransportError. Existing per-command --cancel tests pass unchanged because the per-command behaviour is the same.
When the foreground curate / query / dream wait loop sees a
task:cancelled broadcast (driven by a remote 'brv <cmd> --cancel <id>'
from another terminal, or the TUI Ctrl+Q keybind), the command now:
- prints '✗ <Command> cancelled (Task: <id>)' in text mode, or writes
a {status: 'cancelled', event: 'cancelled'} JSON envelope in json
mode, so consumers can distinguish cancel from success;
- exits with code 130 (the conventional SIGINT exit) so shell scripts
and CI can branch on cancellation.
Wiring detail worth noting: the wasCancelled flag is hoisted to run()
and bubbled up by submitTask via its return shape. The this.exit(130)
call lives AFTER the withDaemonRetry try/catch — throwing it inside
the callback would be swallowed by the catch (which routes any throw
through reportError) and the exit code would be lost.
6 new unit tests cover the remote-cancel path in text and json mode
across all three commands. Existing tests still pass unchanged.
When withDaemonRetry retries a task:cancel after a dropped response, the
daemon's handleTaskCancel would look up the taskId in this.tasks, miss
(the prior cancel had already moved it to completedTasks), and return
{error: 'Task not found', success: false}. The CLI then reported a
misleading 'Failed to cancel' even though the cancel actually worked.
Make the lookup idempotent: if the taskId is absent from this.tasks but
present in completedTasks with status: 'cancelled', return success.
Truly unknown taskIds and tasks that reached a different terminal state
(completed / error) still return the structured error, so the existing
contract for those cases is preserved.
This covers Nit-3 from the cancel-pipeline post-review.
1 new unit test in test/unit/infra/process/task-router.test.ts asserts
the retry-after-cancel path returns {success: true}. Existing tests
including 'should return error for unknown taskId' continue to pass.
…cess flag
Two small post-review nits from the cancel-pipeline review:
Nit-4 — empty catch {} in the SIGINT handler hid programmer-error
throws (bad payload, transport regression) silently. Add a single
stderr write inside the catch so a genuine bug leaves a breadcrumb.
Comment expanded to spell out why we still swallow rather than
propagate (the wait loop's timeout / state-change paths already
surface a true transport drop).
Nit-5 — the cancelled-path JSON envelope previously set success: true
while the process exits 130. Two signals pointing opposite ways
confused consumers parsing both. Flip JSON success to false so it
tracks the exit code; cancellation semantics still live in
data.status: 'cancelled' for code that needs to distinguish cancel
from error. Updated the 3 'remote cancel during foreground wait' JSON
tests across curate/query/dream to expect success: false. No
behaviour change in text mode or in exit codes.
Touches src/oclif/lib/task-client.ts and the 3 CLI command files only;
test/commands/{curate,query,dream}.test.ts get the matching assertion
flip.
The previous Nit-3 fix only looked up the cancelled status in the in-memory `completedTasks` map. That map is cleared by a setTimeout after `TASK_CLEANUP_GRACE_PERIOD_MS` (5s), so a cancel retry arriving after the grace window would still report "Task not found" even though the task was successfully cancelled — semantically wrong, since the task is in fact cancelled in the persistent history. Extend handleTaskCancel with a durable second fallback: when the task is gone from both this.tasks AND completedTasks, look it up by id in the persistent task history store (resolved via the client's project path). If the persisted status is 'cancelled', return success. Any other status (completed / error) or a missing entry still returns the structured "Task not found". The handler becomes async because store.getById is async; the existing sync test invocations get `await` added. No production caller depended on the sync return — `transport.onRequest` supports both shapes. 2 new tests in task-router.test.ts: - 'returns success on a retry after completedTasks has aged out — durable via persistent history store' - 'still returns Task not found when the history store has a non-cancelled status (e.g. completed)' The original in-memory fast path is preserved so the common case (retry within 5s) avoids a disk read.
- task-router: cancelTaskLocally now stamps status='cancelled' and routes through moveToCompleted so the in-memory fast-path idempotency catches sequential retries on the queued/no-agent cancel paths. Previously the retry raced the fire-and-forget durable hook write and could see a misleading 'Task not found'. Adds regression test. - transport: document TaskCancelResponse.success semantics across the four reply paths (queued, forwarded, idempotent retry, not-found), plus an inline note above the forward-to-agent branch in task-router. - oclif: inline the runCancelBranch wrapper in curate/dream/query — the three identical 9-line private methods are gone; the single use site in each command now calls runCancelBranchWithRetry directly. - task-client: introduce WaitForTaskClient = Pick<ITransportClient, ...> so the SIGINT test stub satisfies the contract structurally without an 'as unknown as ITransportClient' cast; installSigintCancel narrowed to Pick<..., 'request'> for the same reason. - tui: expand JSDoc on use-cancel-running-task-keybind with a RESERVED CHORD WARNING so contributors know Ctrl+Q is globally intercepted whenever any non-terminal task exists in the tasks store.
Conflicts: src/oclif/commands/curate/index.ts src/oclif/commands/dream.ts src/oclif/commands/query.ts src/oclif/lib/task-client.ts
|
Claude finished @bao-byterover's task in 6m 49s —— View job Claude PR Review – #680 (Task Cancellation)
SummaryOverall score: 4.5 / 5 This is a thoughtful, well-tested implementation of an end-to-end cancel pipeline that threads cleanly through the CLI ↔ daemon ↔ agent process boundaries. The design splits responsibilities correctly: oclif owns SIGINT and Strengths
Issues / suggestions (inline)
None of these block landing. Issues 1 and 2 are the only ones with real production-impact potential; the rest are quality/consistency polish. |
| try { | ||
| deps.client.request(TaskEvents.CANCEL, {taskId: deps.taskId}) | ||
| } catch (error) { | ||
| // Transport unavailable — the wait loop will time out or see a | ||
| // disconnect event and surface that path instead. Log the swallowed | ||
| // throw to stderr so a genuine bug (bad payload, transport regression) | ||
| // leaves a breadcrumb instead of failing silently. | ||
| const message = error instanceof Error ? error.message : String(error) | ||
| process.stderr.write(`[brv] cancel request failed locally: ${message}\n`) | ||
| } |
There was a problem hiding this comment.
issue (bug): The try/catch only catches synchronous throws, but client.request(...) is declared to return a Promise (see WaitForTaskClient = Pick<ITransportClient, 'on' | 'onStateChange' | 'request'>). If the request rejects asynchronously (network blip, daemon disconnected mid-send), the promise becomes unhandled — which can crash Node 22+ via the default unhandledRejection behavior, defeating the whole point of "Ctrl-C should never make things worse than they were."
Suggest attaching a .catch so async rejections take the same logging path:
| try { | |
| deps.client.request(TaskEvents.CANCEL, {taskId: deps.taskId}) | |
| } catch (error) { | |
| // Transport unavailable — the wait loop will time out or see a | |
| // disconnect event and surface that path instead. Log the swallowed | |
| // throw to stderr so a genuine bug (bad payload, transport regression) | |
| // leaves a breadcrumb instead of failing silently. | |
| const message = error instanceof Error ? error.message : String(error) | |
| process.stderr.write(`[brv] cancel request failed locally: ${message}\n`) | |
| } | |
| try { | |
| void Promise.resolve(deps.client.request(TaskEvents.CANCEL, {taskId: deps.taskId})).catch((error: unknown) => { | |
| const message = error instanceof Error ? error.message : String(error) | |
| process.stderr.write(`[brv] cancel request failed locally: ${message}\n`) | |
| }) | |
| } catch (error) { | |
| // Transport unavailable — the wait loop will time out or see a | |
| // disconnect event and surface that path instead. Log the swallowed | |
| // throw to stderr so a genuine bug (bad payload, transport regression) | |
| // leaves a breadcrumb instead of failing silently. | |
| const message = error instanceof Error ? error.message : String(error) | |
| process.stderr.write(`[brv] cancel request failed locally: ${message}\n`) | |
| } |
| // If Agent connected for this task's project, forward cancel request. | ||
| // NOTE: `success: true` here means "queued for forward", NOT "agent | ||
| // confirmed cancellation". The agent may have already finished the task | ||
| // (race window between daemon dispatch and agent terminal event), in | ||
| // which case agent-cancel-listener returns false and emits no | ||
| // task:cancelled. Callers must rely on the terminal event broadcast, | ||
| // not this success flag, to know the final task state. See JSDoc on | ||
| // TaskCancelResponse for full semantics. |
There was a problem hiding this comment.
suggestion (quality): The forward-to-agent path right below this comment does not stamp any "cancel-in-flight" marker on the live TaskInfo before returning. A rapid double-cancel (or a retried cancel request) will re-enter this branch and forward task:cancel to the agent a second time before the first cancel's terminal task:cancelled has been emitted.
handleAgentCancelEvent correctly returns silently on the second call, so this isn't a correctness bug — but it does mean an extra socket round-trip and an extra "task:cancel received" log line per duplicate.
Consider stamping this.tasks.set(taskId, {...task, status: 'cancelling'}) and short-circuiting subsequent cancels of an already-cancelling task with {success: true}. The fast-path in the no-task branch above (prior?.task.status === 'cancelled') already established a precedent for status-based idempotency.
Not blocking — scenario 4 in cancel-pipeline.test.ts already proves the user-observable outcome is one terminal event. Flagging in case you want to tighten the daemon↔agent chatter.
| this.tasks.set(taskId, {...task, completedAt: Date.now(), status: 'cancelled'}) | ||
| this.moveToCompleted(taskId) | ||
| this.notifyHooksCancelled(taskId, task).catch(() => {}) |
There was a problem hiding this comment.
nit (consistency): Line 623 stamps status: 'cancelled' on the in-memory entry, but the task argument forwarded to notifyHooksCancelled on line 625 is still the un-mutated reference captured by the caller — hooks see a TaskInfo without status: 'cancelled' and without completedAt. The TaskHistoryHook.onTaskCancelled builds its persisted entry from a separate code path so this doesn't break persistence, but a future hook that reads task.status directly would observe an inconsistent snapshot.
Trivial fix: forward the just-set value so the hook sees what it expects.
| this.tasks.set(taskId, {...task, completedAt: Date.now(), status: 'cancelled'}) | |
| this.moveToCompleted(taskId) | |
| this.notifyHooksCancelled(taskId, task).catch(() => {}) | |
| const cancelledTask = {...task, completedAt: Date.now(), status: 'cancelled' as const} | |
| this.tasks.set(taskId, cancelledTask) | |
| this.moveToCompleted(taskId) | |
| this.notifyHooksCancelled(taskId, cancelledTask).catch(() => {}) |
| * RESERVED CHORD WARNING (read before binding Ctrl+Q anywhere else in the TUI): | ||
| * - This keybind is mounted globally via `app-providers.tsx → CancelKeybindInitializer`, | ||
| * so as soon as ANY non-terminal task exists in the tasks store, Ctrl+Q is | ||
| * intercepted everywhere — text inputs, REPL prompt, search box, modals, etc. | ||
| * - DO NOT bind Ctrl+Q to any other action while a curate/query/dream task can | ||
| * be in flight; the global cancel will eat the chord first. | ||
| * - The binding targets `selectCancelTargetTaskId`'s "most recently created | ||
| * non-terminal task." With multiple in-flight tasks, Ctrl+Q always cancels | ||
| * the most recent one, not the one currently focused in the UI. | ||
| * - There is no confirmation step today — the cancel network round-trip starts | ||
| * the instant the chord lands. If a real-world conflict surfaces (user | ||
| * accidentally cancelling), the recommended follow-up is to mirror the SIGINT | ||
| * double-tap pattern in `installSigintCancel` (first Ctrl+Q arms, second | ||
| * within Ns fires). | ||
| */ |
There was a problem hiding this comment.
praise: This RESERVED CHORD WARNING block is exactly the kind of why-this-exists doc that prevents accidental regressions. The fact that it explicitly calls out the global interception, the "no confirmation" risk, and the proposed double-tap mitigation path makes future debates much easier.
| /** | ||
| * Daemon's reply to a `task:cancel` request. | ||
| * | ||
| * Semantics of `success`: | ||
| * - `true` from a queued/local-cancel path → terminal `task:cancelled` has | ||
| * already been broadcast by the daemon; the task is fully cancelled. | ||
| * - `true` from the forward-to-agent path → the cancel was forwarded to a | ||
| * live agent, NOT confirmed. The agent may have already finished the task | ||
| * (race), in which case no `task:cancelled` follows; the wait loop must | ||
| * still rely on the actual terminal event broadcast (`task:cancelled` / | ||
| * `task:completed` / `task:error`) to determine the final state. | ||
| * - `true` from an idempotent retry → the task was previously observed as | ||
| * `status: 'cancelled'` (in-memory cache or persistent history store). | ||
| * - `false` with `error: 'Task not found'` → daemon has no record of the | ||
| * task in any tier (active / completed-cache / persistent history). | ||
| */ | ||
| export interface TaskCancelResponse { |
There was a problem hiding this comment.
praise: The semantics matrix on TaskCancelResponse.success is genuinely excellent — distinguishing "queued-for-forward" from "agent-confirmed" from "idempotent-retry" upfront removes the kind of "wait, what does true actually mean?" ambiguity that bites cancel APIs everywhere. The accompanying Callers must rely on the terminal event broadcast directive is the right escape hatch and is consistently honored by both runCancelTask (oclif) and useCancelRunningTaskKeybind (TUI).
| export const cancelTask = async (payload: TaskCancelRequest): Promise<TaskCancelResponse> => { | ||
| const {apiClient} = useTransportStore.getState() | ||
| if (!apiClient) throw new Error('Not connected') | ||
|
|
||
| const response = await apiClient.request<TaskCancelResponse, TaskCancelRequest>(TaskEvents.CANCEL, payload) | ||
| if (!response.success) throw new Error(response.error ?? 'Cancel failed') | ||
| return response | ||
| } |
There was a problem hiding this comment.
nit (consistency): This helper throws on !response.success, but the only caller (useCancelRunningTaskKeybind → .catch(() => {})) intentionally silences errors per its comment. That's a legitimate design choice, but it means:
- The
error: response.error ?? 'Cancel failed'message is built and immediately discarded — dead code at the integration point. - A "Task not found" idempotent retry (which the daemon doc-comments explicitly bless as expected) is treated as an error by this helper.
If you want to keep the throw semantics for future callers, that's fine — but consider noting in the JSDoc that the daemon's success: false for "Task not found" is a documented idempotent-retry outcome, not a failure, so future consumers don't wire it as user-visible error UI. Alternatively, return the full TaskCancelResponse unchanged and let each caller decide.
| const wait = waitForTaskCompletion( | ||
| { | ||
| client: stub.client, | ||
| command: 'curate', | ||
| format: 'text', | ||
| onCancelled: (result) => cancelled.push(result), | ||
| onCompleted() {}, | ||
| onError() {}, | ||
| taskId: 'task-1', | ||
| timeoutMs: 5000, | ||
| }, | ||
| () => {}, |
There was a problem hiding this comment.
nit: Multiple test cases pass timeoutMs: 5000 to waitForTaskCompletion, but WaitForTaskOptions (in src/oclif/lib/task-client.ts) does not declare a timeoutMs field — only the deprecated --timeout flag still exists in the command layer, and the heartbeat-driven stale check is the actual liveness signal now.
These extra keys are silently dropped (test tsconfig isn't strict about excess props on inline literals), so the tests pass — but they read as if there's a per-test override. Either drop the key throughout this file or thread it through to a real option to keep the test surface honest. Same pattern appears in scenarios at lines 95, 121, 147, 178, 206, 232, 259, 288.
| if (this.getTaskHistoryStore !== undefined) { | ||
| const projectPath = this.resolveClientProjectPath?.(clientId) | ||
| if (projectPath !== undefined) { | ||
| try { | ||
| const store = this.getTaskHistoryStore(projectPath) | ||
| const entry = await store.getById(taskId) | ||
| if (entry?.status === 'cancelled') { | ||
| return {success: true} | ||
| } | ||
| } catch (error) { | ||
| transportLog( | ||
| `handleTaskCancel: history-store lookup failed for ${taskId}: ${error instanceof Error ? error.message : String(error)}`, | ||
| ) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
issue (resilience): This durable-idempotency path is unbounded — store.getById(taskId) runs without a timeout, and the daemon will hold the cancel request open until disk I/O completes. On a healthy filesystem this is microseconds, but clearCompleted running on the same project can compact the index concurrently, and task:list pass-2 already races against getById in handleTaskList (line 1466). A stuck store call here will hang every subsequent task:cancel from the same client because the wire layer serializes ACK responses per request id.
Two cheap improvements:
- Race the
getByIdagainst a boundedsetTimeout(1–2s is generous) so a misbehaving store falls through toTask not foundrather than blocking the response. - The catch block already logs and falls through to the not-found return — verify by reading: ✅ it does. So the bug is purely "no timeout," not "no recovery."
Not blocking — the integration test suite never exercises a slow store — but cheap insurance for production.
The waitForTaskCompletion options interface lost timeoutMs when the per-call setTimeout was replaced with a heartbeat-driven stale check, but this test file kept the field, breaking the suite with TS2353.
|
Claude finished @bao-byterover's task in 3m 27s —— View job Claude PR Review – #680 (Task Cancellation) — Re-review
Summary (re-review)Updated overall score: 4.5 / 5 — unchanged. The branch is ship-ready modulo two real issues (#1, #2 below) that are cheap to fix and have known production-impact paths. What was addressed since the last review (5/6 acknowledged, 1/6 actually closed)
The follow-up commits did land valuable adjacent improvements that weren't on the original list — calling those out so they don't go unnoticed:
New finding from this pass
What's still solid
RecommendationLand it. Issues #1, #2, and the new #7 are the only items I'd want addressed in this branch (each is a 1–3 line change); #3, #4, #5 are polish that can ship as a follow-up. Two of three blockers are async-rejection patterns and share the same one-line fix shape — |
| try { | ||
| const cancelled = await agent.cancelTask(taskId) | ||
| if (!cancelled) return | ||
| transport.request(TransportTaskEventNames.CANCELLED, {taskId}) |
There was a problem hiding this comment.
issue (bug): Same async-rejection class as Issue #1 from the previous review — but with a sharper consequence here. transport.request(...) is declared to return a Promise, but it's not awaited and isn't .catch()-ed. The surrounding try/catch only handles the synchronous throw path and the await agent.cancelTask(taskId) rejection; an async rejection from transport.request(...) escapes both.
That matters more on this path than on the SIGINT one because agent-process.ts:942 installs a process-level unhandledRejection handler that calls process.exit(1). So a transient transport hiccup while emitting the post-cancel task:cancelled echo would kill the whole agent child process — exactly the surface the cancel pipeline was built to keep cheap and isolated.
Cheap fix — make the fire-and-forget intent explicit and contain the failure:
| transport.request(TransportTaskEventNames.CANCELLED, {taskId}) | |
| const cancelled = await agent.cancelTask(taskId) | |
| if (!cancelled) return | |
| Promise.resolve(transport.request(TransportTaskEventNames.CANCELLED, {taskId})).catch((emitError: unknown) => { | |
| const emitMessage = emitError instanceof Error ? emitError.message : String(emitError) | |
| log(`task:cancelled emit failed taskId=${taskId} err=${emitMessage}`) | |
| }) | |
| } catch (error) { | |
| const message = error instanceof Error ? error.message : String(error) | |
| log(`task:cancel handler error taskId=${taskId} err=${message}`) | |
| } |
Note: same fix is still needed at src/oclif/lib/task-client.ts:205 (Issue #1 from the prior review, still open on this commit).
feat: [ENG-2784] add WebUI cancel button to task list rows and detail…
|
Claude finished @ncnthien's task in 3m 40s —— View job Claude PR Review – #680 (Task Cancellation) — Re-review of latest commits
Summary (re-review)Overall score: 4.5 / 5 — unchanged. Branch remains ship-ready. The new WebUI cancel surface (commits Status of items from the prior two reviews
Findings from this pass (the latest two commits)
What I'd ship as-is vs. what I'd want fixed before landingBlocking-shaped (still 1–3 lines each):
Polish that can ship as a follow-up:
What remains rock-solid
RecommendationLand it after addressing Issues #1, #2, #7 (each is a one- to three-line change with a known fix pattern). The rest are quality polish. |
| if (!apiClient) throw new Error('Not connected') | ||
|
|
||
| const response = await apiClient.request<TaskCancelResponse, TaskCancelRequest>(TaskEvents.CANCEL, payload) | ||
| if (!response.success) throw new Error(response.error ?? 'Cancel failed') |
There was a problem hiding this comment.
issue (consistency / UX): This helper is the WebUI twin of src/tui/features/tasks/api/cancel-task.ts:23 and inherits the same documented-idempotent footgun: TaskCancelResponse.success: false with error: 'Task not found' is the expected outcome for late retries (see the TaskCancelResponse JSDoc in src/shared/transport/events/task-events.ts:56-71, which now explicitly enumerates the four cases). Throwing here turns that idempotent outcome into a red-toast user error at the only callsite (task-list-view.tsx:264 toast.error(err.message)).
Concrete scenario: a user clicks Cancel, the daemon forwards to the agent, the agent terminates, the row transitions to cancelled, then later the user pulls up history and clicks Cancel again on a stale row (or the table state lags by one tick) — the user sees a "Task not found" error toast for what is contractually a success.
Two reasonable fixes:
- Treat
success: false && error === 'Task not found'as idempotent success and return the response unchanged. - Return
TaskCancelResponseunchanged and let the caller decide. ThecancelMutation.mutate({onError, onSuccess})shape intask-list-view.tsxalready encodes that branching cleanly.
If you want to keep the throw semantics for future callers, at minimum add a JSDoc note matching the contract on TaskCancelResponse so future consumers don't wire success: false straight into user-visible error UI.
| cancelMutation.mutate( | ||
| {taskId}, | ||
| { | ||
| onError(err) { | ||
| toast.error(err.message) | ||
| clear() | ||
| }, | ||
| onSuccess: clear, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
suggestion (UX): Clearing cancellingIds on onSuccess causes a visible button flicker: Spinner → "Cancel task" button → Delete button.
The race:
- User clicks Cancel —
cancellingIds.add(taskId)runs, row showsLoaderCircle+title="Cancelling…". - Daemon ACKs
{success: true}immediately (the forward-to-agent path attask-router.ts:720returns before the agent confirms — see the JSDoc onTaskCancelResponseand the comment attask-router.ts:710-716). onSuccess: clearruns.cancellingIds.has(taskId)is nowfalse, but the in-store status is still'started'(terminal event hasn't arrived yet), sorowActionKind(task.status)still returns'cancel'→ the row briefly renders the Cancel button again.- Eventually the agent emits
task:cancelled, the store transitions to terminal, and the row switches to the Delete button.
Two cheap fixes:
- Don't clear on
onSuccess. Instead, drive cancel-state clearance from auseEffectwatchingtasks— whenisTerminalStatus(task.status)becomes true for a tracked id, drop it from the Set. That keeps the spinner visible from click → terminal event, which is the user's mental model. - Or stamp the row visually as "cancelling" until it transitions: extend the
rowActionKinddecision to considercancellingIdsas well so the row always shows the spinner variant while a cancel is in flight, regardless of the daemon ACK.
Not a correctness bug, but the flicker is the kind of thing that erodes user trust in the Cancel button on low-latency local daemons.
| <TableCell className="text-center" onClick={(event) => event.stopPropagation()}> | ||
| {terminal && <RowAction onClick={() => onDelete(task.taskId)} />} | ||
| {actionKind === 'delete' ? ( | ||
| <DeleteRowAction onClick={() => onDelete(task.taskId)} /> | ||
| ) : ( | ||
| <CancelRowAction cancelling={cancelling} onClick={() => onCancel(task.taskId)} /> | ||
| )} | ||
| </TableCell> |
There was a problem hiding this comment.
praise: Mutually-exclusive row-action column based on rowActionKind(task.status) is a clean design — Cancel for active tasks, Delete for terminal — and the extracted utility with dedicated tests (row-action-kind.test.ts) keeps the decision easy to evolve as new statuses are added.
The companion Spinner / Cancelling… title on CancelRowAction (line 265) is also a nice touch — it communicates intent without needing a separate toast on click.
| {isActive && ( | ||
| <Button | ||
| aria-label="Cancel task" | ||
| className="ml-1 h-6 gap-1 border-red-500/40 px-2 text-red-400 hover:border-red-500/60 hover:bg-red-500/10 hover:text-red-300" | ||
| disabled={cancelling} | ||
| onClick={() => onCancel(task.taskId)} | ||
| size="xs" | ||
| variant="outline" | ||
| > | ||
| {cancelling ? <LoaderCircle className="size-3 animate-spin" /> : <CircleStop className="size-3" />} | ||
| {cancelling ? 'Cancelling…' : 'Cancel'} | ||
| </Button> | ||
| )} |
There was a problem hiding this comment.
nit (consistency): The table row (task-list-table.tsx:212-218) shows a Delete action for terminal tasks via rowActionKind, but the detail header here only renders the Cancel button when isActive and shows nothing for terminal tasks — so a user who opened the detail sheet for a completed / error / cancelled task has to close the sheet and find the row again to delete it. The two surfaces should probably mirror each other.
Cheap follow-up: when isTerminal, render a Delete button in the same slot (or via the same rowActionKind decision) so the detail header and the row action stay symmetric. Not blocking — calling out so the asymmetry doesn't ossify.
No description provided.