Skip to content

Proj/task cancellation#680

Open
bao-byterover wants to merge 36 commits into
mainfrom
proj/task-cancellation
Open

Proj/task cancellation#680
bao-byterover wants to merge 36 commits into
mainfrom
proj/task-cancellation

Conversation

@bao-byterover
Copy link
Copy Markdown
Collaborator

No description provided.

- ChatSession.cancel(taskId?) returns boolean so callers can tell
  whether a controller was found and aborted vs no-op.
- CipherAgent.cancelTask(taskId) fans cancel across live sessions and
  returns true when any session held the controller. Idempotent.
- Legacy no-arg cancel semantics unchanged (REPL Esc path).
- New handleAgentCancelEvent helper delegates to CipherAgent.cancelTask
  and emits task:cancelled upstream only when an active controller was
  held. Best-effort: any error from agent or transport is logged and
  swallowed so the cancel pipeline never crashes the agent.
- agent-process registers the listener alongside task:execute.
- Unit tests cover matched emit, unmatched no-op, agent-throw and
  transport-throw paths, idempotency, and per-event logging.
- handleExecutorTerminalError helper branches the executor catch:
  SessionCancelledError suppresses task:error (T1.1 emits task:cancelled);
  anything else takes the existing task:error path. Curate Phase 4 is
  naturally skipped because the executor throws before postWork is
  assigned; dream's finally still releases the lock via rollback on the
  abort path.
- task-router.handleTaskCancelled now calls agentPool.notifyTaskCompleted
  so the project FIFO drains after a cancel, symmetric with the
  completed/error paths.
- New IAgentPool.cancelQueuedTask delegates to ProjectTaskQueue.cancel.
  task-router.handleTaskCancel checks the queue first; queued tasks are
  removed and the daemon emits task:cancelled directly, never forwarding
  to the agent that holds no controller for them.
- Extract cancelTaskLocally to share broadcast and hook plumbing between
  the queued-cancel and no-agent-connected branches.
- Tests: 5 for handleExecutorTerminalError, 4 for task-router cancel
  paths, 3 for agent-pool cancelQueuedTask. Update 7 existing mock
  IAgentPool to satisfy the new interface method.
- New test/integration/cancel-pipeline.test.ts covers the 5 T1.4
  scenarios: cancel running, cancel running with queued, cancel queued,
  idempotent double-cancel, follow-up after cancel.
- Harness uses a real SocketIOTransportServer, a real TaskRouter, and a
  stub IAgentPool backed by a real ProjectTaskQueue so the queue/drain
  logic is exercised end-to-end. A TransportClient acts as the mock
  agent and honours per-task behaviors (wait-for-cancel or auto-complete)
  so the test driver controls when each task ends.
- History persistence is asserted via an ITaskLifecycleHook recorder
  rather than a real FileCurateLogStore — the recorder is the same hook
  interface CurateLogHandler implements in production to write
  status='cancelled' to history. The file-write itself stays covered by
  file-curate-log-store unit tests.
- Suite stays in-memory; waitFor polls with a bounded 1500ms deadline.
  Whole file finishes in ~210ms with no hanging promises.
- New src/oclif/lib/cancel-task.ts exposes runCancelTask({client,
  command, format, log, taskId}) — the only place in the CLI where the
  task:cancel transport event name and request shape appear. T2.2-T2.5
  per-command --cancel flags will call it.
- Text format prints "Cancelled <id>" on success, "Failed to cancel
  <id>: <reason>" on daemon-reported failure, with a generic fallback
  when the daemon omits the reason.
- JSON format reuses writeJsonResponse so the payload shape stays
  consistent with other commands' --format json output. The caller's
  command name is stamped on the envelope so curate/query/dream stay
  greppable.
- Returns boolean so the per-command caller can decide on exit code.
- 7 unit tests cover both formats, success and failure paths, transport
  payload shape, and the missing-error-reason fallback.
- New --cancel <id> flag short-circuits the create flow: no provider
  validation, no task:create, no context required. Delegates to the
  shared runCancelTask helper from T2.1 and exits 1 on daemon-reported
  failure so scripts can rely on the exit code.
- Mutually exclusive with --files, --folder, --detach (oclif's
  `exclusive`). --timeout is left allowed and silently ignored on the
  cancel branch; rejecting a harmless combo would surprise users.
- Existing curate behavior is untouched. The cancel branch is added
  alongside the normal path with zero overlap on the create code.
- 10 new tests in test/commands/curate.test.ts cover both formats,
  success and failure paths, mutex enforcement against each of the
  three exclusive flags, and the timeout-allowed case.
- Add --cancel <id> flag that short-circuits the query flow through
  the shared runCancelTask helper from T2.1.
- Relax the positional `query` argument to optional so --cancel can be
  used without a placeholder query string. A manual guard in run()
  preserves the existing missing-query error when neither input is
  given.
- Reject "<query> --cancel <id>" as a mutually exclusive combination
  with a clear error message in both text and JSON formats, exiting
  non-zero without touching the transport.
- Help text on both surfaces reflects the dual mode (run a query, or
  cancel by id).
- 7 new tests cover both formats, success and failure paths, the
  positional+cancel conflict, and missing-both rejection.
- Add --cancel <id> flag that short-circuits the dream flow through
  the shared runCancelTask helper from T2.1. Branches BEFORE the
  --undo path so cancel never accidentally triggers a revert.
- Mutually exclusive with --force, --undo, --detach (oclif's
  `exclusive`). --timeout is left allowed and silently ignored on the
  cancel branch.
- Help text makes clear that cancel is a hard stop and not a revert;
  pair with `brv dream --undo` for rollback of completed/partial
  dreams.
- 9 new tests cover both formats, success and failure paths, the
  mutex against each of the three exclusive flags, and the
  timeout-allowed case.
- waitForTaskCompletion installs a SIGINT handler for the duration of
  the wait. First Ctrl-C emits task:cancel for the current task and
  writes a "Cancelling task..." hint to stderr (kept off stdout so
  --format json output stays clean). Second Ctrl-C hard-exits with
  code 130 (conventional SIGINT exit) without re-emitting cancel.
- The wait loop now subscribes to task:cancelled as a terminal event.
  Previously it only handled task:completed and task:error, which
  meant a remote --cancel or a Ctrl-C cancel left the foreground
  process hanging until the existing timeout fired. The new branch
  fires the optional onCancelled callback and resolves the promise.
- The SIGINT handler is added to the same unsubscribers chain the
  existing cleanup() drains, so it is removed on every terminal path
  (completed, cancelled, error, disconnect, timeout).
- 8 new unit tests in test/unit/oclif/lib/task-client-sigint.test.ts
  cover both terminal recognition and the SIGINT install/uninstall
  contract.
- Add src/tui/features/tasks/api/cancel-task.ts mirroring the
  provider/api/cancel-oauth pattern: pulls the active apiClient from
  the TUI transport store, emits task:cancel with the taskId, returns
  the daemon's TaskCancelResponse, throws when the response reports
  success: false.
- No new transport client construction; reuses the established TUI
  pattern. Boundary preserved — no imports from server/, agent/, or
  oclif/.
- 5 unit tests cover the event payload, the success resolve, the
  daemon-reported error path, the missing-error fallback, and the
  not-connected guard. Test file establishes the new
  test/unit/tui/features/tasks/api/ folder for T4.2 to extend.
- New useCancelRunningTaskKeybind hook installs a scoped useInput
  that fires only while there is a non-terminal curate/query task in
  the tasks store. On Ctrl+Q the hook calls cancelTask({taskId}) for
  the most recently created running task.
- Architectural note: curate-flow and query-flow components unmount
  within ~100ms of task:create ack, so they cannot own a binding that
  must be active for the entire task lifetime. The keybind is scoped
  by running-task presence in the store, hosted via an invisible
  CancelKeybindInitializer mounted next to TaskSubscriptionInitializer.
  REPL UX is unchanged (prompt stays responsive while task runs).
- A pure selectCancelTargetTaskId selector picks the target taskId —
  most recently created non-terminal task, or undefined when nothing
  is cancellable. Splitting the pure logic from the React hook keeps
  it unit-testable without Ink.
- Ctrl-C remains the TUI's exit shortcut; only Ctrl+Q is intercepted.
  Known limitation: some terminal emulators map Ctrl+Q to XON flow-
  control; Ink raw mode should neutralize that on common emulators
  but is not guaranteed everywhere.
- 6 unit tests cover the selector: empty map, all-terminal, single
  running, multiple running (newest wins), created-status, and
  terminal-newer-than-running.
…ery typing

Post-review cleanup of T2.5 (ENG-2783):

- Extract runCancelBranchWithRetry into src/oclif/lib/cancel-task.ts so
  curate / query / dream stop cloning the same 22-line withDaemonRetry
  wrapper. Per-command runCancelBranch becomes a 10-line delegation:
  pass command name, daemon-client options, format, log, transport-error
  handler, and taskId; receive a boolean for exit-code decision.
- Wrap the SIGINT handler's client.request call in try/catch so a
  synchronously-throwing transport (already-disconnected socket, etc.)
  cannot escape the signal handler and crash the process with an
  uncaught exception. The wait loop still surfaces the disconnect via
  timeout or state-change paths.
- Drop the redundant 'as QueryFlags' cast in query.ts entirely along
  with the now-unused QueryFlags type alias. Use oclif's inferred flag
  type directly and narrow 'format' inline via a ternary, matching the
  pattern in curate/index.ts.
- 3 new unit tests for runCancelBranchWithRetry: success path, daemon-
  reported failure (no transport throw), and transport-throw routed to
  onTransportError. Existing per-command --cancel tests pass unchanged
  because the per-command behaviour is the same.
When the foreground curate / query / dream wait loop sees a
task:cancelled broadcast (driven by a remote 'brv <cmd> --cancel <id>'
from another terminal, or the TUI Ctrl+Q keybind), the command now:

- prints '✗ <Command> cancelled (Task: <id>)' in text mode, or writes
  a {status: 'cancelled', event: 'cancelled'} JSON envelope in json
  mode, so consumers can distinguish cancel from success;
- exits with code 130 (the conventional SIGINT exit) so shell scripts
  and CI can branch on cancellation.

Wiring detail worth noting: the wasCancelled flag is hoisted to run()
and bubbled up by submitTask via its return shape. The this.exit(130)
call lives AFTER the withDaemonRetry try/catch — throwing it inside
the callback would be swallowed by the catch (which routes any throw
through reportError) and the exit code would be lost.

6 new unit tests cover the remote-cancel path in text and json mode
across all three commands. Existing tests still pass unchanged.
When withDaemonRetry retries a task:cancel after a dropped response, the
daemon's handleTaskCancel would look up the taskId in this.tasks, miss
(the prior cancel had already moved it to completedTasks), and return
{error: 'Task not found', success: false}. The CLI then reported a
misleading 'Failed to cancel' even though the cancel actually worked.

Make the lookup idempotent: if the taskId is absent from this.tasks but
present in completedTasks with status: 'cancelled', return success.
Truly unknown taskIds and tasks that reached a different terminal state
(completed / error) still return the structured error, so the existing
contract for those cases is preserved.

This covers Nit-3 from the cancel-pipeline post-review.

1 new unit test in test/unit/infra/process/task-router.test.ts asserts
the retry-after-cancel path returns {success: true}. Existing tests
including 'should return error for unknown taskId' continue to pass.
…cess flag

Two small post-review nits from the cancel-pipeline review:

Nit-4 — empty catch {} in the SIGINT handler hid programmer-error
  throws (bad payload, transport regression) silently. Add a single
  stderr write inside the catch so a genuine bug leaves a breadcrumb.
  Comment expanded to spell out why we still swallow rather than
  propagate (the wait loop's timeout / state-change paths already
  surface a true transport drop).

Nit-5 — the cancelled-path JSON envelope previously set success: true
  while the process exits 130. Two signals pointing opposite ways
  confused consumers parsing both. Flip JSON success to false so it
  tracks the exit code; cancellation semantics still live in
  data.status: 'cancelled' for code that needs to distinguish cancel
  from error. Updated the 3 'remote cancel during foreground wait' JSON
  tests across curate/query/dream to expect success: false. No
  behaviour change in text mode or in exit codes.

Touches src/oclif/lib/task-client.ts and the 3 CLI command files only;
test/commands/{curate,query,dream}.test.ts get the matching assertion
flip.
The previous Nit-3 fix only looked up the cancelled status in the
in-memory `completedTasks` map. That map is cleared by a setTimeout
after `TASK_CLEANUP_GRACE_PERIOD_MS` (5s), so a cancel retry arriving
after the grace window would still report "Task not found" even though
the task was successfully cancelled — semantically wrong, since the
task is in fact cancelled in the persistent history.

Extend handleTaskCancel with a durable second fallback: when the task
is gone from both this.tasks AND completedTasks, look it up by id in
the persistent task history store (resolved via the client's project
path). If the persisted status is 'cancelled', return success. Any
other status (completed / error) or a missing entry still returns the
structured "Task not found".

The handler becomes async because store.getById is async; the existing
sync test invocations get `await` added. No production caller depended
on the sync return — `transport.onRequest` supports both shapes.

2 new tests in task-router.test.ts:
- 'returns success on a retry after completedTasks has aged out —
  durable via persistent history store'
- 'still returns Task not found when the history store has a
  non-cancelled status (e.g. completed)'

The original in-memory fast path is preserved so the common case
(retry within 5s) avoids a disk read.
- task-router: cancelTaskLocally now stamps status='cancelled' and routes
  through moveToCompleted so the in-memory fast-path idempotency catches
  sequential retries on the queued/no-agent cancel paths. Previously the
  retry raced the fire-and-forget durable hook write and could see a
  misleading 'Task not found'. Adds regression test.
- transport: document TaskCancelResponse.success semantics across the
  four reply paths (queued, forwarded, idempotent retry, not-found),
  plus an inline note above the forward-to-agent branch in task-router.
- oclif: inline the runCancelBranch wrapper in curate/dream/query — the
  three identical 9-line private methods are gone; the single use site
  in each command now calls runCancelBranchWithRetry directly.
- task-client: introduce WaitForTaskClient = Pick<ITransportClient, ...>
  so the SIGINT test stub satisfies the contract structurally without
  an 'as unknown as ITransportClient' cast; installSigintCancel narrowed
  to Pick<..., 'request'> for the same reason.
- tui: expand JSDoc on use-cancel-running-task-keybind with a RESERVED
  CHORD WARNING so contributors know Ctrl+Q is globally intercepted
  whenever any non-terminal task exists in the tasks store.
Conflicts:
	src/oclif/commands/curate/index.ts
	src/oclif/commands/dream.ts
	src/oclif/commands/query.ts
	src/oclif/lib/task-client.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Claude finished @bao-byterover's task in 6m 49s —— View job


Claude PR Review – #680 (Task Cancellation)

  • Read CLAUDE.md and PR diff
  • Review new oclif cancel pipeline (cancel-task.ts, task-client.ts, query/curate/dream SIGINT wiring)
  • Review daemon cancel path (task-router.ts, agent-cancel-listener.ts, agent-executor-error.ts, agent-pool.ts)
  • Review agent-side cancel path (cipher-agent.ts, chat-session.ts)
  • Review TUI cancel hooks (use-cancel-running-task-keybind.ts, cancel-task.ts, select-cancel-target.ts)
  • Post inline issues + summary

Summary

Overall score: 4.5 / 5

This is a thoughtful, well-tested implementation of an end-to-end cancel pipeline that threads cleanly through the CLI ↔ daemon ↔ agent process boundaries. The design splits responsibilities correctly: oclif owns SIGINT and --cancel <id>, the daemon owns idempotency + queue-vs-agent routing + lifecycle hooks, the agent owns per-task AbortController fan-out across sessions, and the TUI owns the Ctrl+Q affordance. The 457-line integration test (test/integration/cancel-pipeline.test.ts) exercises the five real-world scenarios (running cancel, queued cancel, idempotent double-cancel, post-cancel follow-up, agent stays warm) and gives me high confidence the user-observable behavior is correct.

Strengths

  • TaskCancelResponse semantics doc — the four-case matrix (queued-local / forward-to-agent / idempotent / not-found) is exactly the kind of upfront definition cancel APIs usually lack; both runCancelTask and the TUI helper honor it consistently. (praise comment)
  • RESERVED CHORD WARNING in use-cancel-running-task-keybind.ts — clear, calls out the global interception risk, and points at the double-tap mitigation if it ever bites. (praise comment)
  • Extracted handlers (agent-cancel-listener.ts, agent-executor-error.ts) — neat split that makes the agent-process dispatch logic unit-testable without forking a child.
  • Durable idempotency via persistent history store — late retries across daemon restarts return success: true instead of misleading "Task not found." Nice follow-through.
  • Strict TDD discipline — 13 new test files, including a 457-line integration spec, hit the project's TDD policy in CLAUDE.md.

Issues / suggestions (inline)

  1. installSigintCancel swallows async rejections (comment) — client.request(...) returns a Promise; the try/catch only catches sync throws. An async rejection becomes an unhandledRejection (process-fatal on Node 22+ default). One-line fix.
  2. handleTaskCancel durable-path I/O has no timeout (comment) — a slow store.getById can stall the cancel ACK indefinitely. Worth bounding.
  3. Double-forward to agent on rapid retries (comment) — no cancelling status stamp, so duplicate cancels re-forward to the agent. Functionally idempotent (agent handles it) but chatty. Optional polish.
  4. cancelTaskLocally passes pre-mutation task to hooks (comment) — hook sees TaskInfo without status: 'cancelled' / completedAt. Tiny consistency fix.
  5. TUI helper throws on "Task not found" (comment) — documented-idempotent outcome treated as error. Worth clarifying in JSDoc or returning the response unchanged.
  6. Test timeoutMs is unused (comment) — silently dropped across 8 scenarios. Nit.

None of these block landing. Issues 1 and 2 are the only ones with real production-impact potential; the rest are quality/consistency polish.
· Branch: proj/task-cancellation

Comment on lines +204 to +213
try {
deps.client.request(TaskEvents.CANCEL, {taskId: deps.taskId})
} catch (error) {
// Transport unavailable — the wait loop will time out or see a
// disconnect event and surface that path instead. Log the swallowed
// throw to stderr so a genuine bug (bad payload, transport regression)
// leaves a breadcrumb instead of failing silently.
const message = error instanceof Error ? error.message : String(error)
process.stderr.write(`[brv] cancel request failed locally: ${message}\n`)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug): The try/catch only catches synchronous throws, but client.request(...) is declared to return a Promise (see WaitForTaskClient = Pick<ITransportClient, 'on' | 'onStateChange' | 'request'>). If the request rejects asynchronously (network blip, daemon disconnected mid-send), the promise becomes unhandled — which can crash Node 22+ via the default unhandledRejection behavior, defeating the whole point of "Ctrl-C should never make things worse than they were."

Suggest attaching a .catch so async rejections take the same logging path:

Suggested change
try {
deps.client.request(TaskEvents.CANCEL, {taskId: deps.taskId})
} catch (error) {
// Transport unavailable — the wait loop will time out or see a
// disconnect event and surface that path instead. Log the swallowed
// throw to stderr so a genuine bug (bad payload, transport regression)
// leaves a breadcrumb instead of failing silently.
const message = error instanceof Error ? error.message : String(error)
process.stderr.write(`[brv] cancel request failed locally: ${message}\n`)
}
try {
void Promise.resolve(deps.client.request(TaskEvents.CANCEL, {taskId: deps.taskId})).catch((error: unknown) => {
const message = error instanceof Error ? error.message : String(error)
process.stderr.write(`[brv] cancel request failed locally: ${message}\n`)
})
} catch (error) {
// Transport unavailable — the wait loop will time out or see a
// disconnect event and surface that path instead. Log the swallowed
// throw to stderr so a genuine bug (bad payload, transport regression)
// leaves a breadcrumb instead of failing silently.
const message = error instanceof Error ? error.message : String(error)
process.stderr.write(`[brv] cancel request failed locally: ${message}\n`)
}

Comment on lines +709 to +716
// If Agent connected for this task's project, forward cancel request.
// NOTE: `success: true` here means "queued for forward", NOT "agent
// confirmed cancellation". The agent may have already finished the task
// (race window between daemon dispatch and agent terminal event), in
// which case agent-cancel-listener returns false and emits no
// task:cancelled. Callers must rely on the terminal event broadcast,
// not this success flag, to know the final task state. See JSDoc on
// TaskCancelResponse for full semantics.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (quality): The forward-to-agent path right below this comment does not stamp any "cancel-in-flight" marker on the live TaskInfo before returning. A rapid double-cancel (or a retried cancel request) will re-enter this branch and forward task:cancel to the agent a second time before the first cancel's terminal task:cancelled has been emitted.

handleAgentCancelEvent correctly returns silently on the second call, so this isn't a correctness bug — but it does mean an extra socket round-trip and an extra "task:cancel received" log line per duplicate.

Consider stamping this.tasks.set(taskId, {...task, status: 'cancelling'}) and short-circuiting subsequent cancels of an already-cancelling task with {success: true}. The fast-path in the no-task branch above (prior?.task.status === 'cancelled') already established a precedent for status-based idempotency.

Not blocking — scenario 4 in cancel-pipeline.test.ts already proves the user-observable outcome is one terminal event. Flagging in case you want to tighten the daemon↔agent chatter.

Comment on lines +623 to +625
this.tasks.set(taskId, {...task, completedAt: Date.now(), status: 'cancelled'})
this.moveToCompleted(taskId)
this.notifyHooksCancelled(taskId, task).catch(() => {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (consistency): Line 623 stamps status: 'cancelled' on the in-memory entry, but the task argument forwarded to notifyHooksCancelled on line 625 is still the un-mutated reference captured by the caller — hooks see a TaskInfo without status: 'cancelled' and without completedAt. The TaskHistoryHook.onTaskCancelled builds its persisted entry from a separate code path so this doesn't break persistence, but a future hook that reads task.status directly would observe an inconsistent snapshot.

Trivial fix: forward the just-set value so the hook sees what it expects.

Suggested change
this.tasks.set(taskId, {...task, completedAt: Date.now(), status: 'cancelled'})
this.moveToCompleted(taskId)
this.notifyHooksCancelled(taskId, task).catch(() => {})
const cancelledTask = {...task, completedAt: Date.now(), status: 'cancelled' as const}
this.tasks.set(taskId, cancelledTask)
this.moveToCompleted(taskId)
this.notifyHooksCancelled(taskId, cancelledTask).catch(() => {})

Comment on lines +12 to +26
* RESERVED CHORD WARNING (read before binding Ctrl+Q anywhere else in the TUI):
* - This keybind is mounted globally via `app-providers.tsx → CancelKeybindInitializer`,
* so as soon as ANY non-terminal task exists in the tasks store, Ctrl+Q is
* intercepted everywhere — text inputs, REPL prompt, search box, modals, etc.
* - DO NOT bind Ctrl+Q to any other action while a curate/query/dream task can
* be in flight; the global cancel will eat the chord first.
* - The binding targets `selectCancelTargetTaskId`'s "most recently created
* non-terminal task." With multiple in-flight tasks, Ctrl+Q always cancels
* the most recent one, not the one currently focused in the UI.
* - There is no confirmation step today — the cancel network round-trip starts
* the instant the chord lands. If a real-world conflict surfaces (user
* accidentally cancelling), the recommended follow-up is to mirror the SIGINT
* double-tap pattern in `installSigintCancel` (first Ctrl+Q arms, second
* within Ns fires).
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: This RESERVED CHORD WARNING block is exactly the kind of why-this-exists doc that prevents accidental regressions. The fact that it explicitly calls out the global interception, the "no confirmation" risk, and the proposed double-tap mitigation path makes future debates much easier.

Comment on lines +56 to 72
/**
* Daemon's reply to a `task:cancel` request.
*
* Semantics of `success`:
* - `true` from a queued/local-cancel path → terminal `task:cancelled` has
* already been broadcast by the daemon; the task is fully cancelled.
* - `true` from the forward-to-agent path → the cancel was forwarded to a
* live agent, NOT confirmed. The agent may have already finished the task
* (race), in which case no `task:cancelled` follows; the wait loop must
* still rely on the actual terminal event broadcast (`task:cancelled` /
* `task:completed` / `task:error`) to determine the final state.
* - `true` from an idempotent retry → the task was previously observed as
* `status: 'cancelled'` (in-memory cache or persistent history store).
* - `false` with `error: 'Task not found'` → daemon has no record of the
* task in any tier (active / completed-cache / persistent history).
*/
export interface TaskCancelResponse {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: The semantics matrix on TaskCancelResponse.success is genuinely excellent — distinguishing "queued-for-forward" from "agent-confirmed" from "idempotent-retry" upfront removes the kind of "wait, what does true actually mean?" ambiguity that bites cancel APIs everywhere. The accompanying Callers must rely on the terminal event broadcast directive is the right escape hatch and is consistently honored by both runCancelTask (oclif) and useCancelRunningTaskKeybind (TUI).

Comment on lines +18 to +25
export const cancelTask = async (payload: TaskCancelRequest): Promise<TaskCancelResponse> => {
const {apiClient} = useTransportStore.getState()
if (!apiClient) throw new Error('Not connected')

const response = await apiClient.request<TaskCancelResponse, TaskCancelRequest>(TaskEvents.CANCEL, payload)
if (!response.success) throw new Error(response.error ?? 'Cancel failed')
return response
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (consistency): This helper throws on !response.success, but the only caller (useCancelRunningTaskKeybind.catch(() => {})) intentionally silences errors per its comment. That's a legitimate design choice, but it means:

  1. The error: response.error ?? 'Cancel failed' message is built and immediately discarded — dead code at the integration point.
  2. A "Task not found" idempotent retry (which the daemon doc-comments explicitly bless as expected) is treated as an error by this helper.

If you want to keep the throw semantics for future callers, that's fine — but consider noting in the JSDoc that the daemon's success: false for "Task not found" is a documented idempotent-retry outcome, not a failure, so future consumers don't wire it as user-visible error UI. Alternatively, return the full TaskCancelResponse unchanged and let each caller decide.

Comment on lines +84 to +95
const wait = waitForTaskCompletion(
{
client: stub.client,
command: 'curate',
format: 'text',
onCancelled: (result) => cancelled.push(result),
onCompleted() {},
onError() {},
taskId: 'task-1',
timeoutMs: 5000,
},
() => {},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Multiple test cases pass timeoutMs: 5000 to waitForTaskCompletion, but WaitForTaskOptions (in src/oclif/lib/task-client.ts) does not declare a timeoutMs field — only the deprecated --timeout flag still exists in the command layer, and the heartbeat-driven stale check is the actual liveness signal now.

These extra keys are silently dropped (test tsconfig isn't strict about excess props on inline literals), so the tests pass — but they read as if there's a per-test override. Either drop the key throughout this file or thread it through to a real option to keep the test surface honest. Same pattern appears in scenarios at lines 95, 121, 147, 178, 206, 232, 259, 288.

Comment on lines +678 to +693
if (this.getTaskHistoryStore !== undefined) {
const projectPath = this.resolveClientProjectPath?.(clientId)
if (projectPath !== undefined) {
try {
const store = this.getTaskHistoryStore(projectPath)
const entry = await store.getById(taskId)
if (entry?.status === 'cancelled') {
return {success: true}
}
} catch (error) {
transportLog(
`handleTaskCancel: history-store lookup failed for ${taskId}: ${error instanceof Error ? error.message : String(error)}`,
)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (resilience): This durable-idempotency path is unbounded — store.getById(taskId) runs without a timeout, and the daemon will hold the cancel request open until disk I/O completes. On a healthy filesystem this is microseconds, but clearCompleted running on the same project can compact the index concurrently, and task:list pass-2 already races against getById in handleTaskList (line 1466). A stuck store call here will hang every subsequent task:cancel from the same client because the wire layer serializes ACK responses per request id.

Two cheap improvements:

  1. Race the getById against a bounded setTimeout (1–2s is generous) so a misbehaving store falls through to Task not found rather than blocking the response.
  2. The catch block already logs and falls through to the not-found return — verify by reading: ✅ it does. So the bug is purely "no timeout," not "no recovery."

Not blocking — the integration test suite never exercises a slow store — but cheap insurance for production.

The waitForTaskCompletion options interface lost timeoutMs when the
per-call setTimeout was replaced with a heartbeat-driven stale check,
but this test file kept the field, breaking the suite with TS2353.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Claude finished @bao-byterover's task in 3m 27s —— View job


Claude PR Review – #680 (Task Cancellation) — Re-review

  • Read CLAUDE.md + diff against origin/main
  • Check whether the 6 items from the last review were addressed
  • Re-review the latest fix commits (cancelTaskLocally cache miss, durable idempotency, stale timeoutMs removal, JSON success flag flip)
  • Spot-check anything new
  • Post inline comments + updated summary

Summary (re-review)

Updated overall score: 4.5 / 5 — unchanged. The branch is ship-ready modulo two real issues (#1, #2 below) that are cheap to fix and have known production-impact paths.

What was addressed since the last review (5/6 acknowledged, 1/6 actually closed)

# Item Status
1 Async rejection in installSigintCancel ❌ Still open at src/oclif/lib/task-client.ts:204-213client.request(...) returns a Promise and is neither awaited nor .catch()-ed; the existing try/catch only handles synchronous throws
2 Unbounded store.getById on durable cancel path ❌ Still open at task-router.ts:683 — extensive comments were added but no timeout wrapper
3 Double-forward to agent on rapid cancels ❌ Still open — the forward branch (task-router.ts:717-720) does not stamp a cancelling status. Marked optional last time; still optional
4 cancelTaskLocally passes pre-mutation task to hooks ❌ Still open — task-router.ts:625 still forwards the original task argument; the in-memory map is mutated correctly via this.tasks.set(...) on line 623, but the hook callee gets the stale snapshot. Trivial fix
5 TUI helper throws on documented-idempotent "Task not found" ❌ Still open at src/tui/features/tasks/api/cancel-task.ts:23 — no JSDoc note added; behavior unchanged
6 Unused timeoutMs: 5000 in SIGINT tests ✅ Closed by 06b9c07

The follow-up commits did land valuable adjacent improvements that weren't on the original list — calling those out so they don't go unnoticed:

  • acb2acc durable idempotency via the persistent task history store is the right fix for the cache-miss race after TASK_CLEANUP_GRACE_PERIOD_MS. Without it, late cancel retries would have returned a misleading "Task not found" even though the task was persisted as cancelled. The two new tests (returns success on a retry after completedTasks has aged out / still returns Task not found when the history store has a non-cancelled status) lock the contract down well.
  • c6d77da aligning JSON success: false with exit code 130 removes the contradictory-signal problem (success: true with non-zero exit). The cancel state still lives in data.status: 'cancelled', so consumers that want to distinguish cancel-from-error still can. Right call.
  • 45a0ff0 cancelTaskLocally cache miss correctly stamps status: 'cancelled' on the in-memory entry before moveToCompleted, which closes the race in the queued/no-agent cancel paths. Nice catch.
  • TaskCancelResponse JSDoc in shared/transport/events/task-events.ts:56-71 now spells out all four cases of success. Worth keeping.
  • runCancelBranchWithRetry inlined into curate/dream/query removed three identical 9-line wrappers — small DRY win.

New finding from this pass

  1. agent-cancel-listener.ts:42 has the same async-rejection bug as Issue Feat/init #1, but with a sharper blast radiustransport.request(...) is fire-and-forget and not .catch()-ed, and agent-process.ts:942 installs unhandledRejectionprocess.exit(1). A transient transport hiccup while emitting the post-cancel echo would kill the agent child process. Inline comment with suggested fix.

What's still solid

  • The 457-line cancel-pipeline.test.ts integration suite remains the strongest part of the PR — five real scenarios end-to-end.
  • New unit coverage in task-router.test.ts for both the in-memory and durable idempotency branches is exactly the level of test discipline CLAUDE.md asks for.
  • The RESERVED CHORD WARNING block on use-cancel-running-task-keybind.ts:12-26 is even better than before — it explicitly names the global-interception risk and points at the double-tap mitigation pattern.

Recommendation

Land it. Issues #1, #2, and the new #7 are the only items I'd want addressed in this branch (each is a 1–3 line change); #3, #4, #5 are polish that can ship as a follow-up. Two of three blockers are async-rejection patterns and share the same one-line fix shape — Promise.resolve(...).catch(...).
· Branch: proj/task-cancellation

try {
const cancelled = await agent.cancelTask(taskId)
if (!cancelled) return
transport.request(TransportTaskEventNames.CANCELLED, {taskId})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug): Same async-rejection class as Issue #1 from the previous review — but with a sharper consequence here. transport.request(...) is declared to return a Promise, but it's not awaited and isn't .catch()-ed. The surrounding try/catch only handles the synchronous throw path and the await agent.cancelTask(taskId) rejection; an async rejection from transport.request(...) escapes both.

That matters more on this path than on the SIGINT one because agent-process.ts:942 installs a process-level unhandledRejection handler that calls process.exit(1). So a transient transport hiccup while emitting the post-cancel task:cancelled echo would kill the whole agent child process — exactly the surface the cancel pipeline was built to keep cheap and isolated.

Cheap fix — make the fire-and-forget intent explicit and contain the failure:

Suggested change
transport.request(TransportTaskEventNames.CANCELLED, {taskId})
const cancelled = await agent.cancelTask(taskId)
if (!cancelled) return
Promise.resolve(transport.request(TransportTaskEventNames.CANCELLED, {taskId})).catch((emitError: unknown) => {
const emitMessage = emitError instanceof Error ? emitError.message : String(emitError)
log(`task:cancelled emit failed taskId=${taskId} err=${emitMessage}`)
})
} catch (error) {
const message = error instanceof Error ? error.message : String(error)
log(`task:cancel handler error taskId=${taskId} err=${message}`)
}

Note: same fix is still needed at src/oclif/lib/task-client.ts:205 (Issue #1 from the prior review, still open on this commit).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

Claude finished @ncnthien's task in 3m 40s —— View job


Claude PR Review – #680 (Task Cancellation) — Re-review of latest commits

  • Read CLAUDE.md and PR diff vs origin/main
  • Re-check status of previously flagged issues (1–7)
  • Spot-check the latest commits (WebUI cancel button, unified cancel state, pending feedback)
  • Post inline comments + summary

Summary (re-review)

Overall score: 4.5 / 5 — unchanged. Branch remains ship-ready. The new WebUI cancel surface (commits 5a559ff + 98c0eb3) is well-shaped and well-tested; the few remaining items are quality/consistency polish, not blockers.

Status of items from the prior two reviews

# Item Status
1 installSigintCancel swallows async rejections (task-client.ts:204-213) ❌ Still open — client.request(...) returns a Promise, neither awaited nor .catch()-ed. The comment on lines 198-203 now explicitly acknowledges "wrapped in try/catch so a synchronously-throwing transport… doesn't propagate," which confirms the gap rather than closing it
2 Unbounded store.getById on durable cancel path (task-router.ts:683) ❌ Still open — extensive comments, no timeout wrapper
3 Double-forward to agent on rapid cancels (task-router.ts:717-720) ❌ Still open — optional polish
4 cancelTaskLocally passes pre-mutation task to hooks (task-router.ts:625) ❌ Still open — trivial fix
5 TUI helper throws on documented-idempotent "Task not found" (tui/.../cancel-task.ts:23) ❌ Still open
6 Unused timeoutMs: 5000 in SIGINT tests ✅ Closed by 06b9c07
7 agent-cancel-listener.ts:42 async-rejection (kills agent child via unhandledRejectionprocess.exit(1)) ❌ Still open — transport.request(...) is fire-and-forget; agent-process.ts:947 confirmed process.exit(1) on unhandledRejection

Findings from this pass (the latest two commits)

5a559ff — WebUI cancel button on rows + detail header

  1. WebUI cancelTask helper inherits the same idempotent footgun as the TUI helper (inline) — src/webui/features/tasks/api/cancel-task.ts:17 throws on success: false, including for Task not found which is contractually an idempotent retry. Sharper here than in the TUI because the single caller pipes the throw straight into toast.error(err.message) in task-list-view.tsx:264. Test on line 41 of cancel-task.test.ts explicitly locks in this behavior, so it's a deliberate choice — but worth at least a JSDoc note matching the TaskCancelResponse contract.

  2. handleCancel in task-list-view.tsx:260-269 flickers Spinner → Cancel → Delete (inline) — clearing cancellingIds on onSuccess runs before the daemon's terminal task:cancelled event arrives (the forward-to-agent path returns success: true immediately). The row briefly renders the Cancel button again between ACK and terminal event. Fix: drive clearance from a useEffect on tasks watching for terminal status, or factor cancellingIds into rowActionKind. Not a correctness bug.

  3. Detail header has no Delete action for terminal tasks (inline) — the row shows Cancel-or-Delete via rowActionKind, but task-detail-header.tsx only renders the Cancel button when isActive. A user in the detail sheet of a completed/cancelled/error task has no Delete affordance — they have to close the sheet and find the row. Minor symmetry nit.

98c0eb3 — unify cancel state across list and detail + pending feedback

  • Lifting cancellingIds into TaskListView and threading cancelling={cancellingIds.has(selectedTaskId)} into TaskDetailView is exactly the right shape — the Sheet always sees the same per-task state the row sees. Clean fix.

What I'd ship as-is vs. what I'd want fixed before landing

Blocking-shaped (still 1–3 lines each):

Polish that can ship as a follow-up:

What remains rock-solid

  • The 457-line cancel-pipeline.test.ts integration suite — five real scenarios end-to-end.
  • The TaskCancelResponse four-case JSDoc remains the strongest single piece of contract-shaping in this PR.
  • The RESERVED CHORD WARNING block on use-cancel-running-task-keybind.ts.
  • The new WebUI test files (row-action-kind.test.ts, cancel-task.test.ts) — small, fast, locked-in behavior.
  • Mutually-exclusive row action column via rowActionKind is a clean design.

Recommendation

Land it after addressing Issues #1, #2, #7 (each is a one- to three-line change with a known fix pattern). The rest are quality polish.
· Branch: proj/task-cancellation

if (!apiClient) throw new Error('Not connected')

const response = await apiClient.request<TaskCancelResponse, TaskCancelRequest>(TaskEvents.CANCEL, payload)
if (!response.success) throw new Error(response.error ?? 'Cancel failed')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (consistency / UX): This helper is the WebUI twin of src/tui/features/tasks/api/cancel-task.ts:23 and inherits the same documented-idempotent footgun: TaskCancelResponse.success: false with error: 'Task not found' is the expected outcome for late retries (see the TaskCancelResponse JSDoc in src/shared/transport/events/task-events.ts:56-71, which now explicitly enumerates the four cases). Throwing here turns that idempotent outcome into a red-toast user error at the only callsite (task-list-view.tsx:264 toast.error(err.message)).

Concrete scenario: a user clicks Cancel, the daemon forwards to the agent, the agent terminates, the row transitions to cancelled, then later the user pulls up history and clicks Cancel again on a stale row (or the table state lags by one tick) — the user sees a "Task not found" error toast for what is contractually a success.

Two reasonable fixes:

  1. Treat success: false && error === 'Task not found' as idempotent success and return the response unchanged.
  2. Return TaskCancelResponse unchanged and let the caller decide. The cancelMutation.mutate({onError, onSuccess}) shape in task-list-view.tsx already encodes that branching cleanly.

If you want to keep the throw semantics for future callers, at minimum add a JSDoc note matching the contract on TaskCancelResponse so future consumers don't wire success: false straight into user-visible error UI.

Comment on lines +260 to +269
cancelMutation.mutate(
{taskId},
{
onError(err) {
toast.error(err.message)
clear()
},
onSuccess: clear,
},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (UX): Clearing cancellingIds on onSuccess causes a visible button flicker: Spinner → "Cancel task" button → Delete button.

The race:

  1. User clicks Cancel — cancellingIds.add(taskId) runs, row shows LoaderCircle + title="Cancelling…".
  2. Daemon ACKs {success: true} immediately (the forward-to-agent path at task-router.ts:720 returns before the agent confirms — see the JSDoc on TaskCancelResponse and the comment at task-router.ts:710-716).
  3. onSuccess: clear runs. cancellingIds.has(taskId) is now false, but the in-store status is still 'started' (terminal event hasn't arrived yet), so rowActionKind(task.status) still returns 'cancel' → the row briefly renders the Cancel button again.
  4. Eventually the agent emits task:cancelled, the store transitions to terminal, and the row switches to the Delete button.

Two cheap fixes:

  1. Don't clear on onSuccess. Instead, drive cancel-state clearance from a useEffect watching tasks — when isTerminalStatus(task.status) becomes true for a tracked id, drop it from the Set. That keeps the spinner visible from click → terminal event, which is the user's mental model.
  2. Or stamp the row visually as "cancelling" until it transitions: extend the rowActionKind decision to consider cancellingIds as well so the row always shows the spinner variant while a cancel is in flight, regardless of the daemon ACK.

Not a correctness bug, but the flicker is the kind of thing that erodes user trust in the Cancel button on low-latency local daemons.

Comment on lines 212 to 218
<TableCell className="text-center" onClick={(event) => event.stopPropagation()}>
{terminal && <RowAction onClick={() => onDelete(task.taskId)} />}
{actionKind === 'delete' ? (
<DeleteRowAction onClick={() => onDelete(task.taskId)} />
) : (
<CancelRowAction cancelling={cancelling} onClick={() => onCancel(task.taskId)} />
)}
</TableCell>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Mutually-exclusive row-action column based on rowActionKind(task.status) is a clean design — Cancel for active tasks, Delete for terminal — and the extracted utility with dedicated tests (row-action-kind.test.ts) keeps the decision easy to evolve as new statuses are added.

The companion Spinner / Cancelling… title on CancelRowAction (line 265) is also a nice touch — it communicates intent without needing a separate toast on click.

Comment on lines +57 to +69
{isActive && (
<Button
aria-label="Cancel task"
className="ml-1 h-6 gap-1 border-red-500/40 px-2 text-red-400 hover:border-red-500/60 hover:bg-red-500/10 hover:text-red-300"
disabled={cancelling}
onClick={() => onCancel(task.taskId)}
size="xs"
variant="outline"
>
{cancelling ? <LoaderCircle className="size-3 animate-spin" /> : <CircleStop className="size-3" />}
{cancelling ? 'Cancelling…' : 'Cancel'}
</Button>
)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (consistency): The table row (task-list-table.tsx:212-218) shows a Delete action for terminal tasks via rowActionKind, but the detail header here only renders the Cancel button when isActive and shows nothing for terminal tasks — so a user who opened the detail sheet for a completed / error / cancelled task has to close the sheet and find the row again to delete it. The two surfaces should probably mirror each other.

Cheap follow-up: when isTerminal, render a Delete button in the same slot (or via the same rowActionKind decision) so the detail header and the row action stay symmetric. Not blocking — calling out so the asymmetry doesn't ossify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants