Skip to content

Fix/8787 cancel pipeline keeps running#8832

Merged
klesh merged 2 commits intoapache:mainfrom
danielemoraschi:fix/8787-cancel-pipeline-keeps-running
Apr 18, 2026
Merged

Fix/8787 cancel pipeline keeps running#8832
klesh merged 2 commits intoapache:mainfrom
danielemoraschi:fix/8787-cancel-pipeline-keeps-running

Conversation

@danielemoraschi
Copy link
Copy Markdown
Contributor

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • I have read through the Contributing Documentation.
  • I have added relevant tests.
  • I have added relevant documentation.
  • I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

Clicking the cancel button in the UI (which fires DELETE /api/pipelines/:id) always returns "Operation successfully completed" (HTTP 200), but the pipeline continues running. Observable for 30+ minutes after the cancel request before it stops on its own.

Three independent bugs were causing this:

  1. CancelPipeline silently discarded errors from CancelTask (pipeline.go). The cancel-func was never invoked for running tasks, and pending tasks were not batch-updated in the DB. Fixed by classifying tasks into running vs pending, then delegating to cancelRunningTasks (triggers context cancellation) and cancelPendingTasksInDB (batch DB update with status guard).

  2. gitextractor ignored context cancellation in storeRepoSnapshot (repo_gogit.go). gogit.Blame() has no context parameter, so once a blame started it ran to completion. Added select { case <-ctx.Done() } checks at the top of the outer commit loop and inner patch.Stats() loop so cancellation is respected between iterations.

  3. Cancelled tasks were recorded as TASK_FAILED (run_task.go, pipeline_runner.go). The deferred status-update block didn't distinguish cancellation from failure. Now detects cancellation via errors.Is(err, context.Canceled) || ctx.Err() == context.Canceled, sets status to TASK_CANCELLED, and skips writing failed_sub_task. ComputePipelineStatus also now returns TASK_CANCELLED when the pipeline was cancelled, taking priority over other statuses.

Additional correctness improvements:

  • cancelPendingTasksInDB includes TASK_RESUME in the status filter so tasks awaiting restart recovery are also cancelled
  • cancelRunningTasks uses err.GetType() == errors.NotFound instead of .As(errors.NotFound) to avoid walking the entire error chain
  • SkipOnFail guard in run_task.go reuses the isCancelled variable instead of recomputing

Does this close any open issues?

Closes #8787

Screenshots

N/A

Other Information

Additional improvements:

  • cancelPendingTasksInDB includes TASK_RESUME in the status filter so tasks awaiting restart recovery are also cancelled
  • cancelRunningTasks uses err.GetType() == errors.NotFound instead of .As(errors.NotFound) to avoid walking the entire error chain
  • SkipOnFail guard in run_task.go reuses the isCancelled variable instead of recomputing

Tests:

  • Added cancellation tests to TestComputePipelineStatus verifies TASK_CANCELLED is returned when isCancelled=true, regardless of individual task statuses
  • Added TestCancelPipeline with 3 subtests: cancels all pending tasks, leaves completed tasks unchanged, returns error for non-existent pipeline

…he#8787)

Three independent bugs caused pipeline cancellation to silently fail:

1. CancelPipeline discarded errors from CancelTask and never cancelled
   TASK_CREATED tasks in future stages. Now running tasks are cancelled
   via context, non-running tasks are batch-updated to TASK_CANCELLED in
   the DB, and errors are logged and returned.

2. gitextractor's storeRepoSnapshot (go-git path) had no ctx.Done()
   checks in its commit/blame loops, making it unresponsive to
   cancellation for 30+ minutes on large repos. Added cancellation
   checkpoints following the pattern already used elsewhere in the file.

3. Cancelled tasks were marked TASK_FAILED instead of TASK_CANCELLED,
   and ComputePipelineStatus never returned TASK_CANCELLED. Now RunTask
   checks for context cancellation and writes TASK_CANCELLED, and
   ComputePipelineStatus returns TASK_CANCELLED when the pipeline was
   cancelled by the user.

Test gaps: RunTask deferred status logic, CancelPipeline flow, and
storeRepoSnapshot have no existing unit tests. These are pre-existing
gaps not introduced by this change. The only existing test,
TestComputePipelineStatus, has been extended to cover isCancelled=true.

Closes apache#8787
Related: apache#5585, apache#4188
Extract cancelRunningTasks and cancelPendingTasksInDB helpers from
CancelPipeline for better separation of concerns. Also fixes:
- .As(NotFound) replaced with .GetType() to prevent swallowing wrapped errors
- Added TASK_RESUME to cancelPendingTasksInDB status filter
- Error message now includes pipeline ID for traceability
- Added TestCancelPipeline e2e tests with 3 subtests
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. pr-type/bug-fix This PR fixes a bug priority/high This issue is very important severity/p0 This bug blocks key user journey and function labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

@klesh klesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for your contribution.

@klesh klesh merged commit 9ae723f into apache:main Apr 18, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-type/bug-fix This PR fixes a bug priority/high This issue is very important severity/p0 This bug blocks key user journey and function size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][Pipeline][Gitextractor] Cancel pipeline returns 200 OK but pipeline keeps running

2 participants