Skip to content

Workflow cancellation from runs list leaves downstream needs job permanently queued when using matrix jobs + if: always() #4411

@1minds3t

Description

@1minds3t

Title:
GitHub Actions “Cancel workflow” from runs list leaves dependent jobs stuck queued/pending until canceled again from inside run page

Describe the bug
Canceling a workflow from the main Actions runs list page does not fully cancel all queued/dependent jobs in workflows with large matrices + needs: dependencies.

The workflow appears canceled, and many running jobs terminate correctly, but downstream jobs (especially aggregation/finalizer jobs using needs: + if: always()) remain stuck indefinitely in a queued/pending state.

The only way to fully terminate the workflow is:

  1. Open the workflow run itself
  2. Click the red “Cancel workflow” button again from inside the run page

After doing that, the stuck queued job immediately cancels.

This behavior is reproducible and leaves “zombie” queued jobs alive even after the workflow is supposedly canceled.

To Reproduce

  1. Create a workflow with:

    • Large matrix builds
    • needs: dependencies
    • Final collector job using if: always()
  2. Start the workflow

  3. Cancel it from the main Actions runs page:

    • https://github.com/<repo>/actions
  4. Observe:

    • Running jobs get canceled
    • Some dependent/final jobs remain queued forever
  5. Open the workflow run page directly

  6. Click “Cancel workflow” again from inside the run page

  7. Observe queued job immediately cancels

Expected behavior
Canceling a workflow from the Actions runs list page should fully cancel:

  • running jobs
  • queued jobs
  • dependency jobs
  • downstream needs: jobs
  • jobs gated by if: always()

No jobs should remain pending/queued after the workflow is canceled.

Actual behavior
The workflow shows as canceled, but downstream jobs remain indefinitely queued with messages like:

Waiting for a runner to pick up this job...

or

All GitHub-hosted runners with label [ubuntu-latest] are busy

even though the workflow itself has already been canceled.

The queued job only terminates after opening the workflow page and canceling again from there.

Runner Version and Platform

  • GitHub-hosted runners

  • ubuntu-latest

  • Also observed with:

    • macOS runners
    • Windows runners
  • Using matrix jobs + QEMU + dependent collector jobs

What's not working?
The cancellation propagation appears inconsistent between:

  • workflow-list-page cancel action
    vs
  • in-run-page cancel action

The list-page cancel only partially cancels the workflow graph.

Example affected workflow run
[Workflow Run Example](https://github.com/1minds3t/uv-ffi/actions/runs/25613187934?utm_source=chatgpt.com)

Example stuck job state after cancellation:

Collect all extended wheels
Started 6s ago

Evaluating collect-wheels.if
Evaluating: always()
Result: true

All GitHub-hosted runners with label [ubuntu-latest] are busy.

Waiting for a runner to pick up this job...

Meanwhile the workflow simultaneously reports:

The run was canceled by @1minds3t.

Additional Notes
The stuck job in this case was a collector/finalizer job:

collect-wheels:
  if: always()
  needs:
    - build-linux-glibc-2_17
    - build-linux-musl
    - build-linux-riscv64
    - build-macos-universal2
    - build-windows-i686
    - build-cp37-linux

This may indicate cancellation state is not properly propagated through the dependency graph when canceled from the workflow list UI.

The workflow used large matrices and mixed runner platforms:

  • Linux
  • macOS
  • Windows
  • QEMU cross-arch jobs
  • continue-on-error
  • fail-fast: false

Potentially related to race conditions between queued dependency resolution and workflow cancellation propagation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions