Skip to content

feat: Cancel pending deployments on successful production deployment#1124

Merged
BenjaminMichaelis merged 4 commits into
mainfrom
benjaminmichaelis/check-issue-149
May 17, 2026
Merged

feat: Cancel pending deployments on successful production deployment#1124
BenjaminMichaelis merged 4 commits into
mainfrom
benjaminmichaelis/check-issue-149

Conversation

@BenjaminMichaelis
Copy link
Copy Markdown
Member

Description

Implements automatic cancellation of pending GitHub Actions workflow runs when a production deployment succeeds. This resolves the issue where multiple concurrent deployments could pile up and waste CI/CD resources.

Why This Matters

When developers push code frequently, GitHub Actions queues up multiple workflow runs. Previously, if older code deployed to production while newer code was still building, both would complete—wasting resources and potentially causing confusion about which version is deployed.

With this change, once a production deployment succeeds (confirmed by the smoke test), any pending workflow runs are automatically cancelled, ensuring only necessary CI/CD work runs.

How It Works

  1. Developer pushes commits → GitHub Actions triggers workflow
  2. Build, test, and dev deployment stages complete
  3. Production deployment begins
  4. Smoke test validates the deployment is healthy
  5. [NEW] Cancel pending workflow runs step:
    • Lists all queued runs of the same workflow
    • Skips the current run (prevents self-cancellation)
    • Cancels each queued run with proper error handling
    • Reports cancellation count in logs
  6. Git tag marks successful production deployment

Key Features

  • ✅ Only cancels after successful deployment validation (post-smoke test)
  • ✅ Safe pagination for handling 100+ queued runs
  • ✅ Multiple levels of error handling (transient API failures don't block deployment)
  • ✅ Comprehensive logging with commit info for debugging
  • ✅ Uses GitHub context variables (resilient to config changes)
  • ✅ Excludes current run from cancellation

Code Review History

This implementation was reviewed by independent agents (GPT 5.5 and Opus 4.6) and addresses all critical findings:

  • ✅ Replaced hardcoded workflow filename with context.workflow (resilient to renames)
  • ✅ Added outer try-catch around API calls (deployment succeeds even if cancellation fails)
  • ✅ Fixed pagination logic for correctness
  • ✅ Added comprehensive logging for observability
  • ✅ Validated YAML syntax

Testing Recommendations

  • Push two commits rapidly; verify the first run gets cancelled when the second deploys
  • Monitor logs for proper commit info in cancellation messages
  • Verify deployment completion is not blocked by API failures

Closes #149

Implement automatic cancellation of pending workflow runs when a production
deployment succeeds. This prevents multiple redundant deployments and saves
CI/CD resources.

Changes:
- Add 'actions: write' permission to workflow and deploy-production job
- New 'Cancel pending deployments' step in deploy-production job
- Runs after smoke test to ensure only successful deployments trigger cancellation
- Implements pagination for large numbers of queued runs
- Includes error handling with try-catch for API resilience
- Detailed logging showing commit SHA and branch for each cancelled run

Addresses issue #149
Address blocking issues identified by GPT 5.5 and Opus 4.6 code reviews:

Critical Fixes:
- Replace hardcoded 'Build-Test-And-Deploy.yml' with context.workflow
  Eliminates maintenance trap and silent failure mode on file renames

- Add outer try-catch around listWorkflowRuns API call
  Prevents transient API errors from blocking production deployment tagging

- Fix pagination check: runs.length < 100 instead of hasMore flag
  Clearer logic, prevents extra empty page fetch

- Add debug logging for skipped current run
  Improves auditability and debugging

Improvements:
- Better comments explaining purpose and context
- Improved error messages with action suggestions
- More detailed logging with created_at timestamp
- Outer error handling preserves deployment success despite API issues

Note: Remaining consideration from review - filtering by commit age could
be added in future to only cancel older commits (not newer ones).
This is a design choice pending business requirements clarification.
Copilot AI review requested due to automatic review settings May 17, 2026 14:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds post-deployment cleanup to the production deployment workflow so that, after a successful smoke test, older queued workflow runs are cancelled to reduce wasted CI/CD work and avoid deployment confusion.

Changes:

  • Grants actions: write permission to enable cancelling workflow runs via the GitHub Actions API.
  • Adds a GitHub Script step after the production smoke test to enumerate and cancel queued runs of the same workflow.
  • Logs cancellation attempts and continues deployment even if cancellation fails.
Comments suppressed due to low confidence (2)

.github/workflows/Build-Test-And-Deploy.yml:251

  • workflow_id for listWorkflowRuns must be a workflow numeric id or the workflow file name/path (e.g. Build-Test-And-Deploy.yml), but context.workflow is the workflow name (here: "Build, Test, and Deploy EssentialCSharp.Web"). This will return 404/no results and prevent cancellations. Consider resolving the workflow id via listRepoWorkflows (matching by context.workflow) or parsing the workflow file from GITHUB_WORKFLOW_REF/context.workflowRef.
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  workflow_id: context.workflow,
                  status: 'queued',
                  per_page: 100,

.github/workflows/Build-Test-And-Deploy.yml:283

  • Pagination here is vulnerable to skipping runs because the script cancels runs while paginating by page. Since cancelling changes each run’s status from queued, the queued-run list shrinks and later items can shift to earlier pages; incrementing page can then miss remaining queued runs (especially when there are >100). Prefer collecting all queued run ids across pages first (read-only), or repeatedly fetching page: 1 until no queued runs remain.
                // Last page has fewer results than requested (pagination)
                if (runs.length < 100) {
                  break;
                }
                page++;
              }

Comment thread .github/workflows/Build-Test-And-Deploy.yml Outdated
Comment thread .github/workflows/Build-Test-And-Deploy.yml Outdated
Address two critical issues identified in PR review:

1. Fix API Response Handling (Comment 3254800774)
   GitHub Actions API returns response object with data.workflow_runs array,
   not an array directly. Updated all three locations (lines 253, 255, 259, 279)
   to correctly destructure: const runs = data.workflow_runs;
   This fixes pagination and iteration logic that would have failed silently.

2. Remove Over-Scoped Workflow Permissions (Comment 3254800781)
   Removed actions: write from workflow-level permissions to follow least-privilege.
   Only deploy-production job needs this permission; it remains at job level.
@BenjaminMichaelis BenjaminMichaelis merged commit 2ca5e21 into main May 17, 2026
8 checks passed
@BenjaminMichaelis BenjaminMichaelis deleted the benjaminmichaelis/check-issue-149 branch May 17, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

On successful deployment to prod, cancel pending deployments

2 participants