Skip to content

feat(ci): surface superseded UAT runs (#1274)#1583

Merged
mchmarny merged 1 commit into
NVIDIA:mainfrom
njhensley:ci/uat-superseded-notice
Jul 1, 2026
Merged

feat(ci): surface superseded UAT runs (#1274)#1583
mchmarny merged 1 commit into
NVIDIA:mainfrom
njhensley:ci/uat-superseded-notice

Conversation

@njhensley

Copy link
Copy Markdown
Member

Summary

Adds uat-superseded-notice.yaml, a reactive workflow_run observer that makes a superseded UAT run visible (job-summary entry + ::warning) instead of letting it vanish silently.

Motivation / Context

Fourth and final PR in the DC1 day/night scheduler series (#1274). The reservation lease (uat-run.yaml's uat-<reservation> concurrency group, cancel-in-progress: false) holds at most one in-progress + one pending run; a third contender cancels the older pending run before it starts any job. That run is superseded, not failed — a dropped request that would otherwise leave no trace. This observer surfaces it. Builds only on PR-2's top-level uat-run.yaml (#1569, merged); independent of PR-3 (#1579).

Fixes: N/A
Related: #1274, #1264

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Build/CI/tooling

Component(s) Affected

  • Docs/examples (docs/, examples/)
  • Other: CI workflows (.github/workflows/uat-superseded-notice.yaml)

Implementation Notes

  • Trigger: on: workflow_run: workflows: ["UAT Run"], types: [completed]. Only top-level workflows emit workflow_run, and only from the default branch — so uat-run.yaml (top-level since PR-2) is the right subject, and this observer only runs once merged to main.
  • Classifier: a run cancelled while pending never starts a job. The job reads the triggering run's jobs and treats it as superseded when total_count == 0 or every job has a null started_at; a genuine mid-run cancel (≥1 started job) is ignored. If the jobs API can't be read, it errs toward surfacing rather than silence.
  • Output: a job-summary entry plus a ::warning naming the run and reservation, with a re-dispatch hint.
  • Security: never checks out the triggering run's code; reads only trusted workflow_run.* metadata, passed via env: (never inlined), so a fork-influenced run title can't inject. Needs only permissions: actions: read.
  • Layering: this is the reactive/primary layer. The nightly controller (uat-nightly-batch.yaml, PR-3) reconciles the same signal synchronously for the cells it dispatches; a DC6 guard (DC6 — Reliability + gap-fills #1279) will exercise this observer. The two are complementary.
  • Doc note: the uat.md queuing section now points at this observer instead of deferring to "DC6", and the shipped item is removed from the roadmap. This overlaps the roadmap region PR-3 also edits — whichever merges second takes a one-line rebase.

Testing

# New CI workflow + doc line — no Go source touched.
actionlint .github/workflows/uat-superseded-notice.yaml   # CI-pinned v1.7.11, checksum-verified
yamllint  .github/workflows/uat-superseded-notice.yaml
bash -n   <observer embedded script>
  • actionlint + yamllint clean; bash -n clean; longest line 161 (< 200).
  • Classifier validated against 5 synthetic runs/<id>/jobs payloads: zero-jobs → superseded; all-null-started_at → superseded; one started → not; mixed → not; empty API response → surfaces (fail-toward-visible).
  • Cannot be exercised on this PR: workflow_run observers fire only from the default branch, so it goes live on merge to main; CI here only lints it.

Risk Assessment

  • Low — Isolated, additive single workflow + one doc line; no change to any existing pipeline; easy to revert.

Rollout notes: Inert until merged to main (default-branch-only trigger). No secrets, no checkout, least-privilege token.

Checklist

  • Tests pass locally (make test with -race) — N/A, no Go source changed
  • Linter passes (actionlint + yamllint; no Go → make lint Go gate N/A)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality — N/A (observer is default-branch-only; a DC6 regression guard, DC6 — Reliability + gap-fills #1279, will exercise it)
  • I updated docs if user-facing behavior changed (docs/contributor/uat.md)
  • Changes follow existing patterns in the codebase (mirrors the repo's workflow_run observers)
  • Commits are cryptographically signed (git commit -S)

@njhensley njhensley requested review from a team as code owners July 1, 2026 20:08
@njhensley njhensley added the theme/ci-dx CI pipelines, developer experience, and build tooling label Jul 1, 2026
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: fbf5c936-3846-4d77-ace1-a6c07543e51f

📥 Commits

Reviewing files that changed from the base of the PR and between f84f2e5 and 282a065.

📒 Files selected for processing (2)
  • .github/workflows/uat-superseded-notice.yaml
  • docs/contributor/uat.md

📝 Walkthrough

Walkthrough

Changes

This PR adds a new GitHub Actions workflow, “UAT: Superseded Run Notice,” that runs after a cancelled “UAT Run” completion, checks the triggering run’s jobs through the GitHub API, and distinguishes a pending supersede from a genuine mid-run cancellation. For pending supersedes, it writes a step summary and emits a warning using percent-encoded metadata. The contributor UAT documentation is updated to describe this behavior and remove a roadmap item.

Estimated code review effort: 3 (Moderate) | ~20 minutes

Possibly related PRs

  • NVIDIA/aicr#1569: Introduces the “UAT Run” workflow that this new workflow_run observer listens to.

Suggested reviewers: mchmarny

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: surfacing superseded UAT runs in CI.
Description check ✅ Passed The description is directly related to the workflow and docs changes in this pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/uat-superseded-notice.yaml:
- Around line 70-74: Do not classify a workflow run as superseded when the Jobs
API request fails in the superseded-notice workflow. In the
`jobs_json`/`superseded` logic, stop falling back to `'{}'` on `gh api` errors
and instead detect the failure explicitly before running the `jq` check, so
`total_count == 0` only reflects a real empty jobs response. Use the `gh api`
call and the `superseded` assignment in the workflow to locate the fix.
- Around line 62-64: The superseded-run notice is missing the reservation
context, so the redispatch target remains ambiguous. Update the workflow notice
variables in uat-superseded-notice.yaml to include the reservation directly, or
propagate it from the upstream run name if you prefer. Use the existing
RUN_TITLE, RUN_URL, and HEAD_BRANCH handling as the place to add the reservation
so the notice uniquely identifies the intended run.
- Around line 81-94: The `echo "::warning::..."` command in the UAT superseded
notice workflow is vulnerable to command annotation injection if `RUN_TITLE` or
`RUN_URL` contains `%`, CR, or LF. Update the warning construction in
`uat-superseded-notice.yaml` to escape those values before emitting the
`::warning::` command, while leaving the GitHub Step Summary block unchanged;
use the existing `RUN_TITLE` and `RUN_URL` symbols as the inputs to sanitize.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 15519289-719d-497d-a060-ed1625b66a89

📥 Commits

Reviewing files that changed from the base of the PR and between f07962a and f84f2e5.

📒 Files selected for processing (2)
  • .github/workflows/uat-superseded-notice.yaml
  • docs/contributor/uat.md

Comment thread .github/workflows/uat-superseded-notice.yaml
Comment thread .github/workflows/uat-superseded-notice.yaml Outdated
Comment thread .github/workflows/uat-superseded-notice.yaml Outdated
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Add uat-superseded-notice.yaml, a reactive observer that makes a
superseded UAT run visible instead of silent.

The reservation lease (uat-run.yaml's `uat-<reservation>` concurrency
group, cancel-in-progress: false) holds at most one in-progress plus one
pending run; a third contender cancels the older pending run before it
starts any job. That run is superseded, not failed, and would otherwise
vanish without a trace.

The observer triggers on `workflow_run: completed` for "UAT Run" (only
top-level workflows emit workflow_run, and only from the default branch),
classifies a cancelled run that never started a job as a supersede
(total_count == 0, or every job has a null started_at) versus a genuine
mid-run cancel, and emits a job-summary entry plus a ::warning. It reads
only trusted workflow_run.* metadata via env (no injection) and needs
only actions: read.

This is the reactive/primary layer; the nightly controller reconciles the
same signal synchronously for the cells it dispatches, and a DC6 guard
(NVIDIA#1279) will exercise this observer.

Related NVIDIA#1274, NVIDIA#1264.

Signed-off-by: Nathan Hensley <nhensley@nvidia.com>
@njhensley njhensley force-pushed the ci/uat-superseded-notice branch from f84f2e5 to 282a065 Compare July 1, 2026 20:18

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped workflow_run observer that surfaces superseded UAT runs. Security posture is exactly right for this trigger type: no checkout of the triggering run, only trusted workflow_run.* metadata read via env: (never inlined), the ::warning percent-encodes %/CR/LF so a crafted run title can't inject, least-privilege actions: read, repo-gated (matching the repo's existing github.repository == 'nvidia/aicr' convention), timeout + per-run concurrency. The pending-vs-mid-run classifier is sound (total_count == 0 or all started_at null), and it fails toward not crying wolf on a jobs-API fetch error. Docs updated and the DC6 roadmap item retired. actionlint/analyze green; inert until merged to main as expected. LGTM.

@mchmarny mchmarny merged commit 1eb9140 into NVIDIA:main Jul 1, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci area/docs size/M theme/ci-dx CI pipelines, developer experience, and build tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants