Skip to content

fix(ci): retry Auto Queue while mergeStateStatus is UNKNOWN, gate on approval and resolved threads#4678

Merged
Yicong-Huang merged 6 commits into
apache:mainfrom
Yicong-Huang:chore/ci/auto-queue-loose-filter
May 2, 2026
Merged

fix(ci): retry Auto Queue while mergeStateStatus is UNKNOWN, gate on approval and resolved threads#4678
Yicong-Huang merged 6 commits into
apache:mainfrom
Yicong-Huang:chore/ci/auto-queue-loose-filter

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 2, 2026

What changes were proposed in this PR?

Iterates on .github/workflows/auto-queue.yml so it actually picks up the candidate PR after a push to main:

  • Backoff retries on UNKNOWN: query → categorize eligible PRs into BEHIND / UNKNOWN. If any are UNKNOWN, sleep and retry up to ~120s (delays 0/10/20/30/30/30s); GitHub recomputes mergeStateStatus asynchronously after a base-branch push, so the first scan often sees UNKNOWN for everything. Stop retrying as soon as a BEHIND PR is updated, or as soon as nothing is UNKNOWN and nothing is BEHIND (don't waste the 2-min budget).
  • Hourly cron (schedule: 0 * * * *): catches PRs that go BEHIND between merges (force-push to base, late auto-merge enable, etc.).
  • Tighter eligibility: also require reviewDecision === 'APPROVED' and zero unresolved review threads. Avoids burning CI on PRs that wouldn't merge even with a green build.
  • Verbose logging: per-PR verdict line (skip: <reason> or eligible: mergeable=… state=…), per-attempt grouped output, every updateBranch call with HTTP status, retry/backoff announcements, final summary with elapsed seconds. Each attempt is wrapped in core.startGroup so the Actions UI collapses them.
  • Workflow display name: AutoQueueAuto Queue.

Eligibility, in plain terms

A PR is acted on only if all of the following hold:

  • Auto-merge is enabled.
  • Not a draft.
  • Not conflicting.
  • reviewDecision === 'APPROVED'.
  • No unresolved review threads.
  • mergeStateStatus === 'BEHIND' (UNKNOWN triggers retry; CLEAN/BLOCKED/etc. are no-ops).

Any related issues, documentation, discussions?

Follow-up to #4672. Same AUTO_MERGE_TOKEN PAT contract.

Originally observed in run 25248773692: the workflow fired ~3s after a push to main, when GitHub had not yet recomputed mergeStateStatus. PR #4652 was a real candidate but came back as UNKNOWN, so the strict filter dropped it and the run logged "No auto-merge PRs need updating". The retry loop here is the targeted fix.

How was this PR tested?

Workflow runs only on push-to-main, schedule, and workflow_dispatch. Will observe behavior on the next merge and on the next scheduled run after this lands.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7 (Claude Code)

The AutoQueue workflow fires moments after a push to main, when GitHub
has not yet recomputed mergeStateStatus for open PRs (often UNKNOWN for
several seconds). Filtering on === 'BEHIND' silently dropped genuine
candidates whose state had not yet settled — observed in run 25248773692
where an auto-merge PR was BEHIND but the run logged "No auto-merge
PRs need updating".

Drop the BEHIND filter; iterate eligible auto-merge PRs from oldest and
let updateBranch's response decide whether work was done. On failure,
warn and continue to the next.
@github-actions github-actions Bot added the ci changes related to CI label May 2, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.15%. Comparing base (34a00cc) to head (f940b88).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4678      +/-   ##
============================================
- Coverage     46.15%   46.15%   -0.01%     
  Complexity     1993     1993              
============================================
  Files          1013     1013              
  Lines         38165    38165              
  Branches       3712     3712              
============================================
- Hits          17616    17615       -1     
  Misses        19774    19774              
- Partials        775      776       +1     
Flag Coverage Δ
agent-service 28.73% <ø> (ø)
frontend 35.28% <ø> (ø)
python 85.05% <ø> (ø)
scala 38.15% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Yicong Huang and others added 2 commits May 2, 2026 02:43
- Retry the scan-and-update cycle up to 6 times with backoffs summing
  to ~120s. mergeable / mergeStateStatus are computed asynchronously
  and can be UNKNOWN right after a base-branch push, so a single early
  scan misses real candidates.
- Add 'schedule: 0 * * * *' so PRs that become BEHIND between merges
  (force-push to base, late auto-merge enable) still get advanced.
- Log every scanned PR with its verdict (skip reason or eligibility +
  current state), every updateBranch attempt with HTTP status, and
  per-attempt summary so the run history is enough to debug from.
@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 2, 2026 09:45
@Yicong-Huang Yicong-Huang requested a review from aglinxinyuan May 2, 2026 09:46
…lity

Only update PRs that would actually merge once CI passes: add
reviewDecision === 'APPROVED' and zero unresolved review threads to the
eligibility check. Skips with a specific log line so it's clear why a
PR was passed over.

Avoids burning CI on PRs blocked on review.
Yicong Huang added 2 commits May 2, 2026 02:48
Reinstate the BEHIND filter: only act on PRs whose head is genuinely out
of date with main. Don't waste a write on a PR that's already CLEAN or
otherwise blocked.

Backoff retries are now narrowly scoped: continue only while at least
one eligible PR is UNKNOWN (i.e., GitHub is still recomputing state
after a base-branch push). If all eligible PRs have settled and none
are BEHIND, exit immediately rather than burning the full ~2min budget.
@Yicong-Huang Yicong-Huang changed the title fix(ci): drop strict BEHIND filter in AutoQueue, iterate candidates fix(ci): retry Auto Queue while mergeStateStatus is UNKNOWN, gate on approval and resolved threads May 2, 2026
@Yicong-Huang Yicong-Huang merged commit 254faf8 into apache:main May 2, 2026
18 checks passed
Yicong-Huang added a commit that referenced this pull request May 3, 2026
…ight (#4845)

### What changes were proposed in this PR?

Two changes to `.github/workflows/auto-queue.yml`.

**1. More triggers, shorter cron.** Add `pull_request
{auto_merge_enabled, ready_for_review}`, `pull_request_review
{submitted}`, `workflow_run {Required Checks completed}`, and drop cron
from hourly to every 5 min. Each event covers a state change that
previously waited up to an hour for cron — most notably `workflow_run
completed` lets us bump the next PR the moment the head PR's CI finishes
(pass or fail).

**2. In-flight guard.** GraphQL now pulls `statusCheckRollup.state`. If
any eligible PR has `mergeStateStatus != BEHIND` and CI is `PENDING /
EXPECTED / SUCCESS`, the run exits without bumping anyone — the queue
head is already moving, and preempting CI on another PR would just force
a re-bump after the head merges. `BEHIND + PENDING` does not count (CI
runs on pre-update code), and `BLOCKED + FAILURE/ERROR` does not count
(won't auto-merge, release the guard).

### Any related issues, documentation, discussions?

Tracking issue: #4553. Builds on #4672 (initial workflow) and #4678
(UNKNOWN retry + eligibility gates).

### How was this PR tested?

Workflow YAML parses clean. Decision logic is exercised by every trigger
after merge — no separate test harness for this workflow today.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yicong-Huang added a commit that referenced this pull request May 3, 2026
### What changes were proposed in this PR?

Add an `emergency` label fast-path to Auto Queue. A PR with this label
is bumped before any non-emergency PR regardless of CREATED_AT, and its
presence in BEHIND bypasses the in-flight guard so a non-emergency PR's
running CI doesn't delay the bump. Within each priority class
CREATED_AT-ASC ordering is preserved.

Eligibility gates (auto-merge / not draft / not conflicting / APPROVED /
threads resolved) still apply — this only reorders the bump, it does not
bypass review. Label name is set by the `EMERGENCY_LABEL` constant
(one-line change if `priority/P0` or similar is preferred later).

### Any related issues, documentation, discussions?

Builds on #4672, #4678, #4845.

### How was this PR tested?

`yaml.safe_load` parses; `node --check` parses the wrapped script body.
Unit test on the partition logic: `[#100 docs, #101 emergency, #102
plain, #103 emergency+fix]` → `[101, 103, 100, 102]`.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants