Skip to content

fix(ci): expand Auto Queue triggers and skip when queue head is in flight#4845

Merged
Yicong-Huang merged 3 commits into
apache:mainfrom
Yicong-Huang:fix/auto-queue-events-and-inflight-guard
May 3, 2026
Merged

fix(ci): expand Auto Queue triggers and skip when queue head is in flight#4845
Yicong-Huang merged 3 commits into
apache:mainfrom
Yicong-Huang:fix/auto-queue-events-and-inflight-guard

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 3, 2026

What changes were proposed in this PR?

Three changes to .github/workflows/auto-queue.yml.

1. More triggers, shorter cron. Add pull_request {auto_merge_enabled, ready_for_review}, pull_request_review {submitted}, workflow_run {Required Checks completed}, and drop cron from hourly to every 5 min. Each event covers a state change that previously waited up to an hour for cron — most notably workflow_run completed lets us bump the next PR the moment the head PR's CI finishes (pass or fail).

2. In-flight guard. GraphQL now pulls statusCheckRollup.state. If any eligible PR has mergeStateStatus != BEHIND and CI is PENDING / EXPECTED / SUCCESS, the run exits without bumping anyone — the queue head is already moving, and preempting CI on another PR would just force a re-bump after the head merges. BEHIND + PENDING does not count (CI runs on pre-update code), and BLOCKED + FAILURE/ERROR does not count (won't auto-merge, release the guard).

3. Tolerate transient GraphQL 5xx. A real run died on attempt 2/6 from a HttpError: terminated (HTTP 500) even though four backoff attempts remained. Wrap the GraphQL call in try/catch and treat failures as "retry next attempt".

Any related issues, documentation, discussions?

Tracking issue: #4553. Builds on #4672 (initial workflow) and #4678 (UNKNOWN retry + eligibility gates).

How was this PR tested?

Workflow YAML parses clean. Decision logic is exercised by every trigger after merge — no separate test harness for this workflow today.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

…ight

Triggers
  Add pull_request {auto_merge_enabled, ready_for_review},
  pull_request_review {submitted}, and workflow_run for the
  Required Checks workflow. Drop the cron interval from hourly to
  5 minutes (its only role now is bounded safety net for missed
  events). The script short-circuits non-approval reviews up front
  so the most chatty event is a cheap no-op.

In-flight guard
  Pull commits[last:1].commit.statusCheckRollup.state in the GraphQL
  query and treat an eligible PR as in-flight when its
  mergeStateStatus is not BEHIND and its CI rollup is PENDING,
  EXPECTED, or SUCCESS. When such a PR exists the run exits without
  bumping anyone else — it is the queue head and bumping a
  different PR would just preempt CI on a PR we would re-bump
  after the head merges.

  PRs that are BEHIND with PENDING checks do NOT count as in-flight:
  that CI is on pre-update code and would have to re-run anyway.
  Likewise, FAILURE / ERROR rollups release the guard so the queue
  advances past stuck-on-CI PRs instead of waiting on cron.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added fix ci changes related to CI labels May 3, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.02%. Comparing base (1c3ec67) to head (0be6f1c).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #4845   +/-   ##
=========================================
  Coverage     43.02%   43.02%           
- Complexity     2029     2030    +1     
=========================================
  Files           957      957           
  Lines         34077    34077           
  Branches       3753     3753           
=========================================
  Hits          14663    14663           
  Misses        18637    18637           
  Partials        777      777           
Flag Coverage Δ
access-control-service 28.12% <ø> (ø)
agent-service 33.72% <ø> (ø)
amber 40.96% <ø> (ø)
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 33.24% <ø> (ø)
frontend 35.28% <ø> (ø)
python 84.84% <ø> (ø)
workflow-compiling-service 47.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Yicong-Huang Yicong-Huang requested a review from aglinxinyuan May 3, 2026 05:36
Yicong-Huang and others added 2 commits May 2, 2026 22:43
A 500 from api.github.com/graphql on attempt 2/6 killed the whole run
even though the backoff loop had four attempts left. Wrap the GraphQL
call in try/catch and treat any failure as "retry next attempt".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 3, 2026 05:44
@Yicong-Huang Yicong-Huang added release/v1.1.0-incubating back porting to release/v1.1.0-incubating and removed release/v1.1.0-incubating back porting to release/v1.1.0-incubating labels May 3, 2026
@Yicong-Huang Yicong-Huang merged commit b8c1abb into apache:main May 3, 2026
33 of 76 checks passed
Yicong-Huang added a commit that referenced this pull request May 3, 2026
### What changes were proposed in this PR?

Add an `emergency` label fast-path to Auto Queue. A PR with this label
is bumped before any non-emergency PR regardless of CREATED_AT, and its
presence in BEHIND bypasses the in-flight guard so a non-emergency PR's
running CI doesn't delay the bump. Within each priority class
CREATED_AT-ASC ordering is preserved.

Eligibility gates (auto-merge / not draft / not conflicting / APPROVED /
threads resolved) still apply — this only reorders the bump, it does not
bypass review. Label name is set by the `EMERGENCY_LABEL` constant
(one-line change if `priority/P0` or similar is preferred later).

### Any related issues, documentation, discussions?

Builds on #4672, #4678, #4845.

### How was this PR tested?

`yaml.safe_load` parses; `node --check` parses the wrapped script body.
Unit test on the partition logic: `[#100 docs, #101 emergency, #102
plain, #103 emergency+fix]` → `[101, 103, 100, 102]`.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants