ci: Add merge queue retry if CI_TIMEOUT#1111
Conversation
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
WalkthroughAdds a new GitHub Actions workflow Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dequeue as "Event: pull_request (dequeued)"
participant WF as "Workflow: Merge Queue Auto-Retry"
participant App as "GitHub App (create-github-app-token)"
participant REST as "GitHub REST API (comments)"
participant GraphQL as "GitHub GraphQL API"
Dequeue->>WF: workflow triggered (pull_request dequeued)
WF->>WF: extract PR_NUMBER, PR_NODE_ID, reason
alt reason == "CI_TIMEOUT"
WF->>App: request app token (vars.BOT_ID, secrets.BOT_KEY)
App-->>WF: installation token
WF->>REST: list PR comments -> count "Auto-retry attempt" => RETRY_COUNT
WF->>WF: compare RETRY_COUNT < MAX_RETRIES (3)
alt should_retry == true
WF->>REST: post comment "🔄 Auto-retry attempt N..."
WF->>GraphQL: enqueuePullRequest(pullRequestId: PR_NODE_ID) (with token)
GraphQL-->>WF: data.enqueuePullRequest (success) / error
else should_retry == false
WF->>REST: post comment "⚠️ Maximum auto-retry attempts reached..."
end
else reason != "CI_TIMEOUT"
WF->>REST: no retry (no-op)
end
par on workflow failure
WF->>REST: post comment "❌ Auto-retry failed due to an error..."
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
.github/workflows/merge-queue-retry.yml (3)
22-25: Add concurrency to prevent duplicate retries racing.jobs: requeue-pr: runs-on: ubuntu-latest + concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.node_id }} + cancel-in-progress: true
68-79: Include the dequeue reason in the retry comment for traceability.- gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - -f body="🔄 Auto-retry attempt ${RETRY_COUNT}: PR was removed from merge queue, automatically requeuing..." + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="🔄 Auto-retry attempt ${RETRY_COUNT}: PR was removed from merge queue (reason: ${{ github.event.reason }}). Automatically requeuing…"
100-109: Also include the reason in the max-retries comment.- gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - -f body="⚠️ Maximum auto-retry attempts reached. PR was removed from merge queue multiple times. Please investigate the issue and manually requeue if needed." + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="⚠️ Maximum auto-retry attempts reached (last reason: ${{ github.event.reason }}). Please investigate and requeue manually if needed."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/merge-queue-retry.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Lint check
🔇 Additional comments (3)
.github/workflows/merge-queue-retry.yml (3)
1-14: License header LGTM.
26-32: Verify GitHub App scopes match needs.The App must be installed on the repo with at least: Issues: write (comments), Pull requests: write, and permission to enqueue via GraphQL (Merge Queue capability). Please confirm the App has these. (docs.github.com)
Do you want me to add a README snippet listing the exact App permissions to grant?
110-119: Failure notification step LGTM.
terrykong
left a comment
There was a problem hiding this comment.
amazing @chtruong814 . this has plagued us for a while. i'll approve, but feel free to assign another reviewer if you'd like another set of eyes
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
.github/workflows/merge-queue-retry.yml (3)
17-21: Use pull_request_target and set minimal permissions (safer for forks and needed for secrets).Switch to pull_request_target so the workflow can read secrets and write comments, and add a minimal permissions block. Do not check out or execute PR code under pull_request_target.
-on: - pull_request: - types: - - dequeued +on: + pull_request_target: + types: + - dequeued + +permissions: + contents: read + pull-requests: write + issues: writeFor GitHub Actions, does the pull_request event have a 'dequeued' action with a 'reason' field, and is pull_request_target appropriate for accessing secrets on forked PRs?
33-63: Harden reason check, enable strict bash, reduce noisy logs, and quote GITHUB_OUTPUT.Current step is brittle on reason string, dumps all comments, and doesn’t fail fast.
- name: Check dequeue reason and retry count id: check_retry - if: github.event.reason == 'CI_TIMEOUT' + if: contains(fromJSON('["CI_TIMEOUT","CHECKS_TIMEOUT","timed_out","TIMEOUT","timeout"]'), github.event.reason) env: GH_TOKEN: ${{ steps.generate_token.outputs.token }} run: | - PR_NUMBER=${{ github.event.pull_request.number }} - - # Debug: Show all comments first - echo "=== All PR Comments ===" - gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - --jq '.[] | {id: .id, created_at: .created_at, body: .body[:100]}' + set -euo pipefail + PR_NUMBER=${{ github.event.pull_request.number }} + echo "Dequeued reason: '${{ github.event.reason }}'" echo "=== Filtering for retry comments ===" # Get the current number of retry attempts from PR comments RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ --jq '[.[] | select(.body | contains("Auto-retry attempt")) | .body] | length') echo "Current retry count: $RETRY_COUNT" MAX_RETRIES=3 if [ "$RETRY_COUNT" -lt "$MAX_RETRIES" ]; then - echo "should_retry=true" >> $GITHUB_OUTPUT - echo "retry_count=$((RETRY_COUNT + 1))" >> $GITHUB_OUTPUT + echo "should_retry=true" >> "$GITHUB_OUTPUT" + echo "retry_count=$((RETRY_COUNT + 1))" >> "$GITHUB_OUTPUT" echo "✅ Will retry (attempt $((RETRY_COUNT + 1))/$MAX_RETRIES)" else - echo "should_retry=false" >> $GITHUB_OUTPUT + echo "should_retry=false" >> "$GITHUB_OUTPUT" echo "❌ Max retries ($MAX_RETRIES) reached for PR #${PR_NUMBER}" fi
76-96: Use enqueuePullRequest with expectedHeadOid, detect failures, and exit non-zero.Current curl call lacks expectedHeadOid, ignores GraphQL errors, and never fails the step.
- name: Requeue Pull Request if: steps.check_retry.outputs.should_retry == 'true' env: GH_TOKEN: ${{ steps.generate_token.outputs.token }} run: | - PR_NUMBER=${{ github.event.pull_request.number }} - PR_NODE_ID="${{ github.event.pull_request.node_id }}" - - echo "Requeuing PR #${PR_NUMBER}..." - - # First, try using GraphQL API to enqueue the PR directly - GRAPHQL_RESPONSE=$(curl -s -X POST \ - -H "Authorization: Bearer ${{ steps.generate_token.outputs.token }}" \ - -H "Content-Type: application/json" \ - -d "{\"query\": \"mutation { enqueuePullRequest(input: {pullRequestId: \\\"${PR_NODE_ID}\\\"}) { clientMutationId } }\"}" \ - https://api.github.com/graphql) - - if echo "$GRAPHQL_RESPONSE" | jq -e '.data.enqueuePullRequest' > /dev/null; then - echo "PR #${PR_NUMBER} has been successfully requeued" - fi + set -euo pipefail + PR_NUMBER=${{ github.event.pull_request.number }} + PR_NODE_ID="${{ github.event.pull_request.node_id }}" + HEAD_SHA="${{ github.event.pull_request.head.sha }}" + echo "Requeuing PR #${PR_NUMBER}..." + RESP=$(gh api graphql -f query=' + mutation($id:ID!, $oid:GitObjectID) { + enqueuePullRequest(input:{pullRequestId:$id, expectedHeadOid:$oid}) { + mergeQueueEntry { id } + } + }' -f id="$PR_NODE_ID" -f oid="$HEAD_SHA") + if echo "$RESP" | jq -e '.data.enqueuePullRequest.mergeQueueEntry.id' >/dev/null; then + echo "✅ PR #${PR_NUMBER} successfully requeued" + else + echo "GraphQL response: $RESP" + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="❌ Auto-retry attempted but requeue GraphQL call failed. Please requeue manually." + exit 1 + fi
🧹 Nitpick comments (4)
.github/workflows/merge-queue-retry.yml (4)
23-25: Add job-level concurrency to avoid duplicate retries when multiple dequeues fire.Prevents races posting multiple comments and enqueuing twice.
requeue-pr: runs-on: ubuntu-latest + concurrency: + group: requeue-pr-${{ github.event.pull_request.number }} + cancel-in-progress: false
65-76: Minor: ensure shell interpolation occurs as intended and consider consistent wording.The interpolation is fine, but consider a consistent, grep-able prefix.
- gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - -f body="🔄 Auto-retry attempt ${RETRY_COUNT}: PR was removed from merge queue, automatically requeuing..." + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="🔄 Auto-retry attempt ${RETRY_COUNT}: PR dequeued (reason: ${{ github.event.reason }}). Automatically requeuing..."
97-116: Tighten notifications; include reason for context.Make messages more actionable for on-call.
- gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - -f body="⚠️ Maximum auto-retry attempts reached. PR was removed from merge queue multiple times. Please investigate the issue and manually requeue if needed." + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="⚠️ Maximum auto-retry attempts reached (reason: ${{ github.event.reason }}). Please investigate and manually requeue if needed."- gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - -f body="❌ Auto-retry failed due to an error in the workflow. Please manually requeue the PR." + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="❌ Auto-retry workflow error. Requeue did not complete. Please manually requeue the PR."
28-31: Security hardening: pin actions to commit SHAs.Prevents supply-chain surprises from mutable tags.
- uses: actions/create-github-app-token@v1 + uses: actions/create-github-app-token@a38b9d0c6e530c4b1f1a7f7e23a2d0bb76b0a52e # v1
ko3n1g
left a comment
There was a problem hiding this comment.
Do we know why we're running into this issue in the first place? In the settings, we've configured a timeout of 6hrs, and if I'm reading this correctly an average merge item takes 3.5hrs to complete? So with 2 items in the queue, it's expected that the 2nd item will timeout?
I'm new to merge-queues so I'm pretty sure there's a mistake in my assumption.
So the two questions I have are:
- Why 360min and not something larger?
- Is it true that a merge-item takes 3.5hrs to complete? That seems really heavy for GitHub CI. If this is the case, I would recommend to offload some testing to main and deal with it via reverts
|
@ko3n1g Regarding the 360 limit, the upper limit is 360. If you attempt to make it longer, the UI prevents you. So, if many items are in the queue and the CI takes a while, then it will remove PRs that have exceeded that overall time. It's a fair call out on the overall Github CI time and a good suggestion for us to keep in mind. It's a known issue but not something we'll be addressing in this PR. It's hard to say what's optimal at the moment given the current CI infra limitations. |
ko3n1g
left a comment
There was a problem hiding this comment.
Thanks for educating me on this. This upper limit is very unfortunate. Would probably be good to have this workflow in the FW-templates toolbox eventually.
|
🔄 Auto-retry attempt 2: PR was removed from merge queue, automatically requeuing... |
|
🔄 Auto-retry attempt 3: PR was removed from merge queue, automatically requeuing... |
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
ℹ️ File Consistency CheckCheck based on commit: 795591a (PR #1111 from This is a test comment This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
ℹ️ File Consistency CheckCheck based on commit: 7a66f30 (PR #1111 from This is a test comment This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
.github/workflows/merge-queue-retry.yml (3)
34-65: Uncomment and harden the retry gate; match timeout reasons robustly; fail fast.Re-enable this with strict bash, tolerant reason matching, and clear outputs. This prevents infinite retries and noisy logs.
Apply:
- # - name: Check dequeue reason and retry count - # id: check_retry - # if: github.event.reason == 'CI_TIMEOUT' - # env: - # GH_TOKEN: ${{ steps.generate_token.outputs.token }} - # run: | - # PR_NUMBER=${{ github.event.pull_request.number }} - # # Debug: Show all comments first - # echo "=== All PR Comments ===" - # gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - # --jq '.[] | {id: .id, created_at: .created_at, body: .body[:100]}' - # echo "=== Filtering for retry comments ===" - # # Get the current number of retry attempts from PR comments - # RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - # --jq '[.[] | select(.body | contains("Auto-retry attempt")) | .body] | length') - # echo "Current retry count: $RETRY_COUNT" - # MAX_RETRIES=3 - # if [ "$RETRY_COUNT" -lt "$MAX_RETRIES" ]; then - # echo "should_retry=true" >> $GITHUB_OUTPUT - # echo "retry_count=$((RETRY_COUNT + 1))" >> $GITHUB_OUTPUT - # echo "✅ Will retry (attempt $((RETRY_COUNT + 1))/$MAX_RETRIES)" - # else - # echo "should_retry=false" >> $GITHUB_OUTPUT - # echo "❌ Max retries ($MAX_RETRIES) reached for PR #${PR_NUMBER}" - # fi + - name: Check dequeue reason and retry count + id: check_retry + if: contains(fromJSON('["CI_TIMEOUT","CHECKS_TIMEOUT","TIMEOUT","timed_out","timeout"]'), github.event.reason) + env: + GH_TOKEN: ${{ steps.generate_token.outputs.token }} + run: | + set -euo pipefail + PR_NUMBER='${{ github.event.pull_request.number }}' + MAX_RETRIES="${MAX_RETRIES:-3}" + echo "Dequeued reason: '${{ github.event.reason }}'" + RETRY_COUNT=$( + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + --jq '[ .[] | select(.body | contains("Auto-retry attempt")) ] | length' + ) + echo "Current retry count: $RETRY_COUNT" + if [ "$RETRY_COUNT" -lt "$MAX_RETRIES" ]; then + { + echo "should_retry=true" + echo "retry_count=$((RETRY_COUNT + 1))" + } >> "$GITHUB_OUTPUT" + echo "✅ Will retry (attempt $((RETRY_COUNT + 1))/$MAX_RETRIES)" + else + echo "should_retry=false" >> "$GITHUB_OUTPUT" + echo "❌ Max retries ($MAX_RETRIES) reached for PR #${PR_NUMBER}" + fi
17-22: Trigger mismatch: switch to pull_request_target: dequeued (and add workflow_dispatch).Current
pushtrigger can’t accessgithub.event.pull_request.*and won’t fire on merge-queue dequeues. Usepull_request_targetwithtypes: [dequeued]to receiveevent.reasonand PR context; addworkflow_dispatchfor manual tests.Apply:
name: "Merge Queue Auto-Retry" -on: - push: - # pull_request: - # types: - # - dequeued +on: + pull_request_target: + types: [dequeued] + workflow_dispatch: + +# Minimal base token perms (App token is used for writes). +permissions: + contents: read + pull-requests: write + issues: write
77-97: Hard-coded PR ids; missing expectedHeadOid; no error handling; step won’t fail on GraphQL errors.This will always requeue PR 1111, ignores the dequeued PR, and may silently “succeed.” Pull values from the event, include
expectedHeadOid, enable bash safety, and fail on error. Gate on retry decision.Apply:
- - name: Requeue Pull Request - # if: steps.check_retry.outputs.should_retry == 'true' + - name: Requeue Pull Request + if: steps.check_retry.outputs.should_retry == 'true' env: GH_TOKEN: ${{ steps.generate_token.outputs.token }} run: | - PR_NUMBER="1111" - PR_NODE_ID="PR_kwDOOJjv8s6nvYsV" - - echo "Requeuing PR #${PR_NUMBER}..." - - # First, try using GraphQL API to enqueue the PR directly - GRAPHQL_RESPONSE=$(curl -s -X POST \ - -H "Authorization: Bearer ${{ steps.generate_token.outputs.token }}" \ - -H "Content-Type: application/json" \ - -d "{\"query\": \"mutation { enqueuePullRequest(input: {pullRequestId: \\\"${PR_NODE_ID}\\\"}) { clientMutationId } }\"}" \ - https://api.github.com/graphql) - - echo "GRAPHQL_RESPONSE: $GRAPHQL_RESPONSE" - if echo "$GRAPHQL_RESPONSE" | jq -e '.data.enqueuePullRequest' > /dev/null; then - echo "PR #${PR_NUMBER} has been successfully requeued" - fi + set -euo pipefail + PR_NUMBER="${{ github.event.pull_request.number }}" + PR_NODE_ID="${{ github.event.pull_request.node_id }}" + HEAD_SHA="${{ github.event.pull_request.head.sha }}" + echo "Requeuing PR #${PR_NUMBER}..." + RESP=$(gh api graphql -f query=' + mutation($id:ID!, $oid:GitObjectID){ + enqueuePullRequest(input:{pullRequestId:$id, expectedHeadOid:$oid}) { + mergeQueueEntry { id } + } + }' -f id="$PR_NODE_ID" -f oid="$HEAD_SHA") + if echo "$RESP" | jq -e '.data.enqueuePullRequest.mergeQueueEntry.id' >/dev/null; then + echo "✅ PR #${PR_NUMBER} successfully requeued" + else + echo "GraphQL response: $RESP" + exit 1 + fi
🧹 Nitpick comments (5)
.github/workflows/merge-queue-retry.yml (5)
66-76: Post a retry marker comment to track attempts.Re-enable this so the count logic has a durable marker.
Apply:
- # - name: Add retry comment - # if: steps.check_retry.outputs.should_retry == 'true' - # env: - # GH_TOKEN: ${{ steps.generate_token.outputs.token }} - # run: | - # PR_NUMBER=${{ github.event.pull_request.number }} - # RETRY_COUNT=${{ steps.check_retry.outputs.retry_count }} - # gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - # -f body="🔄 Auto-retry attempt ${RETRY_COUNT}: PR was removed from merge queue, automatically requeuing..." + - name: Add retry comment + if: steps.check_retry.outputs.should_retry == 'true' + env: + GH_TOKEN: ${{ steps.generate_token.outputs.token }} + run: | + PR_NUMBER='${{ github.event.pull_request.number }}' + RETRY_COUNT='${{ steps.check_retry.outputs.retry_count }}' + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="🔄 Auto-retry attempt ${RETRY_COUNT}: PR was removed from merge queue due to '${{ github.event.reason }}'. Requeuing…"
99-108: Surface max-retries reached to the PR.Notify the author when auto-retries stop.
Apply:
- # - name: Max retries reached comment - # if: steps.check_retry.outputs.should_retry == 'false' - # env: - # GH_TOKEN: ${{ steps.generate_token.outputs.token }} - # run: | - # PR_NUMBER=${{ github.event.pull_request.number }} - # gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - # -f body="⚠️ Maximum auto-retry attempts reached. PR was removed from merge queue multiple times. Please investigate the issue and manually requeue if needed." + - name: Max retries reached comment + if: steps.check_retry.outputs.should_retry == 'false' + env: + GH_TOKEN: ${{ steps.generate_token.outputs.token }} + run: | + PR_NUMBER='${{ github.event.pull_request.number }}' + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="⚠️ Maximum auto-retry attempts reached. Please investigate the flake/timeouts and requeue manually if needed."
23-26: Add a concurrency group to avoid duplicate requeues.Prevents two dequeues from racing and enqueuing twice.
Apply:
jobs: requeue-pr: runs-on: ubuntu-latest + concurrency: + group: merge-queue-retry-${{ github.event.pull_request.number || github.run_id }} + cancel-in-progress: true
27-33: Validate App token creation inputs.If
vars.BOT_ID/secrets.BOT_KEYare missing, fail early with a clearer message.Apply:
- name: Generate GitHub App token id: generate_token uses: actions/create-github-app-token@v1 with: app-id: ${{ vars.BOT_ID }} private-key: ${{ secrets.BOT_KEY }} + - name: Verify App token + run: | + test -n "${{ steps.generate_token.outputs.token }}" || { echo "Missing App token"; exit 1; }
7-7: Use HTTPS for the license URL.Minor polish.
Apply:
-# http://www.apache.org/licenses/LICENSE-2.0 +# https://www.apache.org/licenses/LICENSE-2.0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/merge-queue-retry.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Post submodule check comment / Comment on PR
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
ℹ️ File Consistency CheckCheck based on commit: c3ec29a (PR #1111 from This is a test comment This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
ℹ️ File Consistency CheckCheck based on commit: c6a6fb6 (PR #1111 from This is a test comment This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (5)
.github/workflows/merge-queue-retry.yml (5)
33-36: Reason check is brittle; tolerate variants of timeout.- if: github.event.reason == 'CI_TIMEOUT' + if: contains(fromJSON('["CI_TIMEOUT","CHECKS_TIMEOUT","timed_out","TIMEOUT","timeout"]'), github.event.reason)
38-45: Harden shell, cut noisy logs, and quote outputs.- run: | - PR_NUMBER=${{ github.event.pull_request.number }} - - # Debug: Show all comments first - echo "=== All PR Comments ===" - gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - --jq '.[] | {id: .id, created_at: .created_at, body: .body[:100]}' + run: | + set -euo pipefail + PR_NUMBER='${{ github.event.pull_request.number }}' + echo "Dequeued reason: '${{ github.event.reason }}'" @@ - RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ --jq '[.[] | select(.body | contains("Auto-retry attempt")) | .body] | length') @@ - echo "should_retry=true" >> $GITHUB_OUTPUT - echo "retry_count=$((RETRY_COUNT + 1))" >> $GITHUB_OUTPUT + echo "should_retry=true" >> "$GITHUB_OUTPUT" + echo "retry_count=$((RETRY_COUNT + 1))" >> "$GITHUB_OUTPUT" echo "✅ Will retry (attempt $((RETRY_COUNT + 1))/$MAX_RETRIES)" else - echo "should_retry=false" >> $GITHUB_OUTPUT + echo "should_retry=false" >> "$GITHUB_OUTPUT" echo "❌ Max retries ($MAX_RETRIES) reached for PR #${PR_NUMBER}" fiAlso applies to: 48-63
111-119: Failure notifier: shell safety and token fallback.- env: - GH_TOKEN: ${{ steps.generate_token.outputs.token }} - run: | - PR_NUMBER=${{ github.event.pull_request.number }} + env: + GH_TOKEN: ${{ steps.generate_token.outputs.token || github.token }} + run: | + set -euo pipefail + PR_NUMBER='${{ github.event.pull_request.number }}'
17-21: Blocker: use pull_request_target and declare minimal permissions (fork safety + secrets).
pull_requestwon’t expose secrets (BOT_KEY) on forks; this workflow will fail there. Switch topull_request_targetand add minimal perms.on: - pull_request: + pull_request_target: types: - dequeued + +# Minimal base token perms; App token is used for writes. +permissions: + contents: read + pull-requests: write + issues: write
76-99: Use gh graphql with expectedHeadOid; fail and notify on GraphQL errors.- run: | - PR_NUMBER=${{ github.event.pull_request.number }} - PR_NODE_ID="${{ github.event.pull_request.node_id }}" - - echo "Requeuing PR #${PR_NUMBER}..." - - # First, try using GraphQL API to enqueue the PR directly - GRAPHQL_RESPONSE=$(curl -s -X POST \ - -H "Authorization: Bearer ${{ steps.generate_token.outputs.token }}" \ - -H "Content-Type: application/json" \ - -d "{\"query\": \"mutation { enqueuePullRequest(input: {pullRequestId: \\\"${PR_NODE_ID}\\\"}) { clientMutationId } }\"}" \ - https://api.github.com/graphql) - - if echo "$GRAPHQL_RESPONSE" | jq -e '.data.enqueuePullRequest' > /dev/null; then - echo "PR #${PR_NUMBER} has been successfully requeued" - else - echo "❌ Failed to enqueue PR #${PR_NUMBER}. GraphQL response for debugging:" - echo "$GRAPHQL_RESPONSE" - exit 1 - fi + run: | + set -euo pipefail + PR_NUMBER='${{ github.event.pull_request.number }}' + PR_NODE_ID='${{ github.event.pull_request.node_id }}' + HEAD_SHA='${{ github.event.pull_request.head.sha }}' + + echo "Requeuing PR #${PR_NUMBER}..." + RESP=$(gh api graphql -f query=' + mutation($id:ID!, $oid:GitObjectID) { + enqueuePullRequest(input:{pullRequestId:$id, expectedHeadOid:$oid}) { + mergeQueueEntry { id } + } + }' -f id="$PR_NODE_ID" -f oid="$HEAD_SHA") + + if echo "$RESP" | jq -e '.data.enqueuePullRequest.mergeQueueEntry.id' >/dev/null; then + echo "✅ PR #${PR_NUMBER} successfully requeued" + else + echo "GraphQL response: $RESP" + gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + -f body="❌ Auto-retry attempted but requeue GraphQL call failed. Please requeue manually." + exit 1 + fi
🧹 Nitpick comments (4)
.github/workflows/merge-queue-retry.yml (4)
22-25: Optional: avoid duplicate runs per PR with concurrency.jobs: requeue-pr: runs-on: ubuntu-latest + concurrency: + group: auto-retry-${{ github.event.pull_request.number }} + cancel-in-progress: true
49-51: Count only bot-authored retry comments to avoid false positives.- RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ - --jq '[.[] | select(.body | contains("Auto-retry attempt")) | .body] | length') + RETRY_COUNT=$(gh api "repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \ + --jq '[.[] | select((.body | contains("Auto-retry attempt")) and (.user.type=="Bot" or (.user.login|test("bot$")))) | .body] | length')
65-75: Quote interpolations and harden shell in comment step.- run: | - PR_NUMBER=${{ github.event.pull_request.number }} - RETRY_COUNT=${{ steps.check_retry.outputs.retry_count }} + run: | + set -euo pipefail + PR_NUMBER='${{ github.event.pull_request.number }}' + RETRY_COUNT='${{ steps.check_retry.outputs.retry_count }}'
101-110: Shell safety for “max retries reached” step.- run: | - PR_NUMBER=${{ github.event.pull_request.number }} + run: | + set -euo pipefail + PR_NUMBER='${{ github.event.pull_request.number }}'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/merge-queue-retry.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
What does this PR do ?
Add merge queue retry if CI_TIMEOUT
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Chores