Skip to content

ci(pre-commit): auth tflint via GITHUB_TOKEN + retry transient flakes (closes #564)#697

Merged
cristim merged 2 commits into
feat/multicloud-web-frontendfrom
fix/564-tflint-token
May 25, 2026
Merged

ci(pre-commit): auth tflint via GITHUB_TOKEN + retry transient flakes (closes #564)#697
cristim merged 2 commits into
feat/multicloud-web-frontendfrom
fix/564-tflint-token

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented May 25, 2026

Summary

Fixes the recurring Run pre-commit CI failure (issue #564) caused by terraform_tflint's tflint --init hitting the GitHub Releases API anonymously. Two-part fix:

  1. Expose secrets.GITHUB_TOKEN to the Run pre-commit step. tflint --init reads GITHUB_TOKEN natively; with it set the rate limit jumps from 60/hr per shared runner IP to 5000/hr per workflow-run token. Eliminates the rate-limit failure class entirely.

  2. Wrap with nick-fields/retry@v3.0.2 (SHA-pinned at ce71cc2ab81d554ebbe88c79ab5975992d79ba08 per project policy feedback_sha_pin_current_major.md): 3 attempts with 90s wait, 15-minute timeout. Defense-in-depth for the residual transient flakes (GitHub Releases availability blips, plugin download timeouts) that the token cannot fix.

Most recent observation of this failure: PR #696's pre-commit job. The actual code in that PR was clean; only this CI-infra hook tripped.

Diff

.github/workflows/pre-commit.yml (+21/-1):

Test plan

Cross-references

Summary by CodeRabbit

  • Chores
    • Improved reliability of pre-commit checks in CI by adding retry attempts and increasing step timeouts.
    • Extended overall workflow timeout to allow longer runs and accommodate retries.
    • Set authentication for hook downloads to ensure terraform/tflint-related hooks can fetch resources while preserving an existing skip for terraform validation.

Review Change Stack

…closes #564)

The `Run pre-commit` step in `.github/workflows/pre-commit.yml`
intermittently fails because `terraform_tflint`'s `tflint --init`
downloads ruleset plugins from the GitHub Releases API anonymously,
hitting the 60-requests-per-hour per-IP limit that the runner shares
with every other workflow running on the same NAT egress. Most
recently observed on PR #696
(https://github.com/LeanerCloud/CUDly/actions/runs/26411029351).

Two-part fix:

1. Expose secrets.GITHUB_TOKEN to the step. tflint reads GITHUB_TOKEN
   natively; with it set the limit jumps to 5000/hr per-token, which
   matters because the token is unique per workflow run, not per IP.
   This alone fixes the rate-limit class.

2. Wrap the step with nick-fields/retry@v3.0.2 (SHA-pinned per
   project policy) at 3 attempts with a 90s wait. The token fix
   eliminates the AWS-plugin rate-limit case; the retry handles the
   residual flakes (GitHub Releases availability blips, plugin
   download timeouts, etc.) so a transient failure no longer requires
   a manual rerun.

Acceptance criteria from #564:
- GITHUB_TOKEN exposed to the `Run pre-commit` step
- Three consecutive `pre-commit` runs in the same hour all complete
  without a tflint rate-limit failure (the retry-wrapper also catches
  the residual flakes the token cannot)
@cristim cristim added triaged Item has been triaged priority/p2 Backlog-worthy severity/medium Moderate harm urgency/now Drop other things impact/internal Team-internal only effort/xs Trivial / one-liner type/chore Maintenance / non-user-visible labels May 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42be5662-a42a-4c0d-9f24-d8424a77f0e3

📥 Commits

Reviewing files that changed from the base of the PR and between 607afea and dc639b6.

📒 Files selected for processing (1)
  • .github/workflows/pre-commit.yml

📝 Walkthrough

Walkthrough

The workflow job-level timeout was raised to 35 minutes. The Run pre-commit step now runs via nick-fields/retry@v3, exposes GITHUB_TOKEN in the step environment, preserves SKIP: terraform_validate, and configures retry params: timeout_minutes: 10, max_attempts: 3, retry_wait_seconds: 90.

Changes

Pre-commit Retry and Token Configuration

Layer / File(s) Summary
Job timeout and pre-commit retry wrapper
.github/workflows/pre-commit.yml
Job-level timeout-minutes increased to 35. Run pre-commit is wrapped with nick-fields/retry@v3, sets GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} in the step env, keeps SKIP: terraform_validate, and configures timeout_minutes: 10, max_attempts: 3, retry_wait_seconds: 90 while invoking pre-commit run --all-files via the wrapper.

Sequence Diagram

sequenceDiagram
  participant Runner
  participant RetryAction as nick-fields/retry
  participant PreCommit as pre-commit
  participant GitHubAPI as GitHub_API
  Runner->>RetryAction: start step (env: GITHUB_TOKEN, SKIP: terraform_validate)
  RetryAction->>PreCommit: run "pre-commit run --all-files" (command)
  PreCommit->>GitHubAPI: fetch plugin releases (uses GITHUB_TOKEN for auth)
  RetryAction->>RetryAction: on failure, wait 90s and retry (max 3 attempts, timeout 10m per attempt)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

urgency/this-sprint, effort/s

Poem

🐰 I hopped into CI, brave and spry,
I gave the token a wink and a try,
When hooks would stall,
I retried them all,
Now pre-commit hops past rate-limit sky.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding GITHUB_TOKEN authentication and retry logic to fix tflint rate-limit failures.
Linked Issues check ✅ Passed The PR exposes GITHUB_TOKEN to the Run pre-commit step as required by issue #564, enabling authenticated tflint requests to GitHub API.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the tflint rate-limit issue; no out-of-scope modifications detected beyond the stated objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/564-tflint-token

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented May 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/pre-commit.yml:
- Around line 202-204: The retry configuration is inconsistent: you set retry
plugin params timeout_minutes: 15, max_attempts: 3, retry_wait_seconds: 90 while
the overall job has timeout-minutes: 15, so retries may never run; either reduce
timeout_minutes in the nick-fields/retry block to a value that allows up to 3
attempts within the job cap (e.g., ~4 minutes per attempt accounting for
retry_wait_seconds) or increase the job-level timeout-minutes to cover
max_attempts * timeout_minutes + retry waits; update the keys timeout_minutes,
max_attempts, and/or the job's timeout-minutes accordingly to ensure the retry
policy can actually execute all attempts.
🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 490f2372-a613-4999-9f33-df2e4c9085f9

📥 Commits

Reviewing files that changed from the base of the PR and between 423e87c and 607afea.

📒 Files selected for processing (1)
  • .github/workflows/pre-commit.yml

Comment thread .github/workflows/pre-commit.yml Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

The agent generated fixes only for .github/workflows/ files, which cannot be committed due to permission restrictions. Please apply these changes manually.


⚠️ 1 file(s) could not be committed — the agent does not have permission to push to .github/workflows/. Please apply these changes manually:

.github/workflows/pre-commit.yml — 1 change:

Lines 199–205
           SKIP: terraform_validate
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         with:
-          timeout_minutes: 15
+          timeout_minutes: 4
           max_attempts: 3
           retry_wait_seconds: 90
           command: pre-commit run --all-files

CR pointed out the retry policy was ineffective: `nick-fields/retry`
uses `timeout_minutes` PER ATTEMPT (not total), so with
`max_attempts: 3` and `timeout_minutes: 15` the worst case is
~48 min, but the job-level `timeout-minutes: 15` killed any retry
before attempt 2 could start.

Fix: tune both timeouts so the retry budget fits inside the job
budget with safety margin.

- per-attempt timeout: 15 -> 10 minutes (normal pre-commit runs
  take ~3-5 min, 10 is comfortable margin without inflating
  hang-detection latency)
- job timeout: 15 -> 35 minutes
  worst case = 3 attempts * 10 min + 2 * 90 s retry waits = 33 min,
  fits in 35 min with margin

Avoids CR's suggested 50-minute job cap because that would also
delay killswitch on a genuinely hung step. 35 min is the tightest
value that still lets all 3 attempts complete.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented May 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim cristim merged commit c04df9d into feat/multicloud-web-frontend May 25, 2026
4 checks passed
@cristim cristim deleted the fix/564-tflint-token branch May 25, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/xs Trivial / one-liner impact/internal Team-internal only priority/p2 Backlog-worthy severity/medium Moderate harm triaged Item has been triaged type/chore Maintenance / non-user-visible urgency/now Drop other things

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant