CI: back off retry attempts with exponential delay to ride out 504s#258
CI: back off retry attempts with exponential delay to ride out 504s#258
Conversation
Recent CI runs have been failing on the GitHub releases CDN with 504 Gateway Timeout while downloading SafeDITool.artifactbundle.zip. The previous retry helper used 3 attempts with a flat 10s delay (~30s total), which is well inside a typical CDN brownout window. Two changes: 1. Add a `cache-swiftpm` composite action that wraps `actions/cache` over the SwiftPM artifact and caches directories on macOS and Linux, keyed on `Package.swift` (and `CLI/Package.swift`) hashes. Wire it into every job that resolves package dependencies. Once warm, the artifact bundle download is skipped entirely. 2. Switch the retry helper to exponential backoff with a higher default attempt count: 6 attempts with delays of 15s, 30s, 60s, 120s, 240s (capped) — ~7.75 minutes of total retry window for cold-cache cases that fall through to the network. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #258 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 41 41
Lines 6369 6369
=========================================
Hits 6369 6369 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f297d9bdd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ~/Library/Caches/org.swift.swiftpm | ||
| ~/Library/org.swift.swiftpm | ||
| ~/.cache/org.swift.swiftpm | ||
| key: ${{ runner.os }}-spm-${{ hashFiles('Package.swift', 'CLI/Package.swift') }}${{ inputs.key-suffix && format('-{0}', inputs.key-suffix) || '' }} |
There was a problem hiding this comment.
Include job scope in SwiftPM cache key
The cache key only varies by OS and manifest hash, so all macOS jobs that call this action without key-suffix race to create the same entry. On a cold key, a job like spm-package-integration can save first with a dependency set that does not include the prebuilt SafeDITool artifact, and later jobs then get an exact cache hit and cannot update that key. In that state, xcodebuild/spm jobs still have to download the artifact from GitHub Releases, so the intended 504 mitigation can remain ineffective until the hash changes.
Useful? React with 👍 / 👎.
…ndle-cache-and-retry
The cache action only varied the key by OS and manifest hashes, so every job that resolved Package.swift shared a single key. The first job to save its cache (e.g., spm-package-integration with no SafeDITool prebuilt artifact) could poison the entry for later jobs (xcodebuild/spm), which then got an exact-key hit and could not update it — leaving them to fall back to the artifact bundle download even on warm cache, defeating the 504 mitigation. Include `github.job` in the cache key (and restore-keys) so each job owns its own entry. The optional `key-suffix` input is kept as a secondary scope hatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32b1fecd16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # would let the first job to save poison the shared cache for every | ||
| # other job — leaving them with an exact-key hit they cannot update. | ||
| # Scope by `github.job` so each job owns its own cache entry. | ||
| key: ${{ runner.os }}-spm-${{ github.job }}-${{ hashFiles('Package.swift', 'CLI/Package.swift') }}${{ inputs.key-suffix && format('-{0}', inputs.key-suffix) || '' }} |
There was a problem hiding this comment.
Include all resolved manifests in cache key
The cache key only hashes Package.swift and CLI/Package.swift, but CI also resolves dependencies from other manifests (for example swift package resolve --package-path "Examples/Example Package Integration" in .github/workflows/ci.yml). When one of those un-hashed manifests changes, this key does not rotate, so the workflow gets an exact cache hit and cannot save the newly resolved artifacts/dependencies back to cache; those jobs then keep starting from stale cache state and continue hitting the network, which undermines the 504-mitigation goal of this change.
Useful? React with 👍 / 👎.
The cache key only hashed the root Package.swift and CLI/Package.swift, but jobs also resolve dependencies from the Examples/* manifests (both Package.swift files and the .xcodeproj project.pbxproj files). When one of those un-hashed manifests changed, the key did not rotate, so the workflow received an exact cache hit it could not update. Affected jobs kept restoring stale cache state and continued hitting the network, undermining the 504-mitigation goal of this action. Switch to globs that cover every checked-in Package.swift and Xcode project file so any dependency-affecting manifest change rotates the key. New Examples added later are picked up automatically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a4f10bf9e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # integrations) so changing any of them rotates the key — otherwise | ||
| # an unhashed manifest change leaves the workflow on a stale exact hit | ||
| # that it cannot update. | ||
| key: ${{ runner.os }}-spm-${{ github.job }}-${{ hashFiles('**/Package.swift', '**/*.xcodeproj/project.pbxproj') }}${{ inputs.key-suffix && format('-{0}', inputs.key-suffix) || '' }} |
There was a problem hiding this comment.
Include Package.resolved files in cache key
Hashing only Package.swift and project.pbxproj means dependency lockfile-only updates (for example Package.resolved changes) reuse the same primary cache key, so actions/cache restores an exact hit and does not create an updated cache for the new resolved graph. In that scenario, each run can keep re-downloading newly pinned dependencies/artifacts, which undercuts the reliability/performance goal of this change; include **/Package.resolved (and any other lockfiles used by these jobs) in the key hash so cache entries rotate when resolved dependencies change.
Useful? React with 👍 / 👎.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
SafeDITool.artifactbundle.zip. The old retry helper (3 attempts × 10s = ~30s window) is well inside typical brownout duration. See run 24608060028 / job 71957528753 for the failure mode.retrycomposite action to use exponential backoff with more attempts: 6 attempts with delays of 15s → 30s → 60s → 120s → 240s (capped). Total retry window is ~7.75 minutes — enough to ride out most CDN brownouts.Test plan
xcodebuildmatrix,spm-package-integration,spm-project-integration,spm-multi-project-integration,spm,linux,lint-swift) still pass with the reworked retry action.Attempt 1 failed, retrying in 15s...→Attempt 2 failed, retrying in 30s...).🤖 Generated with Claude Code