Skip to content

CI: back off retry attempts with exponential delay to ride out 504s#258

Merged
dfed merged 6 commits intomainfrom
claude/ci-artifact-bundle-cache-and-retry
Apr 19, 2026
Merged

CI: back off retry attempts with exponential delay to ride out 504s#258
dfed merged 6 commits intomainfrom
claude/ci-artifact-bundle-cache-and-retry

Conversation

@dfed
Copy link
Copy Markdown
Owner

@dfed dfed commented Apr 18, 2026

Summary

  • Recent CI runs have been failing on GitHub releases CDN 504s while downloading SafeDITool.artifactbundle.zip. The old retry helper (3 attempts × 10s = ~30s window) is well inside typical brownout duration. See run 24608060028 / job 71957528753 for the failure mode.
  • Reworks the retry composite action to use exponential backoff with more attempts: 6 attempts with delays of 15s → 30s → 60s → 120s → 240s (capped). Total retry window is ~7.75 minutes — enough to ride out most CDN brownouts.

Test plan

  • Verify all existing CI jobs (xcodebuild matrix, spm-package-integration, spm-project-integration, spm-multi-project-integration, spm, linux, lint-swift) still pass with the reworked retry action.
  • Confirm that a transient failure now logs the doubling delay (e.g. Attempt 1 failed, retrying in 15s...Attempt 2 failed, retrying in 30s...).

🤖 Generated with Claude Code

Recent CI runs have been failing on the GitHub releases CDN with
504 Gateway Timeout while downloading SafeDITool.artifactbundle.zip.
The previous retry helper used 3 attempts with a flat 10s delay
(~30s total), which is well inside a typical CDN brownout window.

Two changes:

1. Add a `cache-swiftpm` composite action that wraps `actions/cache`
   over the SwiftPM artifact and caches directories on macOS and
   Linux, keyed on `Package.swift` (and `CLI/Package.swift`) hashes.
   Wire it into every job that resolves package dependencies.
   Once warm, the artifact bundle download is skipped entirely.

2. Switch the retry helper to exponential backoff with a higher
   default attempt count: 6 attempts with delays of 15s, 30s, 60s,
   120s, 240s (capped) — ~7.75 minutes of total retry window for
   cold-cache cases that fall through to the network.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (d3d6454) to head (00aa3c9).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #258   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           41        41           
  Lines         6369      6369           
=========================================
  Hits          6369      6369           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f297d9bdd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

~/Library/Caches/org.swift.swiftpm
~/Library/org.swift.swiftpm
~/.cache/org.swift.swiftpm
key: ${{ runner.os }}-spm-${{ hashFiles('Package.swift', 'CLI/Package.swift') }}${{ inputs.key-suffix && format('-{0}', inputs.key-suffix) || '' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include job scope in SwiftPM cache key

The cache key only varies by OS and manifest hash, so all macOS jobs that call this action without key-suffix race to create the same entry. On a cold key, a job like spm-package-integration can save first with a dependency set that does not include the prebuilt SafeDITool artifact, and later jobs then get an exact cache hit and cannot update that key. In that state, xcodebuild/spm jobs still have to download the artifact from GitHub Releases, so the intended 504 mitigation can remain ineffective until the hash changes.

Useful? React with 👍 / 👎.

dfed and others added 2 commits April 18, 2026 12:54
The cache action only varied the key by OS and manifest hashes, so every
job that resolved Package.swift shared a single key. The first job to
save its cache (e.g., spm-package-integration with no SafeDITool prebuilt
artifact) could poison the entry for later jobs (xcodebuild/spm), which
then got an exact-key hit and could not update it — leaving them to fall
back to the artifact bundle download even on warm cache, defeating the
504 mitigation.

Include `github.job` in the cache key (and restore-keys) so each job owns
its own entry. The optional `key-suffix` input is kept as a secondary
scope hatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32b1fecd16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# would let the first job to save poison the shared cache for every
# other job — leaving them with an exact-key hit they cannot update.
# Scope by `github.job` so each job owns its own cache entry.
key: ${{ runner.os }}-spm-${{ github.job }}-${{ hashFiles('Package.swift', 'CLI/Package.swift') }}${{ inputs.key-suffix && format('-{0}', inputs.key-suffix) || '' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include all resolved manifests in cache key

The cache key only hashes Package.swift and CLI/Package.swift, but CI also resolves dependencies from other manifests (for example swift package resolve --package-path "Examples/Example Package Integration" in .github/workflows/ci.yml). When one of those un-hashed manifests changes, this key does not rotate, so the workflow gets an exact cache hit and cannot save the newly resolved artifacts/dependencies back to cache; those jobs then keep starting from stale cache state and continue hitting the network, which undermines the 504-mitigation goal of this change.

Useful? React with 👍 / 👎.

The cache key only hashed the root Package.swift and CLI/Package.swift,
but jobs also resolve dependencies from the Examples/* manifests (both
Package.swift files and the .xcodeproj project.pbxproj files). When one
of those un-hashed manifests changed, the key did not rotate, so the
workflow received an exact cache hit it could not update. Affected jobs
kept restoring stale cache state and continued hitting the network,
undermining the 504-mitigation goal of this action.

Switch to globs that cover every checked-in Package.swift and Xcode
project file so any dependency-affecting manifest change rotates the
key. New Examples added later are picked up automatically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfed
Copy link
Copy Markdown
Owner Author

dfed commented Apr 18, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a4f10bf9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# integrations) so changing any of them rotates the key — otherwise
# an unhashed manifest change leaves the workflow on a stale exact hit
# that it cannot update.
key: ${{ runner.os }}-spm-${{ github.job }}-${{ hashFiles('**/Package.swift', '**/*.xcodeproj/project.pbxproj') }}${{ inputs.key-suffix && format('-{0}', inputs.key-suffix) || '' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include Package.resolved files in cache key

Hashing only Package.swift and project.pbxproj means dependency lockfile-only updates (for example Package.resolved changes) reuse the same primary cache key, so actions/cache restores an exact hit and does not create an updated cache for the new resolved graph. In that scenario, each run can keep re-downloading newly pinned dependencies/artifacts, which undercuts the reliability/performance goal of this change; include **/Package.resolved (and any other lockfiles used by these jobs) in the key hash so cache entries rotate when resolved dependencies change.

Useful? React with 👍 / 👎.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfed dfed changed the title CI: cache SwiftPM artifacts and back off retries to ride out 504s CI: back off retry attempts with exponential delay to ride out 504s Apr 19, 2026
@dfed dfed merged commit c86f3b3 into main Apr 19, 2026
17 checks passed
@dfed dfed deleted the claude/ci-artifact-bundle-cache-and-retry branch April 19, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant