feat[artifact]: add multipart s3 download#3875
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba614d5381
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if experiments.IsEnabled(ctx, experiments.S3MultipartDownloads) { | ||
| return d.startMultipart(ctx) | ||
| } |
There was a problem hiding this comment.
Preserve retry policy on multipart S3 downloads
When s3-multipart-downloads is enabled, Start returns startMultipart directly, so the configured Retries value is never applied to the overall download attempt. The legacy path used NewDownload(...).Start(ctx), which wraps the transfer in a 5-attempt retrier; this change means transient S3/network failures now fail sooner on the experimental path because only the SDK’s internal per-request retries run. Please wrap the multipart path with equivalent top-level retry behavior (or map conf.Retries into the SDK retryer) to keep reliability consistent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The multipart upload was intentionally not wrapped in a the retrier because the AWS SDK already handles per part retries. Retrying the outer call could result in 5 (configured retries) x 3 (AWS SDK default retry per part) = 15 total retries.
As a compromise I will wire in conf.Retries into the S3 client's per part max retries.
There was a problem hiding this comment.
When can we say that we have enough confidence that this can graduate from experiment? Can we use more extensive test coverage to avoid the experiment?
Our current stance is avoiding experiment if possible (though I notice that we just added the zip-plugins experiment 😅 )
There was a problem hiding this comment.
Yea that's fair. I'd at least like the reporting customer to try it, but I might defer to more seasoned Agents Stewards.
There was a problem hiding this comment.
one of the tricky things about experiments is that it's often really hard to get customers to actually use them, even when it would fix their issues. saying "try this feature that we're not sure works" is often kind of a hard sell, for obvious reasons.
on the other hand, one of the wonderful things about the agent is that if we break stuff, customers can very easily revert their version and let us know that something broke. so in reality, every change to the agent is opt-in (or opt-out).
obviously this doesn't give us permission to ship stuff that's broken, but it does allow us to Boldly Go a little bit. so long as we've done our due diligence (and based on the tests here, we have), it's okay to ship it and fix it if it breaks.
There was a problem hiding this comment.
my general rules of thumb on experiments are:
- if at some point we want to say "this feature was a bad idea, i think we should just get rid of it", then we should use an experiment.
zip-pluginsandpty-raware good examples of this — we're not sure whether or not the feature is going to make it to primetime, so we give ourselves permission to remove them if we want. - if a feature is unfinished at the time of an agent release, it might be worth making an experiment, but it's often going to be easier to have things behind a flag and say "experimental" somewhere in the flag description
there was a time where we were using experiments as escape hatches for backwards compatibility fixes (eg, normalised-upload-paths), but i think we're moving away from that as time goes on, and just using regular CLI flags instead. the user interface is more or less the same (the difference between --experiment normalised-upload-paths and --normalised-upload-paths is small), but having flags is generally more discoverable
There was a problem hiding this comment.
I agree with @moskyb. I'd vote let's ship without experiment flag 🙏🏿
1ed8d44 to
ea59895
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4189e7d09
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if d.conf.DisableMultipart { | ||
| return d.startSingle(ctx) | ||
| } | ||
| return d.startMultipart(ctx) |
There was a problem hiding this comment.
Gate multipart S3 downloads behind an opt-in switch
This branch makes multipart the default path for every s3:// artifact download because DisableMultipart is false by default and there is no experiment/feature-flag check before calling startMultipart. That turns a behavior-change rollout into an immediate global switch for custom S3 users, which can unexpectedly alter download behavior in production instead of keeping it opt-in. Please gate this path behind an explicit opt-in (or keep single-stream as default) so existing agents do not silently change transport strategy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks, but this was a deliberate decision as per chats with real humans. #3875 (comment)
b4189e7 to
ce4bec8
Compare
moskyb
left a comment
There was a problem hiding this comment.
a couple little nits, but it's looking really good!
| EnvVar: "BUILDKITE_AGENT_INCLUDE_RETRIED_JOBS", | ||
| Usage: "Include artifacts from retried jobs in the search (default: false)", | ||
| }, | ||
| NoS3MultipartDownloadFlag, |
There was a problem hiding this comment.
if this is the only place the flag is being used, just define it inline rather than having it out in globals i reckon
| // Disable parallel multipart downloads for s3:// artifacts and fall back | ||
| // to a single-stream presigned GET. Has no effect on other backends. | ||
| NoS3MultipartDownload bool |
There was a problem hiding this comment.
the general pattern we have for --no-whatever flags is to have the flag be negated (eg --no-s3-multipart-download), but then have any references to that behaviour happen in the positive (eg if cfg.S3MultipartDownload { ... }) as it makes the logic a lot easier to read. if cfg.S3MultipartDownload is a lot easier to read than if !cfg.NoS3MultipartDownload etc.
see this block in agent start — all of the plumbing in the agent uses positive bools, which are negated from the flag values, which are negative bools.
this one is a bit of a persnicket — happy to chat more about it on monday.
5d788ac to
d6f3d83
Compare
- multipart download by default - opt out with --no-s3-multipart-download - uses AWS SDK's native per part retries, configured with N from conf.Retries - per part download size & concurrency matches AWS CLI defaults Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d6f3d83 to
fd228a4
Compare
Description
Improve artifact download speed for custom s3 buckets by supporting multipart download. Previously this was done via a single stream
Context
Resolves A-1172
Changes
Uses the feature/s3/manager to fan out a single s3 GET request into parallel ranged requests.
On by default; opt out via
--no-s3-multipart-download.
Testing
go test ./...). Buildkite employees may check this if the pipeline has run automatically.go tool gofumpt -extra -w .)Disclosures / Credits
Researched Planned and Implemented with claude
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com