feat: implement on-demand batch presigning for multipart uploads#4004
Closed
xuang7 wants to merge 14 commits intoapache:mainfrom
Closed
feat: implement on-demand batch presigning for multipart uploads#4004xuang7 wants to merge 14 commits intoapache:mainfrom
xuang7 wants to merge 14 commits intoapache:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR introduces on-demand batch presigning for multipart uploads to prevent failures from expired pre-signed URLs during long-running uploads. Previously, all part URLs were pre-signed upfront, causing later parts to expire (15-30 minutes). The new implementation presigns URLs in batches as needed.
Key Changes:
- Backend adds
presignUploadPartsmethod usingS3Presignerto sign specific part batches on-demand - API endpoint now supports
type=init(first batch) and newtype=signoperation (subsequent batches) - Frontend switches to RxJS
expandoperator for recursive, stateless batch fetching with configurable batch size (default: 100 parts)
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| frontend/src/app/dashboard/service/user/dataset/dataset.service.ts | Implements RxJS expand-based recursive batch fetching; adds signPendingParts method and urlBatchSize configuration |
| file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala | Adds S3Presigner client and presignUploadParts method with URI extraction helper |
| file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala | Adds "sign" operation handler; converts init response to Map format |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
frontend/src/app/dashboard/service/user/dataset/dataset.service.ts
Outdated
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala
Outdated
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Show resolved
Hide resolved
…e.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Xinyuan Lin <xinyual3@uci.edu>
…torageClient.scala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Xinyuan Lin <xinyual3@uci.edu>
Contributor
|
@aicam please review it before @aglinxinyuan does his review. |
Contributor
|
@xuang7 since we decided to change presign url expiration configuration, we can close this PR for now, if needed, we can open it later |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this PR?
This PR introduces on-demand (batch) presigning for multipart uploads to reduce failures from expired pre-signed URLs. Previously, all part URLs were pre-signed at the start using an experimental LakeFS API. For long uploads, URLs for later parts could expire (after 15 min locally or 30 min on the server), causing the upload to fail midway. The revised implementation uses the LakeFS function for initial setup, then presigns URL batches on-demand directly using
S3Presigner.Changes (Backend)
presignUploadParts. This method usess3Presignerto sign a specific list of provided partNumberspendingPartslist andphysicalAddressfrom the client. It calls the newS3StorageClient.presignUploadPartsto sign the requested batch and returns the new URLs.Changes (Frontend)
concatMapfor sequential batch processing:type=init(no pre-signed URLs)type=presignfor each batch just before uploadingurlBatchSizevariable (default: 50) to control how many URLs are requested in each init and sign call.Changes (Config)
s3MultipartPresignExpiryMinutesconfiguration variable to control presigned URL expiration time (default: 30 minutes)s3PresignEndpointconfiguration variable for generating presigned URLsPresigned URL Comparison
Any related issues, documentation, discussions?
Fixes #3837
Resolves URL expiration for pending parts. Fully handling interruptions during part uploads requires resumable uploads.
How was this PR tested?
Tested with existing automated test cases and local manual tests on k8s.
Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.5