Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multipart downloads when downloading large ranges via TransferManager.download() #248

Open
forrestfwilliams opened this issue Nov 2, 2022 · 6 comments

Comments

@forrestfwilliams
Copy link

This issue references issues #1215, and its duplicate #3466 from the boto3 repository. It has also been discussed in this stackOverflow post.

Issue

s3transfer supports ranged download requests and multipart downloads, however it is not possible to perform a multi-part download over a specific range. This results in slow download times when attempting to download a 1GB range of data from a 4GB file in S3.

Use Case

I work at the Alaska Satellite Facility, where we distribute large amounts of remote sensing data to users across the globe via AWS. Many of these datasets come in legacy formats, such as zip files, that are not cloud-friendly. Due to the highly structured nature of these datasets, we can identify byte ranges that contain subsets of data that our users would be interested in downloading directly. However, since these datasets are still large (~1GB within a larger 4GB zip file), and multipart downloads are not supported for range requests, we cannot offer extraction of these dataset with low latency. I know of many other groups that have encountered this issue while trying to distribute large remote sensing datasets.

Proposed Solution

It would be great if a range argument were added to TransferConfig, that could then be passed to a TransferManager.download() call, which would then download data ranges with sizes greater than the multipart_threshold via a multipart download.

I am willing to participate in developing this solution.

@forrestfwilliams
Copy link
Author

@tim-finnigan is there any update on this work? Excited to see #260!

@tim-finnigan
Copy link

I don't have any updates at the moment but will check in with the team.

@forrestfwilliams
Copy link
Author

@tim-finnigan just checking in again. Did you hear back from the team?

@tim-finnigan
Copy link

Hi @forrestfwilliams thanks for your patience and apologies for the delay in getting back to you. This issue was reviewed in the last couple of weeks and it was determined that it will need some further investigation at a cross-SDK level. I think there are some planned improvements related to S3 transfers that may or may not overlap with this issue. I wish I had more details to share at this point but unfortunately that is the extent of what I know at this time. I'll still plan to update this issue when there is more information to share.

@forrestfwilliams
Copy link
Author

Hey @tim-finnigan, any updates on the "planned improvements related to S3 transfers" that overlap with this issue? Thanks!

@tim-finnigan
Copy link

Hi @forrestfwilliams thanks for following up - this feature request is still in process but moving forward. It is part of a broader effort to improve S3 transfers across SDKs and a thorough review process is required before the logic would be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants