Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart upload for large data. #2734

Merged
merged 1 commit into from Jan 8, 2024
Merged

Multipart upload for large data. #2734

merged 1 commit into from Jan 8, 2024

Conversation

khustup2
Copy link
Contributor

@khustup2 khustup2 commented Jan 5, 2024

🚀 🚀 Pull Request

Use upload_fileobj to upload large data into s3.

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Things to be aware of

Things to worry about

Additional Context

Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (133e0a4) 83.34% compared to head (350e671) 83.92%.

Files Patch % Lines
deeplake/core/storage/s3.py 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2734      +/-   ##
==========================================
+ Coverage   83.34%   83.92%   +0.58%     
==========================================
  Files         233      233              
  Lines       26447    26452       +5     
==========================================
+ Hits        22041    22200     +159     
+ Misses       4406     4252     -154     
Flag Coverage Δ
unittests 83.92% <66.66%> (+0.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

sonarcloud bot commented Jan 5, 2024

stream = BytesIO(content)
self.client.upload_fileobj(stream, self.bucket, path)
else:
self.client.put_object(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is put_object enough faster than upload_fileobj that it's worth having the else clause still? Or can we always use upload_fileobj ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put_object is direct mapping to the PutObject api call and considered as a default way to upload object. I'm not sure about the performance differences, but I see upload_fileobj as an edge case scenario and not as a default upload method.

@khustup2 khustup2 merged commit d952ab6 into main Jan 8, 2024
11 of 12 checks passed
@khustup2 khustup2 deleted the s3-upload-5gb branch January 8, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants