Multipart upload for large data. #2734

khustup2 · 2024-01-05T14:45:46Z

🚀 🚀 Pull Request

Use upload_fileobj to upload large data into s3.

Impact

Bug fix (non-breaking change which fixes expected existing functionality)
Enhancement/New feature (adds functionality without impacting existing logic)
Breaking change (fix or feature that would cause existing functionality to change)

Description

Things to be aware of

Things to worry about

Additional Context

codecov · 2024-01-05T15:31:09Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (133e0a4) 83.34% compared to head (350e671) 83.92%.

Files	Patch %	Lines
deeplake/core/storage/s3.py	66.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2734      +/-   ##
==========================================
+ Coverage   83.34%   83.92%   +0.58%     
==========================================
  Files         233      233              
  Lines       26447    26452       +5     
==========================================
+ Hits        22041    22200     +159     
+ Misses       4406     4252     -154

Flag	Coverage Δ
unittests	`83.92% <66.66%> (+0.58%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sonarcloud · 2024-01-05T15:31:44Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
60.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

nvoxland-al · 2024-01-05T17:47:27Z

deeplake/core/storage/s3.py

+            stream = BytesIO(content)
+            self.client.upload_fileobj(stream, self.bucket, path)
+        else:
+            self.client.put_object(


Is put_object enough faster than upload_fileobj that it's worth having the else clause still? Or can we always use upload_fileobj ?

put_object is direct mapping to the PutObject api call and considered as a default way to upload object. I'm not sure about the performance differences, but I see upload_fileobj as an edge case scenario and not as a default upload method.

Multipart upload for large data.

350e671

khustup2 requested review from fayaz-al and activesoull January 5, 2024 16:35

nvoxland-al reviewed Jan 5, 2024

View reviewed changes

fayaz-al approved these changes Jan 6, 2024

View reviewed changes

khustup2 requested a review from nvoxland-al January 8, 2024 11:15

nvoxland-al approved these changes Jan 8, 2024

View reviewed changes

activesoull approved these changes Jan 8, 2024

View reviewed changes

khustup2 merged commit d952ab6 into main Jan 8, 2024
11 of 12 checks passed

khustup2 deleted the s3-upload-5gb branch January 8, 2024 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multipart upload for large data. #2734

Multipart upload for large data. #2734

khustup2 commented Jan 5, 2024

codecov bot commented Jan 5, 2024

sonarcloud bot commented Jan 5, 2024

nvoxland-al Jan 5, 2024

khustup2 Jan 5, 2024

Multipart upload for large data. #2734

Multipart upload for large data. #2734

Conversation

khustup2 commented Jan 5, 2024

🚀 🚀 Pull Request

Impact

Description

Things to be aware of

Things to worry about

Additional Context

codecov bot commented Jan 5, 2024

Codecov Report

sonarcloud bot commented Jan 5, 2024

Quality Gate passed

nvoxland-al Jan 5, 2024

Choose a reason for hiding this comment

khustup2 Jan 5, 2024

Choose a reason for hiding this comment