New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented chunked upload for Azure Blobs storage driver #1400
Conversation
Codecov Report
@@ Coverage Diff @@
## trunk #1400 +/- ##
==========================================
+ Coverage 86.58% 86.66% +0.08%
==========================================
Files 364 364
Lines 76193 76082 -111
Branches 7439 7422 -17
==========================================
- Hits 65968 65933 -35
+ Misses 7400 7325 -75
+ Partials 2825 2824 -1
Continue to review full report at Codecov.
|
@@ -584,7 +584,6 @@ def _save_object(self, response, obj, destination_path, | |||
def _upload_object(self, object_name, content_type, request_path, | |||
request_method='PUT', | |||
headers=None, file_path=None, stream=None, | |||
upload_func=None, upload_func_kwargs=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I assume those arguments became unused when libcloud switched to the requests library (that change also seemed to have introduced a lot of unintentional regressions related to streaming uploads and downloading).
Sadly it's not trivial to write good unit / integration tests for that. In the past we mostly had unit tests which mocked some of that functionality for tests so real life issues and regressions introduced as part of the requests migration were not caught.
libcloud/storage/base.py
Outdated
return {'response': response, | ||
'bytes_transferred': stream_length, | ||
'data_hash': stream_hash} | ||
|
||
def _determine_content_type(self, content_type, object_name, | ||
file_path=None): | ||
if not content_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably inverse that logic and do if content type and return early to avoid one level of nesting, but that's just a personal style preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up the function in 498ac88.
Implemented chunked upload for Azure Blobs storage driver
Description
This pull request fixes #1399 by implementing chunked upload in the Azure Blobs storage driver via the Put Block and Put Block List APIs.
The implementation mostly leverages the
AzureBlobsStorageDriver._upload_in_chunks
function that was first introduced in 24f34c9 and was stopped being used in 6e0040d with the the following main updates:Enable setting object metadata, content-type and content-md5 for chunked uploads.
Drop support for PageBlob. This is a breaking API change. However, given the non-trivial differences between the PageBlob and BlockBlob APIs, I'd hypothesize that the improved simplicity of the code will aid in the long run with bugfixes and maintenance. If keeping support for PageBlob is a hard requirement, I suspect a non-trivial amount of refactoring would have to be undertaken to cleanly support chunked upload for both BlockBlob and PageBlob object types. I'm open to do the work as part of this pull request, but I'd love to first discuss the pros/cons of both approaches.
Additionally, the following companion changes were made:
upload_func
andupload_func_kwargs
arguments inStorageDriver._upload_object
(7e55057)StorageDriver._determine_content_type
(0dc5cbb)libcloud.utils.files.read_in_chunks
from type checking to duck-typing to ensure that more iterators (e.g. opened file objects) use the fastread(int)
code-path (0fded11)Status
Checklist