Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented chunked upload for Azure Blobs storage driver #1400

Merged
merged 11 commits into from Jan 9, 2020

Conversation

@c-w
Copy link
Member

c-w commented Jan 4, 2020

Implemented chunked upload for Azure Blobs storage driver

Description

This pull request fixes #1399 by implementing chunked upload in the Azure Blobs storage driver via the Put Block and Put Block List APIs.

The implementation mostly leverages the AzureBlobsStorageDriver._upload_in_chunks function that was first introduced in 24f34c9 and was stopped being used in 6e0040d with the the following main updates:

  • Enable setting object metadata, content-type and content-md5 for chunked uploads.

  • Drop support for PageBlob. This is a breaking API change. However, given the non-trivial differences between the PageBlob and BlockBlob APIs, I'd hypothesize that the improved simplicity of the code will aid in the long run with bugfixes and maintenance. If keeping support for PageBlob is a hard requirement, I suspect a non-trivial amount of refactoring would have to be undertaken to cleanly support chunked upload for both BlockBlob and PageBlob object types. I'm open to do the work as part of this pull request, but I'd love to first discuss the pros/cons of both approaches.

Additionally, the following companion changes were made:

  • Remove unused upload_func and upload_func_kwargs arguments in StorageDriver._upload_object (7e55057)
  • Rationalize determination of content type to a new helper function StorageDriver._determine_content_type (0dc5cbb)
  • Switch libcloud.utils.files.read_in_chunks from type checking to duck-typing to ensure that more iterators (e.g. opened file objects) use the fast read(int) code-path (0fded11)
  • Make Azure Storage constants configurable via environment variables (638a189)

Status

  • done, ready for review

Checklist

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jan 4, 2020

Codecov Report

Merging #1400 into trunk will increase coverage by 0.08%.
The diff coverage is 93.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##            trunk    #1400      +/-   ##
==========================================
+ Coverage   86.58%   86.66%   +0.08%     
==========================================
  Files         364      364              
  Lines       76193    76082     -111     
  Branches     7439     7422      -17     
==========================================
- Hits        65968    65933      -35     
+ Misses       7400     7325      -75     
+ Partials     2825     2824       -1
Impacted Files Coverage Δ
libcloud/test/storage/test_base.py 92.92% <ø> (-0.41%) ⬇️
libcloud/storage/drivers/s3.py 89.3% <100%> (+0.12%) ⬆️
libcloud/test/storage/test_azure_blobs.py 94.27% <100%> (+2.27%) ⬆️
libcloud/test/storage/test_atmos.py 95.11% <100%> (ø) ⬆️
libcloud/storage/drivers/atmos.py 78.03% <100%> (+0.13%) ⬆️
libcloud/utils/files.py 100% <100%> (ø) ⬆️
libcloud/storage/base.py 83.01% <100%> (ø) ⬆️
libcloud/storage/drivers/azure_blobs.py 85.13% <88.46%> (+15.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6dca82e...9174502. Read the comment docs.

@c-w c-w removed the status: needs tests label Jan 6, 2020
@c-w c-w requested review from Kami and tonybaloney Jan 6, 2020
@@ -584,7 +584,6 @@ def _save_object(self, response, obj, destination_path,
def _upload_object(self, object_name, content_type, request_path,
request_method='PUT',
headers=None, file_path=None, stream=None,
upload_func=None, upload_func_kwargs=None,

This comment has been minimized.

Copy link
@Kami

Kami Jan 8, 2020

Member

Yep, I assume those arguments became unused when libcloud switched to the requests library (that change also seemed to have introduced a lot of unintentional regressions related to streaming uploads and downloading).

Sadly it's not trivial to write good unit / integration tests for that. In the past we mostly had unit tests which mocked some of that functionality for tests so real life issues and regressions introduced as part of the requests migration were not caught.

return {'response': response,
'bytes_transferred': stream_length,
'data_hash': stream_hash}

def _determine_content_type(self, content_type, object_name,
file_path=None):
if not content_type:

This comment has been minimized.

Copy link
@Kami

Kami Jan 8, 2020

Member

I would probably inverse that logic and do if content type and return early to avoid one level of nesting, but that's just a personal style preference.

This comment has been minimized.

Copy link
@c-w

c-w Jan 9, 2020

Author Member

Cleaned up the function in 498ac88.

@Kami
Kami approved these changes Jan 8, 2020
Copy link
Member

Kami left a comment

I didn't test the changes, but LGTM 👍

c-w added 2 commits Jan 9, 2020
@c-w c-w merged commit a97ab73 into apache:trunk Jan 9, 2020
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@c-w c-w deleted the CatalystCode:azure_blobs_chunked_upload branch Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.