Skip to content
This repository has been archived by the owner on Jan 16, 2023. It is now read-only.

Support for Google Cloud Storage #137

Closed
wants to merge 14 commits into from
Closed

Support for Google Cloud Storage #137

wants to merge 14 commits into from

Conversation

jpnauta
Copy link
Contributor

@jpnauta jpnauta commented Jul 7, 2019

Currently this project does not work with Google Cloud Storage. This is because GoogleCloudStorage:

  1. has a different method of getting a remote file's md5 hash, and it
  2. does not have the location property

These changes resolve this two issues.

@@ -119,7 +119,8 @@ def delete_file(self, path, prefixed_path, source_storage):
path, prefixed_path, source_storage)
if not self.dry_run:
self.log("Deleting '%s'" % path)
self.storage.delete(prefixed_path)
if self.storage.exists(prefixed_path):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this introduces an extra network call and should not be introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, if the file has already been deleted on GCS, an exception will be thrown. It's not a requirement though. Perhaps I can catch the exception instead, otherwise I'll remove this code entirely

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely prefer catching the exception :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this logic for now; I'll do a separate P/R if necessary later

@antonagestam
Copy link
Owner

Thanks for your contribution. Happy to see someone working on Google Cloud integration.

I think it might be beneficial to introduce a concept of a HashStrategy (suggestions for better names appreciated) interface. Then each storage backend can map to a strategy. So there would probably need to exist BotoHashStrategy, Boto3HashStrategy and GoogleCloudHashStrategy, each inheriting from an abstract base class. I don't like the idea of letting get_remote_etag grow indefinitely.

@jpnauta
Copy link
Contributor Author

jpnauta commented Jul 13, 2019

Agreed! I can implement that sometime this week.

@antonagestam
Copy link
Owner

@jpnauta Great! Make sure to think about the case in #139

@jpnauta
Copy link
Contributor Author

jpnauta commented Jul 14, 2019

I did some changes, let me know what you think. I'll look into fixing the CI failures.

isinstance(storage, S3Boto3Storage))


def reset_connection(storage):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this logic is only specific to Boto3, I moved it to S3Boto3StorageExtensions so this file could be removed

@@ -32,14 +32,7 @@ def __init__(self, *args, **kwargs):
self.tasks = []
self.etags = {}
self.collectfast_enabled = settings.enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will trigger the storage class' __init__() function, which in turn triggers check_preload_metadata(), which contains this same code without the is_boto(...) statement

"""
Create md5 hash from file contents.
"""
contents = storage.open(path).read()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been moved to FileSystemStorageExtensions


STORAGE_EXTENSIONS_MAP = {
# Maps the relevant `Storage` class to it corresponding `StorageExtensions` class
'django.core.files.storage.FileSystemStorage': 'collectfast.storage_extensions.file_system.FileSystemStorageExtensions',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To implement #139, you would simply need to override this map value and the corresponding class with the desired get_etag() functionality

@jpnauta
Copy link
Contributor Author

jpnauta commented Jul 15, 2019

Alright, sorry for the many notifications, I feel like I'm happy with this P/R 😅 @antonagestam can you look at the Travis CI failure when you get the changce? I think it's a config issue.

@antonagestam
Copy link
Owner

I'm sorry I haven't found the time to review and work you further here.

I had a few more things that I felt needed to be in place to find the right abstraction and so I started working on an alternative implementation, see #149. Sorry for not working with you all the way here, but I'm having a hard time putting aside the time for this project.

Anyhow, my implementation does not yet have a Google Cloud strategy, if you're still interested it would be great if you'd be willing to take that from your PR and make it work on top of the Strategy abstraction, once I finish the work on updating the test suite and docs and merge with master.

def get_etag(self, path):
normalized_path = path.replace('\\', '/')
try:
md5_base64 = self.storage.bucket.get_blob(normalized_path)._properties['md5Hash']
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you verify that the get_blob() call only fetches metadata and not the actual blob here?

normalized_path = path.replace('\\', '/')
try:
md5_base64 = self.storage.bucket.get_blob(normalized_path)._properties['md5Hash']
return '"' + binascii.hexlify(base64.urlsafe_b64decode(md5_base64)).decode("utf-8") + '"'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason you are using binascii.hexlify() and base64.urlsafe_b64decode()as opposed to just a call to e.g. base64.b64decode(md5_base64).decode("utf-8")? I think it would be worth it to explain this line with a comment :)

@antonagestam
Copy link
Owner

Update: I added you as I co-author here, please let me know if that's alright with you :)

@jpnauta
Copy link
Contributor Author

jpnauta commented Nov 1, 2019

Sorry @antonagestam I didn't get a notification for this! Everything looks great here, thanks for making this happen 😄

@antonagestam
Copy link
Owner

@jpnauta No worries! This wouldn't have happened without your help. The Travis jobs are now setup with a Google Cloud Storage account and tests an actual integration :)

Thanks again for your contributions! 🍰

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants