Support for Google Cloud Storage #137

jpnauta · 2019-07-07T05:56:23Z

Currently this project does not work with Google Cloud Storage. This is because GoogleCloudStorage:

has a different method of getting a remote file's md5 hash, and it
does not have the location property

These changes resolve this two issues.

antonagestam · 2019-07-10T08:25:14Z

collectfast/management/commands/collectstatic.py

@@ -119,7 +119,8 @@ def delete_file(self, path, prefixed_path, source_storage):
                path, prefixed_path, source_storage)
        if not self.dry_run:
            self.log("Deleting '%s'" % path)
-            self.storage.delete(prefixed_path)
+            if self.storage.exists(prefixed_path):


I believe this introduces an extra network call and should not be introduced.

Without this, if the file has already been deleted on GCS, an exception will be thrown. It's not a requirement though. Perhaps I can catch the exception instead, otherwise I'll remove this code entirely

Definitely prefer catching the exception :)

I removed this logic for now; I'll do a separate P/R if necessary later

antonagestam · 2019-07-10T08:28:50Z

Thanks for your contribution. Happy to see someone working on Google Cloud integration.

I think it might be beneficial to introduce a concept of a HashStrategy (suggestions for better names appreciated) interface. Then each storage backend can map to a strategy. So there would probably need to exist BotoHashStrategy, Boto3HashStrategy and GoogleCloudHashStrategy, each inheriting from an abstract base class. I don't like the idea of letting get_remote_etag grow indefinitely.

jpnauta · 2019-07-13T05:42:11Z

Agreed! I can implement that sometime this week.

antonagestam · 2019-07-13T11:23:52Z

@jpnauta Great! Make sure to think about the case in #139

jpnauta · 2019-07-14T20:54:35Z

I did some changes, let me know what you think. I'll look into fixing the CI failures.

jpnauta · 2019-07-14T20:58:49Z

collectfast/boto.py

-        isinstance(storage, S3Boto3Storage))
-
-
-def reset_connection(storage):


Since this logic is only specific to Boto3, I moved it to S3Boto3StorageExtensions so this file could be removed

jpnauta · 2019-07-14T21:25:12Z

collectfast/management/commands/collectstatic.py

@@ -32,14 +32,7 @@ def __init__(self, *args, **kwargs):
        self.tasks = []
        self.etags = {}
        self.collectfast_enabled = settings.enabled
-


This code will trigger the storage class' __init__() function, which in turn triggers check_preload_metadata(), which contains this same code without the is_boto(...) statement

…various small improvements

jpnauta · 2019-07-15T00:47:05Z

collectfast/etag.py

    """
    Create md5 hash from file contents.
    """
-    contents = storage.open(path).read()


This code has been moved to FileSystemStorageExtensions

jpnauta · 2019-07-15T00:48:59Z

collectfast/storage_extensions/__init__.py

+
+STORAGE_EXTENSIONS_MAP = {
+    # Maps the relevant `Storage` class to it corresponding `StorageExtensions` class
+    'django.core.files.storage.FileSystemStorage': 'collectfast.storage_extensions.file_system.FileSystemStorageExtensions',


To implement #139, you would simply need to override this map value and the corresponding class with the desired get_etag() functionality

jpnauta · 2019-07-15T00:51:45Z

Alright, sorry for the many notifications, I feel like I'm happy with this P/R 😅 @antonagestam can you look at the Travis CI failure when you get the changce? I think it's a config issue.

antonagestam · 2019-09-28T12:26:25Z

I'm sorry I haven't found the time to review and work you further here.

I had a few more things that I felt needed to be in place to find the right abstraction and so I started working on an alternative implementation, see #149. Sorry for not working with you all the way here, but I'm having a hard time putting aside the time for this project.

Anyhow, my implementation does not yet have a Google Cloud strategy, if you're still interested it would be great if you'd be willing to take that from your PR and make it work on top of the Strategy abstraction, once I finish the work on updating the test suite and docs and merge with master.

antonagestam · 2019-09-28T15:42:03Z

collectfast/storage_extensions/gcloud.py

+    def get_etag(self, path):
+        normalized_path = path.replace('\\', '/')
+        try:
+            md5_base64 = self.storage.bucket.get_blob(normalized_path)._properties['md5Hash']


Did you verify that the get_blob() call only fetches metadata and not the actual blob here?

antonagestam · 2019-09-28T15:43:59Z

collectfast/storage_extensions/gcloud.py

+        normalized_path = path.replace('\\', '/')
+        try:
+            md5_base64 = self.storage.bucket.get_blob(normalized_path)._properties['md5Hash']
+            return '"' + binascii.hexlify(base64.urlsafe_b64decode(md5_base64)).decode("utf-8") + '"'


Is there a specific reason you are using binascii.hexlify() and base64.urlsafe_b64decode()as opposed to just a call to e.g. base64.b64decode(md5_base64).decode("utf-8")? I think it would be worth it to explain this line with a comment :)

antonagestam · 2019-10-03T19:07:06Z

Update: I added you as I co-author here, please let me know if that's alright with you :)

jpnauta · 2019-11-01T01:50:27Z

Sorry @antonagestam I didn't get a notification for this! Everything looks great here, thanks for making this happen 😄

antonagestam · 2019-11-01T08:35:31Z

@jpnauta No worries! This wouldn't have happened without your help. The Travis jobs are now setup with a Google Cloud Storage account and tests an actual integration :)

Thanks again for your contributions! 🍰

Jeremy Nauta and others added 6 commits July 6, 2019 22:04

Get etag method for gcloud

5111231

Use bucket_name to identify storage for gcloud storage

2ce8775

Decode md5 etag from GCS' response

7555c87

Update etag.py

55d6e23

Ignore if location field does not exist

10f39f3

Don't attempt to delete file if it does not exist

f2f0318

jpnauta mentioned this pull request Jul 7, 2019

Need feasibility with google cloud platform #122

Closed

antonagestam reviewed Jul 10, 2019

View reviewed changes

antonagestam mentioned this pull request Jul 10, 2019

Improve FileSystemStorage handling #139

Closed

Cleaner way to do storage-specific logic

28cf448

jpnauta commented Jul 14, 2019

View reviewed changes

jpnauta added 3 commits July 14, 2019 15:06

Add installation for google-cloud-storages on Appveyor

0b7d95e

Add installation for google-cloud-storage on Appveyor

1029b3e

Don't use lazy loading to get storage extensions

863e62a

jpnauta commented Jul 14, 2019

View reviewed changes

Create storage extension for local file systems using existing code; …

02ed5de

…various small improvements

jpnauta commented Jul 15, 2019

View reviewed changes

jpnauta added 3 commits July 23, 2019 17:33

Lazy load storage_extensions object

842974b

Catch any error while importing storage class or storage extension class

eebced5

Catch exception if file to delete is no longer on google cloud storage

8877914

antonagestam reviewed Sep 28, 2019

View reviewed changes

antonagestam closed this Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Google Cloud Storage #137

Support for Google Cloud Storage #137

jpnauta commented Jul 7, 2019

antonagestam Jul 10, 2019

jpnauta Jul 13, 2019

antonagestam Jul 13, 2019

jpnauta Jul 14, 2019

antonagestam commented Jul 10, 2019

jpnauta commented Jul 13, 2019

antonagestam commented Jul 13, 2019

jpnauta commented Jul 14, 2019

jpnauta Jul 14, 2019

jpnauta Jul 14, 2019

jpnauta Jul 15, 2019

jpnauta Jul 15, 2019

jpnauta commented Jul 15, 2019

antonagestam commented Sep 28, 2019

antonagestam Sep 28, 2019

antonagestam Sep 28, 2019

antonagestam commented Oct 3, 2019

jpnauta commented Nov 1, 2019

antonagestam commented Nov 1, 2019

		isinstance(storage, S3Boto3Storage))


		def reset_connection(storage):

Support for Google Cloud Storage #137

Support for Google Cloud Storage #137

Conversation

jpnauta commented Jul 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonagestam commented Jul 10, 2019

jpnauta commented Jul 13, 2019

antonagestam commented Jul 13, 2019

jpnauta commented Jul 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpnauta commented Jul 15, 2019

antonagestam commented Sep 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonagestam commented Oct 3, 2019

jpnauta commented Nov 1, 2019

antonagestam commented Nov 1, 2019