Conversation
@@ -119,7 +119,8 @@ def delete_file(self, path, prefixed_path, source_storage): | |||
path, prefixed_path, source_storage) | |||
if not self.dry_run: | |||
self.log("Deleting '%s'" % path) | |||
self.storage.delete(prefixed_path) | |||
if self.storage.exists(prefixed_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this introduces an extra network call and should not be introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, if the file has already been deleted on GCS, an exception will be thrown. It's not a requirement though. Perhaps I can catch the exception instead, otherwise I'll remove this code entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely prefer catching the exception :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this logic for now; I'll do a separate P/R if necessary later
Thanks for your contribution. Happy to see someone working on Google Cloud integration. I think it might be beneficial to introduce a concept of a |
Agreed! I can implement that sometime this week. |
I did some changes, let me know what you think. I'll look into fixing the CI failures. |
isinstance(storage, S3Boto3Storage)) | ||
|
||
|
||
def reset_connection(storage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this logic is only specific to Boto3, I moved it to S3Boto3StorageExtensions
so this file could be removed
@@ -32,14 +32,7 @@ def __init__(self, *args, **kwargs): | |||
self.tasks = [] | |||
self.etags = {} | |||
self.collectfast_enabled = settings.enabled | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will trigger the storage class' __init__()
function, which in turn triggers check_preload_metadata()
, which contains this same code without the is_boto(...)
statement
…various small improvements
""" | ||
Create md5 hash from file contents. | ||
""" | ||
contents = storage.open(path).read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been moved to FileSystemStorageExtensions
|
||
STORAGE_EXTENSIONS_MAP = { | ||
# Maps the relevant `Storage` class to it corresponding `StorageExtensions` class | ||
'django.core.files.storage.FileSystemStorage': 'collectfast.storage_extensions.file_system.FileSystemStorageExtensions', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To implement #139, you would simply need to override this map value and the corresponding class with the desired get_etag()
functionality
Alright, sorry for the many notifications, I feel like I'm happy with this P/R 😅 @antonagestam can you look at the Travis CI failure when you get the changce? I think it's a config issue. |
I'm sorry I haven't found the time to review and work you further here. I had a few more things that I felt needed to be in place to find the right abstraction and so I started working on an alternative implementation, see #149. Sorry for not working with you all the way here, but I'm having a hard time putting aside the time for this project. Anyhow, my implementation does not yet have a Google Cloud strategy, if you're still interested it would be great if you'd be willing to take that from your PR and make it work on top of the |
def get_etag(self, path): | ||
normalized_path = path.replace('\\', '/') | ||
try: | ||
md5_base64 = self.storage.bucket.get_blob(normalized_path)._properties['md5Hash'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verify that the get_blob()
call only fetches metadata and not the actual blob here?
normalized_path = path.replace('\\', '/') | ||
try: | ||
md5_base64 = self.storage.bucket.get_blob(normalized_path)._properties['md5Hash'] | ||
return '"' + binascii.hexlify(base64.urlsafe_b64decode(md5_base64)).decode("utf-8") + '"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you are using binascii.hexlify()
and base64.urlsafe_b64decode()
as opposed to just a call to e.g. base64.b64decode(md5_base64).decode("utf-8")
? I think it would be worth it to explain this line with a comment :)
Update: I added you as I co-author here, please let me know if that's alright with you :) |
Sorry @antonagestam I didn't get a notification for this! Everything looks great here, thanks for making this happen 😄 |
@jpnauta No worries! This wouldn't have happened without your help. The Travis jobs are now setup with a Google Cloud Storage account and tests an actual integration :) Thanks again for your contributions! 🍰 |
Currently this project does not work with Google Cloud Storage. This is because
GoogleCloudStorage
:location
propertyThese changes resolve this two issues.