Implemented NoDuplicatesStorageProxy, for use in "compress" command #1105

cuu508 · 2022-03-05T19:29:12Z

This is my second attempt to fix #1099.

I added a NoDuplicatesStorageProxy, which delegates all method calls to DefaultStorage, except it filters out duplicate save() calls.

I updated the compress management command to use NoDuplicatesStorageProxy by monkey-patching compressor.storage.default_storage.

The PR includes a test case which exercises the changes.

cc: #1099, #1100

diox · 2022-04-02T11:09:44Z

Apologies for not getting back to you sooner on this, I have a shitty month and not much motivation for looking at code outside of work.

This PR works for offline compression, and we could go that route, but maybe we should try to see if we can address the broader issue first. I've written about it in I've commented about the broader issue in #1103 (comment) : the heart of the problem is that compressor really cares about files being named in a predictable way, and to achieve that, it circumvents Django's storage implementation that doesn't let you do that by deleting the file first if it exists.

There are multiple problematic real-world scenarios with this:

The one we're having in offline compression, where multiple threads are trying to write to the same file at the same time
"Online" compression where 2 compressor threads try to write to the same file at the same time
Tests running with multiple processes where 2 processes try to write to the same file at the same time
"Online" compression where compressor tries to write to an existing file - therefore temporarily deleting it - while the file is being served to the browser by a completely different process (this one is almost impossible to solve for as long as we're deleting the file)

Maybe we could expand the lock idea to compressor core (adding it to the base storage class), to circumvent some of these scenarios ? It's a little scary to add locks to online compression, but maybe that should be ok with a short timeout just in case?

cuu508 · 2022-04-02T12:42:19Z

Maybe we could expand the lock idea to compressor core (adding it to the base storage class), to circumvent some of these scenarios ?

I see django.core.files.storage uses helper functions from django.core.files.locks. It looks like these functions lock access to files at OS level, and so would probably work even with multiple processes. So that's perhaps a piece of the puzzle.

Another idea:

drop the deletion logic in CompressorFileStorage.get_available_name()
this means that get_available_name will sometimes return filenames with an underscore and a random string appended at the end to ensure uniqueness
override the save() method to add one additional last step: if the filename contains an underscore then move it into the correct location.

So, instead of deleting a file and then writing new data in the same filename, compressor would write the data in a temporary file, then move it to the correct place. Python docs say os.replace is an atomic operation, so this would avoid the brief moment of file not existing between delete and write.

diox · 2022-04-02T18:01:09Z

I like that idea of using os.replace. It seems a lot better than deleting the file if it exists! Compared to your original approaches, it might be a bit wasteful as we're writing the same file multiple times, but that seems fine. And we don't have to care about the potential issues regarding replacing across different filesystems, so it should never fail. In our situation, it carries a lot less risk than locks IMHO.

diox · 2022-04-03T12:09:51Z

#1107 addressed this differently.

Implemented NoDuplicatesStorageProxy, for use in "compress" command

15ac93f

cc: #1099, #1100

cuu508 mentioned this pull request Mar 5, 2022

Intermittent failure in offline compression #1099

Closed

diox self-requested a review March 8, 2022 09:49

cuu508 mentioned this pull request Apr 2, 2022

Remove delete() call from CompressorFileStorage.get_available_name #1107

Merged

diox removed their request for review April 3, 2022 10:57

diox closed this Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented NoDuplicatesStorageProxy, for use in "compress" command #1105

Implemented NoDuplicatesStorageProxy, for use in "compress" command #1105

cuu508 commented Mar 5, 2022

diox commented Apr 2, 2022 •

edited

cuu508 commented Apr 2, 2022

diox commented Apr 2, 2022 •

edited

diox commented Apr 3, 2022

Implemented NoDuplicatesStorageProxy, for use in "compress" command #1105

Implemented NoDuplicatesStorageProxy, for use in "compress" command #1105

Conversation

cuu508 commented Mar 5, 2022

diox commented Apr 2, 2022 • edited

cuu508 commented Apr 2, 2022

diox commented Apr 2, 2022 • edited

diox commented Apr 3, 2022

diox commented Apr 2, 2022 •

edited

diox commented Apr 2, 2022 •

edited