Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variants are not re-generated on overwrite image #206

Closed
berthj193 opened this issue Apr 22, 2019 · 16 comments
Closed

Variants are not re-generated on overwrite image #206

berthj193 opened this issue Apr 22, 2019 · 16 comments
Assignees
Labels

Comments

@berthj193
Copy link

Thanks for putting efforts into this great package! I found this one most relevant to my requirement, among several django imaging libraries.

The only issue I'm having is variants don't get regenerated when you replace an existing image.

It works fine locally (django.core.files.storage.FileSystemStorage), but not on S3 (storages.backends.s3boto3.S3Boto3Storage). Only main image is replaced, and thumbnail variants remain untouched.

  • main_image.png (replaced with new one)
  • main_image.md.png (untouched, old)
  • main_image.sm.png(untouched, old)

My workaround was to add random hash to file name so that set of 3 new images are created on S3 bucket, each time you replace. It's not good enough since old images hold the space.

It would be great if someone can take a quick double check, and fix if easy.

@codingjoe
Copy link
Owner

Hi @MingTheGit thanks for your kind words and thanks for reaching out. I let me try to rephrase your situation, to make sure I am getting it right.

You are uploading a new file, with the same name as a previously uploaded file, right? You are not using the rendervariations --replace management command, are you?
Does that mean your AWS_S3_FILE_OVERWRITE setting is set to True? Otherwise the storage backend should reject the file name and create a new one with a random bit at the end. I would recommend not to allow overriding file names (change the setting to False) to avoid collisions.

That being said, if the setting is set to False this will break the package. You see, if you call storage.save(name, content), it will actually modify the file name if a file with the same name already exists. That behavior is not supported by this package, since we don't persist the variation names.
Therefore it is important that the original image names always remain unique or the old images (including variations) are deleted in advance using obj.image_field.delete().

I hope that helps you a bit, let me know if you have any further questions.

Best
-joe

@codingjoe codingjoe self-assigned this Apr 23, 2019
@berthj193
Copy link
Author

@codingjoe Thanks for your quick answer!

That being said, if the setting is set to False this will break the package.

Did you mean "That being said, if the setting is set to True this will break the package."?

I'm not using management command, posting snippet of my code.

profile_image = StdImageField(
    blank=True,
    default="",
    upload_to=profile_image_media_path,
    variations={"regular": (180, 180, True), "thumbnail": (76, 76, True)},
)

def profile_image_media_path(project, filename):
    ...
    _, extension = os.path.splitext(filename)
    return f"models/{model_id}/profile_image{extension}"

@codingjoe
Copy link
Owner

@MingTheGit check these lines here:

if storage.exists(variation_name):
if replace:
storage.delete(variation_name)
logger.info('File "%s" already exists and has been replaced.',
variation_name)
else:
logger.info('File "%s" already exists.', variation_name)
return variation_name

You see, we check if the file already exists. The default behavior is not to override existing files. In any case, your filename pattern would require you to actually delete the picture before overriding it. That being said, i don't know if this is a good filename pattern. It depends on your bucket policy is set to private, otherwise guessing filenames is pretty easy. Check out another creation of mine, to create more advanced filename patterns: https://github.com/codingjoe/django-dynamic-filenames

@berthj193
Copy link
Author

berthj193 commented Apr 30, 2019

@codingjoe

Thanks for your checking, I appreciate it. I have tried some debugging and it seems like this is going to be a new feature request.

So, replace flag is always False in my case, because I'm not using management command.
L58 accepts replace from either L55 or management command. As far as I know replace in L52 is always False.

def render_variations(self, replace=False):
"""Render all image variations and saves them to the storage."""
for _, variation in self.field.variations.items():
self.render_variation(self.name, variation, replace, self.storage)
@classmethod
def render_variation(cls, file_name, variation, replace=False,

Can you please consider adding replace | overwrite flag to StdImageField in the next roadmap?
Thank you in advance.

@berthj193
Copy link
Author

How it works in the following scenarios

  • django.core.files.storage.FileSystemStorage
  • storages.backends.s3boto3.S3Boto3Storage with AWS_S3_FILE_OVERWRITE = False

Django ImageField generates hashed file name in these cases.
(self.name in L55, file_name in L58)

replace flag is not checked at all, because storage.exists(variation_name) is always False in L62.

@codingjoe
Copy link
Owner

@MingTheGit I didn't plan on implementing that feature, but feel free to make a suggestion.
Best Joe

@caspervk
Copy link

Why is this issue closed? It can never be correct behavior to replace the original while preserving the old variations, resulting in completely different images depending on the variation. Either all variations (including the original) should be replaced or none of them should.
In this regard I agree with @MingTheGit that it should be possible to replace images. In fact, I'd argue that this should be the default for variations, like it currently is for the original.

@codingjoe codingjoe added bug and removed question labels Jul 16, 2019
@codingjoe codingjoe assigned caspervk and unassigned codingjoe Jul 16, 2019
@codingjoe
Copy link
Owner

@caspervk: If you believe this to be a bug, I can't wait for your PR to resolve it :)

@codingjoe codingjoe reopened this Jul 16, 2019
@caspervk
Copy link

caspervk commented Jul 16, 2019

Ideally, I would remove the following check from the render_variation() class method:

Index: stdimage/models.py
===================================================================
--- stdimage/models.py	(revision 29f5e6a00cbb5353917f2adbd8f93bcb6e9461d9)
+++ stdimage/models.py	(date 1563267502420)
@@ -58,15 +58,6 @@
                          storage=default_storage):
         """Render an image variation and saves it to the storage."""
         variation_name = cls.get_variation_name(file_name, variation['name'])
-        if storage.exists(variation_name):
-            if replace:
-                storage.delete(variation_name)
-                logger.info('File "%s" already exists and has been replaced.',
-                            variation_name)
-            else:
-                logger.info('File "%s" already exists.', variation_name)
-                return variation_name
-
         ImageFile.LOAD_TRUNCATED_IMAGES = True
         with storage.open(file_name) as f:
             with Image.open(f) as img:

As @MingTheGit said, it the job of S3Boto3Storage to replace the file if it is already present. This library should not concern itself with how the storage system (over)writes files, but simply assume that it will be done correctly under the chosen filename. Keep in mind that appending a hash to the filename instead of overwriting is not normal behavior for any storage system -- it is very much expected that files will be overwritten by default. Optionally -- since S3 Storage is so popular -- a warning could be emitted if AWS_S3_FILE_OVERWRITE = False since this setting, as you said, will break the library. Luckily, however, AWS_S3_FILE_OVERWRITE is True by default in django-storages, so it should have minimal impact on existing code.

caspervk pushed a commit to caspervk/django-stdimage that referenced this issue Jul 16, 2019
@codingjoe
Copy link
Owner

@MingTheGit as much as I agree with the problem, I would caution you a bit regarding the solution. Please keep in mind, that changing the behavior should not be done quietly. Currently, existing variations are not overridden. So if we are to change the behavior, we need to do it in a way, that informs users should the update.

@caspervk
Copy link

@codingjoe I absolutely agree that care must be taken not to introduce changes that cause data-loss for existing users. How about a new major release, as has been done in the past?

@codingjoe
Copy link
Owner

@caspervk a major release is not a problem, but preferable even then, the API changes if the behavior changes, so that users need to adapt their implementation. The problem is, that not everyone reads release notes :/
I see two possibilities. Either we add a setting/argument, that allows you to achieve the behavior you want. Or you we force users to make a deliberate choice on how it should behave.

@caspervk
Copy link

caspervk commented Aug 1, 2019

@codingjoe Given the fact that the original is overwritten while the variations are not, I just cannot see anyone using this bug as feature, as it would leave their system in an inconsistent state. In my opinion, an option is not the way to go, as it really should be the default to overwrite, like I've tried to argue for in this thread previously.

The point about API and behavioral changes is a good one, but I'm not sure how this bug can be fixed then. One solution could be to release a new minor version with a check that emits a loud warning if an image would be overwritten using the new logic, and then release a new major version that actually overwrites some time later.

@codingjoe
Copy link
Owner

You make a good point @caspervk. To be fair though, I am not aware of any storage backend – except S3 – that overrides files by default. This is strange behavior and I presume it only exists, because the none-overwrite implementation does need to make an extra call and increases IO.

I will propose a patch. Let's see if this would solve your problem.

codingjoe added a commit that referenced this issue Aug 6, 2019
We now overwrite files by default when a new files gets uploaded.
This fixes an issue, where someone might upload a file with the
same name twice the first initial variations get not replaced
with a version of the newer file.
It happens in the least IO consuming way. The rendervariations
management command still behaves as before where variations are
not replaced by default.
codingjoe added a commit that referenced this issue Aug 6, 2019
We now overwrite files by default when a new files gets uploaded.
This fixes an issue, where someone might upload a file with the
same name twice the first initial variations get not replaced
with a version of the newer file.
It happens in the least IO consuming way. The rendervariations
management command still behaves as before where variations are
not replaced by default.
@caspervk
Copy link

To be fair though, I am not aware of any storage backend – except S3 – that overrides files by default.

Appending a hash to the filename instead of overwriting is not normal behavior for any filesystem -- it is very much expected that files will be overwritten by default. At least in regular Python, doing with open("foo", "w") as f; f.write(..) will save/overwrite to foo regardless of whether the file already exists or not.

I am, however, not sure how storage backends work in Django - maybe it is normal behavior not to overwrite? Regardless, I think you should fix this issue in the best way you see fit - you are the expert on the inner workings of this library after all 😄

@codingjoe
Copy link
Owner

@caspervk I know it's default behavior in Python, but not in Django storage backends. I don't know why they chose this path. But I guess it's to prevent users from errors.
I merged it, it's released in 5.0.1

Thank you so much for your contribution. It is very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants