-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a unique value (version identifier) when replacing existing files/images #22606
Comments
Replace File
option on files/images.Replace File
option on files/images.
Potential Solution:
This has the benefit of giving a version number when the image was replaced, and would allow the user to build their own logic on how to handle caching. With it being as simple as checking the field, if it is not null, append a query param to break previous cache issues. I am happy to take on solving this, with whatever solution the team thinks would be ideal. |
I'd like to suggest another potential solution: When replacing a file, we could update the If we take this path, we might want to use an arbitrary UUID for |
@paescuj I didn't even think about the file extension, that is a great point. I think we would also want to update the Would the primary key stay the same, just the I did see that this change did get added, #22848 which allows a custom |
Replace File
option on files/images.
@paescuj another option would be to check the data passed in from the new file, and replace the |
@paescuj I have a change within
|
Thanks for your prompt feedback! ❤️
Oh good point! Yeah, I think consequently we should also update that - as well as the
Right, I think the primary key always stays the same, as after a replacement, it's still the same file "resource/container/..." that is now just pointing to a different "asset". With those changes, do you think there's still a need for the If it turns out that such a field is necessary:
Footnotes |
Ideally I wanted to have a I do like that the primary key is the same as the asset My goal with the additional field was to try and keep it simple without making a potential breaking change on how users expect file assets to work. (only because I am not as familiar with the depth of Directus and how a change like that can affect different areas). Having just a |
Yes, this field should only change when an existing file is being replaced. You should not be able to change it via API/APP.
Your comment here about the date field, solved the issue I was having when using a timestamp field, where the INSERT/UPDATE issue no longer happens when any field is changed. So I created a new field |
I do not have a Cloudinary account ATM, but I will create one and work on seeing if I can test this change against that issue. |
I'm probably missing something, but wouldn't it make sense to simply update |
@nickrum the issue right now, is that |
@that1matt Got it, thanks! This sound like something that is usually solved by storing a checksum/hash of the file (I thought we already had something like this), which has the added benefit that the cache is only invalidated if the file is actually different. |
@nickrum I wasn't able to find anything that informed me when a file was replaced. Nothing that returned through the API. |
@that1matt Thanks again! Some final remarks:
Awesome input ❤️ I think I'd prefer this to a |
Yes, along with keeping the file
That makes sense to me as well. The goal of this change is to provide a definitive way of identifying files that have been replaced, to find a way to invalidate cache without allowing every modification to invalidate cache. (This could PR could fix areas where you are calling
I agree, having a hash/checksum would solve the problem, as long as this field is returned through the API. I am happy to work on changing the implementation. I also like this better than the |
@paescuj I believe we will still need a new field to be returned through the API when replacing a file, as there is currently not a way to determine/notify when a file has been replaced through the API? Just to make sure I am understanding what to implement, we would have a new |
Correct, while already creating a |
I am having some trouble getting a hash value, hopefully I am just missing something that someone can point out. Here is the snippet of code
While this does give me a hash, I am now getting an error.
Any insights would be much appreciated. It seems calling the stream before line 123 will throw the error, calling it after will not. |
I'm out one day for a wedding and we've changed the plot! 😄 Good shoutout to use a checksum instead @nickrum. Feels more appropriate than a version number indeed. const string = await readableStreamToString(stream);
const checksum = generateChecksum(string); @that1matt This feels dangerous to me. That readable stream could be gigabytes worth of file, which is then all read into a single string. This is an easy way to crash the server with an out of memory issue. I believe you can pipe a readable stream to the hasher, so hopefully we can change that to something like (pseudo code): import { pipeline } from 'node:stream/promises';
import { createHash } from 'node:crypto';
const hash = await pipeline(stream, createHash('md5')); |
@rijkvanzanten thanks for the feedback. I appreciate all the insights into Directus that I miss.
I haven't played with streams much, so hopefully this is on the right track. |
Just a consideration here. Since we do support uploading multi-gig files (in chunks) in the next version, doing a hash does seems like a performance pitfall, if the file has to be downloaded from the provider just to do local hashing in the Directus instance. |
@hanneskuettner Provided the chunked upload still goes through the Directus instance, couldn't we just calculate the hash on the fly and update it as new chunks become available? |
Problem is that we cannot continue hashing once the upload has paused, as we cannot access and store the hash state... See e.g. nodejs/node#25411 (comment) |
While we can't manually safe and restore the hasher instance, couldn't we just keep it around in memory? I haven't checked it, but I would assume that the hasher instance isn't much bigger than the resulting hash. |
Not possible, since the upload can happen across multiple horizontally scaled instances and does not have to be pinned to one instance only |
It's never easy when multiple horizontally scaled instances come into play... |
Thanks @that1matt for your active participation in this matter! We've now decided within the team to continue with the date approach. Hashing currently raises too many questions, especially when it comes to the new "resumable uploads" feature. The reason for the separate explorational pull request from my side (#22900), was that if we would add hashing, this most likely has to happen directly during the upload (as opposed to loading the file again from storage after upload). Nevertheless, I'd like to keep your pull request (#22751) open for a bit, as it contains other useful work, that we can look at at a later point. We'll open an additional pull request shortly, which will exclusively cover the the date field change. |
The hashing approach felt like the theoretical "correct" way to do it, but while testing it out in those PRs it does raise too many performance concerns and stability questions. The main goal of this implementation is to have the flag to indicate whether the file was changed, to solve that we don't need full checksumming. Juice ain't worth the squeeze 🙂 |
@paescuj let me know if you want me to adjust the PR I have open, I can change it back to use a |
I have updated the linked PR to use a |
Describe the Improvement
Currently using the
Replace File
option in Directus, there is no indication that this file was replaced. This causes issues with external CDN providers where images are not updated to reflect the new changes. There is no indication that the file has been replaced, as the ID is the same, and the file overwrites the original.Changing the name of the file or any field related to
directus_files
, does update themodified_on
value.Ideally, having an indicator that returns a version number, or another specific value stored on
directus_files
. This would indicate that this file has been replaced using theReplace File
and could be grabbed through the API endpoint to indicate that we need to invalidate the cache of the specific file.Using the
modified_on
value doesn't exactly limit the scope of files changed if the file has changed other fields as well.Using a lower cache TTL defeats the purpose of caching the image.
The text was updated successfully, but these errors were encountered: