-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Consistent image ID hashes across machines #711
fix: Consistent image ID hashes across machines #711
Conversation
🦋 Changeset detectedLatest commit: e43d13f The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Hmm, that's really weird. I'm surprised the mtime would be different across machines. Are they different OSes? Or is the mtime set based upon when you check the repo out from git. I'd love to better understand the issue before we switch away from it |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #711 +/- ##
==========================================
- Coverage 95.49% 95.48% -0.01%
==========================================
Files 33 33
Lines 1288 1286 -2
Branches 226 226
==========================================
- Hits 1230 1228 -2
Misses 58 58
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
It's definitely strange 😄. The OSes are set to be the same as they're running from the same workflow, so I don't think it's that. Perhaps it's more related to timezones? 🤔
I'd be happy to dig into this question when I get a chance! |
It seems like this assumption was correct, @benmccann! After checking out the repo on 2 separate instances (on the same machine), I'm ending up with two different File stats 1:{
stats: Stats {
dev: 2080,
mode: 33188,
nlink: 1,
uid: 1000,
gid: 1000,
rdev: 0,
blksize: 4096,
ino: 768440,
size: 159643,
blocks: 312,
atimeMs: 1712710759640.3115,
mtimeMs: 1712710733650.3157,
ctimeMs: 1712710733650.3157,
birthtimeMs: 1712710733650.3157,
atime: 2024-04-10T00:59:19.640Z,
mtime: 2024-04-10T00:58:53.650Z,
ctime: 2024-04-10T00:58:53.650Z,
birthtime: 2024-04-10T00:58:53.650Z
}
} File stats 2:{
stats: Stats {
dev: 2080,
mode: 33188,
nlink: 1,
uid: 1000,
gid: 1000,
rdev: 0,
blksize: 4096,
ino: 1598355,
size: 159643,
blocks: 312,
atimeMs: 1712711152420.249,
mtimeMs: 1712711150750.2493,
ctimeMs: 1712711150750.2493,
birthtimeMs: 1712711150750.2493,
atime: 2024-04-10T01:05:52.420Z,
mtime: 2024-04-10T01:05:50.750Z,
ctime: 2024-04-10T01:05:50.750Z,
birthtime: 2024-04-10T01:05:50.750Z
}
} |
Sorry for the long delay. I'm afraid I'd forgotten about this PR. I'm nervous about using the file size because you could hit false positives with it. I'm not sure how expensive it would be to compute a hash for every file, but it seems safer. I had an idea that we could use mtime first and then fallback to the hash if it's different, but that probably wouldn't work because it would require us storing two hashes. |
Fair enough!
I agree. Testing locally, I modified it to use I also did some microbenchmarking for it (just in case), testing different image sizes to see their impact:
Here's the repo for it too: https://github.com/AdrianGonz97/imagetools-hashing-benchmarks Hashing with the So in practice, I think it's completely fine to hash with the image itself (even without the Side note: While writing the benchmark, I noticed that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for all the changes and the thorough job!
This PR fixes an issue that we've been experiencing in huntabyte/shadcn-svelte#978 where the hash for the generated image ID is different when it's computed on different machines. The goal of that PR was to cache the images so that they can be utilized across our GH runners.
The source of the issue is the
mtime
stat from the image file, which seemingly differs from machine to machine, causing the hash to generate a different id, resulting in a cache MISS. Instead,mtime
has been replaced for thesize
stat, which (in conjunction with the image config and file url) should provide sufficient uniqueness for the hash.What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Bug fix
What is the new behavior (if this is a feature change)?
Implements consistent image ID hashing across machines
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this
PR?)
No
Other information:
I wasn't sure the best way to express this change in a test, so I've hardcoded the expected id string. I'm sure there's a better way to do it, so please feel free to modify it!