New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-ASCII filenames on Darwin lead to different hash #847
Comments
Can you give a way to reproduce? What derivation to build? |
Just checked the pelican sources don't seem to have this issue anymore. I'll try to create a small derivation to reproduce the issue. |
Did you edit the OP with the minimal example? If so note there are no notifications from that. |
@johbo any luck with the repro? There's already a known issue with the default Darwin case-insensitive HFS+ filesystem, since any FO derivation that contains files with different cases will lose the "overlapping" files and then hash to something different. |
I got back to it. Here is how I tried to reproduce it, maybe that helps to decide if there is a problem at all inside of Nix or if the issues sits somewhere else. I've put sources into this repository: https://github.com/johbo/reproduce-nix-unicode-darwin Basic idea is to use tarball = pkgs.fetchzip {
url = https://github.com/johbo/reproduce-nix-unicode-darwin/archive/9c7029ef3b9301c9faf55659ea281332f5f6a281.tar.gz;
sha256 = "1h7z2wax8ywhp0zr08qm78573rcd6nq3y8scl5pbv3lhpilf44sr";
}; The repository contains the file I've built things in the following way both on Darwin and on NixOS:
Last test was with these versions:
|
One thing I recall from screwing around on Darwin is that HFS+ always stores some normalized form (can't remember the details) of unicode characters, so if you enter your diacritics as combining characters they might get switched to the precomposed forms. Or something like that. We probably just need the hash function to be explicit about what it wants. |
I marked this as stale due to inactivity. → More info |
I closed this issue due to inactivity. → More info |
I get different hashes on Darwin if non-ASCII filenames are included.
This is a way to reproduce the problem:
I see this result on Darwin:
And this result on NixOS:
My assumption is that this difference was also causing the issue that I got a different hash for Pelican on Darwin than on NixOS. I tracked the difference down to a file called
décembre
inside of the source tarball of Pelican.I guess that what we get back as the filename needs special treatment on darwin, so that we get consistent hashing. I am willing to try things out if someone has a hint for me where to start in the codebase.
Pointers:
The text was updated successfully, but these errors were encountered: