Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII filenames on Darwin lead to different hash #847

Open
johbo opened this issue Mar 13, 2016 · 8 comments
Open

Non-ASCII filenames on Darwin lead to different hash #847

johbo opened this issue Mar 13, 2016 · 8 comments
Assignees
Labels
bug macos Nix on macOS, aka OS X, aka darwin stale

Comments

@johbo
Copy link

johbo commented Mar 13, 2016

I get different hashes on Darwin if non-ASCII filenames are included.

This is a way to reproduce the problem:

mkdir reproduce
touch reproduce/décembre
nix-hash reproduce

I see this result on Darwin:

$ nix-hash reproduce
ae4076b53a2de6a8c26c1139d603dde1

And this result on NixOS:

$ nix-hash reproduce
ebf3715949c8cc6c0b03b1320544d17b

My assumption is that this difference was also causing the issue that I got a different hash for Pelican on Darwin than on NixOS. I tracked the difference down to a file called décembre inside of the source tarball of Pelican.

I guess that what we get back as the filename needs special treatment on darwin, so that we get consistent hashing. I am willing to try things out if someone has a hint for me where to start in the codebase.

Pointers:

@domenkozar
Copy link
Member

Can you give a way to reproduce? What derivation to build?

@johbo
Copy link
Author

johbo commented Jun 6, 2016

Just checked the pelican sources don't seem to have this issue anymore. I'll try to create a small derivation to reproduce the issue.

@Ericson2314
Copy link
Member

Did you edit the OP with the minimal example? If so note there are no notifications from that.

@domenkozar domenkozar added macos Nix on macOS, aka OS X, aka darwin bug labels Jul 21, 2016
@copumpkin
Copy link
Member

@johbo any luck with the repro? There's already a known issue with the default Darwin case-insensitive HFS+ filesystem, since any FO derivation that contains files with different cases will lose the "overlapping" files and then hash to something different.

@johbo
Copy link
Author

johbo commented Nov 19, 2017

I got back to it. Here is how I tried to reproduce it, maybe that helps to decide if there is a problem at all inside of Nix or if the issues sits somewhere else.

I've put sources into this repository: https://github.com/johbo/reproduce-nix-unicode-darwin

Basic idea is to use fetchurl to get sources from a repository:

  tarball = pkgs.fetchzip {
    url = https://github.com/johbo/reproduce-nix-unicode-darwin/archive/9c7029ef3b9301c9faf55659ea281332f5f6a281.tar.gz;
    sha256 = "1h7z2wax8ywhp0zr08qm78573rcd6nq3y8scl5pbv3lhpilf44sr";
   };

The repository contains the file décembre which is expected to trigger the issue. That's also a filename from the Pelican repository.

I've built things in the following way both on Darwin and on NixOS:

nix-build -A tarball

Last test was with these versions:

  • nix-build (Nix) 1.11.4 on my NixOS VM
  • nix-build (Nix) 1.11.15 on darwin

@copumpkin
Copy link
Member

One thing I recall from screwing around on Darwin is that HFS+ always stores some normalized form (can't remember the details) of unicode characters, so if you enter your diacritics as combining characters they might get switched to the precomposed forms. Or something like that. We probably just need the hash function to be explicit about what it wants.

@stale
Copy link

stale bot commented Feb 15, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 15, 2021
@stale
Copy link

stale bot commented May 2, 2022

I closed this issue due to inactivity. → More info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug macos Nix on macOS, aka OS X, aka darwin stale
Projects
None yet
Development

No branches or pull requests

7 participants