Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetchzip: force UTF-8 compatible locale to unpack non-ASCII symbols #176253

Merged
merged 4 commits into from Jun 15, 2022

Conversation

trofi
Copy link
Contributor

@trofi trofi commented Jun 4, 2022

musl and darwin support UTF-8 locales without any extras. As a result
unzip can unpack UTF-8 filenames there as is. But on glibc without
locale archive presence files get mangled as:

deps/αβ -> deps/#U03b1#U03b2

This makes fetchzip fixed-output derivations unstable.

Tested this change to fail in coq.src which was generated in system
that mangles UTF-8 symbols:

$ nix build -f. coq.src --rebuild -L
source> trying https://github.com/coq/coq/archive/V8.15.2.zip
source>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
source>                                  Dload  Upload   Total   Spent    Left  Speed
source>   0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
source> 100 8945k  100 8945k    0     0  1513k      0  0:00:05  0:00:05 --:--:-- 1989k
source> unpacking source archive /build/V8.15.2.zip
error: hash mismatch in fixed-output derivation '/nix/store/hrnyykm7wgw8vxisgq7hc2bg5gr0y6s8-source.drv':
         specified: sha256-h81nFqkuvZkMR7YLHy7laTq5yOhjMW+w6rYzncxvyD4=
            got:    sha256-DTspmwyD3Evl1CUmvUy2MonbLGUezvsHN3prmP9eK2I=

Note: it means that some of existing caches for fixed output
derivations become incorrect. It should not break already cached
tarballs on cache.nixos.org thus the impact should not be widespread.

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@trofi trofi changed the title fetchzip: force UTF-8 compatibel locale to unpack non-ASCII symbols fetchzip: force UTF-8 compatible locale to unpack non-ASCII symbols Jun 4, 2022
@trofi trofi changed the base branch from master to staging June 4, 2022 15:03
# Otherwise unzip unpacks escaped file names as if '-U' options was in effect.
#
# Pick en_US.UTF-8 as most possible to be present on glibc, musl and darwin.
LANG=en_US.UTF-8 unzip -qq "$curSrc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C.UTF-8 maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU C.UTF-8 is not supported on macos. musl would probably work (not sure either).

Copy link
Member

@SuperSandro2000 SuperSandro2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we only need to find the ~10 packages which disabled tests because of that

pkgs/build-support/fetchzip/default.nix Outdated Show resolved Hide resolved
Copy link
Member

@SuperSandro2000 SuperSandro2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code wise this should be fine but I am not deep enough into fetchzip and reproducibility.

@trofi
Copy link
Contributor Author

trofi commented Jun 10, 2022

Before merging it Sandro also suggested finding up existing problematic tarballs. We can recalculate their hashes as part of this PR. I'll try to extract at least some of problematic ones.

Very crude way to find .zip sources and test to fetch them:

# extract:
$ nix-env -qaP -I nixpkgs=. | awk '{print $1}' | while read p; do a=${p#nixos.}; nix-instantiate --eval --expr "with import ./. {}; \"$a.src: \${lib.concatStringsSep \" \" ($a.src.urls or [])}\""; done |& fgrep .zip > all-zips

# test:
$ cat all-zips | tr '":' '  ' | awk '{print $1}' | while read a; do echo $a; nix build -f. $a; nix build -f. $a --rebuild; done

List of suspects that started failing:

  • coq_8_13.src:
  • coq_8_14.src:
  • coq_8_15.src:
  • coq_8_16.src:
  • eduli.src:

List of already failing builds (fetch fails upstream URL):

  • asciidoctorj.src:
  • coqPackages.zorns-lemma.src:
  • enblend-enfuse.src:
  • gtdialog.src:
  • ifm.src:
  • irpf.src:
  • jbake.src:
  • kwm.src:
  • xmlbeans.src:

List of already failing builds (hash mismatch, changed tarball?):

  • coqPackages.corn.src:
  • imagej.src:
  • ipmicfg.src:
  • libbass.src:
  • libbass_fx.src:
  • randoop.src:
  • sox.src:
  • virtual-ans.src:

@trofi
Copy link
Contributor Author

trofi commented Jun 11, 2022

@ofborg build coq_8_13.src coq_8_14.src coq_8_15.src eduli.src

@trofi
Copy link
Contributor Author

trofi commented Jun 11, 2022

@ofborg build coq_8_13.src coq_8_14.src coq_8_15.src coq_8_16.src eduli.src

@vcunat
Copy link
Member

vcunat commented Jun 11, 2022

glibcLocales is over 200 MB due to containing all locales. That's way too heavy for this use case, I believe. With .override { allLocales = false; } we'd still get the locale we need with just around 3 MB.

glibcLocalesUtf8 is a trimmed down version of glibcLocales that provides
UTF-8 locale. It is useful to use in derivations that need some UTF-8
lcoale, like unzip in NixOS#176253.
musl and darwin support UTF-8 locales without any extras. As a result
unzip can unpack UTF-8 filenames there as is. But on glibc without
locale archive presence files get mangled as:

    deps/αβ -> deps/#U03b1#U03b2

This makes `fetchzip` fixed-output derivations unstable.

Tested this change to fail in `coq.src` which was generated in system
that mangles UTF-8 symbols:

    $ nix build -f. coq.src --rebuild -L
    source> trying https://github.com/coq/coq/archive/V8.15.2.zip
    source>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    source>                                  Dload  Upload   Total   Spent    Left  Speed
    source>   0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
    source> 100 8945k  100 8945k    0     0  1513k      0  0:00:05  0:00:05 --:--:-- 1989k
    source> unpacking source archive /build/V8.15.2.zip
    error: hash mismatch in fixed-output derivation '/nix/store/hrnyykm7wgw8vxisgq7hc2bg5gr0y6s8-source.drv':
             specified: sha256-h81nFqkuvZkMR7YLHy7laTq5yOhjMW+w6rYzncxvyD4=
                got:    sha256-DTspmwyD3Evl1CUmvUy2MonbLGUezvsHN3prmP9eK2I=

Note: it means that some of existing caches for fixed output
derivations become incorrect. It should not break already cached
tarballs on cache.nixos.org thus the impact should not be widespread.
fetchzip changed unpacking of UTF-8 files on glibc systems:
    NixOS#176253
As a result unpacked contents changed it's filenames.
@trofi
Copy link
Contributor Author

trofi commented Jun 11, 2022

glibcLocales is over 200 MB due to containing all locales. That's way too heavy for this use case, I believe. With .override { allLocales = false; } we'd still get the locale we need with just around 3 MB.

Good point. Added shorter form of glibcLocales as:

+  glibcLocalesUtf8 =
+    if stdenv.hostPlatform.isLinux
+    then callPackage ../development/libraries/glibc/locales.nix { allLocales = false; }
+    else null;

and updated fetchzip to use it.

@SuperSandro2000
Copy link
Member

glibcLocales is over 200 MB due to containing all locales. That's way too heavy for this use case, I believe. With .override { allLocales = false; } we'd still get the locale we need with just around 3 MB.

We should reverse that default IMO. Most users are fine with C.UTF-8 plus their set locale and maybe en_US.UTF-8.

@vcunat
Copy link
Member

vcunat commented Jun 11, 2022

Well, that's like a discussion for another topic. For users it's mostly the NixOS supportedLocales option what's important. That one is fed into this parameter, though (a bit surprisingly to me) it defaults to all locales, too. Either way, I believe that's the main step to address.

EDIT: I think some people have argued for basics (perhaps C.UTF-8) getting compiled into the glibc package itself, though I'm not sure how hard that is technically.

…chzip update

fetchzip changed unpacking of UTF-8 files on glibc systems:
    NixOS#176253
As a result unpacked contents changed it's filenames.

Closes: NixOS#176225
@SuperSandro2000
Copy link
Member

Doing a PR right now to fix that. We can discuss this further there.

#177318

@vcunat vcunat linked an issue Jun 12, 2022 that may be closed by this pull request
@trofi trofi merged commit 206b2cf into NixOS:staging Jun 15, 2022
@trofi trofi deleted the fetchzip-forced-UTF-8 branch June 15, 2022 18:32
@Artturin
Copy link
Member

Artturin commented Jul 13, 2022

according to bisect this(ffb456a) broke the eval of pkgsCross.aarch64-android.stdenv

$ nix eval ".#pkgsCross.aarch64-android.stdenv.drvPath"
error: infinite recursion encountered

       at /nix/store/a2s5vi4zz6szchvry9ks215zrif1jc5m-source/pkgs/top-level/all-packages.nix:12562:8:

        12561|   gccStdenv =
        12562|     if stdenv.cc.isGNU
             |        ^
        12563|     then stdenv
(use '--show-trace' to show detailed location information)

found with

git bisect start 06e4e9c8676 06e4e9c8676^ --
git bisect run bash -c 'nix eval ".#pkgsCross.aarch64-android.stdenv.drvPath"'

@trofi
Copy link
Contributor Author

trofi commented Jul 13, 2022

pkgsCross.aarch64-android.stdenv

Good catch! I always wondered why we defined it non-null on musl (where it does not break, but builds unused glibc and locales). Proposed the fix as: #181384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coq.src can't be built on darwin (UTF-8 file names?)
7 participants