Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isl: make musl/bootstrap package bit-by-bit reproducible #106587

Closed
raboof opened this issue Dec 10, 2020 · 11 comments
Closed

isl: make musl/bootstrap package bit-by-bit reproducible #106587

raboof opened this issue Dec 10, 2020 · 11 comments

Comments

@raboof
Copy link
Member

raboof commented Dec 10, 2020

Describe the bug

The closure of the iso-minimal derivation contains a version of the 'isl' library that does not build deterministically.

To Reproduce

The 'normal' isl library is reproducible: nix-build '<nixpkgs>' -A isl --check succeeds.

However, an isl that is a transitive dependency of busybox-sandbox-shell is not reproducible:

nix-build $(nix-store --query --graph `nix-instantiate '<nixpkgs>' -A busybox-sandbox-shell` | grep -e "isl-0.20.* -> " | grep musl | cut -d "\"" -f 2 | uniq) --check

I'm not sure how to diagnose what's different with this isl derivation that makes it unreproducible. I also tried nix-build '<nixpkgs>' --arg crossSystem '(import <nixpkgs> {}).lib.systems.examples.musl64' -A isl --check, but that also succeeds for me.

@raboof
Copy link
Member Author

raboof commented Dec 17, 2020

(nix-build -A pkgs.pkgsMusl.isl also reproduces fine and is different from the busybox-sandbox-shell dependency)

@zimbatm zimbatm added this to Inbox in R13y via automation Dec 22, 2020
@raboof
Copy link
Member Author

raboof commented Dec 27, 2020

However, an isl that is a transitive dependency of busybox-sandbox-shell is not reproducible:

nix-build $(nix-store --query --graph `nix-instantiate '<nixpkgs>' -A busybox-sandbox-shell` | grep -e "isl-0.20.* -> " | grep musl | cut -d "\"" -f 2 | uniq) --check

Hmm, it's reproducible for me now... let's close this for now and revisit the issue when we see it again.

@raboof raboof closed this as completed Dec 27, 2020
R13y automation moved this from Inbox to Done Dec 27, 2020
@raboof
Copy link
Member Author

raboof commented Jan 4, 2021

Now another isl that is a transitive dependency of busybox-sandbox-shell is not reproducible - not so sure anymore it has to do with musl directly:

nix-build $(nix-store --query --graph `nix-instantiate '<nixpkgs>' -A busybox-sandbox-shell` | grep -e "isl-0.20.* -> " | grep gcc-9 | cut -d "\"" -f 2 | uniq) --check

@raboof raboof reopened this Jan 4, 2021
R13y automation moved this from Done to In progress Jan 4, 2021
@raboof
Copy link
Member Author

raboof commented Jan 4, 2021

It seems the actual assembly generated by gcc is different, for example for print.o:

https://pastebin.com/SPezTXki

Looking in nix-shell it seems print.o is generated with:

gcc -DHAVE_CONFIG_H   -I. -I. -I./include -Iinclude/    -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -MT print.o -MD -MP -MF $depbase.Tpo -c -o print.o print.c

Unfortunately on my machine that consistently produces the same output, which is different from the one we find on cache.nixos.org

The build seems to be using gcc 8.3.0, but that is not part of the build closure 🤔

@raboof
Copy link
Member Author

raboof commented Mar 3, 2021

I see differences in the assembly like add $0x1,%eax vs inc %eax - seems like some optimization that does not deterministically trigger? CFLAGS is using -O3

Debian has seen some differences in 0.23 on i386 recently (https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/diffoscope-results/isl.html) which look somewhat similar

The build log (nix-build /nix/store/lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv --check vs nix log /nix/store/lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv) seem pretty much the same - I did see a change in ordering of the CCLD calls, but I think cmake should make sure that doesn't matter, right?

@veehaitch
Copy link
Member

I'm also running into this on x86_64:

# nix build -L --rebuild '/nix/store/lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv'
error: derivation '/nix/store/lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv' may not be deterministic: output '/nix/store/6gmh9jq83v6fh2l0rqg937w9fxjzac7r-isl-0.20' differs

@veehaitch
Copy link
Member

veehaitch commented Apr 8, 2021

Now another isl that is a transitive dependency of busybox-sandbox-shell is not reproducible - not so sure anymore it has to do with musl directly:

nix-build $(nix-store --query --graph `nix-instantiate '<nixpkgs>' -A busybox-sandbox-shell` | grep -e "isl-0.20.* -> " | grep gcc-9 | cut -d "\"" -f 2 | uniq) --check

When looking at the dependencies of busybox-sandbox-shell from c514786

nix show-derivation -r 'github:nixos/nixpkgs?ref=c5147860e23ed75ce9d40298c66b416c00be116#busybox-sandbox-shell' \
  | jq 'keys' \
  | jq '.[]' \
  | grep 'isl-0.20.drv'

I get two isl dependencies:

  1. /nix/store/3yy0njks8l23lw4l4pkgdj41vgsnxh8f-isl-0.20.drv: A dependency of /nix/store/4f97plism3424fydpbi66i5nl68ri5ff-x86_64-unknown-linux-musl-stage-static-gcc-10.2.0.drv
  2. /nix/store/lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv: A dependency of /nix/store/70cx4mag2pdhriw0plrszv1i2y72narp-gcc-10.2.0.drv

3yy0njks8l23lw4l4pkgdj41vgsnxh8f-isl-0.20.drv

Reproduces just fine.

lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv

Doesn't produce the same output.

When looking at the differences between 3yy0njks8l23lw4l4pkgdj41vgsnxh8f and lj2z3sfxqcr3q3lp0q97lixlags3g1wn, both have in common that some CCLD reordering happens (as you, @raboof, already mentioned). The most apparent difference is, however, the check for -march=k8, which succeeds for lj2z3sfxqcr3q3lp0q97lixlags3g1wn but doesn't for 3yy0njks8l23lw4l4pkgdj41vgsnxh8f. From the Hydra logs alone, it's hard to tell if this flag ends up in calls to gcc but it could be an explanation for the major differences between the outputs.

@veehaitch
Copy link
Member

The CPU machine type could be the problem after all. I executed nix-build --check '/nix/store/lj2z3sfxqcr3q3lp0q97lixlags3g1wn-isl-0.20.drv' on a PC Engines APU2 (AMD GX-412TC SOC) which passes just fine. My previous comment was referencing builds executed on an Intel machine. Logs for the check run using the APU2:

@veehaitch
Copy link
Member

The issue is that the configure script guesses the architecture of the host. I see two possibilities to approach this:

  1. Define an architecture mapping for stdenv.hostPlatform.system and pass it using --with-gcc-arch=. That's what the imagemagick derivation does:

    ++ [ "--with-gcc-arch=${arch}" ]

  2. Set the --with-gcc-arch= option to generic. This will, in turn, try to set the GCC options -march= and -mtune= to generic. At least for the -march option, generic is not a valid value but this doesn't cause any harm as the -march= option gets omitted altogether then. libffi uses this approach:

    "--with-gcc-arch=generic" # no detection of -march= or -mtune=

I'd go for the second option as it doesn't require a mapping for all the architectures we support.

@veehaitch
Copy link
Member

Since 1359822 is merged into master now, isl reproduces fine for me 🙂

nix build --rebuild -L 'github:nixos/nixpkgs/45e57e917d623aa71e9e72947cf9c2dae4f015dc#isl_0_20'

@raboof raboof closed this as completed Apr 26, 2021
R13y automation moved this from In progress to Done Apr 26, 2021
@raboof
Copy link
Member Author

raboof commented Apr 26, 2021

The reproducibility problems weren't always, erm, reproducible, but this is a good sign! Let's close this until we have reason to suspect further problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
R13y
Done
Development

No branches or pull requests

2 participants