Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug output substituted from cache does not match original output #7756

Open
symphorien opened this issue Feb 5, 2023 · 7 comments
Open
Labels

Comments

@symphorien
Copy link
Member

Downloading qemu and qemu.debug from hydra at the same time results in different build-ids and different logs. As a result debug symbols are not usable. (also happens with fwupd)

Steps To Reproduce

Checkout nixpkgs at 0591d6b57bfeb55dfeec99a671843337bc2c3323

$  nix-build -A qemu -A qemu.debug
/nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0
/nix/store/w1hwhyrsmvgmaz7ggaqarbny3iyacqgl-qemu-7.2.0-debug
$  LANG=C readelf -a result/bin/qemu-system-aarch64 | grep Build   
    Build ID: c688a6af5e64019b775be06285a9823b0ee058b0
$  LANG=C readelf -a result-debug/lib/debug/qemu-system-aarch64 | grep Build
readelf: Error: Unable to find program interpreter name
    Build ID: eebcd5810e67a5754ae1e77a0fd12118c7f6ab45

These are different build-ids.

I don't think this is a bug in the nixpkgs code generating separate debug info, for the following reason:

nix log ./result yields a log mentioning:

separating debug info from /nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0/bin/qemu-system-aarch64 (build ID eebcd5810e67a5754ae1e77a0fd12118c7f6ab45)

which is the build-id I get in -debug output
Hydra log on the other hand is https://hydra.nixos.org/build/208064778/nixlog/1 which mentions

separating debug info from /nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0/bin/qemu-aarch64 (build ID 6b6520854c647f9f706ac95f13dab5003eb98472)

which is the build-id of the bin output.

As if qemu was built twice and the substituted outputs belonged to distinct builds.

Expected behavior

build-id matches between debug output and original outputs

nix-env --version output

$  nix-env --version
nix-env (Nix) 2.11.1

Additional context

Mix and match of drv files was already noticed at #7562 but this time we mix and match build outputs and it actually breaks stuff.

debuginfo mismatch is detected by https://github.com/symphorien/nixseparatedebuginfod (there is one warning for each instance in your store at first startup, if you have the debug-outputs downloaded).

Priorities

Add 👍 to issues you find important.

@roberth
Copy link
Member

roberth commented Feb 5, 2023

As if qemu was built twice and the substituted outputs belonged to distinct builds.

That may well be the case. For example:

  1. you build qemu
  2. hydra.nixos.org builds qemu
  3. you garbage collect. The main output is retained
  4. you debug qemu. The debug output is substituted
  5. you now have outputs from two different builds

Whenever you have two store paths that need to be in sync, this is only guaranteed to work when Nixpkgs builds those paths in a reproducible manner. The build id is an impurity that Nixpkgs is responsible for fixing.

@symphorien
Copy link
Member Author

Your explanation does not match my case: if I delete manually the two offending store path, and redownload them simultaneously, then I still get the mismatch. Hydra did build the two store path simultaneously as well, so something inside hydra or nix did mix and match them.

$  nix-build -A qemu -A qemu.debug
these 2 paths will be fetched (815.05 MiB download, 1548.45 MiB unpacked):
  /nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0
  /nix/store/w1hwhyrsmvgmaz7ggaqarbny3iyacqgl-qemu-7.2.0-debug
copying path '/nix/store/w1hwhyrsmvgmaz7ggaqarbny3iyacqgl-qemu-7.2.0-debug' from 'https://cache.nixos.org'...
copying path '/nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0' from 'https://cache.nixos.org'...
/nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0
/nix/store/w1hwhyrsmvgmaz7ggaqarbny3iyacqgl-qemu-7.2.0-debug
                                                                                                                                                                                             
~/src/nixpkgs nixos-unstable* 34s 
$  LANG=C readelf -a result/bin/qemu-system-aarch64 | grep Build
    Build ID: c688a6af5e64019b775be06285a9823b0ee058b0
                                                                                                                                                                                             
~/src/nixpkgs nixos-unstable* 
$  LANG=C readelf -a result-debug/lib/debug/qemu-system-aarch64 | grep Build
readelf: Error: Unable to find program interpreter name
    Build ID: eebcd5810e67a5754ae1e77a0fd12118c7f6ab45

@roberth
Copy link
Member

roberth commented Feb 5, 2023

Seems like both a hydra and nixpkgs bug then.

@symphorien
Copy link
Member Author

Besides, nix-build --check indicates that qemu builds reproducibly (build-ids are hashes of the content of some elf sections, they are not random), so I don't think nixpkgs is at fault here.

@roberth
Copy link
Member

roberth commented Feb 5, 2023

Thank you; I'm not much of an elf expert.
That being the case, then whatever wrote those sections seems to be at fault. So this is about bit-for-bit reproducibility; the kind that Debian people refer to when they use the term. That's a responsibility that does fall on the expression authors, as I don't think there's that much extra that Nix could do for bit-for-bit reproducibility. (though if you know of an extra repro measure that isn't #7571, I think we're all quite interested!)

@symphorien
Copy link
Member Author

symphorien commented Feb 5, 2023

I don't understand why you think this is only a reproducibility problem when:

  • nix-build --check -A qemu is ok
  • hydra built /nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0 and /nix/store/w1hwhyrsmvgmaz7ggaqarbny3iyacqgl-qemu-7.2.0-debug at the same time
  • I downloaded /nix/store/dfxy53v2142cbp5l5fzg0srxb4fzh7m5-qemu-7.2.0 and /nix/store/w1hwhyrsmvgmaz7ggaqarbny3iyacqgl-qemu-7.2.0-debug at the same time

There was no "download one, then rebuild on hydra, then download another" dance. I should have downloaded either two first builds or the two last builds of these store paths, but not a mix of the two.

@roberth
Copy link
Member

roberth commented Feb 5, 2023

Maybe the dance happened in hydra? Otherwise I'm out of ideas for mechanisms that allow this to happen.

  1. Hydra builds a nixos test driver on a remote builder.
  2. Hydra uploads the output closure of the driver to the cache, including the main binary.
  3. Hydra builds qemu on remote builder, because the debug output is not substitutable.
  4. Hydra uploads the missing output and does not touch the existing output.

(1) and (3) are different builds.

This does hinge on the possibility that (3) can be scheduled after (1), but Hydra has demonstrated to me before that it doesn't always exploit dependency information. That is anecdotal of course.

nix-build --check -A qemu is ok

Not all impurities are random, or time based. They can take input from cpu architecture, kernel version and more.
I see gcc crash every now and then. Where there's a crash, there's often opportunity for memory corruption. This has the potential to make an output unreproducible.
Nondeterminism is a nastier beast than randomness.

hydra built /nix/store/...-qemu-7.2.0 and /nix/store/...-qemu-7.2.0-debug at the same time

So does my example, twice.

I downloaded [...] at the same time

Compatible with the hypothesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants