New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preview: deterministic build #2281

Open
wants to merge 40 commits into
base: master
from

Conversation

Projects
None yet
@alexanderkjeldaas
Contributor

alexanderkjeldaas commented Apr 15, 2014

This is a set of changes that makes the system_tarball_pc derivation deterministic.

Stdenv bootstrap:

  • The stdenvs created during bootstrap are numbered to make it easier to debug/understand
  • BUG: for the bootstrap stdenvs, binutils was after the bootstrap binaries in path.
  • Deterministic archives are enabled early during bootstrapping of stdenvs.

Stdenv builder:

  • The generic builder depends on libfaketime. libfaketime needs a home. libfaketime is currently distributed through github, that that does not work during bootstrap when SSL is not available. It should be moved to tarballs.nixos.org.
  • Feature: Setting useFakeTime fixes the time during builds. This can break builds, but is easy to manage compared to patching. Additional environment variables are available for configuring libfaketime.
  • Feature: A fake "date" utility is prepended to the path
  • Check: The build directory (/tmp/nix-build-foo-x.y.z) is not allowed to appear in artifacts ($out) to avoid non-deterministic output.

Gcc:

  • Gcc by default now defines DATE and TIME to be (time_t)0.
  • TODO: -frandom-seed is not set. This is relevant for c++ code.
  • PGO is turned off for gcc. Note: there is no consensus on doing this as it reduces compilation performance.

Various changes:

  • The xsltproc utility creates random identifiers. A post-processing stage using perl was added to the nixos manual. This could be generalized.
  • All uses of 'gzip -9' has been replaced with 'gzip -9n'.
  • Added options to cpio for default uid/gid, and default mtime for tar.
  • File lists are sorted before being added to cpio/tar.
@alexanderkjeldaas

This comment has been minimized.

Show comment
Hide comment
@alexanderkjeldaas

alexanderkjeldaas Apr 15, 2014

Contributor

Known issues:

  1. The inode changes done in perl seems to be buggy (boot problems), but they might not be needed.
  2. Some patches are not required because of the gcc-wrapper and libfaketime features.
  3. I'm not sure the overridden 'date' binary works correctly.
Contributor

alexanderkjeldaas commented Apr 15, 2014

Known issues:

  1. The inode changes done in perl seems to be buggy (boot problems), but they might not be needed.
  2. Some patches are not required because of the gcc-wrapper and libfaketime features.
  3. I'm not sure the overridden 'date' binary works correctly.
@shlevy

This comment has been minimized.

Show comment
Hide comment
@shlevy

shlevy Apr 16, 2014

Member

I think nix uses the equivalent of (time_t) 1 for its file a/m/ctime changes

Member

shlevy commented Apr 16, 2014

I think nix uses the equivalent of (time_t) 1 for its file a/m/ctime changes

@domenkozar

This comment has been minimized.

Show comment
Hide comment
@domenkozar

domenkozar Apr 19, 2014

Member

Wonderful. @alexanderkjeldaas that means most of other PRs can be closed?

Member

domenkozar commented Apr 19, 2014

Wonderful. @alexanderkjeldaas that means most of other PRs can be closed?

@thoughtpolice

This comment has been minimized.

Show comment
Hide comment
@thoughtpolice

thoughtpolice Jun 6, 2014

Member

@vcunat @alexanderkjeldaas Can we get some of these things merged on the pending stdenv branch? I know that we haven't solved the GCC question re: PGO, so we'll have to leave that commit out, but a lot of these changes are not very intrusive, and merging them would reduce burdens later and get us much closer to a deterministic build.

Member

thoughtpolice commented Jun 6, 2014

@vcunat @alexanderkjeldaas Can we get some of these things merged on the pending stdenv branch? I know that we haven't solved the GCC question re: PGO, so we'll have to leave that commit out, but a lot of these changes are not very intrusive, and merging them would reduce burdens later and get us much closer to a deterministic build.

@vcunat

This comment has been minimized.

Show comment
Hide comment
@vcunat

vcunat Jun 9, 2014

Member

Ah, I completely forgot this series of work. The currently staged stdenv has been quite tested by me, so I would merge it about the current state (after Hydra verifies gcc on Darwin). I'll look at this afterwards, as I wanted to do another iteration of stdenv stuff (there were some others I missed this time).

Member

vcunat commented Jun 9, 2014

Ah, I completely forgot this series of work. The currently staged stdenv has been quite tested by me, so I would merge it about the current state (after Hydra verifies gcc on Darwin). I'll look at this afterwards, as I wanted to do another iteration of stdenv stuff (there were some others I missed this time).

@vcunat vcunat added the enhancement label Jun 9, 2014

@alexanderkjeldaas

This comment has been minimized.

Show comment
Hide comment
@alexanderkjeldaas

alexanderkjeldaas Jun 9, 2014

Contributor

@vcunat I'll just leave it as-is then I guess.

On Mon, Jun 9, 2014 at 7:40 PM, Vladimír Čunát notifications@github.com
wrote:

Ah, I completely forgot this series of work. The currently staged stdenv
has been quite tested by me, so I would merge it about the current state
(after Hydra verifies gcc on Darwin). I'll look at this afterwards, as I
wanted to do another iteration of stdenv stuff (there were some others I
missed this time).


Reply to this email directly or view it on GitHub
#2281 (comment).

Contributor

alexanderkjeldaas commented Jun 9, 2014

@vcunat I'll just leave it as-is then I guess.

On Mon, Jun 9, 2014 at 7:40 PM, Vladimír Čunát notifications@github.com
wrote:

Ah, I completely forgot this series of work. The currently staged stdenv
has been quite tested by me, so I would merge it about the current state
(after Hydra verifies gcc on Darwin). I'll look at this afterwards, as I
wanted to do another iteration of stdenv stuff (there were some others I
missed this time).


Reply to this email directly or view it on GitHub
#2281 (comment).

@vcunat vcunat added the stdenv label Jun 9, 2014

@thoughtpolice

This comment has been minimized.

Show comment
Hide comment
@thoughtpolice

thoughtpolice Jun 29, 2014

Member

@alexanderkjeldaas I'm going to begin merging some of this work into HEAD soon. I'm probably not going to merge everything in one go, so feel free to rebase this when you get a chance. I'll update with what I've pushed upstream.

Member

thoughtpolice commented Jun 29, 2014

@alexanderkjeldaas I'm going to begin merging some of this work into HEAD soon. I'm probably not going to merge everything in one go, so feel free to rebase this when you get a chance. I'll update with what I've pushed upstream.

@alexanderkjeldaas

This comment has been minimized.

Show comment
Hide comment
@alexanderkjeldaas

alexanderkjeldaas Jun 30, 2014

Contributor

I've rebased

On Sun, Jun 29, 2014 at 3:47 AM, Austin Seipp notifications@github.com
wrote:

@alexanderkjeldaas https://github.com/alexanderkjeldaas I'm going to
begin merging some of this work into HEAD soon. I'm probably not going to
merge everything in one go, so feel free to rebase this when you get a
chance. I'll update with what I've pushed upstream.


Reply to this email directly or view it on GitHub
#2281 (comment).

Contributor

alexanderkjeldaas commented Jun 30, 2014

I've rebased

On Sun, Jun 29, 2014 at 3:47 AM, Austin Seipp notifications@github.com
wrote:

@alexanderkjeldaas https://github.com/alexanderkjeldaas I'm going to
begin merging some of this work into HEAD soon. I'm probably not going to
merge everything in one go, so feel free to rebase this when you get a
chance. I'll update with what I've pushed upstream.


Reply to this email directly or view it on GitHub
#2281 (comment).

@alexanderkjeldaas

This comment has been minimized.

Show comment
Hide comment
@alexanderkjeldaas

alexanderkjeldaas Jul 1, 2014

Contributor

I've added a minor fix for python 2.7.7 that I forgot to cherry-pick from my internal branch.

Contributor

alexanderkjeldaas commented Jul 1, 2014

I've added a minor fix for python 2.7.7 that I forgot to cherry-pick from my internal branch.

@7c6f434c

This comment has been minimized.

Show comment
Hide comment
@7c6f434c

7c6f434c Aug 30, 2014

Member

So, what is the status of cherry-picking? Github doesn't easily show this, unfortunately…

Obviously, this will never get directly in master (only in staging) and it gets chery-picked in small pieces.

I actually support reproducibility, although some people seem to like PGO too much…

Member

7c6f434c commented Aug 30, 2014

So, what is the status of cherry-picking? Github doesn't easily show this, unfortunately…

Obviously, this will never get directly in master (only in staging) and it gets chery-picked in small pieces.

I actually support reproducibility, although some people seem to like PGO too much…

@vcunat

This comment has been minimized.

Show comment
Hide comment
@vcunat

vcunat Aug 30, 2014

Member

PGO is probably the only questionable thing here, IIRC. I'm planning to really review and test this within the next 10 days.

Member

vcunat commented Aug 30, 2014

PGO is probably the only questionable thing here, IIRC. I'm planning to really review and test this within the next 10 days.

alexanderkjeldaas added some commits Oct 1, 2013

Make glibc compilation more pure.
Remove datetime from nscd.
Make the linux bootstrap environments more deterministic.
This includes two changes:
1) Fix a bug where the bootstrap-tools is always used instead of binutils
2) Enable strip --enable-deterministic-archives as soon as a new binutils
   is available.
Improve python library determinism.
1) Make the core python libraries deterministic.
2) Make the python libraries created by glib deterministic.
@bjornfor

This comment has been minimized.

Show comment
Hide comment
@bjornfor

bjornfor Oct 10, 2015

Contributor

What's the status of this PR? (I don't know if/how I can help, but I'm definitely interested in deterministic builds.)

Contributor

bjornfor commented Oct 10, 2015

What's the status of this PR? (I don't know if/how I can help, but I'm definitely interested in deterministic builds.)

@vcunat

This comment has been minimized.

Show comment
Hide comment
@vcunat

vcunat Oct 11, 2015

Member

I cherry-picked some of the commits (long ago). Some packages remained non-deterministic when I tested them so it wasn't clear to me whether the extra complexity was worth it.

Member

vcunat commented Oct 11, 2015

I cherry-picked some of the commits (long ago). Some packages remained non-deterministic when I tested them so it wasn't clear to me whether the extra complexity was worth it.

@Mathnerd314

This comment has been minimized.

Show comment
Hide comment
@Mathnerd314

Mathnerd314 Oct 11, 2015

Contributor

I rebased the patches: https://github.com/NixOS/nixpkgs/compare/master...Mathnerd314:deterministic-patches?expand=1

Skipped:

  • These looked like they were random untested changes:
    • python 2.7.7 updates to deterministic builds.
    • Make python 2.7 deterministic.
    • Make syslinux deterministic.
    • Disable useFakeTime for smartmontools.
    • Set useFakeTime on a set of derivations.
    • Make smartmontools deterministic.
    • Add useFakeTime for python, groff, kernel.
    • perl-modules: Do not create perllocal.pod, for determinism.
    • Set the linux kernel timestamp properly.
  • Not patching elf binaries result in non-deterministic builds.
    • This seemed like it could just be a separate PR
  • Change some fixed timestamp to != (time_t)0
    • unclear why these are needed, I did miss a line though
  • Add a fake date utility together with setup.
    • I think just adding a date thing to buildInputs should work
  • Remove dates from kernel 3.10.35
    • Ancient kernel, I don't care about it
  • Add atomic-ops package.
    • already packaged
  • Use real address for DATE and TIME.
    • this went to gcc_wrapper_old; I think current gcc wrapper patches these just fine
  • these should be fixed by just using a fake date utility:
    • Make openssl deterministic.
    • Improve determinism for libgcrypt, libgpg-error, and busybox.
    • Make perl 5.16 binary deterministic.
    • Fix date in version string.
Contributor

Mathnerd314 commented Oct 11, 2015

I rebased the patches: https://github.com/NixOS/nixpkgs/compare/master...Mathnerd314:deterministic-patches?expand=1

Skipped:

  • These looked like they were random untested changes:
    • python 2.7.7 updates to deterministic builds.
    • Make python 2.7 deterministic.
    • Make syslinux deterministic.
    • Disable useFakeTime for smartmontools.
    • Set useFakeTime on a set of derivations.
    • Make smartmontools deterministic.
    • Add useFakeTime for python, groff, kernel.
    • perl-modules: Do not create perllocal.pod, for determinism.
    • Set the linux kernel timestamp properly.
  • Not patching elf binaries result in non-deterministic builds.
    • This seemed like it could just be a separate PR
  • Change some fixed timestamp to != (time_t)0
    • unclear why these are needed, I did miss a line though
  • Add a fake date utility together with setup.
    • I think just adding a date thing to buildInputs should work
  • Remove dates from kernel 3.10.35
    • Ancient kernel, I don't care about it
  • Add atomic-ops package.
    • already packaged
  • Use real address for DATE and TIME.
    • this went to gcc_wrapper_old; I think current gcc wrapper patches these just fine
  • these should be fixed by just using a fake date utility:
    • Make openssl deterministic.
    • Improve determinism for libgcrypt, libgpg-error, and busybox.
    • Make perl 5.16 binary deterministic.
    • Fix date in version string.
@domenkozar

This comment has been minimized.

Show comment
Hide comment
@domenkozar

domenkozar Nov 3, 2015

Member

@Mathnerd314 let's open a PR and merge those?

Member

domenkozar commented Nov 3, 2015

@Mathnerd314 let's open a PR and merge those?

@copumpkin

This comment has been minimized.

Show comment
Hide comment
@copumpkin

copumpkin Dec 7, 2015

Member

@Mathnerd314 @domenkozar did you end up creating that PR?

Member

copumpkin commented Dec 7, 2015

@Mathnerd314 @domenkozar did you end up creating that PR?

@vcunat vcunat removed their assignment Dec 7, 2015

@copumpkin

This comment has been minimized.

Show comment
Hide comment
@copumpkin

copumpkin Jan 16, 2016

Member

Gcc by default now defines DATE and TIME to be (time_t)0.

I think @edolstra just recently changed this behavior in some builds which might conflict with this.

Member

copumpkin commented Jan 16, 2016

Gcc by default now defines DATE and TIME to be (time_t)0.

I think @edolstra just recently changed this behavior in some builds which might conflict with this.

@copumpkin

This comment has been minimized.

Show comment
Hide comment
@copumpkin

copumpkin Jan 16, 2016

Member

81e530a is the commit I'm talking about.

Member

copumpkin commented Jan 16, 2016

81e530a is the commit I'm talking about.

@jagajaga

This comment has been minimized.

Show comment
Hide comment
@jagajaga

jagajaga Mar 4, 2016

Member

Ping all.

Member

jagajaga commented Mar 4, 2016

Ping all.

@siddharthist

This comment has been minimized.

Show comment
Hide comment
@siddharthist

siddharthist Oct 6, 2016

Contributor

@Mathnerd314 Are you still planning to open that PR?

Contributor

siddharthist commented Oct 6, 2016

@Mathnerd314 Are you still planning to open that PR?

@spacekitteh

This comment has been minimized.

Show comment
Hide comment
@spacekitteh

spacekitteh Nov 4, 2016

Contributor

@grahamc tag as security

Contributor

spacekitteh commented Nov 4, 2016

@grahamc tag as security

@joachifm

This comment has been minimized.

Show comment
Hide comment
@joachifm

joachifm Dec 19, 2016

Contributor

Note that 3157dbe is probably no longer required due to tytso/e2fsprogs@a2143b5

Contributor

joachifm commented Dec 19, 2016

Note that 3157dbe is probably no longer required due to tytso/e2fsprogs@a2143b5

@joachifm

This comment has been minimized.

Show comment
Hide comment
@joachifm

joachifm Dec 19, 2016

Contributor

I've experimented with patching gzip -9 -> gzip -9n, but it doesn't seem to matter (to nix-build --check, anyway). Could this be because we now fix timestamps after unpacking sources? Or are there still reasons for doing it, even with SOURCE_DATE_EPOCH?

Contributor

joachifm commented Dec 19, 2016

I've experimented with patching gzip -9 -> gzip -9n, but it doesn't seem to matter (to nix-build --check, anyway). Could this be because we now fix timestamps after unpacking sources? Or are there still reasons for doing it, even with SOURCE_DATE_EPOCH?

@joachifm

This comment has been minimized.

Show comment
Hide comment
@joachifm

joachifm Dec 20, 2016

Contributor

Hm, I guess gzip -n still makes sense if the thing being compressed was created as part of the build.

Contributor

joachifm commented Dec 20, 2016

Hm, I guess gzip -n still makes sense if the thing being compressed was created as part of the build.

@joachifm

This comment has been minimized.

Show comment
Hide comment
@joachifm

joachifm Dec 20, 2016

Contributor

I've been going through most of these. A brief summary so far:

I think the following are obsolete

  • smartmontools passes nix-build --check & the string that is patched out
    in fd1101a
    no longer occurs in the source.
  • glibc currently contains a patch that removes datetime from nscd
  • the openldap patch is no longer relevant, I think, the build no longer skips
    elf patching & the build passes --check
  • groff and opensp pass --check on my end
  • the improvement to the linux stdenv are at least partially covered (strip is called with --deterministic-archives)

The following need more work

  • syslinux fails --check, but the patch in this PR is insufficient
  • as reported previously, the manual is still indeterministic

I've not looked at libgpg-error, busybox, python, perl, or the gcc stuff, nor the general libfaketime support.

Contributor

joachifm commented Dec 20, 2016

I've been going through most of these. A brief summary so far:

I think the following are obsolete

  • smartmontools passes nix-build --check & the string that is patched out
    in fd1101a
    no longer occurs in the source.
  • glibc currently contains a patch that removes datetime from nscd
  • the openldap patch is no longer relevant, I think, the build no longer skips
    elf patching & the build passes --check
  • groff and opensp pass --check on my end
  • the improvement to the linux stdenv are at least partially covered (strip is called with --deterministic-archives)

The following need more work

  • syslinux fails --check, but the patch in this PR is insufficient
  • as reported previously, the manual is still indeterministic

I've not looked at libgpg-error, busybox, python, perl, or the gcc stuff, nor the general libfaketime support.

@joachifm

This comment has been minimized.

Show comment
Hide comment
@joachifm

joachifm Dec 20, 2016

Contributor

Regarding the manual, https://wiki.debian.org/ReproducibleBuilds/ExperimentalToolchain#libxslt indicates that the issue of random ids is fixed/worked on upstream, so perhaps we want to just skip that for now.

Contributor

joachifm commented Dec 20, 2016

Regarding the manual, https://wiki.debian.org/ReproducibleBuilds/ExperimentalToolchain#libxslt indicates that the issue of random ids is fixed/worked on upstream, so perhaps we want to just skip that for now.

@cleverca22

This comment has been minimized.

Show comment
Hide comment
@cleverca22

cleverca22 Dec 20, 2016

Contributor

something that can help to stress-test determinism, http://manpages.ubuntu.com/manpages/xenial/man1/disorderfs.1.html

this is a fuse filesystem that randomizes the order of files in a directory, so you cant accidentally rely on the fs doing that most of the time

Contributor

cleverca22 commented Dec 20, 2016

something that can help to stress-test determinism, http://manpages.ubuntu.com/manpages/xenial/man1/disorderfs.1.html

this is a fuse filesystem that randomizes the order of files in a directory, so you cant accidentally rely on the fs doing that most of the time

@joachifm

This comment has been minimized.

Show comment
Hide comment
@joachifm

joachifm Dec 21, 2016

Contributor

Some of the perl stuff was lost when perl16 was removed, but otherwise it seems like @vcunat (or whomever) picked up most of the specific package fixes. I think we're left with gcc, faketime in stdenv, stdenv numbering, and the fake date command thing, all of which seem like they could be profitably dealt with on their own.

Contributor

joachifm commented Dec 21, 2016

Some of the perl stuff was lost when perl16 was removed, but otherwise it seems like @vcunat (or whomever) picked up most of the specific package fixes. I think we're left with gcc, faketime in stdenv, stdenv numbering, and the fake date command thing, all of which seem like they could be profitably dealt with on their own.

@vcunat

This comment has been minimized.

Show comment
Hide comment
@vcunat

vcunat Dec 21, 2016

Member

IIRC I had tried hard to pick whatever I could verify/make clearly advantegous. For some issues (like PGO) I didn't succeed in a reasonable amount of time, so I left those behind.

Member

vcunat commented Dec 21, 2016

IIRC I had tried hard to pick whatever I could verify/make clearly advantegous. For some issues (like PGO) I didn't succeed in a reasonable amount of time, so I left those behind.

@cbarrett

This comment has been minimized.

Show comment
Hide comment
@cbarrett

cbarrett Apr 12, 2018

FYI this PR is being linked to from https://reproducible-builds.org/who/. I volunteer to ping whoever's necessary to get an update posted (don't have the knowledge for more, unfortunately).

cbarrett commented Apr 12, 2018

FYI this PR is being linked to from https://reproducible-builds.org/who/. I volunteer to ping whoever's necessary to get an update posted (don't have the knowledge for more, unfortunately).

@Ekleog

This comment has been minimized.

Show comment
Hide comment
@Ekleog

Ekleog Sep 26, 2018

Contributor

(triage) My reading of the comments in this thread is that most of the changes either have been merged or are no longer needed. The remaining changes that would require being split out to separate PRs would be:

Does that sound correct to those actually involved?

Contributor

Ekleog commented Sep 26, 2018

(triage) My reading of the comments in this thread is that most of the changes either have been merged or are no longer needed. The remaining changes that would require being split out to separate PRs would be:

Does that sound correct to those actually involved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment