Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$SOURCE_DATE_EPOCH is always 1 if src= points to a local directory #112595

Closed
kvtb opened this issue Feb 9, 2021 · 23 comments
Closed

$SOURCE_DATE_EPOCH is always 1 if src= points to a local directory #112595

kvtb opened this issue Feb 9, 2021 · 23 comments

Comments

@kvtb
Copy link
Contributor

kvtb commented Feb 9, 2021

I do not think it is a bug to fix. Better something to brainstorm.

When there is a derivation whose src points to a tarball (or fetchurl a tarball), the source files have timestamps.
But when there is something like src = ./dev/directory, the directory gets copied to Nix Store first, and on this step all timestamps are set to 1970-01-01T00:00:01Z

It might looks a minor inconvenience, but it has some bad consequences.
For example $SOURCE_DATE_EPOCH is not set to the timestamp of the newest source file and remains the same ("1") while source is getting changed. That results in various caching issues (always Last-Modified: Thu, 01 Jan 1970 00:00:01 GMT on embedded web-servers, etc) when src=/dev/directory, but the issues disappear with src=fetchurl{url="....../sources.tar.gz";}

It would be nice to find a way to skip copying src directory to the Nix Store.
Or, at least, copy it there in a form which preserves timestamps (tarball?).

@kvtb
Copy link
Contributor Author

kvtb commented Feb 9, 2021

  tarballDir = dir: builtins.storePath (builtins.exec ["/bin/sh" "-c" ''
    set -e
    >&2 tar -C ${lib.escapeShellArg (builtins.dirOf dir)} -cvf /tmp/result.tar ${lib.escapeShellArg (builtins.baseNameOf dir)}
    >&2 echo -n "{ memo = \"$(nix add-to-store /tmp/result.tar)\"; }"
    echo -n "{ memo = \"$(nix add-to-store /tmp/result.tar)\"; }"
    rm /tmp/result.tar
  '']).memo;

and then

   src = tarballDir "/dev/directory";

does the trick, although it is not nice

@kvtb
Copy link
Contributor Author

kvtb commented Feb 9, 2021

fetchgit erases timestamps too

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

Looks similar to #25485

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

Looks similar to #25485

In some sense, yes (web server headers used in example too).

But that issues is about no timestamps in derivation output, while this is about timestamp losing in
(src directory) -> (Nix Store) -> ($NIX_BUILD_TOP) path,
where the intermediate (Nix Store) is not really necessary and, even if used, might preserve timestamps.

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

@kvtb It is not the src that puts things into the nix store. It happens whenever you are trying to interpolate the path object (in this case ./dev/directory) into a string or derivation. So this behaviour has nothing to do with nixpkgs, it is a Nix thing. I'm not sure if paths in Nix store are allowed to have specific timestamps, they should be stripped when transferred as NAR archives.

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

@kvtb It is not the src that puts things into the nix store. It happens whenever you are trying to interpolate the path object (in this case ./dev/directory) into a string or derivation.

I see, but "interpolation of path object" is not mandatory. The src does not have to be transferred via Nix Store or NAR archives.

For example:

  • my hacky tarballDir does not do that
  • fetchFromGitHub might refrain from unpacking
  • fetchgit might produce .tar too, ...

There is at least one alternative: to keep sources in tarballs, so they can transit via Nix Store or NAR without losing timestamps.

I cannot accept arguments that erasing timestamps are inevitable just because

  • Many derivation use fetchurl and do not erase timestamps (so switching fetchurl<->fetchgit changes this behavior)
  • The whole logic around $SOURCE_DATE_EPOCH (which is set to the timestamp of the newest source file) relies on the fact that the timestamps of source files are not erased.

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

To summarize: the question is - should unpacked sources be a Nix derivation?

The current answer is "sometimes" (depending on whether one uses fetchurl or fetchgit) and the consequences are:

  1. $SOURCE_DATE_EPOCH works sometimes. sometimes it is set to "1"
  2. sometimes there are source files older than 1980, and there are tons of workarounds (which suddenly become needed just after changing from src=fetchurl{...} to src=fetchgit{...} or src=./dev/dir)

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

I see, but "interpolation of path object" is not mandatory. The src does not have to be transferred via Nix Store or NAR archives.

Behaviour of nixlang paths is design choice to ensure that the instantiations hash all inputs and the builds are reproducible. There is no special handling for the src attribute, you can always pass an impure path as a string src = "/path/to/foo" like you did to implement your tarballDir. That will not work in sandbox and in remote builds.

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

That will not work in sandbox and in remote builds.

Not a showstopper at all: the whole tarballDir might be implemented in C++ as builtins.tarballDir (next to builtins.readDir). And it does not work in sandbox and in remote builds, simple because it works when nix-instantiate works, before nix-build with sandboxes and remote builds.

But it is implementation details, even src=./dev/dir is just a special case, not the keypoint of the problem, and can be removed from consideration if it is so confusing per se. The problem can be shown without src=./dev/dir at all: the two cases (preserving and erasing timestamps) can be shown on fetchurl vs. fetchgit as examples.

Both build types are reproducible, but they are different: in the first $SOURCE_DATE_EPOCH works and there are no problem with pre-1980 dates, in the second $SOURCE_DATE_EPOCH is always "1" and file timestamps are always in 1970 causing problems with Python, with browser caching etc.

Do we really need both?

I would keep only the first (postulating that filetimes are important part of the sources and thus unpacked sources should not be stored in Nix Store - (I also think that it would improve the Nix Store performance, as millions of small files are slow to chmod, chown, reset timestamps, calculate sha, deduplicate, delete ...))

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

The fetchurl creates a "flat" fixed-hash store path with a single file, and the dates are recovered from the data in that file by tar/unzip during the unpackPhase. The fetchgit creates a "recursive" fixed-hash store path (aka a directory) and the metadata for its files is stripped from the filesystem, so it is already gone by the time when the SOURCE_DATE_EPOCH logic kicks in.

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

the whole tarballDir might be implemented in C++ as builtins.tarballDir

I guess you could do that. But then you would need your server to serve files from inside that tarball?

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

The fetchurl creates a "flat" fixed-hash store path with a single file, and the dates are recovered from the data in that file by tar/unzip during the unpackPhase. The fetchgit creates a "recursive" fixed-hash store path (aka a directory) and the metadata for its files is stripped from the filesystem, so it is already gone by the time when the SOURCE_DATE_EPOCH logic kicks in.

Exactly!
That is the two cases I am trying to describe.

Do we need both (metadata-preserving and metadata-erasing) ways to work with the sources?

@kvtb

This comment has been minimized.

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

Hey, I did not say that, why do you edit my comments?

My bad, used wrong buttons

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

the whole tarballDir might be implemented in C++ as builtins.tarballDir

I guess you could do that. But then you would need your server to serve files from inside that tarball?

No, why?
fetchurl'ed tarballs are unpacked on unpackPhase, fetchgit'ed are copied.

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

the whole tarballDir might be implemented in C++ as builtins.tarballDir

I guess you could do that. But then you would need your server to serve files from inside that tarball?

No, why?
fetchurl'ed tarballs are unpacked on unpackPhase, fetchgit'ed are copied.

If you unpack them, move them to $out then Nix will just remove timestamps again.

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

No, why?
fetchurl'ed tarballs are unpacked on unpackPhase, fetchgit'ed are copied.

If unpack them, move them to $out then Nix will just remove timestamps again.

They moved not to $out, but to $NIX_BUILD_TOP to run patchPhase, configurePhase, etc.

Imagine the world like our, but fetchFromGitHub does not unpack tarballs and fetchgit makes tarballs after git clone (let keep src=./dev/dir case aside for a while as it might require some C++ coding).
Then all fetch* are consistent.

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

So you run your server inside the build?

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

So you run your server inside the build?

What server?

@veprbl
Copy link
Member

veprbl commented Feb 10, 2021

The one that allows you to experience the caching issues. Or what is the actual motivation for keeping the timestamps?

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

The one that allows you to experience the caching issues. Or what is the actual motivation for keeping the timestamps?

The motivations is to make all fetch* consistent regarding to preserve-timestamps or erase-timestamps.
I would agree with any of these outcomes.
Choosing one reduces the chaos.

I just see preserve-timestamps superior, because it makes $SOURCE_DATE_EPOCH meaningful (timestamp of the newest of the source files https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/setup-hooks/set-source-date-epoch-to-latest.sh, not "timestamp or 1 depending on your fetcher") and kills the year-1980 issue.

@kvtb
Copy link
Contributor Author

kvtb commented Feb 10, 2021

In one case set-source-date-epoch-to-latest.sh would gone as always producing "1", on another - all the year-1980 workarounds.
Cleanup and win-win, don't you agree?

@raboof
Copy link
Member

raboof commented Sep 20, 2023

Not for local directories, but if you're interested in SOURCE_DATE_EPOCH you might be interested in #256270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants