Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: make symlinks that refer to the same output relative #74253

Merged
merged 1 commit into from Jan 15, 2020

Conversation

@andir
Copy link
Member

andir commented Nov 26, 2019

Motivation for this change

While looking at the graph of all the outputs in my personal binary
cache it became obvious that we have a lot of self references within the
package set. That isn't an isssue by itself. However it increases the
size of the binary cache for every (reproducible) build of a package
that carries references to itself. You can no longer deduplicate the
outputs since they are all unique. One of the ways to get rid of (a few)
references is to rewrite all the symlinks that are currently used to be
relative symlinks. Two buildd of something that didn't really change but
carry each a self-reference can the be stored as the same NAR file again.

I quickly hacked together this change to see if that would yield and
success. My bash scripting skills are probably not great but so far it
seem to somewhat work.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change (only all of those in the tested set)
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @Ericson2314

@andir

This comment has been minimized.

Copy link
Member Author

andir commented Nov 26, 2019

Are there valid use cases where this is not desirable? I was thinking of providing an escape hatch if there are any...

@7c6f434c

This comment has been minimized.

Copy link
Member

7c6f434c commented Nov 26, 2019

Copy link
Contributor

B4dM4n left a comment

The concept sounds nice, but I found two issues in the bash script.

Are there valid use cases where this is not desirable? I was thinking of providing an escape hatch if there are any...

I can definitely see issues with scripts that expect an absolute path from readlink and will be broken by this change, so being able to disable this hook would be nice.

local relativeTarget
while IFS= read -r -d $'\0' f; do
symlinkTarget=$(readlink "$f")
if [ "${symlinkTarget:0:${#prefix}}" != "$prefix" ]; then

This comment has been minimized.

Copy link
@B4dM4n

B4dM4n Nov 26, 2019

Contributor

This check has the potential to produce false negatives when two outputs share the same prefix:

/nix/store
├── 1111-out
│  └── file -> /nix/store/1111-out2/1234
└── 1111-out2
   └── 1234

In this case /nix/store/1111-out/file would be rewritten to file -> 1234.

Suggested change
if [ "${symlinkTarget:0:${#prefix}}" != "$prefix" ]; then
if [[ "$symlinkTarget"/ != "$prefix"/* ]]; then

This comment has been minimized.

Copy link
@andir

andir Nov 26, 2019

Author Member

Thanks! While that technically should never happen within /nix/store that might be a good safeguard.

I'll add that to the next version.

relativeTarget="${symlinkTarget:${#prefix}}"
echo "rewriting symlink $f to be relative to $prefix"

rm "$f"
ln -rs "$prefix/$relativeTarget" "$f"
Comment on lines 16 to 20

This comment has been minimized.

Copy link
@B4dM4n

B4dM4n Nov 26, 2019

Contributor

The ln destination should be "$prefix/$f".
Never mind, f is already absolute. Updated code block below.

To simplify the process this block can be replaced with these ln flags:

        echo "rewriting symlink $f to be relative to $prefix"
        ln -snrf "$symlinkTarget" "$f"

as ln -r produces the minimal possible relative link.

@conferno

This comment has been minimized.

Copy link

conferno commented Nov 26, 2019

If we are going to find all symlinks in $out and resolve them, it would be nice to add a single code line which would fail the build with a meaningful error message if the target (file or dir at the location a symlink points to) does not exist.
There were a couple of bugs in nixpkgs history when dangling symlinks appeared after a blind version bump. It did not fail the build postponing the problem.

@conferno

This comment has been minimized.

Copy link

conferno commented Nov 26, 2019

Are there valid use cases where this is not desirable? I was thinking of providing an escape hatch if there are any...
I guess a weird enough upstream build system of a reverse depedency could copy something to its own installation directory, which was OK when added (back when it was a plain file), works with absolute symlinks but might break for relative ones with some flags? This sounds weird, but maybe something close is posible.

I have seen these bugs in the wild:

  1. relative readlink'ed path resolved using cwd as the base directory
  2. resolving chain of symlinks using the same base directory of the first symlink in the chain

Using absolute paths in symlinks brings some redundancy against those bugs

@7c6f434c

This comment has been minimized.

Copy link
Member

7c6f434c commented Nov 26, 2019

@andir

This comment has been minimized.

Copy link
Member Author

andir commented Nov 26, 2019

If we are going to find all symlinks and resolve them, it would be nice to fail the build with a meaningful error message if the target (file or dir at the location a symlink points to) does not exist.
… and also to provide and ways way to opt out of this behaviour. I do have some uses for symlinking to nonexistent files; for example when something wants to write to some path that is in store (but can be redirected to a proper location via a symlink)

Please note that I am only checking for references into the same output. So in general those shouldn't be the problem. You can still create arbitrary symlinks to directories that do not exist as long as they do not point into the output. Would that work for you?

I feel like the enforcement of valid symlinks is yet another topic compared to what I wanted to accomplish here.

@7c6f434c

This comment has been minimized.

Copy link
Member

7c6f434c commented Nov 26, 2019

@conferno

This comment has been minimized.

Copy link

conferno commented Nov 27, 2019

If we are going to find all symlinks and resolve them, it would be nice to fail the build with a meaningful error message if the target (file or dir at the location a symlink points to) does not exist.
… and also to provide and ways way to opt out of this behaviour. I do have some uses for symlinking to nonexistent files; for example when something wants to write to some path that is in store (but can be redirected to a proper location via a symlink)

Ok, then just print a warning visible in ofborg logs.

@conferno

This comment has been minimized.

Copy link

conferno commented Nov 27, 2019

I feel like the enforcement of valid symlinks is yet another topic compared to what I wanted to accomplish here.

Yes, it is another topic, but adding the warning is +1 line (and zero performance penalty) in your hook, while making a dedicated hook for the enforcement would be a controvercial issue mainly for its performace implications

@conferno

This comment has been minimized.

Copy link

conferno commented Nov 27, 2019

Ah, right, same-output symlinks stay broken or valid after installation, and the situation where existence of a broken symlink to a nonexistent location is preferrable to plain absence of a directory entry sounds like a small component part of an unspeakable horror.

Dangling symlinks within same output could be OK in derivations "built" by fetchzip or fetchgit (llvm sources has some among the tests), so yes, failing hard is not acceptable, but printing a warning would be nice

@andir andir force-pushed the andir:relative-links branch from 11a151e to c6c26e7 Nov 27, 2019
@andir

This comment has been minimized.

Copy link
Member Author

andir commented Nov 27, 2019

I added some of the review comments. I removed the hard failure on dangling symlinks and just left the warning in there. Does that make sense?

I think failing on invalid symlinks is really a more complicated topic that we shouldn't try to solve right now. Warning is probably okay but failing is yet another different topic from this approach. We can add it to this hook in another PR once there is consensus about the desired behavior.

@andir

This comment has been minimized.

Copy link
Member Author

andir commented Nov 27, 2019

@GrahamcOfBorg build bash

continue
fi

if [ ! -e "$symlinkTarget" ]; then

This comment has been minimized.

Copy link
@conferno

conferno Nov 28, 2019

I think this block (lines 20-22) should preceed the block above (lines 15-18) so sanity of external symlinks will be checked too.

This comment has been minimized.

Copy link
@andir

andir Nov 28, 2019

Author Member

I don't think that is a good idea. A brief look at nixpkgs showed that a bunch of packages point symlinks at /etc/$foo/ or to /run/opengl-drivers and those clearly do not exist in the build sandbox. Expecting every single of those packages to opt out of this is not really a great solution. This should be an incremental change and not a roadblock.

This comment has been minimized.

Copy link
@conferno

conferno Nov 28, 2019

it is only printing a warning.
symlinks to /etc/$foo/ also deserve to be warned about, it is a potential problem with Nix-on-{Debian,Fedora,Gentoo,...} where /etc/$foo/ might differ from that on NixOS

@FRidh FRidh added this to Needs review in Staging Nov 30, 2019
@FRidh FRidh added this to the 20.03 milestone Dec 10, 2019
@FRidh

This comment has been minimized.

Copy link
Member

FRidh commented Jan 6, 2020

Can this go in?

While looking at the graph of all the outputs in my personal binary
cache it became obvious that we have a lot of self references within the
package set. That isn't an isuse by itself. However it increases the
size of the binary cache for every (reproducible) build of a package
that carries references to itself. You can no longer deduplicate the
outputs since they are all unique. One of the ways to get rid of (a few)
references is to rewrite all the symlinks that are currently used to be
relative symlinks. Two build of something that didn't really change but
carries a self-reference can the be store as the same NAR file again.

I quickly hacked together this change to see if that would yield and
success. My bash scripting skills are probably not great but so far it
seem to somewhat work.
@andir andir force-pushed the andir:relative-links branch from c6c26e7 to 437b024 Jan 6, 2020
@andir

This comment has been minimized.

Copy link
Member Author

andir commented Jan 6, 2020

@FRidh Looks good to me :-) If @Ericson2314 is now happy I see no reason why we can't merge this.

@FRidh FRidh requested a review from Ericson2314 Jan 6, 2020
@FRidh FRidh merged commit cb007e6 into NixOS:staging Jan 15, 2020
16 checks passed
16 checks passed
stdenv on x86_64-darwin Timed out, unknown build status
Details
Evaluation Performance Report Evaluator Performance Report
Details
grahamcofborg-eval ^.^!
Details
grahamcofborg-eval-check-maintainers matching changed paths to changed attrs...
Details
grahamcofborg-eval-check-meta config.nix: checkMeta = true
Details
grahamcofborg-eval-darwin nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A darwin-tested
Details
grahamcofborg-eval-nixos nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release-combined.nix -A tested
Details
grahamcofborg-eval-nixos-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release.nix -A manual
Details
grahamcofborg-eval-nixos-options nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release.nix -A options
Details
grahamcofborg-eval-nixpkgs-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A manual
Details
grahamcofborg-eval-nixpkgs-tarball nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A tarball
Details
grahamcofborg-eval-nixpkgs-unstable-jobset nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A unstable
Details
grahamcofborg-eval-package-list nix-env -qa --json --file .
Details
grahamcofborg-eval-package-list-no-aliases nix-env -qa --json --file . --arg config { allowAliases = false; }
Details
stdenv on aarch64-linux Success
Details
stdenv on x86_64-linux Success
Details
Staging automation moved this from Needs review to Done Jan 15, 2020
@andir

This comment has been minimized.

Copy link
Member Author

andir commented Jan 15, 2020

Thanks @FRidh !

@Ericson2314

This comment has been minimized.

Copy link
Member

Ericson2314 commented Jan 17, 2020

Woo! Sorry I didn't have the time to vet this further myself. Thanks @FRidh and @andir

@andir andir deleted the andir:relative-links branch Jan 29, 2020
@bkchr bkchr mentioned this pull request Feb 3, 2020
3 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Staging
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.