Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow some derivations to hardlink to other files in the store #1272

Open
Ekleog opened this issue Mar 11, 2017 · 11 comments
Open

Allow some derivations to hardlink to other files in the store #1272

Ekleog opened this issue Mar 11, 2017 · 11 comments
Assignees
Labels
feature Feature request or proposal performance

Comments

@Ekleog
Copy link
Member

Ekleog commented Mar 11, 2017

Context

I am currently writing a nixos module that allows to easily generate VMs, and need a way to pass to the guest its store and only it (not giving it full access to the store so that he cannot see secrets that could be in there).

I could have gone with mount --bind, as is done for derivation building, but making this a permanent choice with ~1k bind-mounts per VM seems really unsustainable.

So I chose to generate the VM's store in a derivation, and to give this derivation to the guest as though it was its store (this being the less bad of the ways I could think of doing it).

Issue

In order to do this I'd have liked to just hardlink the required derivations, instead of copying everything and waiting for nix-store --optimize to come and remove the copies and replace them with hardlinks that I could have done from the beginning.

This would reduce disk dereliction and a lot less time would be spent copying things that will anyways be hardlinked later.

However, derivation building seems to happen in an environment where its buildInputs are mount --bind, which means hardlinks are impossible as the vfs driver doesn't recognize they are on the same underlying filesystem.

Proposed solution

Add a derivation option that requests direct access to /nix/store, not through a mount --bind "sandbox" (I tried both with nix.useSandbox = true; and nix.useSandbox = false;, and it seems to happen anyway, so I guess that's not what's called sandbox in nix vernacular).

What do you think about this? Is it too narrow a use case to deserve such a change?

@stale
Copy link

stale bot commented Feb 15, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 15, 2021
@Ekleog
Copy link
Member Author

Ekleog commented Feb 17, 2021

Still important

@stale stale bot removed the stale label Feb 17, 2021
@thufschmitt
Copy link
Member

Might be a wrong trail, but at a glance it looks like https://github.com/grahamc/netboot.nix is solving a pretty similar issue using recursive-nix. Might be worth a try

@stale
Copy link

stale bot commented Aug 18, 2021

I marked this as stale due to inactivity. → More info

@VanCoding
Copy link

I think it would be really good to have this feature.
Sometimes we want to copy files from one derivation to a dependent derivation.
At this point, we already know that the content will be completely identical and we could just make a hardlink right there instead of copying first and then wait for nix to figure this out by itself.

One use-case that I currently have in mind is building something like a "nix-native" version of PNPM. PNPM makes heavy use of hardlinks to be fast and disk-space efficient. In Node.js, you often need the same package, but wired up with different versions of its dependencies, and the only way to not copy the whole package is to hardlink all of its files. Symlinks won't work for this.

I have 2 additional ideas how we could achieve hardlinking support:

  1. Nix could provide a special hard-link command that could be called to create hardlinks
  2. We could write all hardlinks that we want to be created as a list to a file inside the derivation. After the derivation is built, nix looks at that file and then creates these hardlinks on its own. Or maybe there's some way to pass this list to nix directly, without writing it into a file.

Both of these approaches would have the benefit of nix knowing exactly which files of a derivation hardlink to other derivations. This could be useful information when copying around store paths (but maybe this is already handled very well).

What do you think?

@Dessix
Copy link

Dessix commented Jun 19, 2023

Wait, Nix doesn't do this on its own already? Is there some other pattern we're supposed to be using for the moment, like symlink-trees to parent derivations? I assumed derivations that add or remove one file were effectively just UnionFS-like projections, to allow for lightweight dependent derivations.

@flokli
Copy link
Contributor

flokli commented Dec 9, 2023

I don't think this is actual a beneficial feature to have. Whether something is using the same inode or not is an implementation detail of the filesystem, and --optimize doing hardlinks an implementation detail as well.

Even without it being different inodes, you filesystem might already have decided to deduplicate the underlying data internally (--reflink style).

Inside the build, you shouldn't have any assumptions about being on the same filesystem as your other store paths, and during substitution, you don't have a way to signal this points to data similar to somewhere else either.

I'd leave this up to the nix store layer, it could do some deduplication post-build, but I would not expose / use more builder sandbox internals.

@VanCoding
Copy link

@flokli I agree that it may not be a good idea to allow creating actual links, because in the build we should not make any assumptions about how this all is going to be stored.

But it could still be beneficial to be able to tell nix "hey, I want to put a file here that's exactly the same as the file from this other derivation". Then the store layer could use this information to improve performance upfront, because it could save on hashing or unnecessary copying and comparing the contents.

@flokli
Copy link
Contributor

flokli commented Dec 11, 2023

Then the store layer could use this information to improve performance upfront, because it could save on hashing or unnecessary copying and comparing the contents.

I don't think it matters. The build exposes a filesystem that the build process can write to, post-build we must feed all contents in the right order into sha256 to calculate the narhash, so we need to traverse all contents anyways.

If you're copying files from another store path and make it easy for the filesystem to deduplicate, best you can do is probably copy with cp --reflink=auto - that should perform a lightweight copy if it's the same filesystem, and if the filesystem supports it, but falls back to a regular copy otherwise.

@VanCoding
Copy link

@flokl I see... but for calculating the sha256 it's only required to read the file, and not write it. But yeah, having to re-hash the files is not optimal. In theory, if the files that are being linked are all known upfront, before the build of the derivation even starts, it'd be sufficient to feed a list of their paths into the narhash, no?

I really see that for a lot of scenarios it would be better to solve this outside of nix, but for some scenarios like a PNPM-like package manager that uses the nix-store, it could be useful. At least if you we don't want to tell everybody which filesystem or store-layer to use.

@flokli
Copy link
Contributor

flokli commented Dec 11, 2023

There's no primitive to copy things around that is not a build - other than maybe builtins.filterSource, though that's another usecase and doesn't allow moving things.

For everything that is a build, the opportunistic relink copy seems the least annoying method, and requires no changes in Nix itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal performance
Projects
None yet
Development

No branches or pull requests

9 participants