Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible squashfs images: hard links #114331

Closed
raboof opened this issue Feb 25, 2021 · 7 comments
Closed

reproducible squashfs images: hard links #114331

raboof opened this issue Feb 25, 2021 · 7 comments
Assignees

Comments

@raboof
Copy link
Member

raboof commented Feb 25, 2021

When building jfsutils locally, fsck.jfs and jfs_fsck are hard-linked to each other:

$ nix-store --delete /nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15
$ nix-build '<nixpkgs>' -A jfsutils --option substitute false
(...)
$ ls -il /nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15/bin
total 1612
18095189 -r-xr-xr-x 2 root root 439944 Jan  1  1970 fsck.jfs
18095202 -r-xr-xr-x 1 root root 103256 Jan  1  1970 jfs_debugfs
18095189 -r-xr-xr-x 2 root root 439944 Jan  1  1970 jfs_fsck
18095197 -r-xr-xr-x 1 root root 214536 Jan  1  1970 jfs_fscklog
18095199 -r-xr-xr-x 1 root root 263376 Jan  1  1970 jfs_logdump
18095200 -r-xr-xr-x 2 root root  66648 Jan  1  1970 jfs_mkfs
18095195 -r-xr-xr-x 1 root root  36328 Jan  1  1970 jfs_tune
18095200 -r-xr-xr-x 2 root root  66648 Jan  1  1970 mkfs.jfs

However, when fetching the same package from the binary cache, they are separate files:

$ nix-store --delete /nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15
$ nix-build '<nixpkgs>' -A jfsutils
these paths will be fetched (0.18 MiB download, 1.57 MiB unpacked):
  /nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15
copying path '/nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15' from 'https://cache.nixos.org'...
/nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15
$ ls -il /nix/store/8il80j7689ssgwnw55wz5zw2hrls9lv6-jfsutils-1.1.15/bin
total 1612
17229251 -r-xr-xr-x 1 root root 439944 Jan  1  1970 fsck.jfs
17229271 -r-xr-xr-x 1 root root 103256 Jan  1  1970 jfs_debugfs
17229273 -r-xr-xr-x 1 root root 439944 Jan  1  1970 jfs_fsck
17229274 -r-xr-xr-x 1 root root 214536 Jan  1  1970 jfs_fscklog
17229275 -r-xr-xr-x 1 root root 263376 Jan  1  1970 jfs_logdump
17229276 -r-xr-xr-x 1 root root  66648 Jan  1  1970 jfs_mkfs
17229277 -r-xr-xr-x 1 root root  36328 Jan  1  1970 jfs_tune
17229278 -r-xr-xr-x 1 root root  66648 Jan  1  1970 mkfs.jfs

This would not be a problem, except that when creating a squashfs image containing this path, the image will also be different depending on whether these files are hard-linked or separate, making (for example) our installation ISO's not bit-by-bit reproducible: the image will be different depending on whether jfsutils was fetched from the cache or built locally.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/what-are-your-goals-for-21-05/11559/4

@raboof raboof self-assigned this Feb 25, 2021
@raboof
Copy link
Member Author

raboof commented Feb 25, 2021

confirmed with nix-store --dump and nix-store --restore that the NAR format indeed doesn't preserve hard links.

I don't see a way to tell mksquashfs to ignore hard links (--no-xattrs doesn't help), so it might be reasonable to avoid having them in /nix/store entirely. I think we should make nix-build --check fail on them.

Whether it's better to replace a hard link with a symlink or with a duplicate file depends on the context, I think, so we probably shouldn't have a auto-hook but instead fix this on a per-package basis.

raboof added a commit to raboof/nixpkgs that referenced this issue Feb 25, 2021
The NAR archive format we use for cache.nixos.org does not preserve the
information that some files might be hard links to each other.

However, the squashfs format does preserve this information, which means
when creating a squashfs image containing nix store paths that contain
hard links, the squashfs image would be different depending on whether
that nix store path was fetched from cache.nixos.org or built locally.

For this reason I think we should avoid hard links in the nix store.
For more background see NixOS#114331
raboof added a commit to raboof/nixpkgs that referenced this issue Feb 25, 2021
The NAR archive format we use for cache.nixos.org does not preserve the
information that some files might be hard links to each other.

However, the squashfs format does preserve this information, which means
when creating a squashfs image containing nix store paths that contain
hard links, the squashfs image would be different depending on whether
that nix store path was fetched from cache.nixos.org or built locally.

For this reason I think we should avoid hard links in the nix store.
For more background see NixOS#114331
@raboof raboof mentioned this issue Feb 25, 2021
10 tasks
@primeos
Copy link
Member

primeos commented Feb 25, 2021

However, when fetching the same package from the binary cache, they are separate files:

This is also interesting IMO. Not sure where this difference comes from.

confirmed with nix-store --dump and nix-store --restore that the NAR format indeed doesn't preserve hard links.

That explains it then and could be considered another problem (the Nix store FS would still have to support hard-links).

Whether it's better to replace a hard link with a symlink or with a duplicate file depends on the context, I think, so we probably shouldn't have a auto-hook but instead fix this on a per-package basis.

I wonder if that's a complete solution because we have nix.autoOptimiseStore:

If set to true, Nix automatically detects files in the store that have identical contents, and replaces them with hard links to a single copy. This saves disk space. If set to false (the default), you can still run nix-store --optimise to get rid of duplicate files.

So I assume an optimised Nix store would replace copies with hard-links but the symlink approach should still work.
And unfortunately this also means that we cannot avoid hard-links in the Nix store (without "breaking" changes) and that nix.autoOptimiseStore=true will likely introduce additional r13y issues.

I don't see a way to tell mksquashfs to ignore hard links (--no-xattrs doesn't help), so it might be reasonable to avoid having them in /nix/store entirely. I think we should make nix-build --check fail on them.

I wonder if we should contact upstream regarding this as this shouldn't only affect Nixpkgs (@raboof if you don't have time/motivation I could give it a try). From a quick look adding a flag to ignore hard links for r13y could be as easy as adding a conditional here: https://github.com/plougher/squashfs-tools/blob/57930cf4a1dc18a2cad221df40036cd801f85a09/squashfs-tools/mksquashfs.c#L2759 (though it could also break everything).

@raboof anyway, this isn't an objection, feel free to continue with your approach, I'm mainly writing this because it seems like a good idea to "fix" this in mksquashfs as well. And huge thanks for all of your amazing and valuable r13y contributions! :)

@plougher
Copy link

This isn't really a Squashfs issue, as Mksquashfs does exactly what it is supposed to do. If the source has hardlinks then Mksquashfs preserves them.

But, I can see a reason why the source might use hardlinks (saving space), and where it might be desirable to instead store them as duplicates in the Squashfs image, as Squashfs doesn't need to hardlink them to save space.

So I have added a -no-hardlinks option here

plougher/squashfs-tools@c37bb4d

@raboof
Copy link
Member Author

raboof commented Feb 26, 2021

I wonder if that's a complete solution because we have nix.autoOptimiseStore

Great point, I didn't realize that.

This isn't really a Squashfs issue, as Mksquashfs does exactly what it is supposed to do

I agree

But, I can see a reason why the source might use hardlinks (saving space), and where it might be desirable to instead store them as duplicates in the Squashfs image, as Squashfs doesn't need to hardlink them to save space. So I have added a -no-hardlinks option here

Awesome, thanks! With that we can go back to pretending hardlinks are an 'invisible' optimization, which probably makes most sense in the end.

@primeos
Copy link
Member

primeos commented Feb 26, 2021

This isn't really a Squashfs issue, as Mksquashfs does exactly what it is supposed to do.

Sorry, I didn't mean to imply that and agree that there's nothing wrong with the mksquashfs implementation (I've quoted "fix" because I didn't consider it a bug but my brief text is of course not clear enough / misleading and that's my fault - I just didn't know that you'll read it before I'd open a proper upstream "issue").
Anyway, my aim was only to suggest this as a new feature as it could help with reproducibility issues that originate outside of mksquashfs control (in this case our inability to preserve hard-links in our Nix ARchives (NARs) and outside of Nixpkgs e.g. in cases where the hard-links are lost while extracting a tar archive to a filesystem that doesn't support them - none of this would be mksquashfs's fault at all but IMO for r13y it's the easiest solution/workaround to resolve this issue when generating the image by "normalizing" the files there).

So I have added a -no-hardlinks option here

@plougher Wow, that's totally awesome, thank you very very much! :)

raboof added a commit to raboof/nixpkgs that referenced this issue Feb 27, 2021
the nix store may contain hardlinks: derivations may output them
directly, or users may be using store optimization which automatically
hardlinks identical files in the nix store.

The presence of these links are intended to be a 'transparent'
optimization. However, when creating a squashfs image, the image
will be different depending on whether hard links were present
on the filesystem, leading to reproducibility problems.

By passing '-no-hardlinks' to mksquashfs the files are stored
as duplicates in the squashfs image. Since squashfs has support
for duplicate files this does not lead to a larger image.

For more details see
NixOS#114331
zimbatm pushed a commit that referenced this issue Feb 28, 2021
the nix store may contain hardlinks: derivations may output them
directly, or users may be using store optimization which automatically
hardlinks identical files in the nix store.

The presence of these links are intended to be a 'transparent'
optimization. However, when creating a squashfs image, the image
will be different depending on whether hard links were present
on the filesystem, leading to reproducibility problems.

By passing '-no-hardlinks' to mksquashfs the files are stored
as duplicates in the squashfs image. Since squashfs has support
for duplicate files this does not lead to a larger image.

For more details see
#114331
@raboof
Copy link
Member Author

raboof commented Mar 4, 2021

fixed in #114454

@raboof raboof closed this as completed Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants