Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic initramfs #21273

Closed
wants to merge 3 commits into from
Closed

Conversation

joachifm
Copy link
Contributor

Extracted from #2281, some details in commit messages.

@mention-bot
Copy link

@joachifm, thanks for your PR! By analyzing the history of the files in this pull request, we identified @abbradar, @edolstra and @alexanderkjeldaas to be potential reviewers.

Copy link
Member

@abbradar abbradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the Perl change does, so can't review it.

@@ -39,7 +39,7 @@ mkdir -p $out
for PREP in $prepend; do
cat $PREP >> $out/initrd
done
(cd root && find * -print0 | cpio -o -H newc -R 0:0 --null | perl $cpioClean | $compressor >> $out/initrd)
(cd root && find * -print0 | sort -z | cpio -o -H newc -R 0.0 --null | perl $cpioClean | $compressor > $out/initrd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is >> changed to > here? Also, why change : to . (though this shouldn't change semantics)? BTW, from cpio documentation:

              To  avoid  the lookup and ensure that arguments are treated as numeric values, prefix them with a plus
              sign, e.g.: -R +0:+0.

Can't hurt either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I failed to notice those minute differences while squashing; I'll ensure that the squashed version is in line with the modern code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the change from >> to > would break the prepend stuff. Nice catch!

@abbradar abbradar added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS and removed 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS labels Dec 18, 2016
@abbradar
Copy link
Member

abbradar commented Dec 18, 2016

Seems good to me now! I also vaguely get what does the Perl code do now -- seems OK for my limited understanding.

Personally I'm glad we finally started to work on merging this big PR, at least parts -- thanks!

EDIT: ninja'd you -- your description matches mine thoughts.

@joachifm
Copy link
Contributor Author

@abbradar as far as I can tell, the change to clean-cpio is to make explicit that entries are assigned sequential inode numbers based on input order and not taking into account hardlinking at all. If B is a hardlink to A, they'd receive different inode numbers, whereas ordinarily they'd have the same inode (I guess).

@joachifm
Copy link
Contributor Author

joachifm commented Dec 18, 2016

Then again #2281 (comment) mentions boot problems due to inode reassignment. Hrm, I can't see any addendum to that comment (but the thread is large, so I might be missing something).
EDIT: perhaps they're referring to the changes to perl itself, makes a bit more sense :)

@joachifm
Copy link
Contributor Author

Nevermind, I think I simply misunderstood the linked comment.

@abbradar
Copy link
Member

abbradar commented Dec 18, 2016

This comment is bad news, either we need to ask @alexanderkjeldaas what boot problems did he have with that patch, or do extensive testing by ourselves. I'm willing to volunteer and cherry-pick it later when I have time, but maybe you need some complex initrd to bump into those problems.

From your description I'm not sure what would "having hardlinks have different inode numbers" mean in CPIO. If those were files, that would mean you have essentially made a copy (if I'm incorrect, please point that out!). Would it work that way here automatically, just by changing inode number in some map? I'm pointing to that because if there are indeed problems maybe it has something to do with them.

(to the new comment) Why do you think you have misunderstood it? It seems to mean boot problems with this patch to me.

@joachifm
Copy link
Contributor Author

joachifm commented Dec 19, 2016

Regarding inode numbers, https://github.com/libarchive/libarchive/wiki/ManPageCpio5 says

dev, ino
The device and inode numbers from the disk. These are used by programs that read cpio archives to determine when two entries refer to the same file. Programs that synthesize cpio archives should be careful to set these to distinct values for each entry.

Thus, setting different ino would indicate that two files are distinct even if they were originally hardlinked (i.e., the same file).

It also says

Hardlinked files are not given special treatment; the full file contents are included with each copy of the file.

My reading of that is that the only difference is in the archive header, not the actual payload.

(In the description of the new ascii header format it says hardlinked files are handled by setting the filesize to 0; I assume they're referring to the filesize header field and that the actual files themselves are still included as before).

(EDIT: ofc, seeing as we're creating files using the new ascii header format, it seems like messing with inos to avoid stuff being marked as hardlinks would be the wrong way to go about it, but then I guess that's not the primary purpose of the remapping).

@joachifm
Copy link
Contributor Author

joachifm commented Dec 19, 2016

Anyway, I think it's prudent to hold off on this until somebody gives a proper explanation.

@alexanderkjeldaas
Copy link
Contributor

The > vs >> is a bug.

I might remember wrong, but I think I first tried setting ino == 0 on all files in the cleanup, but then a more conservative fix is to just remap all inodes from whatever they are in the file-system to a consecutive range starting from 1.

So the question is just whether having a slightly invalid cpio file with ino == 0 works because nothing really uses this information, or whether a mapping like what this code does, is needed.

I don't remember exactly at which point I had boot issues, and whether they were exactly related to this issue.

@alexanderkjeldaas
Copy link
Contributor

@joachifm

as far as I can tell, the change to clean-cpio is to make explicit that entries are assigned sequential inode numbers based on input order and not taking into account hardlinking at all. If B is a hardlink to A, they'd receive different inode numbers, whereas ordinarily they'd have the same inode (I guess).

No, this line:
$ino_remap{$e->{inode}} = $ino++ unless exists $ino_remap{$e->{inode}};

only allocates a new inode number if none has been allocated for $e->{inode} previously. So if B and A are hardlinks, they should be hardlinks in the remapped version as well.

That's the idea, at least.

@abbradar
Copy link
Member

@alexanderkjeldaas Thank you for your answers! Between this and re-reading the code again after a fresh sleep this seems pretty safe to me.

@joachifm
Copy link
Contributor Author

@alexanderkjeldaas cool, but just so I can understand better: is it just about sequential numbering? I figured it had to do with hardlinking due to the mention of nlink. How does nlink figure into it?

@edolstra
Copy link
Member

Have you tried cpio --reproducible? It might make the Perl script unnecessary. Maybe--renumber-inodes is also necessary.

@joachifm
Copy link
Contributor Author

--reproducible alone appears insufficient; it works when combined with cpio-clean.

@joachifm
Copy link
Contributor Author

Anyway, I've cleaned up the commits, so if we're happy with this, it should be ready to go.

@alexanderkjeldaas
Copy link
Contributor

@edolstra I don't think I've seen --renumber-inodes before.

@joachifm where do you see this nlink thing? If the inode number is the same, then by definition it is a hardlink.

@joachifm
Copy link
Contributor Author

@alexanderkjeldaas the original commit message said "Do not set nlink when cleaning cpio archives", I assumed it referred to the nlink field.

@alexanderkjeldaas
Copy link
Contributor

alexanderkjeldaas commented Dec 19, 2016 via email

alexanderkjeldaas and others added 3 commits December 19, 2016 23:59
Taken from
NixOS@1462666

This allows
```sh
nix-build nixos --check --arg configuration '{ ... }: { }' -A config.system.build.initialRamdisk
```
to succeed.

Note that `cpio --reproducible` (which implies `--renumber-inodes`) in
combination with the previous version of `cpio-clean.pl` also makes the
build deterministic, but `--reproducible` by itself appears
insufficient.  Thus we might as well do the renumbering in perl.
joachifm: the original commits were
NixOS@4156801
NixOS@3d28eb3
I squashed them together and updated the commit message.

Note that it appears that it is not strictly necessary to sort the files
for `nix-build --check` to succeed, but it can't hurt either.
@joachifm
Copy link
Contributor Author

@alexanderkjeldaas thank you for clearing that up for me; I interpreted the message to mean that renumbering somehow indirectly affected nlink, hence my confusion :) I have amended the commit message accordingly.

@dezgeg
Copy link
Contributor

dezgeg commented Dec 20, 2016

I guess the remaining part besides cpio --reproducible is the mtimes. This works for me at least (and gets rid of cpio-clean.pl):

diff --git a/pkgs/build-support/kernel/make-initrd.sh b/pkgs/build-support/kernel/make-initrd.sh
index 89021ca..a9be7e5 100644
--- a/pkgs/build-support/kernel/make-initrd.sh
+++ b/pkgs/build-support/kernel/make-initrd.sh
@@ -39,7 +39,9 @@ mkdir -p $out
 for PREP in $prepend; do
   cat $PREP >> $out/initrd
 done
-(cd root && find * -print0 | cpio -o -H newc -R 0:0 --null | perl $cpioClean | $compressor >> $out/initrd)
+
+find root -print0 | xargs -0r touch --no-dereference --date="@1"
+(cd root && find * -print0 | sort -z | cpio -o -H newc -R +0:+0 --reproducible --null | $compressor >> $out/initrd)
 
 if [ -n "$makeUInitrd" ]; then
     mv $out/initrd $out/initrd.gz

Stolen from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845034.

@joachifm
Copy link
Contributor Author

@dezgeg cool, perhaps you'd care to commit something to that effect, then this PR could be closed in deference to it.

joachifm added a commit to joachifm/nixpkgs that referenced this pull request Dec 20, 2016
To achieve reproducible results, `cpio` archive members are added in
sorted order and inodes renumbered.

The `cpio-clean.pl` script is made obsolete by setting mtimes via
`touch` & using `cpio --reproducible`.  Suggested by @dezgeg in
NixOS#21273 (comment).

Note that using `--reproducible` means that initial ramdisk creation now
requires at least `cpio` version 2.12 (released in 2015).
@joachifm
Copy link
Contributor Author

@dezgeg I took the liberty of implementing your proposal in a different PR; I'll merge that one instead of this. Hope that's okay, if not feel free to commit your own version :)

@joachifm joachifm closed this Dec 20, 2016
joachifm added a commit that referenced this pull request Dec 20, 2016
To achieve reproducible results, `cpio` archive members are added in
sorted order and inodes renumbered.

The `cpio-clean.pl` script is made obsolete by setting mtimes via
`touch` & using `cpio --reproducible`.  Suggested by @dezgeg in
#21273 (comment).

Note that using `--reproducible` means that initial ramdisk creation now
requires at least `cpio` version 2.12 (released in 2015).
@joachifm joachifm deleted the deterministic-initramfs branch December 20, 2016 13:35
@joachifm joachifm restored the deterministic-initramfs branch December 20, 2016 13:36
@joachifm joachifm deleted the deterministic-initramfs branch March 13, 2017 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants