Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeInitrdNG: malformed squashfs images when filesize exceeds 2GB #203593

Closed
jhvst opened this issue Nov 29, 2022 · 12 comments
Closed

makeInitrdNG: malformed squashfs images when filesize exceeds 2GB #203593

jhvst opened this issue Nov 29, 2022 · 12 comments
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@jhvst
Copy link
Contributor

jhvst commented Nov 29, 2022

Describe the bug

If you create initrd files over 2GB, for example, with this configuration and try to boot it, the bootup will fail when mounting the squashfs image in init-1-stage with an error squashfs error unable to read id index table. As suggested, e.g., by #26230, this is indeed caused by data corruption. However, the corruption does not happen in transport, it happens when creating the initrd. Furthermore, and what's the real bug, is that the data corruption only happens when the resulting initrd exceeds 2GB in size. The data corruption can be verified by comparing the sha256sum of the squashfs images: first on the computer that build the image (i.e., from nix store path echoed on the final parts of the build process), and then on the computer that tries to boot the image from the initrd root folder in emergency shell. The files will be exact in size, but differ in their hash signature. If one is to use the emergency shell to fetch the squashfs image from the nix store and manually initialize init-2-stage, the OS will boot successfully.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Build an initrd file which exceeds 2GB in size. For example, with my configuration run nix-build -A pix.ipxe nvidia.nix -I home-manager=https://github.com/nix-community/home-manager/archive/master.tar.gz -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/refs/heads/nixos-unstable.zip. Alternatively, you can download a pre-built version of mine.
  2. Either modify the kernelModules in the Nix config file to include drivers necessary you to fetch the original squashfs image. This would be keyabord, filesystem drivers, and/or network drivers. Alternatively, you can download a kernel that I used.
  3. Ensure you are running latest Linux kernel on your current system. We are going to use kexec, which only recently supports files over 2GB in size. linux_latest from nixpkgs will do.
  4. In the result folder from step 2, you will find a file called kexec-boot. If you modified the Nix configuration file to include kernelModules for your system, you can execute this script. If you decided to use my prebuilt images, run kexec --load phasedKernel --initrd=initrd -c "boot.shell_on_fail". Then, when you are ready to halt the system, run kexec -e.
  5. During the bootup, you should see the squashfs error described. Press f for emergency shell. Then, you will find the malformed squashfs from the root folder. Check its sha256sum. This should differ from your self-built image, or the file at http://boot.ponkila.com/squashfs.img. Bug reproduced. Done.
  6. You can continue with the bootup process by acquiring the original squashfs image used in the built process. I store mine here. With the original squashfs acquired, delete the one found from initrd, and rename the downloaded/mounted squashfs with the same name as the original file. Then, run ./init or prepare the filesystem manually to eventually launch switch_root. You can refer to this document for more details. Finally, the OS will boot successfully.

Expected behavior

I expect it's fine if my initrd is over 2GB. Currently, it's not: I'm unable to boot. Kexec shows that this is not BIOS related issue.

Screenshots

N/A

Additional context

I wrote more details here. I have triaged this issue a lot, from cancelling out BusyBox issues, to UEFI compatibility, to RAM running out, and to kernel configuration issues with tmpfs files over 2GB. I have looked at the current implementation of the makeInitrdNG, which I believe is at least related to the bug, but I cannot see how this issue could arise from the current Rust code. This bug does not seem to be an issue with any of the userspace or kernel code, hence, it's specific to the way the images are built on Nix.

This issue is probably not very high in priority -- in practice, the 2GB limit can be circumvented by modifying the init-1-stage script to download a rootfs over the Internet, much like what is done in reproducing this bug. However, I don't think this issue should exist unless there is something awry happening in the initrd build process, hence, should probably taken a look at.

Notify maintainers

@dasJ @ElvishJerricco @K900 @lheckemann

Metadata

N/A

@jhvst jhvst added the 0.kind: bug Something is broken label Nov 29, 2022
@K900
Copy link
Contributor

K900 commented Nov 29, 2022

The Rust bits don't touch the contents of the initrd at all, they're just copied and then fed to cpio. I wonder if you're hitting some cpio bug.

@jhvst
Copy link
Contributor Author

jhvst commented Nov 29, 2022

Sound plausible given that cpio is quite old to my understanding. Bit more context from my triaging: the Linux kernel code for kexec had a bug/feature regarding the 2GB size due to use of the int datatype in the read_file code: https://lore.kernel.org/lkml/20220527025535.3953665-2-pasha.tatashin@soleen.com/T/

@K900 do you think this issue should be kept open until the issue is found from the dependencies, or closed as unrelated to NixOS?

@K900
Copy link
Contributor

K900 commented Nov 29, 2022

I think we should at least look into this and maybe fix our cpio build to make it work.

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Dec 5, 2022
@tupakkatapa
Copy link

Found a lot of threads where people discuss this and the magic value of 2GB where problems start to occur. Then I found the following, which actually makes a lot of sense:

"Downloads are placed in the 32-bit address space (i.e., below 4GB). It is very plausible that a large chunk of this address space is allocated for PCI BARs, and so you may have only ~2GB of actual RAM within this address space."
https://ipxe-devel.ipxe.narkive.com/YA1ZoMfx/size-limit-in-ipxe

Unfortunately, I think it is game over regarding this; we are stuck with the squashfs method. Or what do you guys think?

@lheckemann
Copy link
Member

Yeah, the only solution I see for that is a more complex setup where only a small initrd is fetched at PXE time and that initrd then obtains the Nix store squashfs (or otherwise getting the nix store -- depending on the use case nfs or similar might make sense too and allow for faster boots) from the netboot server some other way. That would require setting up networking and stuff in the initrd though.

@tupakkatapa
Copy link

That is exactly what we are currently using as a workaround. It required some changes to the stage-1-init script: https://github.com/majbacka-labs/nixpkgs/commits/patch-init1sh. It should be stated that this does not have anything to do with fixing kexec, which is annoying since I would like to have both functionalities for the same output format.

If you are also convinced that this does not directly relate to nixpkgs, as far as I am concerned, this issue should be closed.

@jhvst
Copy link
Contributor Author

jhvst commented Mar 23, 2024 via email

@ElvishJerricco
Copy link
Contributor

@jhvst Isn't the argument that the initrd itself, which is loaded before the kernel starts, is too big because it contains the squashfs? i.e. The problem is not mounting the squashfs; it's loading the initrd in the first place.

@jhvst
Copy link
Contributor Author

jhvst commented Mar 26, 2024

I do not think so, but I may be wrong: the shell_on_fail boot option drops you into a initrd environment, right? The bootup certainly works to this stage. Moreover, as the squashfs image can be fetched from the network and then continued successfully while in the supposed initrd environment (and we do this quite often with @tupakkatapa, as we daily-drive our patchset which implements the workaround suggestion of @lheckemann above) it seems that the initrd is actually fine, but the problem is with the squashfs file which extends over 2GB limit. It might of course be possible that the initrd is somehow pruned to an extent, but it affecting only one file but not the integrity of anything else seems unlikely to me given the current information.

FWIW, we wish to eventually upstream the changes for the workaround (see: #203750), but I guess an alternative solution would be to troubleshoot this issue. However, both options are low priority for us, while the required effort seems coincidentally high.

My current ETA is that I may take a look at this sometime in June/July, but cannot promise any resolution.

@ElvishJerricco
Copy link
Contributor

@jhvst Look at how this initrd is created:

system.build.netbootRamdisk = pkgs.makeInitrdNG {
inherit (config.boot.initrd) compressor;
prepend = [ "${config.system.build.initialRamdisk}/initrd" ];
contents =
[ { object = config.system.build.squashfsStore;
symlink = "/nix-store.squashfs";
}
];
};

The ordinary initrd is used in prepend, meaning this initrd contains two compressed initrds; one is the normal one, and then after it is one containing only the squashfs.

This means that if the initrd is being truncated during load because it is too long, only the squashfs would be affected.

So I think my explanation is still very likely correct.

@jhvst
Copy link
Contributor Author

jhvst commented Mar 27, 2024

Now I see, makes sense then. This might be a good test case then. I will most likely start debugging this expression. Thanks!

@jhvst
Copy link
Contributor Author

jhvst commented Sep 6, 2024

It seems that this issue has now resolved itself -- initrds over 2GB boot fine. Thanks to everyone who shared their comments. This has been tested both on kexec and via ipxe netboot.

@jhvst jhvst closed this as completed Sep 6, 2024
tupakkatapa added a commit to tupakkatapa/nix-config that referenced this issue Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests

6 participants