Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

singularity-tools: refactor and add definition-based image buider #224636

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ShamrockLee
Copy link
Contributor

@ShamrockLee ShamrockLee commented Apr 4, 2023

Description of changes

Status: work in progress

The original buildImage implementation is commented out to make discussion easier.

Also update trivial-builders.nix:
writeMultipleReferencesToFile: init (#178717)
writeScriptBin, writeShellScriptBin: add meta.mainProgram automatically

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Apr 4, 2023

Questions:

  • What does runScript do?

  • What does runAsRoot do? Is it an analogy to the setup section in the Apptainer/Singularity definition file?

  • If we would like to make the buildImage interface consistent with that of dockTools, which builder are we going to align with? Key difference between Apptainer and Docker, IIUC:

    • Apptainer/Singularity images does not have layers
    • Apptainer/Singularity images are usually run immutably, without daemon or process isolation.
  • Should we replace buildImage with the definition-based implementation? Should we keep the sandbox-based implementation?

  • I'm personally optimistic about Apptainer's progress in unprivileged image building. We could hopefully build the image without additional virtualization eventually. Considering the current situation, is there a lighter chroot-like mounter to provide /var/singularity/mnt/{container,final,overlay,session,source}? Should we maintain the manually-image-composition script by @posch (Beat singularity-tools up to shape #177908 (comment))?

Cc: @jbedo @SomeoneSerge @dmadisetti

Copy link
Contributor Author

@ShamrockLee ShamrockLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional questions about the original implementation.

@jbedo
Copy link
Contributor

jbedo commented Apr 4, 2023

Questions:

  • What does runScript do?

It's the contents of the runscript script singularity uses when invoked by a singularity run.

  • What does runAsRoot do? Is it an analogy to the setup section in the Apptainer/Singularity definition file?

Not quite, it's executed inside the container via a chroot, but has the host store available. So it allows arbitrary setup of things outside of the nix store prior to instantiating the container specific store. This is analagous to dockerTools.

  • If we would like to make the buildImage interface consistent with that of dockTools, which builder are we going to align with? Key difference between Apptainer and Docker, IIUC:

    • Apptainer/Singularity images does not have layers
    • Apptainer/Singularity images are usually run immutably, without daemon or process isolation.

Not sure how much isolation affects the process of building an image, more of a runtime thing.

  • Should we replace buildImage with the definition-based implementation? Should we keep the sandbox-based implementation?
  • I'm personally optimistic about Apptainer's progress in unprivileged image building. We could hopefully build the image without additional virtualization eventually. Considering the current situation, is there a lighter chroot-like mounter to provide /var/singularity/mnt/{container,final,overlay,session,source}? Should we maintain the manually-image-composition script by @posch (Beat singularity-tools up to shape #177908 (comment))?

One thing to consider is that many places have namespaces disabled so lighter alternatives are not possible. The squashfs approach of @posch is great and wasn't possible when I wrote the initial implementation, but runAsRoot isn't possible with this. This probably doesn't matter for a majority of use cases.

@ShamrockLee
Copy link
Contributor Author

One thing to consider is that many places have namespaces disabled so lighter alternatives are not possible.

What if we have user namespaces? buildFHSUserEnvBubblewrap are common, and there's a TODO by atemu in all-packages.nix about switching to buildFHSUserEnv = buildFHSUserEnvBubblewrap eventually.

posch's script is great, but it's a bit hacky as the neither Apptainer nor SingularityCE have published the specification about the image format. (Hand-made sandbox build also has similar problem.)

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 5, 2023

What if we have user namespaces? buildFHSUserEnvBubblewrap are common

(I haven't processed all of the previous notifications, but) I just wanted to note that this isn't too often the case for HPC clusters (Aalto's "Triton" only offers setuid)...

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Apr 5, 2023

Just noticed that the layout of an Apptainer/Singularity "sandbox" is very similar to the extracted content of a SIF image.

If we could figure out a way to create a sandbox, maybe we could just pack it into a squashfs image.

On the other hand, I'm facing a strange runInLinuxVM bug (#224889) while testing the sandbox production.

"${coreutils}/bin/ln" -s "${coreutils}/bin/env" "''${SINGULARITY_ROOTFS}/usr/bin/env"
'';
environment = {
PATH = "${lib.makeBinPath contents}:\${PATH:-}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is the alternative to the previous approach of generating a FHS-like tree (/bin and such). We can address this in a separate PR, but I wanted to explore (ab)using ext3 and populate both /nix/store and /bin using hardlinks. This way I could hide the container's /nix with --bind /tmp:/nix and still having working sh and nix (assuming I've built them statically linked)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is the alternative to the previous approach of generating a FHS-like tree (/bin and such).

Yes. The PATH-based approach seems cleaner for me, which is also how nix shell works. As both of them has its use cases (Nixy idiom vs. FHS PATH compatibility), we could allow users to decide which to use.

We can address this in a separate PR, but I wanted to explore (ab)using ext3 and populate both /nix/store and /bin using hardlinks. This way I could hide the container's /nix with --bind /tmp:/nix and still having working sh and nix (assuming I've built them statically linked)

HPC might be based upon a distributed filesystems; great chance that they don't support hard links. Former versions of Singularity simply extracts the SquashFS image on the host. If the machine have some sort of TMPDIR where local filesystems like ext4 or tmpfs are mounted, you could have it extract there and (hopefully) enjoy the hard links. If you don't have access hard-link supported spaces, there might be problems extracting an image like that.

A good news is that both Apptainer and SingularityCE now tries to mount the image with the squashfuse command from the environment before falling back to extracting, and that SquashFS does support hard links on its own. I added squashfuse to the defaultPath and wrapper PATH of both apptainer and singularity along with the version bump #224683. Let's make it land on the stable release of Nixpkgs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great chance that they don't support hard links

I guess I don't really care if they're extracted as copies on the host (in fact, only one would be extracted, and the other would get hidden), as long as they are represented as hardlinks inside the image that we create.

I added squashfuse to the defaultPath and wrapper PATH of both apptainer and singularity along with the version bump

Awesome! I should have a look tomorrow!

Let's make it land on the stable release of Nixpkgs

Sounds great! But just in case it takes longer, we can and should backport later

@posch
Copy link

posch commented Apr 28, 2023

posch's script is great, but it's a bit hacky as the neither Apptainer nor SingularityCE have published the specification about the image format. (Hand-made sandbox build also has similar problem.)

Today I've tried to use "apptainer build" instead of "mksquashfs" to create the container from the sandbox-like tree. It worked with only a small change to apptainer (see apptainer/apptainer#1312 ). Maybe that's a little less hacky.

@ShamrockLee
Copy link
Contributor Author

ShamrockLee commented Apr 29, 2023

It worked with only a small change to apptainer (see apptainer/apptainer#1312 ). Maybe that's a little less hacky.

Great! That would be more likely to be megred by the upstrem than mine (apptainer/apptainer#1284).

@ShamrockLee
Copy link
Contributor Author

During the work on #224828, I found that apptainer run turns out to invoke a non-interctive Bash session, which results in a painful debugging experience. It's caused by the default runscript calling /bin/sh, which links to the non-interactive runtimeShell in images built by singulatity-tools.buildImage.

As most of the distributions (I have heard of) uses an interactive shell as its /bin/sh, would it be better if we link bashInteractive to /bin/sh? Would we care about the increasing closure size?

@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict labels Mar 19, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants