Skip to content

vminit: replace initramfs with erofs rootfs#208

Draft
dmcgowan wants to merge 1 commit into
containerd:mainfrom
dmcgowan:use-erofs-rootfs
Draft

vminit: replace initramfs with erofs rootfs#208
dmcgowan wants to merge 1 commit into
containerd:mainfrom
dmcgowan:use-erofs-rootfs

Conversation

@dmcgowan
Copy link
Copy Markdown
Member

Replace the gzip-compressed CPIO initramfs with an EROFS block device image as the VM root filesystem, eliminating the tmpfs switch_root that was required to make pivot_root available to containers.

The kernel boots directly into the erofs image on /dev/vda via 'root=/dev/vda rootfstype=erofs ro init=/sbin/vminitd', removing the initramfs decompression which contributed significantly to kernel boot time.

Build changes:

  • Dockerfile: replace cpio/gzip initrd-build stage with erofs-build stage using mkfs.erofs -zlz4; vminitd placed at /sbin/vminitd
  • docker-bake.hcl, Makefile: rename initrd target to rootfs

Host shim changes:

  • instance.go: search for nerdbox-rootfs.erofs; add it as the first virtio-blk device in NewInstance so it is always /dev/vda; pass the erofs boot cmdline to krun_set_kernel with no initrd
  • krun.go: change SetKernel's initramfs parameter to unsafe.Pointer so that an empty string maps to a C null pointer (purego converts an empty Go string to a non-null pointer, causing libkrun to fail)
  • mount.go: start diskAllocator at 'b' since /dev/vda is now the VM rootfs; container disks begin at /dev/vdb

Guest init changes:

  • initd.go: replace with systemMounts that mounts proc, sysfs, cgroup2, run, tmp, and a tmpfs over /etc (so runtime writes such as resolv.conf succeed on the read-only erofs). /dev is omitted — CONFIG_DEVTMPFS_MOUNT=y mounts devtmpfs before init starts, and a redundant mount returns EBUSY on the block-device root.

Copilot AI review requested due to automatic review settings May 29, 2026 06:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the gzip CPIO initramfs with an EROFS block-device rootfs mounted as /dev/vda. The kernel now boots directly into the EROFS image (root=/dev/vda rootfstype=erofs ro init=/sbin/vminitd), avoiding the previous initramfs decompression + switch_root. Build, host shim, and guest init code are updated accordingly.

Changes:

  • Build: replace initrd-build Dockerfile stage with erofs-build (mkfs.erofs -zlz4), and rename the bake target / Make rule / Taskfile entry from initrd to rootfs.
  • Host shim: add the EROFS rootfs as the first virtio-blk disk (/dev/vda), shift container disk allocation to start at vdb, move dynamic mount targets to /run/mnt, and fix the libkrun SetKernel binding so that an empty initrd argument maps to a C NULL pointer.
  • Guest init: drop the explicit devtmpfs mount (kernel auto-mounts it), mount a tmpfs over /etc so resolv.conf/hosts writes succeed on the read-only rootfs, fix a tmpsfs typo, and stop forcing NoPivot=true so runc honors the runtime option.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Dockerfile Replaces the cpio/gzip initrd stage with an mkfs.erofs -zlz4 rootfs stage, lays out /sbin/vminitd, /sbin/crun, mount-point directories, and var/run -> /run.
docker-bake.hcl Renames the initrd bake target to rootfs targeting the new erofs stage.
Makefile Renames _output/nerdbox-initrd target to _output/nerdbox-rootfs.erofs and wires it to build:rootfs.
Taskfile.yml Renames build:initrd to build:rootfs and updates descriptions.
README.md Updates documented artifact name to nerdbox-rootfs.erofs.
internal/vm/libkrun/instance.go Looks up nerdbox-rootfs.erofs, adds it as the first virtio-blk disk, stores rootfsPath, and boots with the new kernel cmdline and no initrd.
internal/vm/libkrun/krun.go Changes SetKernel's initramfs to unsafe.Pointer so "" becomes a NULL C pointer.
internal/shim/task/mount.go Starts the shared diskAllocator at 'b' (since vda is the VM rootfs) and moves bind/block VM mount targets to /run/mnt/....
internal/shim/task/mount_test.go Updates expected disk IDs, device names, and mount paths to reflect vdb+ and /run/mnt.
internal/vminit/process/init.go Uses p.NoPivotRoot instead of forcing NoPivot: true in runc.CreateOpts.
pkg/vminit/initd/initd.go Restructures systemInit to return a single error, adds the DHCP renewer goroutine inside it, fixes the tmpfs source typo, replaces the explicit devtmpfs mount with a tmpfs-over-/etc mount, and logs system-init elapsed time.

Note (outside diff): .github/workflows/ci.yml:250-264 still references nerdbox-initrd (including file _output/nerdbox-initrd), and .github/workflows/benchmarks.yml:109 still labels the step "initrd and shim". CI will fail because the build no longer produces _output/nerdbox-initrd. These files are not part of this diff, but should be updated alongside this PR.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace the gzip-compressed CPIO initramfs with an EROFS block device
image as the VM root filesystem, eliminating the tmpfs switch_root that
was required to make pivot_root available to containers.

The kernel boots directly into the erofs image on /dev/vda via
'root=/dev/vda rootfstype=erofs ro init=/sbin/vminitd', removing the
initramfs decompression which contributed significantly to kernel boot time.

Build changes:
- Dockerfile: replace cpio/gzip initrd-build stage with erofs-build
  stage using mkfs.erofs -zlz4; vminitd placed at /sbin/vminitd
- docker-bake.hcl, Makefile: rename initrd target to rootfs

Host shim changes:
- instance.go: search for nerdbox-rootfs.erofs; add it as the first
  virtio-blk device in NewInstance so it is always /dev/vda; pass
  the erofs boot cmdline to krun_set_kernel with no initrd
- krun.go: change SetKernel's initramfs parameter to unsafe.Pointer so
  that an empty string maps to a C null pointer (purego converts an
  empty Go string to a non-null pointer, causing libkrun to fail)
- mount.go: start diskAllocator at 'b' since /dev/vda is now the VM
  rootfs; container disks begin at /dev/vdb

Guest init changes:
- initd.go: replace with systemMounts that mounts proc, sysfs, cgroup2,
  run, tmp, and a tmpfs over /etc (so runtime writes such as resolv.conf
  succeed on the read-only erofs).
  /dev is omitted — CONFIG_DEVTMPFS_MOUNT=y mounts devtmpfs before init
  starts, and a redundant mount returns EBUSY on the block-device root.

Signed-off-by: Derek McGowan <derek@mcg.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants