Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cpio): improve initramfs image performance and efficiency via cpio reflinks #1531

Merged
merged 8 commits into from Nov 24, 2021

Conversation

ddiss
Copy link
Contributor

@ddiss ddiss commented Jun 8, 2021

Changes

This patchset attempts to speed up initramfs generation for some common (Btrfs / XFS) setups by having Dracut make heavier use of reflinks (AKA copy-on-write clones) during initramfs generation. A good portion of an uncompressed+unstripped initramfs image is duplicate data, which really shouldn't need to be shuffled around when on the same COW clone capable FS.

This is a rework of my #1148 feature submission. Instead for relying on the GNU cpio patchset for copy-on-write FS optimized reflink I/O, a new dracut-cpio tool is provided for cpio archive creation.
My motivating factors for dropping GNU cpio in favour of dracut-cpio are:

  • GNU cpio appears to receive very little attention upstream
  • relative simplicity of implementation
    • dracut-cpio only needs to provide support for cpio newc archive creation for extraction by the kernel
    • rust's standard library supports copy_file_range() natively

The new dracut-cpio functionality is disabled by default. It can be explicitly enabled by building with the --enable-dracut-cpio configure option and then calling dracut with --enhanced-cpio.

Performance

Preliminary benchmarks indicate a significant improvement in initramfs creation and kernel extraction times with reflinks, as shown via the Dracut runtime and QEMU boot time values respectively.
Storage utilization is also improved with initramfs reflinks, as shown by the significant reduction in exclusive extents. Shared extents represent space reclaimed through deduplication of initramfs source file and cpio data segments.

Btrfs Dracut xz Dracut zstd Dracut reflink
Dracut runtime 16.809s (ref) 12.935s (-23.05%) 10.292s (-38.59%)
QEMU boot time: kernel to /init 2361ms (ref) 951.0ms (-59.71%) 857.8ms (-63.66%)
initramfs fiemap data
total bytes 14647296 (ref) 16191488 (+10.54%) 34447360 (+135.2%)
shared (deduplicated) 0 (ref) 0 (+-0.00%) 31309824 (note)
exclusive 14647296 (ref) 16191488 (+10.54%) 3137536 (note)

note: Btrfs fiemap results varied significantly across multiple runs, therefore shared and exclusive values above should only be seen as a guide. In contrast, XFS results were consistent across all runs:

XFS Dracut xz Dracut zstd Dracut reflink
Dracut runtime 14.180s (ref) 10.437s (-26.40%) 9.7687s (-31.11%)
QEMU boot time: kernel to /init 2373ms (ref) 950.9ms (-59.92%) 859.5ms (-63.78%)
initramfs fiemap data
total bytes 14675968 (ref) 16158720 (+10.10%) 34689024 (+136.4%)
shared (deduplicated) 0 (ref) 0 (+-0.00%) 26525696
exclusive 14675968 (ref) 16158720 (+10.10%) 8163328 (-44.37%)

The Dracut xz case corresponds to SUSE Dracut version 055, patched with dracut-cpio reflink support, but configured to run with current SUSE defaults:
GNU cpio archiving, xz -0 --check=crc32 --memlimit-compress=50% compression and strip symbol discard.
The Dracut zstd case matches Dracut xz except for compression which is configured to use zstd -3 -T0.
The Dracut reflink case has dracut-cpio alignment and reflinks enabled via enhanced_cpio=yes. To ensure successful extent sharing, initramfs compression and symbol discard are disabled, alongside use of a reflink friendly Dracut staging area (tmpdir=/boot).

These benchmarks were performed on Tumbleweed 20210924 (5.14.6 kernel) virtual machines assigned 2 vCPUs and 8GiB RAM. The QEMU/KVM hypervisor host was running the same OS and stored the raw VM disk images in memory (tmpfs).
Importantly, bootloader initramfs image read times were not evaluated due to a lack of time and existing instrumentation.

Memory backed storage is significantly less vulnerable to the fragmentation effects of CoW reflinks compared to HDDs. Further benchmarking on SSDs and HDDs is necessary, at least before considering dracut-cpio as a complete replacement for GNU cpio archive creation. The benchmarks were done on a completely different machine and kernel to #1148 , so shouldn't be used for comparison with those numbers.

Special thanks to @Firstyear for helping me get my rust changes in shape for upstream submission.

Checklist

  • I have tested it locally
  • I have reviewed and updated any documentation if relevant
  • I am providing new code and test(s) for it

@github-actions github-actions bot added the test Issues related to testing label Jun 8, 2021
rapid-cpio/Cargo.toml Outdated Show resolved Hide resolved
configure Show resolved Hide resolved
man/dracut.8.asc Show resolved Hide resolved
dracut.sh Outdated Show resolved Hide resolved
@johannbg
Copy link
Collaborator

@ddiss note that @haraldh is on vacation until end of June so expect late review from him

@stale
Copy link

stale bot commented Jul 12, 2021

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

@stale stale bot added the stale communication is stuck label Jul 12, 2021
@johannbg johannbg removed the stale communication is stuck label Jul 17, 2021
@johannbg
Copy link
Collaborator

@ddiss hows the progress going on @Firstyear and @tpgxyz pointers ? @haraldh ping

@ddiss
Copy link
Contributor Author

ddiss commented Jul 26, 2021

@ddiss hows the progress going on @Firstyear and @tpgxyz pointers ? @haraldh ping

I'm back from leave so will return to this in the coming days. Thanks for the ping.

@the8472
Copy link

the8472 commented Jul 29, 2021

rust's standard library supports copy_file_range() natively

Please keep in mind that in std it is considered an optimization and not part of our API guarantees. We had to disable copy_file_range before due to regressions. So if you want to rely on that it you may want to write a test to have a canary. E.g. by tracing syscalls for a particular test or (assuming a suitable filesystem) checking that extents really are shared.

Other options would be using a crate that makes stronger guarantees or documenting that it's best-effort.

@ddiss
Copy link
Contributor Author

ddiss commented Jul 29, 2021

rust's standard library supports copy_file_range() natively

Please keep in mind that in std it is considered an optimization and not part of our API guarantees. We had to disable copy_file_range before due to regressions. So if you want to rely on that it you may want to write a test to have a canary. E.g. by tracing syscalls for a particular test or (assuming a suitable filesystem) checking that extents really are shared.

Other options would be using a crate that makes stronger guarantees or documenting that it's best-effort.

Ack, understood. Best-effort is fine, as that's what we already get from the kernel with regard to performing COW reflink or splice fallback. Thanks for the clarification.

@ddiss
Copy link
Contributor Author

ddiss commented Aug 19, 2021

Changes since last version:

  • perform configure time check for Cargo presence
  • fix lint-shell warnings

@ddiss ddiss changed the title improve initramfs image performance and efficiency via cpio reflinks feat(cpio): improve initramfs image performance and efficiency via cpio reflinks Aug 27, 2021
@ddiss
Copy link
Contributor Author

ddiss commented Aug 27, 2021

Changes since last version:

  • more shellcheck fixes
  • move dracut-cpio sources under src/dracut-cpio

@TomasTomecek
Copy link

/packit build

@ddiss
Copy link
Contributor Author

ddiss commented Sep 2, 2021

Changes since last version:

  • rename --cpio-reflink parameter to --enhanced-cpio to reflect the fact that it won't necessarily result in reflinked data
  • support compression alongside dracut-cpio
  • move dinfo call after dracut-init.sh is loaded (thanks @mwilck)
  • fix cmdline vs config parameter precedence (thanks @mwilck)
  • minor debug output and variable rewording

@ddiss
Copy link
Contributor Author

ddiss commented Sep 3, 2021

I've tacked on one extra commit which adds test coverage for the dracut.sh --enhanced-cpio code-path, as well as kernel cpio archive extraction.

The TEST-62-CPIO test should probably now be renamed TEST-62-SKIPCPIO... I'll squash that in with the next rebase.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Individual test scripts may change working directory, so relative paths
should be avoided.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Crosvm's rust argument library is very small and simple, while still
providing helpful functionality. It will be consumed by dracut-cpio in a
subsequent commit.

The unmodified, BSD licensed argument.rs source is lifted as-is from
https://chromium.googlesource.com/chromiumos/platform/crosvm
(release-R92-13982.B b6ae6517aeef9ae1e3a39c55b52f9ac6de8edb31).
The one-line crosvm.rs wrapper is needed to ensure that crosvm::argument
imports continue to work.

Signed-off-by: David Disseldorp <ddiss@suse.de>
dracut-cpio is a minimal cpio archive creation utility written in Rust.
It provides support for a minimal set of features needed to create
performant and space-efficient initramfs archives:
- "newc" archive format only
- reproducible; inode numbers, uid/gid and mtime can be explicitly set
- data segment copy-on-write reflinks
  + using Rust io::copy()'s native copy_file_range() support[1]
  + optional archive data segment alignment for optimal reflink use[2]
- hardlink support
- comprehensive tests asserting GNU cpio binary output compatibility

1. Rust io::copy() copy_file_range()
   rust-lang/rust#75272

2. Data segment alignment
   We're bending the newc spec a bit to inject zeros after the file path
   to provide data segment alignment. These zeros are accounted for in
   the namesize, but some applications may only expect a single
   zero-terminator (and 4 byte alignment). GNU cpio and Linux initramfs
   handle this fine as long as PATH_MAX isn't exceeded.

Signed-off-by: David Disseldorp <ddiss@suse.de>
If configured with --enable-dracut-cpio, call cargo to build the
dracut-cpio release binary.

Signed-off-by: David Disseldorp <ddiss@suse.de>
The new dracut-cpio binary is capable of performing copy-on-write
optimized initramfs archive creation, but due to the rust dependency
isn't built / installed by default.
This change adds a new "--enhanced-cpio" parameter for dracut which
sees dracut-cpio called for archive creation instead of GNU cpio.

Signed-off-by: David Disseldorp <ddiss@suse.de>
dracut-cpio already carries a bunch of unit tests covering compression
and GNU cpio extraction. The purpose of these tests is to exercise the
dracut.sh --enhanced-cpio code-paths as well as kernel cpio archive
extraction.

Signed-off-by: David Disseldorp <ddiss@suse.de>
@ddiss
Copy link
Contributor Author

ddiss commented Sep 17, 2021

Changes since last version:

  • rebase atop f9c7dea
  • rename TEST-62-CPIO to TEST-62-SKIPCPIO, so that it's distinct from dracut-cpio tests

@ddiss
Copy link
Contributor Author

ddiss commented Sep 20, 2021

Ping - is there anything I can do to move this forwards?

@ddiss
Copy link
Contributor Author

ddiss commented Sep 29, 2021

I've updated the cover letter to include more recent benchmark results, which include XFS and Dracut zstd numbers for comparison.

@ddiss
Copy link
Contributor Author

ddiss commented Sep 29, 2021

One other note is that we encountered what appears to be a bug in GRUB's Btrfs driver when reading initramfs images with a large number of shared extents: https://bugzilla.opensuse.org/show_bug.cgi?id=1190982 . A Btrfs developer is investigating the issue.
The GRUB issue was not seen with reflinked initramfs images on XFS.

@ddiss
Copy link
Contributor Author

ddiss commented Oct 7, 2021

@haraldh ping - I'd really appreciate some feedback on whether this can be merged. Just to make it clear, there's no change to default behaviour here, the new functionality is only enabled if built with configure --enable-dracut-cpio and run with dracut --enhanced-cpio.

@ddiss
Copy link
Contributor Author

ddiss commented Oct 15, 2021

@haraldh ping - I'd really appreciate some feedback on whether this can be merged. Just to make it clear, there's no change to default behaviour here, the new functionality is only enabled if built with configure --enable-dracut-cpio and run with dracut --enhanced-cpio.

Ping again... sorry to be a pain.

@ddiss
Copy link
Contributor Author

ddiss commented Oct 18, 2021

One other note is that we encountered what appears to be a bug in GRUB's Btrfs driver when reading initramfs images with a large number of shared extents: https://bugzilla.opensuse.org/show_bug.cgi?id=1190982 . A Btrfs developer is investigating the issue.

FWIW, @adam900710 posted a patch which addresses the above GRUB Btrfs bug:
https://lists.gnu.org/archive/html/grub-devel/2021-10/msg00138.html

@mwilck
Copy link
Contributor

mwilck commented Oct 20, 2021

FTR, I've done quite a few test runs with David's reflinking cpio, and have observed no issues, disk space savings on par with current compression. Here is an example:

root@bremer:boot> btrfs fi du i-xz-0 i-reflink i-zst-3
     Total   Exclusive  Set shared  Filename
  17.14MiB    17.14MiB       0.00B  i-xz-0
  40.00MiB    10.61MiB    29.39MiB  i-reflink
  18.94MiB    18.94MiB       0.00B  i-zst-3

The performance is also very good, it's faster than "zstd -3 -T0" in almost all cases.

Copy link
Collaborator

@johannbg johannbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see anything at this point why this cant be merged
@haraldh could you review this and add your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues related to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants