Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] feature: use reflinks for extent sharing between initramfs source and archive data #1141

Closed
ddiss opened this issue Mar 3, 2021 · 4 comments
Labels
enhancement Issue adding new functionality

Comments

@ddiss
Copy link
Contributor

ddiss commented Mar 3, 2021

Proposal

I think we could speed up initramfs generation for some common (Btrfs / XFS)
setups by having dracut make heavier use of reflinks / COW clones during
initramfs generation. I'd guess >95% of an uncompressed+unstripped initramfs
image is duplicate data, which really shouldn't need to be shuffled
around when on the same COW clone capable FS.

Dracut already uses cp --reflink=auto when shuffling most things
into the /var/tmp staging area, so it should "just" be a matter of
making the cpio archive generation process clone-range aware
and dropping compression altogether.

This should allow for:

  • improved space efficiency
    • initramfs contents wouldn't be duplicated on disk
  • improved performance
    • initramfs image needn't be stripped / compressed / decompressed
    • initramfs generation would mostly perform metadata I/O
    • there may be some drawbacks due to fragmentation, but that would hopefully be compensated by the removal of compression / decompression

The following caveats would be present for dracut to successfully use reflink (otherwise fallback to read/write):

  • root, boot and dracut staging (/var/tmp) exist on the same Btrfs or XFS filesystem
  • paths don't have nocow flags set

Work-in-progress implementation

Luis and I made some changes to GNU cpio to perform between source and archive via the copy_file_range syscall. I've pushed this patchset to https://github.com/ddiss/cpio/tree/copy_file_range_2_13
Both XFS and Btrfs require proper alignment to ensure that copy_file_range actually results in extent sharing. To do this I worked on a Dracut padcpio binary which inserts dummy pad files into the initramfs cpio archive. The new binary, as well as Dracut logic to call cpio with the new parameters, can be found at https://github.com/ddiss/dracut/tree/cpio_cfr_align .

Needless to say both repos are WIP, so may result in data corruption or other disasters. At this stage I'm interested in some feedback on the approach. I've done some initial benchmarks atop btrfs, with positive results in terms of both runtime and space efficiency. I'll try to post some actual numbers in the coming days.

@ddiss ddiss added the enhancement Issue adding new functionality label Mar 3, 2021
@haraldh
Copy link
Collaborator

haraldh commented Mar 3, 2021

cpio already has:

  /* Fill up the output block.  */
  tape_clear_rest_of_block (out_file_des);
  tape_empty_output_buffer (out_file_des);

Couldn't this be used to kind of implement a padding option for cpio itsself?

@ddiss
Copy link
Contributor Author

ddiss commented Mar 3, 2021

Couldn't this be used to kind of implement a padding option for cpio itsself?

The cpio newc format doesn't allow for arbitrary padding, so I had to inject it via the new pad files in the initramfs image. IMO this logic isn't suitable for GNU cpio. Another option would be to drop GNU cpio and provide a Dracut specific cpio archive generator which provides padding (and performs copy_file_range, etc.).

@ddiss
Copy link
Contributor Author

ddiss commented Mar 6, 2021

dracut_bench.sh.txt

Here are my latest numbers using the attached benchmark script...

---------------------------------+----------+----------+-----------
     Benchmark                   |  Before  |  After   |  Change
---------------------------------+----------+----------+-----------
Dracut create image runtime      |  8.452s  |  7.635s  |  -9.666%
---------------------------------+----------+----------+-----------
initramfs data (fiemap)          |          |          |
- total                          | 12894208 | 34009088 |  +163.7%
- shared (dedup)                 |     0    | 24068096 |
- exclusive                      | 12894208 |  9940992 |  -22.90%
---------------------------------+----------+----------+-----------
QEMU cold boot to Dracut init    |  3.208s  |  2.850s  |  -11.15%
---------------------------------+----------+----------+-----------

Dracut initramfs creation and boot times are markedly reduced,
due to the dropped binary strip and compression, while data consumption
is also reduced thanks to CoW reflinks. I've not yet measured metadata
overhead for the padding and shared extents, nor have I tested boot
performance on bare metal.

@ddiss
Copy link
Contributor Author

ddiss commented Dec 6, 2021

This feature was implemented and merged via #1531 . See the pull request for more up to date benchmark results.

@ddiss ddiss closed this as completed Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue adding new functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants