Fix --preallocate --sparse to actually produce sparse files#916
Merged
Conversation
rsync.1 says combining --preallocate with --sparse yields sparse blocks wherever the filesystem can punch holes, but since 2019 (commit c2da380, "keep file-size 0 when possible") it has silently left the file fully allocated. Two problems, both rooted in that commit switching --preallocate / --inplace to fallocate(FALLOC_FL_KEEP_SIZE): * do_fallocate() then returned 0 instead of the reserved length, so the receiver's preallocated_len was 0 and write_sparse() always lseek'd over null runs instead of punching them (and the over-preallocation trim in receiver.c never fired either). * more fundamentally, KEEP_SIZE leaves the file size at 0 while data is written incrementally, so the FALLOC_FL_PUNCH_HOLE call lands on blocks beyond EOF and is a silent no-op -- the reserved blocks are never freed. Fix both: don't request KEEP_SIZE when --sparse is also active, so the file is preallocated at full size and the punch lands within it; and return the reserved length from do_fallocate() so preallocated_len drives the punch decision and the over-allocation trim. --preallocate without --sparse keeps the KEEP_SIZE (file-size-0) behaviour. t_stub.c gains a sparse_files stub since do_fallocate now references it and the test helpers link syscall.o. preallocate_test.py now asserts via st_blocks (where the filesystem can punch holes) that --preallocate --sparse ends up sparse, guarding the regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1472bcb to
6aad80f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rsync.1 says combining --preallocate with --sparse yields sparse blocks wherever the filesystem can punch holes, but since 2019 (commit c2da380, "keep file-size 0 when possible") it has silently left the file fully allocated. Two problems, both rooted in that commit switching --preallocate / --inplace to fallocate(FALLOC_FL_KEEP_SIZE):
do_fallocate() then returned 0 instead of the reserved length, so the receiver's preallocated_len was 0 and write_sparse() always lseek'd over null runs instead of punching them (and the over-preallocation trim in receiver.c never fired either).
more fundamentally, KEEP_SIZE leaves the file size at 0 while data is written incrementally, so the FALLOC_FL_PUNCH_HOLE call lands on blocks beyond EOF and is a silent no-op -- the reserved blocks are never freed.
Fix both: don't request KEEP_SIZE when --sparse is also active, so the file is preallocated at full size and the punch lands within it; and return the reserved length from do_fallocate() so preallocated_len drives the punch decision and the over-allocation trim. --preallocate without --sparse keeps the KEEP_SIZE (file-size-0) behaviour.
preallocate_test.py now asserts via st_blocks (where the filesystem stores holes) that --preallocate --sparse ends up sparse, guarding the regression.