Skip to content

[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] iomap: fix non-atomic clone operation and don't update size when zeroing range post eof#1811

Open
opsiff wants to merge 7 commits into
deepin-community:linux-6.6.yfrom
opsiff:linux-6.6.y-2026-06-04-iomap
Open

[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] iomap: fix non-atomic clone operation and don't update size when zeroing range post eof#1811
opsiff wants to merge 7 commits into
deepin-community:linux-6.6.yfrom
opsiff:linux-6.6.y-2026-06-04-iomap

Conversation

@opsiff
Copy link
Copy Markdown
Member

@opsiff opsiff commented Jun 4, 2026

xfs part has been merged though lts commits, merge the all series.

  • ("iomap: add pos and dirty_len into trace_iomap_writepage_map") trace support
  • ("iomap: keep on increasing i_size in iomap_write_end()") fixes from v6.10-rc4

Changes since v4:

  • For zeroing range in xfs, move the delalloc check to before searching
    the COW fork when zeroing range. Only modify patch 04, please see it
    for details, not modify other patches.

Changes since v3:

  • Improve some git message comments and do some minor code cleanup, no
    logic changes.

Changes since v2:

  • Merge the patch for dropping of xfs_convert_blocks() and the patch
    for modifying xfs_bmapi_convert_delalloc().
  • Reword the commit message of the second patch.

Changes since v1:

  • Make xfs_bmapi_convert_delalloc() to allocate the target offset and
    drop the writeback helper xfs_convert_blocks().
  • Don't use xfs_iomap_write_direct() to convert delalloc blocks for
    zeroing posteof case, use xfs_bmapi_convert_delalloc() instead.
  • Fix two off-by-one issues when converting delalloc blocks.
  • Add a separate patch to drop the buffered write failure handle in
    zeroing and unsharing.
  • Add a comments do emphasize updating i_size should under folio lock.
  • Make iomap_write_end() to return a boolean, and do some cleanups in
    buffered write begin path.

This patch series fix a problem of exposing zeroed data on xfs since the
non-atomic clone operation. This problem was found while I was
developing ext4 buffered IO iomap conversion (ext4 is relying on this
fix [1]), the root cause of this problem and the discussion about the
solution please see [2]. After fix the problem, iomap_zero_range()
doesn't need to update i_size so that ext4 can use it to zero partial
block, e.g. truncate eof block [3].

[1] https://lore.kernel.org/linux-ext4/20240127015825.1608160-1-yi.zhang@huaweicloud.com/
[2] https://lore.kernel.org/linux-ext4/9b0040ef-3d9d-6246-4bdd-82b9a8f55fa2@huaweicloud.com/
[3] https://lore.kernel.org/linux-ext4/9c9f1831-a772-299b-072b-1c8116c3fb35@huaweicloud.com/

Thanks,
Yi.

Zhang Yi (9):
xfs: match lock mode in xfs_buffered_write_iomap_begin()
xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional
xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset
xfs: convert delayed extents to unwritten when zeroing post eof blocks
iomap: drop the write failure handles when unsharing and zeroing
iomap: don't increase i_size if it's not a write operation
iomap: use a new variable to handle the written bytes in
iomap_write_iter()
iomap: make iomap_write_end() return a boolean
iomap: do some small logical cleanup in buffered write

fs/iomap/buffered-io.c | 105 ++++++++++++++++++++++-----------------
fs/xfs/libxfs/xfs_bmap.c | 40 +++++++++++++--
fs/xfs/xfs_aops.c | 54 ++++++--------------
fs/xfs/xfs_iomap.c | 39 +++++++++++++--
4 files changed, 144 insertions(+), 94 deletions(-)

--
2.39.2

Summary by Sourcery

Fix iomap buffered write accounting so file size is only extended for successfully written data and extend tracing for writeback mapping.

Bug Fixes:

  • Ensure iomap only updates inode size when bytes are successfully written to the page cache, avoiding incorrect i_size growth on short or failed writes.
  • Propagate write failures correctly in buffered write, unshare, and zeroing paths so partial or rejected writes are handled as errors instead of silently succeeding.

Enhancements:

  • Refine iomap buffered write iteration accounting by tracking the total bytes successfully written per operation.
  • Extend the iomap_writepage_map tracepoint to record write position and dirty length for better writeback diagnostics.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Jun 4, 2026

Reviewer's Guide

Refactors iomap buffered write completion to be boolean-based, tightens error handling and i_size updates for non-write operations to avoid exposing zeroed data, and enhances writepage tracing with position and dirty length while integrating with XFS delayed allocation / post-EOF zeroing fixes elsewhere in the series.

File-Level Changes

Change Details Files
Refactor iomap buffered write completion semantics to return success/failure instead of byte counts and tighten error handling paths
  • Change __iomap_write_end and iomap_write_end to return a boolean indicating whether copied bytes were successfully committed to the page cache
  • Track written bytes explicitly in iomap_write_iter instead of overloading the return value from iomap_write_end
  • On write_begin and write_end failures, consistently call iomap_write_failed and correctly revert the iov_iter to avoid exposing partial writes
  • Adjust unshare and zeroing paths to treat iomap_write_end as a boolean and fail loudly if it rejects the operation
fs/iomap/buffered-io.c
Ensure i_size updates are driven by actual written bytes only and not by non-write operations such as zeroing/unsharing beyond EOF
  • Use the explicitly tracked written byte count when deciding whether to extend i_size in iomap_write_end
  • Pass the written length instead of requested length into __iomap_put_folio so pagecache accounting matches the actual data committed
  • Avoid updating i_size when zeroing/unsharing ranges post-EOF, relying on the caller to manage size where appropriate
fs/iomap/buffered-io.c
Extend iomap writeback tracing with additional context for debugging writepage mapping behavior
  • Replace the generic DEFINE_IOMAP_EVENT-based iomap_writepage_map tracepoint with a custom TRACE_EVENT that records inode, file position, dirty_len, iomap address/offset/length, type, and flags
  • Update iomap_writepage_map to pass pos and dirty_len into the tracepoint call
fs/iomap/buffered-io.c
fs/iomap/trace.h

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

mainline inclusion
from mainline-v6.9-rc1
commit 54943ab
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=54943abce0927156ba9909f1a533b25410bfe2ca

--------------------------------

Since commit fd07e0aa23c4 ("iomap: map multiple blocks at a time"), we
could map multi-blocks once a time, and the dirty_len indicates the
expected map length, map_len won't large than it. The pos and dirty_len
means the dirty range that should be mapped to write, add them into
trace_iomap_writepage_map() could be more useful for debug.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240220115759.3445025-1-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 54943ab)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.10-rc1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

--------------------------------

Unsharing and zeroing can only happen within EOF, so there is never a
need to perform posteof pagecache truncation if write begin fails, also
partial write could never theoretically happened from iomap_write_end(),
so remove both of them.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-6-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 89c6c1d)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.10-rc1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

--------------------------------

Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not
needed, the caller should handle it. Especially, when truncate partial
block, we should not increase i_size beyond the new EOF here. It doesn't
affect xfs and gfs2 now because they set the new file size after zero
out, it doesn't matter that a transient increase in i_size, but it will
affect ext4 because it set file size before truncate. So move the i_size
updating logic to iomap_write_iter().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-7-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 943bc08)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…iter()

mainline inclusion
from mainline-v6.10-rc1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

--------------------------------

In iomap_write_iter(), the status variable used to receive the return
value from iomap_write_end() is confusing, replace it with a new written
variable to represent the written bytes in each cycle, no logic changes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-8-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 1a61d74)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.10-rc1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

--------------------------------

For now, we can make sure iomap_write_end() always return 0 or copied
bytes, so instead of return written bytes, convert to return a boolean
to indicate the copied bytes have been written to the pagecache.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-9-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 815f4b6)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.10-rc1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z
CVE: NA

--------------------------------

Since iomap_write_end() can never return a partial write length, the
comparison between written, copied and bytes becomes useless, just
merge them with the unwritten branch.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-10-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit e1f453d)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.10-rc4
category: bugfix

Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")' breaks xfs with realtime device on generic/561, the problem
is when unaligned truncate down a xfs realtime inode with rtextsize > 1
fs block, xfs only zero out the EOF block but doesn't zero out the tail
blocks that aligned to rtextsize, so if we don't increase i_size in
iomap_write_end(), it could expose stale data after we do an append
write beyond the aligned EOF block.

xfs should zero out the tail blocks when truncate down, but before we
finish that, let's fix the issue by just revert the changes in
iomap_write_end().

Fixes: 943bc08 ("iomap: don't increase i_size if it's not a write operation")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240603112222.2109341-1-yi.zhang@huaweicloud.com
Tested-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 0841ea4)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Avenger-285714
Copy link
Copy Markdown
Member

/approve

@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Avenger-285714

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR series ports upstream fixes to the iomap buffered I/O and XFS iomap integration to prevent exposing zeroed/stale data after non-atomic clone operations, and to ensure inode size is only advanced for data that was actually committed to the page cache (not for post-EOF zeroing/unsharing). It also extends iomap writeback tracing to include write position and dirty length for improved diagnostics.

Changes:

  • Update iomap buffered write paths to only advance i_size based on bytes actually committed, and rework accounting in iomap_write_iter().
  • Extend trace_iomap_writepage_map to record pos and dirty_len.
  • Adjust XFS buffered iomap/zeroing paths (per PR description) to handle post-EOF zeroing and delalloc conversion more safely.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
fs/iomap/trace.h Replaces the old DEFINE_IOMAP_EVENT(iomap_writepage_map) with a custom TRACE_EVENT that records pos and dirty_len.
fs/iomap/buffered-io.c Reworks buffered write end/iter logic to gate i_size growth on committed bytes, adjusts failure handling, and updates writeback tracepoint callsites.
fs/xfs/libxfs/xfs_bmap.c XFS bmap/delalloc conversion adjustments for post-EOF zeroing workflows (per PR description).
fs/xfs/xfs_aops.c XFS address-space ops updates related to iomap buffered write/zeroing behavior (per PR description).
fs/xfs/xfs_iomap.c XFS iomap mapping updates to support the series’ zeroing/unsharing semantics (per PR description).
Comments suppressed due to low confidence (1)

fs/iomap/buffered-io.c:1442

  • iomap_write_begin() no longer performs the iomap_write_failed() cleanup on error. For iomap_zero_range(), the range can extend past EOF (including the post-EOF zeroing use case mentioned in the series), and a write_begin failure can leave newly instantiated pagecache beyond i_size. Call iomap_write_failed() before returning the error to preserve the prior cleanup semantics.
		bool ret;

		status = iomap_write_begin(iter, pos, bytes, &folio);
		if (status)
			return status;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fs/iomap/buffered-io.c
Comment on lines 943 to +946
if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
if (ret < len)
iomap_write_failed(iter->inode, pos + ret, len - ret);
return ret;

return written == copied;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants