[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] iomap: fix non-atomic clone operation and don't update size when zeroing range post eof#1811
Conversation
Reviewer's GuideRefactors iomap buffered write completion to be boolean-based, tightens error handling and i_size updates for non-write operations to avoid exposing zeroed data, and enhances writepage tracing with position and dirty length while integrating with XFS delayed allocation / post-EOF zeroing fixes elsewhere in the series. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
mainline inclusion from mainline-v6.9-rc1 commit 54943ab category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=54943abce0927156ba9909f1a533b25410bfe2ca -------------------------------- Since commit fd07e0aa23c4 ("iomap: map multiple blocks at a time"), we could map multi-blocks once a time, and the dirty_len indicates the expected map length, map_len won't large than it. The pos and dirty_len means the dirty range that should be mapped to write, add them into trace_iomap_writepage_map() could be more useful for debug. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240220115759.3445025-1-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 54943ab) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.10-rc1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA -------------------------------- Unsharing and zeroing can only happen within EOF, so there is never a need to perform posteof pagecache truncation if write begin fails, also partial write could never theoretically happened from iomap_write_end(), so remove both of them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-6-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 89c6c1d) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.10-rc1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA -------------------------------- Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not needed, the caller should handle it. Especially, when truncate partial block, we should not increase i_size beyond the new EOF here. It doesn't affect xfs and gfs2 now because they set the new file size after zero out, it doesn't matter that a transient increase in i_size, but it will affect ext4 because it set file size before truncate. So move the i_size updating logic to iomap_write_iter(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-7-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 943bc08) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
…iter() mainline inclusion from mainline-v6.10-rc1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA -------------------------------- In iomap_write_iter(), the status variable used to receive the return value from iomap_write_end() is confusing, replace it with a new written variable to represent the written bytes in each cycle, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-8-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 1a61d74) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.10-rc1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA -------------------------------- For now, we can make sure iomap_write_end() always return 0 or copied bytes, so instead of return written bytes, convert to return a boolean to indicate the copied bytes have been written to the pagecache. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-9-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 815f4b6) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.10-rc1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA -------------------------------- Since iomap_write_end() can never return a partial write length, the comparison between written, copied and bytes becomes useless, just merge them with the unwritten branch. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-10-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit e1f453d) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion
from mainline-v6.10-rc4
category: bugfix
Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")' breaks xfs with realtime device on generic/561, the problem
is when unaligned truncate down a xfs realtime inode with rtextsize > 1
fs block, xfs only zero out the EOF block but doesn't zero out the tail
blocks that aligned to rtextsize, so if we don't increase i_size in
iomap_write_end(), it could expose stale data after we do an append
write beyond the aligned EOF block.
xfs should zero out the tail blocks when truncate down, but before we
finish that, let's fix the issue by just revert the changes in
iomap_write_end().
Fixes: 943bc08 ("iomap: don't increase i_size if it's not a write operation")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240603112222.2109341-1-yi.zhang@huaweicloud.com
Tested-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
(cherry picked from commit 0841ea4)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
c1d51ae to
8806234
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Avenger-285714 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR series ports upstream fixes to the iomap buffered I/O and XFS iomap integration to prevent exposing zeroed/stale data after non-atomic clone operations, and to ensure inode size is only advanced for data that was actually committed to the page cache (not for post-EOF zeroing/unsharing). It also extends iomap writeback tracing to include write position and dirty length for improved diagnostics.
Changes:
- Update iomap buffered write paths to only advance
i_sizebased on bytes actually committed, and rework accounting iniomap_write_iter(). - Extend
trace_iomap_writepage_mapto recordposanddirty_len. - Adjust XFS buffered iomap/zeroing paths (per PR description) to handle post-EOF zeroing and delalloc conversion more safely.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| fs/iomap/trace.h | Replaces the old DEFINE_IOMAP_EVENT(iomap_writepage_map) with a custom TRACE_EVENT that records pos and dirty_len. |
| fs/iomap/buffered-io.c | Reworks buffered write end/iter logic to gate i_size growth on committed bytes, adjusts failure handling, and updates writeback tracepoint callsites. |
| fs/xfs/libxfs/xfs_bmap.c | XFS bmap/delalloc conversion adjustments for post-EOF zeroing workflows (per PR description). |
| fs/xfs/xfs_aops.c | XFS address-space ops updates related to iomap buffered write/zeroing behavior (per PR description). |
| fs/xfs/xfs_iomap.c | XFS iomap mapping updates to support the series’ zeroing/unsharing semantics (per PR description). |
Comments suppressed due to low confidence (1)
fs/iomap/buffered-io.c:1442
- iomap_write_begin() no longer performs the iomap_write_failed() cleanup on error. For iomap_zero_range(), the range can extend past EOF (including the post-EOF zeroing use case mentioned in the series), and a write_begin failure can leave newly instantiated pagecache beyond i_size. Call iomap_write_failed() before returning the error to preserve the prior cleanup semantics.
bool ret;
status = iomap_write_begin(iter, pos, bytes, &folio);
if (status)
return status;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (old_size < pos) | ||
| pagecache_isize_extended(iter->inode, old_size, pos); | ||
| if (ret < len) | ||
| iomap_write_failed(iter->inode, pos + ret, len - ret); | ||
| return ret; | ||
|
|
||
| return written == copied; |
xfs part has been merged though lts commits, merge the all series.
Changes since v4:
the COW fork when zeroing range. Only modify patch 04, please see it
for details, not modify other patches.
Changes since v3:
logic changes.
Changes since v2:
for modifying xfs_bmapi_convert_delalloc().
Changes since v1:
drop the writeback helper xfs_convert_blocks().
zeroing posteof case, use xfs_bmapi_convert_delalloc() instead.
zeroing and unsharing.
buffered write begin path.
This patch series fix a problem of exposing zeroed data on xfs since the
non-atomic clone operation. This problem was found while I was
developing ext4 buffered IO iomap conversion (ext4 is relying on this
fix [1]), the root cause of this problem and the discussion about the
solution please see [2]. After fix the problem, iomap_zero_range()
doesn't need to update i_size so that ext4 can use it to zero partial
block, e.g. truncate eof block [3].
[1] https://lore.kernel.org/linux-ext4/20240127015825.1608160-1-yi.zhang@huaweicloud.com/
[2] https://lore.kernel.org/linux-ext4/9b0040ef-3d9d-6246-4bdd-82b9a8f55fa2@huaweicloud.com/
[3] https://lore.kernel.org/linux-ext4/9c9f1831-a772-299b-072b-1c8116c3fb35@huaweicloud.com/
Thanks,
Yi.
Zhang Yi (9):
xfs: match lock mode in xfs_buffered_write_iomap_begin()
xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional
xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset
xfs: convert delayed extents to unwritten when zeroing post eof blocks
iomap: drop the write failure handles when unsharing and zeroing
iomap: don't increase i_size if it's not a write operation
iomap: use a new variable to handle the written bytes in
iomap_write_iter()
iomap: make iomap_write_end() return a boolean
iomap: do some small logical cleanup in buffered write
fs/iomap/buffered-io.c | 105 ++++++++++++++++++++++-----------------
fs/xfs/libxfs/xfs_bmap.c | 40 +++++++++++++--
fs/xfs/xfs_aops.c | 54 ++++++--------------
fs/xfs/xfs_iomap.c | 39 +++++++++++++--
4 files changed, 144 insertions(+), 94 deletions(-)
--
2.39.2
Summary by Sourcery
Fix iomap buffered write accounting so file size is only extended for successfully written data and extend tracing for writeback mapping.
Bug Fixes:
Enhancements: