osd/ECUtil: Fix erase_after_ro_offset length calculation and add tests#66817
Merged
osd/ECUtil: Fix erase_after_ro_offset length calculation and add tests#66817
Conversation
rzarzynski
reviewed
Jan 8, 2026
Contributor
rzarzynski
left a comment
There was a problem hiding this comment.
The fix LGTM!
The comments are just about the primary unit test which, per the on-Slack discussion, needs some changes.
c6df2cd to
7e1c551
Compare
rzarzynski
approved these changes
Jan 8, 2026
System test logs showed EC recovery failures with assertion errors when recovering small objects (smaller than stripe width) in EC pools. The recovery would fail with "shard_size >= tobj_size" assertions because shards that should be empty incorrectly contained data. The primary change in this commit fixes a bug in shard_extent_map_t::erase_after_ro_offset() where the length calculation was incorrect: sinfo->ro_range_to_shard_extent_set(ro_offset, ro_end - ro_start, ...) Should be: sinfo->ro_range_to_shard_extent_set(ro_offset, ro_end - ro_offset, ...) When ro_offset < ro_start, the incorrect calculation caused data that should be erased to remain on shards, leading to recovery failures. Additionally, this commit adds 13 comprehensive unit tests to TestECUtil that thoroughly exercise erase_after_ro_offset across various edge cases, including the critical scenario of objects smaller than stripe width where some shards should remain empty. These tests successfully catch the bug when it is re-introduced. Note: The unit tests in this commit were generated with assistance from an LLM (Large Language Model) and subsequently validated and refined. Fixes: https://tracker.ceph.com/issues/74329 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
7e1c551 to
a60c86a
Compare
rzarzynski
approved these changes
Jan 8, 2026
Member
|
jenkins test make check arm64 |
Member
|
jenkins test windows |
Member
Contributor
Author
|
jenkins test make check |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
System test logs showed EC recovery failures with assertion errors when
recovering small objects (smaller than stripe width) in EC pools.
The recovery would fail with "shard_size >= tobj_size" assertions
because shards that should be empty incorrectly contained data.
The primary change in this commit fixes a bug in
shard_extent_map_t::erase_after_ro_offset() where the length
calculation was incorrect:
sinfo->ro_range_to_shard_extent_set(ro_offset, ro_end - ro_start, ...)
Should be:
sinfo->ro_range_to_shard_extent_set(ro_offset, ro_end - ro_offset, ...)
When ro_offset < ro_start, the incorrect calculation caused data that
should be erased to remain on shards, leading to recovery failures.
Additionally, this commit adds 13 comprehensive unit tests to TestECUtil
that thoroughly exercise erase_after_ro_offset across various edge cases,
including the critical scenario of objects smaller than stripe width where
some shards should remain empty. These tests successfully catch the bug
when it is re-introduced.
Note: The unit tests in this commit were generated with assistance from
an LLM (Large Language Model) and subsequently validated and refined.
Fixes: https://tracker.ceph.com/issues/74329
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.