tentacle: osd: Do not remove objects with divergent logs if only partial writes.#66725
Merged
yuriw merged 1 commit intoceph:tentaclefrom Jan 29, 2026
Merged
tentacle: osd: Do not remove objects with divergent logs if only partial writes.#66725yuriw merged 1 commit intoceph:tentaclefrom
yuriw merged 1 commit intoceph:tentaclefrom
Conversation
Fixes https://tracker.ceph.com/issues/74221 Note: An AI was used to assist generating unit tests for this commit. The production code was written by the author. In the scenario we are fixing here, there is a divergent log, which needs to be rolled back. The non-primary does not participate in the transaction to the object, but the log exists describing the transaction. The primary has a different transaction and has correctly detected the divergence. The primary correctly concludes that no recovery is needed for the object, since only partial writes exist on the non-primary. The non-primary observes its divergent log and incorrectly concludes that recovery IS needed for the divergent write and prepares by removing that object. The consequence of this depends on the next operation: 1. A read will fail with -EIO 2. A RMW involving a read from the removed object will detect the failure and reconstruct the necessary data. 3. A RMW not involve the write or an append will recreate the object, but with zeros, so will cause data corruption. A It is unusual for such a log entry to exist on the non-primary because normally those are omitted from the non-primary log. The scenario that causes this when a partial write triggers a clone due to copy on write. We now have a clone operation which affects ALL shards and so the log entry is sent to all shards. This is unusual to see in the field. We must have all of the following: 1. A clone operation (these are infrequent) 2. A partial write. 3. A peering cycle must happen before this write is complete. The combination of 1 and 3 make this a very unusual operation in teuthology and will be even rarer in the field. The fix ensures we skip divergent log entries for partial writes that the shard did not participate in. The following is a minimal script to recreate: set -e -x MDS=0 MON=1 OSD=4 MGR=1 ../src/vstart.sh --debug --new -x --localhost -o timeout=10000 -o session_timeout=10000 -o debug_osd=20 ceph osd pool set noautoscale ceph balancer off ceph osd set nodeep-scrub ceph osd set noscrub ceph osd set noout ceph config set global bluestore_debug_inject_read_err true dd if=/dev/random of=file_8k bs=8k count=1 dd if=/dev/random of=file_4k bs=4k count=1 ceph osd erasure-code-profile set alex k=2 m=2 ceph osd pool create mypool --pg_num=1 --pool_type=erasure alex ceph osd pool set mypool allow_ec_overwrites true ceph osd pool set mypool allow_ec_optimizations true ceph osd pool set mypool min_size 2 rados put -p mypool test1 file_8k acting_set=$(ceph osd map mypool test1 --format=json | jq -r '.acting[]') acting_array=($acting_set) shard_0_osd=${acting_array[0]} shard_1_osd=${acting_array[1]} echo "Shard 0 OSD: $shard_0_osd" echo "Shard 1 OSD: $shard_1_osd" ceph daemon osd.$shard_0_osd injectecwriteerr mypool "*" 2 1 0 1 rados -p mypool mksnap test1_snap rados put -p mypool test1 file_4k --offset 0 & ceph osd set noup ceph osd down $shard_1_osd wait ceph osd unset noup rados -p mypool mksnap test1_snap2 rados put -p mypool test1 file_4k --offset 0 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com> (cherry picked from commit 65dce5e)
bill-scales
approved these changes
Dec 23, 2025
rzarzynski
approved these changes
Jan 7, 2026
Contributor
Author
|
jenkins test make check |
Contributor
Contributor
|
jenkins test api |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
backport tracker: https://tracker.ceph.com/issues/74269
backport of #66698
parent tracker: https://tracker.ceph.com/issues/74221
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh