New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pacific: librbd: don't report HOLE_UPDATED when diffing against a hole #54949
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If an object doesn't exist in both start and end versions but there is an intermediate snapshot which contains it (i.e. the object is written to and captured at some point but then discarded prior to or in the end version), diff-iterate reports "new hole" -- callback is invoked with exists=false. This occurs both on the slow list_snaps path and in fast-diff mode. Despite going all the way back to the introduction of diff-iterate in commit 0296c7c ("librbd: implement diff_iterate"), this behavior is wrong and contradicts diff-iterate API documentation added in commit a69532e ("librbd: document diff_iterate in header") in the same series: If the source snapshot name is NULL, we interpret that as the beginning of time and return all allocated regions of the image. It also triggered an assert added in commit c680531 ("librbd: change diff_iterate interface to be more C-friendly") in the same series. Unfortunately, commit f1f6407 ("test_librbd: add diff_iterate test including discard"), also part of the same series, added a test which expected the wrong behavior. Very confusing! A year later, a different manifestation of this bug was fixed in commit 9a1ab95 ("rbd: Fix rbd diff for non-existent objects"), but the fix only covered the case where calc_snap_set_diff() goes past the end snap ID while processing clones. The case where it runs out of clones to process before reaching the end snap ID remained mishandled. A year after that, commit 3ccc3bb ("librbd: diff_iterate needs to handle holes in parent images") dropped the assert mentioned above and this bug got enshrined in the newly introduced fast-diff mode. Finally, a few years later, deep-copy actually started relying on this bug in commit e5a21e9 ("librbd: deep-copy image copy state machine skips clean objects"). This necessitates bifurcation in DiffRequest because deep-copy wants the "has this object been touched" semantics, which is different from diff-iterate (and also potentially much more expensive to produce!). This commit brings a minimal update to TestMockObjectMapDiffRequest tests and DiffIterateTest.DiffIterateDiscard. Coverage is expanded in the following commits. Fixes: https://tracker.ceph.com/issues/53897 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit be507aa)
Similar to DiffIterateTest.DiffIterateDeterministic, systematically cover the most common cases involving full-object discards. With this in place, issue [1] can be reproduced by any of: (preparatory) before snap3 is taken (1) beginning of time -> HEAD (2) snap1 -> HEAD (5) beginning of time -> snap3 (6) snap1 -> snap3 Sub-object discards aren't covered here because of further issues [2][3]. [1] https://tracker.ceph.com/issues/53897 [2] https://tracker.ceph.com/issues/63770 [3] https://tracker.ceph.com/issues/63771 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 6b45a2d)
OBJECT_PENDING is a transition state which normally isn't encountered in (snapshot) object maps. In case it's encountered, for example when a snapshot is taken after losing power at the time a discard was being handled, the object should be treated as dirty and produce a diff as a result. Assuming an object is marked OBJECT_PENDING, theoretically there are four cases with respect to object's state in the next snapshot: 1. OBJECT_NONEXISTENT 2. OBJECT_EXISTS 3. OBJECT_PENDING 4. OBJECT_EXISTS_CLEAN Prior to commit b81cd24 ("librbd/object_map: diff state machine should track object existence"), (3) was handled incorrectly (diff set to DIFF_STATE_NONE instead of DIFF_STATE_UPDATED). Post commit 399a45e ("librbd/object_map: rbd diff between two snapshots lists entire image content"), (4) is handled incorrectly (diff set to DIFF_STATE_DATA instead of DIFF_STATE_DATA_UPDATED). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 70c1991)
Exercise both diff-iterate and deep-copy modes of operation. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit f47f7f8)
Existing *Delta tests cover: - beginning of time -> HEAD, through intermediate snap - snap -> snap, directly - snap -> HEAD, directly But coverage is too weak: none of the weird OBJECT_PENDING cases and only a single diff-iterate vs deep-copy case is tested, for example. Coverage is missing completely for: - beginning of time -> HEAD, directly - beginning of time -> snap, directly - beginning of time -> snap, through intermediate snap - snap -> snap, through intermediate snap - snap -> HEAD, through intermediate snap This adds the following tests: - FromBeginningToHead - FromBeginningToHeadIntermediateSnap (expands FullDelta) - FromBeginningToSnap - FromBeginningToSnapIntermediateSnap - FromSnapToSnap (expands IntermediateDelta) - FromSnapToSnapIntermediateSnap - FromSnapToHead (expands EndDelta) - FromSnapToHeadIntermediateSnap Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 7aff35c)
Despite the test in test_diff_iterate() being correct, it started failing: > check_diff(self.image, 0, IMG_SIZE, 'snap1', [(0, 512, False)]) ... a = [], b = [(0, 512, False)] ... > assert a == b E AssertionError This is because check_diff() drops 'snap1' argument on the floor and passes None to image.diff_iterate() instead. This goes back to 2013, see commit e88fe3c ("rbd.py: add some missing functions"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit f8ced6d)
jenkins test make check |
trociny
approved these changes
Dec 18, 2023
Rados approved |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
backport tracker: https://tracker.ceph.com/issues/63846
backport of #54547
parent tracker: https://tracker.ceph.com/issues/53897