Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacific: librbd: don't report HOLE_UPDATED when diffing against a hole #54949

Merged
merged 7 commits into from Dec 28, 2023

Conversation

idryomov
Copy link
Contributor

If an object doesn't exist in both start and end versions but there is
an intermediate snapshot which contains it (i.e. the object is written
to and captured at some point but then discarded prior to or in the end
version), diff-iterate reports "new hole" -- callback is invoked with
exists=false.  This occurs both on the slow list_snaps path and in
fast-diff mode.

Despite going all the way back to the introduction of diff-iterate in
commit 0296c7c ("librbd: implement diff_iterate"), this behavior
is wrong and contradicts diff-iterate API documentation added in commit
a69532e ("librbd: document diff_iterate in header") in the same
series:

    If the source snapshot name is NULL, we interpret that as
    the beginning of time and return all allocated regions of the
    image.

It also triggered an assert added in commit c680531 ("librbd:
change diff_iterate interface to be more C-friendly") in the same
series.  Unfortunately, commit f1f6407 ("test_librbd: add
diff_iterate test including discard"), also part of the same series,
added a test which expected the wrong behavior.  Very confusing!

A year later, a different manifestation of this bug was fixed in commit
9a1ab95 ("rbd: Fix rbd diff for non-existent objects"), but the
fix only covered the case where calc_snap_set_diff() goes past the end
snap ID while processing clones.  The case where it runs out of clones
to process before reaching the end snap ID remained mishandled.

A year after that, commit 3ccc3bb ("librbd: diff_iterate needs to
handle holes in parent images") dropped the assert mentioned above and
this bug got enshrined in the newly introduced fast-diff mode.

Finally, a few years later, deep-copy actually started relying on this
bug in commit e5a21e9 ("librbd: deep-copy image copy state machine
skips clean objects").  This necessitates bifurcation in DiffRequest
because deep-copy wants the "has this object been touched" semantics,
which is different from diff-iterate (and also potentially much more
expensive to produce!).

This commit brings a minimal update to TestMockObjectMapDiffRequest
tests and DiffIterateTest.DiffIterateDiscard.  Coverage is expanded in
the following commits.

Fixes: https://tracker.ceph.com/issues/53897
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit be507aa)
This effectively reverts commit 3ccc3bb ("librbd: diff_iterate
needs to handle holes in parent images") which just dropped the assert
instead of addressing the root cause of reported crashes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bcb107a)
Similar to DiffIterateTest.DiffIterateDeterministic, systematically
cover the most common cases involving full-object discards.  With this
in place, issue [1] can be reproduced by any of:

    (preparatory) before snap3 is taken
    (1) beginning of time -> HEAD
    (2) snap1 -> HEAD
    (5) beginning of time -> snap3
    (6) snap1 -> snap3

Sub-object discards aren't covered here because of further issues
[2][3].

[1] https://tracker.ceph.com/issues/53897
[2] https://tracker.ceph.com/issues/63770
[3] https://tracker.ceph.com/issues/63771

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6b45a2d)
OBJECT_PENDING is a transition state which normally isn't encountered
in (snapshot) object maps.  In case it's encountered, for example when
a snapshot is taken after losing power at the time a discard was being
handled, the object should be treated as dirty and produce a diff as
a result.

Assuming an object is marked OBJECT_PENDING, theoretically there are
four cases with respect to object's state in the next snapshot:

    1. OBJECT_NONEXISTENT
    2. OBJECT_EXISTS
    3. OBJECT_PENDING
    4. OBJECT_EXISTS_CLEAN

Prior to commit b81cd24 ("librbd/object_map: diff state machine
should track object existence"), (3) was handled incorrectly (diff set
to DIFF_STATE_NONE instead of DIFF_STATE_UPDATED).

Post commit 399a45e ("librbd/object_map: rbd diff between two
snapshots lists entire image content"), (4) is handled incorrectly
(diff set to DIFF_STATE_DATA instead of DIFF_STATE_DATA_UPDATED).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 70c1991)
Exercise both diff-iterate and deep-copy modes of operation.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f47f7f8)
Existing *Delta tests cover:

- beginning of time -> HEAD, through intermediate snap
- snap -> snap, directly
- snap -> HEAD, directly

But coverage is too weak: none of the weird OBJECT_PENDING cases and
only a single diff-iterate vs deep-copy case is tested, for example.

Coverage is missing completely for:

- beginning of time -> HEAD, directly
- beginning of time -> snap, directly
- beginning of time -> snap, through intermediate snap
- snap -> snap, through intermediate snap
- snap -> HEAD, through intermediate snap

This adds the following tests:

- FromBeginningToHead
- FromBeginningToHeadIntermediateSnap (expands FullDelta)
- FromBeginningToSnap
- FromBeginningToSnapIntermediateSnap
- FromSnapToSnap (expands IntermediateDelta)
- FromSnapToSnapIntermediateSnap
- FromSnapToHead (expands EndDelta)
- FromSnapToHeadIntermediateSnap

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7aff35c)
Despite the test in test_diff_iterate() being correct, it started
failing:

    >       check_diff(self.image, 0, IMG_SIZE, 'snap1', [(0, 512, False)])
    ...
    a = [], b = [(0, 512, False)]
    ...
    >       assert a == b
    E       AssertionError

This is because check_diff() drops 'snap1' argument on the floor and
passes None to image.diff_iterate() instead.  This goes back to 2013,
see commit e88fe3c ("rbd.py: add some missing functions").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit f8ced6d)
@idryomov
Copy link
Contributor Author

jenkins test make check

@NitzanMordhai
Copy link
Contributor

Rados approved
PACIFIC - RADOS - Ceph

@yuriw yuriw merged commit 695600c into ceph:pacific Dec 28, 2023
8 checks passed
@idryomov idryomov deleted the wip-53897-pacific branch December 28, 2023 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants