rbd-mirror: fix stuck to sync mirror group snaps on restart of daemon by pkalever · Pull Request #66653 · ceph/ceph

pkalever · 2025-12-16T14:28:44Z

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

pkalever · 2026-01-03T07:10:59Z

Jenkins run successful: http://141.125.111.75:8080/job/rbd_group_snap_mirror_tests_suite_run/658/

pkalever · 2026-01-03T07:14:04Z

jenkins test make check

VinayBhaskar-V · 2026-01-07T10:30:04Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+  int r = 0;
+  {
+    std::unique_lock locker{m_lock};
+    for (auto& local_snap : m_local_group_snaps) {


instead of having a for loop again to get the group snap ids with creating state, can we have an m_prune_creating_group_snap_ids and load it during the check in prune_creating_group_snapshots_if_any ? We need to update the check in prune_creating_group_snapshots_if_any such that it also includes user group snapshots and insert the group snapshots in creating state into m_prune_creating_group_snap_ids so that in handle_prune_creating_group_snapshots_if_any we can just iterate through the m_prune_creating_group_snap_ids

The prune_creating_group_snapshots_if_any() breaks now after the first found_creating match, so won't be in a shape to fill out m_prune_creating_group_snap_ids so I don't see much improvements here with the above. I would still keep it like it is now.

Okay. Actually what i want to suggest is to remove break and fill m_prune_creating_group_snap_ids. It's ok to keep it like it is now.

But in the current for loop in prune_creating_group_snapshots_if_any() why only checking the state for mirror snapshots to break the loop? I mean we can also check for user group snapshot right? i.e instead of

const auto& ns = std::get<cls::rbd::GroupSnapshotNamespaceMirror>( local_snap.snapshot_namespace); if (ns.state == cls::rbd::MIRROR_SNAPSHOT_STATE_NON_PRIMARY || ns.state == cls::rbd::MIRROR_SNAPSHOT_STATE_NON_PRIMARY_DEMOTED) { if (local_snap.state == cls::rbd::GROUP_SNAPSHOT_STATE_CREATING) { found_creating = true; break; } }

why not just

if (local_snap.state == cls::rbd::GROUP_SNAPSHOT_STATE_CREATING) { found_creating = true; break; }

I think this make sense for you with an example:

Take a scenario of force promote been executed on secondary cluster>

Assume we are running a CLI for force promote command which will generate a PRIMARY snapshot initially going though CREATING -> CREATED

If there is a daemon running on the secondary this now race with CREATING phase and try to prune it before even landing to CREATED leaving the PRIMARY promote snapshot in broken state.

This is why it is important to deal with NON_PRIMARY snapshots only as they are the one which we will be interested in pulling with the daemon actions.

Got it. I agree with you.

In prune_creating_group_snapshots_if_any(), we iterate through the local group snapshots and check whether there is at least one non-primary group snapshot in the CREATING state. If so, we list the group images and call handle_prune_creating_group_snapshots_if_any().

However, in handle_prune_creating_group_snapshots_if_any(), we then prune all local group snapshot, both user and mirror, that are in the CREATING state. This doesn’t make sense to me.

If the intent is to find the latest local non-primary group snapshot in the CREATING state, and then remove only those local group snapshots in the CREATING state that are earlier than this snapshot, that would make more sense to me.

@idryomov @pkalever what do you think?

The intent is that:

// If multiple user snapshots are syncing, there may be multiple // snapshots in the CREATING state in the local list.

In such case we will have to clean all of them.

As discussed, we want to iterate through group snapshots only once and determine which ones should be pruned. Identify the latest non-primary group snapshot in the CREATING state, then add that snapshot and any preceding user group snapshots that are also in the CREATING state to a collection for pruning.

The current code has a bug: if there is a single non-primary group snapshot in the CREATING state, it prunes all group snapshots that are in the CREATING state.

Is it possible for a primary or non-primary group snapshot in the CREATING state to be older than the latest non-primary group snapshot in the CREATING state?

As discussed, we want to iterate through group snapshots only once and determine which ones should be pruned. Identify the latest non-primary group snapshot in the CREATING state, then add that snapshot and any preceding user group snapshots that are also in the CREATING state to a collection for pruning.

Yep!

The current code has a bug: if there is a single non-primary group snapshot in the CREATING state, it prunes all group snapshots that are in the CREATING state.

If you are referring to a case where it might also prune the primary snapshot, we discussed this part yesterday that it is covered by image replayers today with its watchers mechanism and would help GR. Hence we are protected. That said, with the latest changes this is handled in the GR too, by filtering out primary snaps and only passing non-primary snaps.

Is it possible for a primary or non-primary group snapshot in the CREATING state to be older than the latest non-primary group snapshot in the CREATING state?

I don't think this is possible, if a non-primary snapshot has to sync it checks if there is any CREATING snap in-progress, only proceeds post the previous one has done with its syncing. The only exception is with user snapshots i.e. multiple user snapshots can sync at once, again until all the user snaps goes to CREATED the preceding mirror snap should not land in CREATED.

VinayBhaskar-V · 2026-01-07T10:37:56Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+
+      retry_needed = true;
+      locker->unlock();
+      for (auto& [_, image_replayer] : *m_image_replayers) {


Here instead of iterating again and again for each image, is it ok to have an lookup table? i.e something like this

std::unordered_map<std::string, ImageReplayer<I>*> replayer_by_image_id; replayer_by_image_id.reserve(m_image_replayers->size()); for (auto& [_, image_replayer] : *m_image_replayers) { if (image_replayer) { const auto& id = image_replayer->get_local_image_id(); if (!id.empty()) { replayer_by_image_id.emplace(id, image_replayer); } } }

and we can directly get the image_replayer for each image by just

auto replayer_it = replayer_by_image_id.find(image.spec.image_id); ImageReplayer<I>* image_replayer = (replayer_it != replayer_by_image_id.end() ? replayer_it->second : nullptr);

or am i missing something because even in the prune_all_image_snapshots we iterate again and again all over the image_replayers.

Yeah, this is once in a time operation actually. But such a hash would be helpful in other prune code, will try to keep one, this might be not that useful here, but should be helpful in other places too. Lets see, if needed I will append a cleanup commit using such routine in other places.

VinayBhaskar-V · 2026-01-07T13:00:39Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+      const auto* mirror_ns =
+        std::get_if<cls::rbd::MirrorSnapshotNamespace>(
+          &snap_info.snapshot_namespace);
+      const auto* user_ns =


I think if and else would be better here instead of fetching and using both mirror_ns and user_ns.

Was trying to do a check in a go there, not a big deal. Will keep it the old style.

VinayBhaskar-V · 2026-01-08T12:13:01Z

qa/workunits/rbd/rbd_mirror_group_simple.sh

  mirror_group_demote "${primary_cluster}" "${pool}/${group0}" 
-  mirror_group_resync "${secondary_cluster}" "${pool}/${group0}" 
  wait_for_group_status_in_pool_dir "${secondary_cluster}" "${pool}"/"${group0}" 'up+unknown'
+  mirror_group_resync "${secondary_cluster}" "${pool}/${group0}"


Is there any particular reason why this test case is changed?

Yes, this was caught as part of a code read exercise. It doesn't make any sense to check for 'up+unknown' post resync, it should be done right after demote.

Actually here the resync should not delete the group that's the criteria, so i think the test has wait_for_group_status_in_pool_dir "${secondary_cluster}" "${pool}"/"${group0}" 'up+unknown'
after mirror_group_resync to ensure that group doesn't deleted and attained 'up+unknown' state even after resync.

Shouldn't we actually let the demote propagate fist after issuing the demote command? and then do resync ?

If so we can simply add some delay post resync command before checking group_id_after may be a sleep 20 (max, as we know this will not resync anyway).

Actually i think there is no need wait for demote snapshot to complete it's propagation on secondary before issuing resync. The resync mainly needs to check whether the remote is primary or not before it takes any steps on secondary. Here in the test case we expect nothing to happen after issuing the resync and will wait demotion to propagate on secondary without any issue.

I agree we do not have to wait for sync to finish at the secondary. But I remember this conversation with John, we wanted to demote the group and the intention was to test the resync on a demoted group. Not on in-transit group on demotion.

Hence I would like this to atleast be like:

[0] pkalever 😎 ceph✨ git diff diff --git a/qa/workunits/rbd/rbd_mirror_group_simple.sh b/qa/workunits/rbd/rbd_mirror_group_simple.sh index ab226999efe7..3d768ed52d0a 100755 --- a/qa/workunits/rbd/rbd_mirror_group_simple.sh +++ b/qa/workunits/rbd/rbd_mirror_group_simple.sh @@ -3487,9 +3487,16 @@ test_resync_marker() # demote primary and request resync on secondary - check that group does not get deleted (due to resync request flag) mirror_group_demote "${primary_cluster}" "${pool}/${group0}" + wait_for_group_status_in_pool_dir "${primary_cluster}" "${pool}"/"${group0}" 'up+unknown' wait_for_group_status_in_pool_dir "${secondary_cluster}" "${pool}"/"${group0}" 'up+unknown' mirror_group_resync "${secondary_cluster}" "${pool}/${group0}" + # wait to give some time for resync to commence + sleep 20 + # making sure nothing changed eversince + wait_for_group_status_in_pool_dir "${primary_cluster}" "${pool}"/"${group0}" 'up+unknown' + wait_for_group_status_in_pool_dir "${secondary_cluster}" "${pool}"/"${group0}" 'up+unknown' +

And then check that old and new group id match.

Again, we can surely keep this change out of this commit as this is not the intention here. Let me know what you think.

But I remember this conversation with John, we wanted to demote the group and the intention was to test the resync on a demoted group. Not on in-transit group on demotion.

Then it's fine having a sleep like you have suggested above is good then

But I remember this conversation with John, we wanted to demote the group and the intention was to test the resync on a demoted group. Not on in-transit group on demotion.

I don't think the state the secondary group (i.e. one on which resync is issued) ever came up -- the fix for the issue of the secondary group getting deleted involved adding a check for the state of the primary group. The state of the secondary group isn't really relevant for sync.

My recollection is that this test case was written with resync immediately following demote on purpose. The idea is to ensure that rbd-mirror daemon on the secondary (1) gets a chance to scan the primary group and (2) based on that evaluates whether the secondary group needs to be removed or not (i.e. whether resync can actually be performed at that time) and decides not to remove. The wait for rbd-mirror daemon to do that was implemented organically -- by waiting for the demotion to be picked up. This change effectively discards that and replaces something organic with a dumb sleep which is much less robust. Hypothetically, it can take rbd-mirror daemon longer than 20 seconds to "commence" the resync.

Let's drop this and the other unrelated test change from the PR. It's not a good practice to modify existing test cases in passing like that.

Sure Ilya. This is not critical for this PR in anyway and doesn't hurt to leave behind.

VinayBhaskar-V · 2026-01-08T12:26:13Z

qa/workunits/rbd/rbd_mirror_group_simple.sh

+    stop_mirror_while_group_snapshot_incomplete "${secondary_cluster}" "${pool}" "${group0}" "${group_snap_id}" "created"
+  fi
+  # make sure the snapshot remains in an incomplete state after the daemon is stopped
+  test_group_snap_sync_incomplete "${secondary_cluster}" "${pool}/${group0}" "${group_snap_id}"


Here i think it would be nice to have a check on the state i.e state == creating or created according to the case as that is main focus of this test case. Something like below with an new helper test_group_snap_state that checks the state of the group snap

if [ "${scenario}" = 'snap_creating' ]; then test_group_snap_state "${secondary_cluster}" "${pool}/${group0}" "${group_snap_id}" "creating" elif [ "${scenario}" = 'snap_created' ]; then test_group_snap_state "${secondary_cluster}" "${pool}/${group0}" "${group_snap_id}" "created" test_group_snap_sync_incomplete "${secondary_cluster}" "${pool}/${group0}" "${group_snap_id}" fi

Good idea. Yes, may be its worth to scrutinize it at that level.

qa/workunits/rbd/rbd_mirror_group_simple.sh

pkalever · 2026-01-09T04:28:06Z

Latest run results: http://141.125.111.75:8080/job/rbd_group_snap_mirror_tests_suite_run/666

VinayBhaskar-V · 2026-01-12T07:15:32Z

Jenkins on top of latest changes: http://141.125.111.75:8080/job/rbd_group_snap_mirror_tests_suite_run/670/

VinayBhaskar-V · 2026-01-12T13:10:50Z

qa/workunits/rbd/rbd_mirror_group_simple.sh

  wait_for_group_synced "${primary_cluster}" "${pool}"/"${group0}" "${secondary_cluster}" "${pool}"/"${group0}"

-  mirror_group_snapshot "${primary_cluster}" "${pool}/${group0}"
-


Sorry i missed this earlier but is there any particular reason to remove this line? I understand that removing this don't effect the test but i think it is good to have this.

This one is out of the testcase reading again. The demote itself will create a snapshot. We don't need to explicitly create a snapshot and then demote. I don't see any benefit out here.

VinayBhaskar-V · 2026-01-12T13:46:55Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+    uint64_t snap_id,
+    std::unique_lock<ceph::mutex>& locker) {
+
+  if (!image_replayer) {


I think this if condition is redundant, we already check that image_replayer is null or not and skip in get_replayers_by_image_id()

It is good practice for routines to be self-contained and to validate required input arguments before using them. I highly recommend maintaining these checks in every routine to prevent dereferencing null pointers or accessing invalid memory as a result of future changes or new callers.

VinayBhaskar-V · 2026-01-12T13:51:09Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+    uint64_t snap_id, std::unique_lock<ceph::mutex>& locker) {
+
+  if (!image_replayer) {
+    return;


I think this if condition is redundant, we already check that image_replayer is null or not and skip in get_replayers_by_image_id()

It is good practice for routines to be self-contained and to validate required input arguments before using them. I highly recommend maintaining these checks in every routine to prevent dereferencing null pointers or accessing invalid memory as a result of future changes or new callers.

VinayBhaskar-V · 2026-01-12T13:51:57Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+std::string Replayer<I>::get_global_image_id( ImageReplayer<I>* image_replayer,
+    std::unique_lock<ceph::mutex>& locker) {
+
+  if (!image_replayer) {


See my comments above.

VinayBhaskar-V · 2026-01-12T13:57:01Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+          local_snap.id);
+      if (r < 0) {
+        derr << "failed to remove group snapshot "
+          << local_snap.id << ": "


Nit: indentation

pkalever · 2026-01-13T19:18:49Z

Latest jenkins results: http://141.125.111.75:8080/job/rbd_group_snap_mirror_tests_suite_run/672/console

qa/workunits/rbd/rbd_mirror_group_simple.sh

qa/workunits/rbd/rbd_mirror_helpers.sh

ajarr · 2026-01-26T19:52:27Z

qa/workunits/rbd/rbd_mirror_helpers.sh

+    local last_snap_name='';
+    for s in 0.1  0.1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 8 8 8 8 8 8 8 8 16 16 32 32; do
+      run_cmd "rbd --cluster ${cluster} snap list ${pool}/${image} --all --format xml --pretty-format"
+      last_snap_name=$(xmlstarlet sel -t -v "(//snapshots/snapshot/name)[last()]" < "$CMD_STDOUT") || { last_snap_name=''; }


Why is || { last_snap_name=''; } necessary?

I'm guessing the idea is to not have any stale collection on failed xmlstarlet, its a common usage,

[0] pkalever 😎 ceph✨ git grep -n xmlstarlet qa/workunits/rbd/rbd_mirror_helpers.sh | grep "||" qa/workunits/rbd/rbd_mirror_helpers.sh:1019: test "${expected_snap_count}" = "$(xmlstarlet sel -t -v "count(//snapshots/snapshot/namespace[primary_snap_id='${snap_id}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:1030: test "${expected_complete}" = "$(xmlstarlet sel -t -v "//snapshots/snapshot/namespace[primary_snap_id='${snap_id}']/complete" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:1163: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1179: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1196: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1241: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1871: last_snap_name=$(xmlstarlet sel -t -v "(//snapshots/snapshot/name)[last()]" < "$CMD_STDOUT") || { last_snap_name=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:1908: state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:1909: snaps_synced=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/namespace/complete" < "$CMD_STDOUT") || { snaps_synced='false'; } qa/workunits/rbd/rbd_mirror_helpers.sh:2069: test "${test_image_count}" = "$(xmlstarlet sel -t -v "count(//image/mirroring[global_id='${global_id}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2188: _id=$(xmlstarlet sel -t -v "//image/id" "$CMD_STDOUT") || { fail "no id!"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:2198: _global_id=$(xmlstarlet sel -t -v "//image/mirroring/global_id" "$CMD_STDOUT") || { fail "not mirrored"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:2217: _size=$(xmlstarlet sel -t -v "//image/size" "$CMD_STDOUT") || { fail "unable to determine size"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:2633: test "${test_group_count}" = "$(xmlstarlet sel -t -v "count(//groups[name='${group}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2716: test "${expected_snap_count}" = "$(xmlstarlet sel -t -v "count(//group_snaps/group_snap[id='${group_snap_id}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2788: state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:2789: snaps_synced=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/namespace/complete" < "$CMD_STDOUT") || { snaps_synced='false'; } qa/workunits/rbd/rbd_mirror_helpers.sh:2895: xmlstarlet sel -Q -t -v "//group/state[contains(text(), ${test_state})]" "${CMD_STDOUT}" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2920: group_replayers=$(xmlstarlet sel -t -v "//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/name" < "$CMD_STDOUT") || { group_replayers=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:2921: image_replayers=$(xmlstarlet sel -t -v "//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/image_replayers/image_replayer/name" < "$CMD_STDOUT") || { image_replayers=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:2922: group_replayers_count=$(xmlstarlet sel -t -v "count(//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/name)" < "$CMD_STDOUT") || { group_replayers_count='0'; } qa/workunits/rbd/rbd_mirror_helpers.sh:2923: image_replayers_count=$(xmlstarlet sel -t -v "count(//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/image_replayers/image_replayer/name)" < "$CMD_STDOUT") || { image_replayers_count='0'; } qa/workunits/rbd/rbd_mirror_helpers.sh:3068: test -n "${state_pattern}" && { test "${state_pattern}" = $(xmlstarlet sel -t -v "//group/state" < "${CMD_STDOUT}" ) || { fail; return 1; } } qa/workunits/rbd/rbd_mirror_helpers.sh:3069: test -n "${description_pattern}" && { test "${description_pattern}" = "$(xmlstarlet sel -t -v "//group/description" "${CMD_STDOUT}" )" || { fail; return 1; } } qa/workunits/rbd/rbd_mirror_helpers.sh:3072: xmlstarlet sel -Q -t -v "//group/daemon_service/daemon_id[contains(text(), ${MIRROR_USER_ID_PREFIX})]" "${CMD_STDOUT}" || { fail; return 1; }

ajarr · 2026-01-26T19:55:18Z

qa/workunits/rbd/rbd_mirror_helpers.sh

+    local count=0
+    while [ "${count}" -lt 60 ]; do
+      run_cmd "rbd --cluster ${cluster} group snap ls ${pool}/${group} --format xml --pretty-format"
+      state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; }


Why is || { state=''; } needed?

see above, I don't see anything wrong in explicitly expressing it on failure of a command, this makes it even clear for a reader.

src/tools/rbd_mirror/group_replayer/Replayer.cc

ajarr · 2026-01-27T00:55:45Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+  int r = 0;
+  {
+    std::unique_lock locker{m_lock};
+    for (auto& local_snap : m_local_group_snaps) {


In prune_creating_group_snapshots_if_any(), we iterate through the local group snapshots and check whether there is at least one non-primary group snapshot in the CREATING state. If so, we list the group images and call handle_prune_creating_group_snapshots_if_any().

However, in handle_prune_creating_group_snapshots_if_any(), we then prune all local group snapshot, both user and mirror, that are in the CREATING state. This doesn’t make sense to me.

If the intent is to find the latest local non-primary group snapshot in the CREATING state, and then remove only those local group snapshots in the CREATING state that are earlier than this snapshot, that would make more sense to me.

@idryomov @pkalever what do you think?

validation of an image snapshot association with a group snapshot requires checking either a valid group_spec or the presence of a group_snap_id in cls::rbd::MirrorSnapshotNamespace. this commit tries to remove redundant validation of checking for both and rely on this minimal condition of checking for a valid group_spec. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

…napshot On daemon restart, the image replayer currently deletes and recreates image snapshots if object copying has not yet started, in order to avoid missing image state such as object-map or metadata. This logic is unnecessary for image snapshot part of mirror group snapshots. By the time a group snapshot reaches GROUP_SNAPSHOT_STATE_CREATED, all member image snapshots are already guaranteed to be in the CREATED state. Deleting such image snapshots provides no benefit and can cause group snapshots to become stuck (in current case) waiting for such image snapshots. Skip image snapshot deletion when the snapshot is part of a group snapshot. A follow-up commit will address handling group snapshots that remain in GROUP_SNAPSHOT_STATE_CREATING across a daemon restart by deleting and allowing the syncing recreating the group snapshot as a whole. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

pkalever

Thanks for the review @ajarr

qa/workunits/rbd/rbd_mirror_group_simple.sh

pkalever · 2026-01-27T05:59:26Z

qa/workunits/rbd/rbd_mirror_helpers.sh

+    local last_snap_name='';
+    for s in 0.1  0.1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 8 8 8 8 8 8 8 8 16 16 32 32; do
+      run_cmd "rbd --cluster ${cluster} snap list ${pool}/${image} --all --format xml --pretty-format"
+      last_snap_name=$(xmlstarlet sel -t -v "(//snapshots/snapshot/name)[last()]" < "$CMD_STDOUT") || { last_snap_name=''; }


I'm guessing the idea is to not have any stale collection on failed xmlstarlet, its a common usage,

[0] pkalever 😎 ceph✨ git grep -n xmlstarlet qa/workunits/rbd/rbd_mirror_helpers.sh | grep "||" qa/workunits/rbd/rbd_mirror_helpers.sh:1019: test "${expected_snap_count}" = "$(xmlstarlet sel -t -v "count(//snapshots/snapshot/namespace[primary_snap_id='${snap_id}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:1030: test "${expected_complete}" = "$(xmlstarlet sel -t -v "//snapshots/snapshot/namespace[primary_snap_id='${snap_id}']/complete" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:1163: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1179: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1196: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1241: result=$(xmlstarlet sel -t -v "$field" < "$CMD_STDOUT") || { fail "field not found: ${field}"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:1871: last_snap_name=$(xmlstarlet sel -t -v "(//snapshots/snapshot/name)[last()]" < "$CMD_STDOUT") || { last_snap_name=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:1908: state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:1909: snaps_synced=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/namespace/complete" < "$CMD_STDOUT") || { snaps_synced='false'; } qa/workunits/rbd/rbd_mirror_helpers.sh:2069: test "${test_image_count}" = "$(xmlstarlet sel -t -v "count(//image/mirroring[global_id='${global_id}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2188: _id=$(xmlstarlet sel -t -v "//image/id" "$CMD_STDOUT") || { fail "no id!"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:2198: _global_id=$(xmlstarlet sel -t -v "//image/mirroring/global_id" "$CMD_STDOUT") || { fail "not mirrored"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:2217: _size=$(xmlstarlet sel -t -v "//image/size" "$CMD_STDOUT") || { fail "unable to determine size"; return; } qa/workunits/rbd/rbd_mirror_helpers.sh:2633: test "${test_group_count}" = "$(xmlstarlet sel -t -v "count(//groups[name='${group}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2716: test "${expected_snap_count}" = "$(xmlstarlet sel -t -v "count(//group_snaps/group_snap[id='${group_snap_id}'])" < "$CMD_STDOUT")" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2788: state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:2789: snaps_synced=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/namespace/complete" < "$CMD_STDOUT") || { snaps_synced='false'; } qa/workunits/rbd/rbd_mirror_helpers.sh:2895: xmlstarlet sel -Q -t -v "//group/state[contains(text(), ${test_state})]" "${CMD_STDOUT}" || { fail; return 1; } qa/workunits/rbd/rbd_mirror_helpers.sh:2920: group_replayers=$(xmlstarlet sel -t -v "//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/name" < "$CMD_STDOUT") || { group_replayers=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:2921: image_replayers=$(xmlstarlet sel -t -v "//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/image_replayers/image_replayer/name" < "$CMD_STDOUT") || { image_replayers=''; } qa/workunits/rbd/rbd_mirror_helpers.sh:2922: group_replayers_count=$(xmlstarlet sel -t -v "count(//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/name)" < "$CMD_STDOUT") || { group_replayers_count='0'; } qa/workunits/rbd/rbd_mirror_helpers.sh:2923: image_replayers_count=$(xmlstarlet sel -t -v "count(//mirror_status/pool_replayers/pool_replayer_status/group_replayers/group_replayer/image_replayers/image_replayer/name)" < "$CMD_STDOUT") || { image_replayers_count='0'; } qa/workunits/rbd/rbd_mirror_helpers.sh:3068: test -n "${state_pattern}" && { test "${state_pattern}" = $(xmlstarlet sel -t -v "//group/state" < "${CMD_STDOUT}" ) || { fail; return 1; } } qa/workunits/rbd/rbd_mirror_helpers.sh:3069: test -n "${description_pattern}" && { test "${description_pattern}" = "$(xmlstarlet sel -t -v "//group/description" "${CMD_STDOUT}" )" || { fail; return 1; } } qa/workunits/rbd/rbd_mirror_helpers.sh:3072: xmlstarlet sel -Q -t -v "//group/daemon_service/daemon_id[contains(text(), ${MIRROR_USER_ID_PREFIX})]" "${CMD_STDOUT}" || { fail; return 1; }

pkalever · 2026-01-27T06:12:51Z

qa/workunits/rbd/rbd_mirror_helpers.sh

+    local count=0
+    while [ "${count}" -lt 60 ]; do
+      run_cmd "rbd --cluster ${cluster} group snap ls ${pool}/${group} --format xml --pretty-format"
+      state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; }


see above, I don't see anything wrong in explicitly expressing it on failure of a command, this makes it even clear for a reader.

pkalever · 2026-01-27T06:14:04Z

qa/workunits/rbd/rbd_mirror_helpers.sh

+    while [ "${count}" -lt 60 ]; do
+      run_cmd "rbd --cluster ${cluster} group snap ls ${pool}/${group} --format xml --pretty-format"
+      state=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/state" < "$CMD_STDOUT") || { state=''; }
+      snaps_synced=$(xmlstarlet sel -t -v "//group_snaps/group_snap[id='${group_snap_id}']/namespace/complete" < "$CMD_STDOUT") || { snaps_synced='false'; }


see above, I don't see anything wrong in explicitly expressing it on failure of a command, this makes it even clear for a reader.

src/tools/rbd_mirror/group_replayer/Replayer.cc

qa/workunits/rbd/rbd_mirror_helpers.sh

pkalever · 2026-01-27T08:02:37Z

qa/workunits/rbd/rbd_mirror_helpers.sh

    fi
 }

+test_group_snap_completeness_state()


we refer CREATING/CREATED state as completeness and we are used to it with the recent changes to the code. I just wanted it to be more verbose and descriptive by name. I can keep it short.

qa/workunits/rbd/rbd_mirror_helpers.sh

ajarr · 2026-01-27T19:37:19Z

src/tools/rbd_mirror/group_replayer/Replayer.cc

+  int r = 0;
+  {
+    std::unique_lock locker{m_lock};
+    for (auto& local_snap : m_local_group_snaps) {


As discussed, we want to iterate through group snapshots only once and determine which ones should be pruned. Identify the latest non-primary group snapshot in the CREATING state, then add that snapshot and any preceding user group snapshots that are also in the CREATING state to a collection for pruning.

The current code has a bug: if there is a single non-primary group snapshot in the CREATING state, it prunes all group snapshots that are in the CREATING state.

Is it possible for a primary or non-primary group snapshot in the CREATING state to be older than the latest non-primary group snapshot in the CREATING state?

qa/workunits/rbd/rbd_mirror_helpers.sh

pkalever · 2026-01-28T15:31:27Z

Latest jenkins job run successfully: http://141.125.111.75:8080/job/rbd_group_snap_mirror_tests_suite_run/679/

qa/workunits/rbd/rbd_mirror_helpers.sh

src/tools/rbd_mirror/group_replayer/Replayer.cc

ajarr

Still need the CI to pass before merging. The code changes look fine

after a daemon restart, prune the entire group snapshot if it remains in GROUP_SNAPSHOT_STATE_CREATING. This aligns group snapshot handling with the image replayer logic and ensures that all member image snapshots are cleanly deleted and recreated. This is required because some image snapshots in the group may not have started object copying prior to the restart, which can otherwise lead to missing image state, object-map, or related metadata. Also, add the necessary tests to validate interrupted synchronization during group snapshot syncing. Specifically, cover the following scenarios: Scenario 1: The snapshot on the secondary is in the creating phase when the daemon is restarted. Scenario 2: The snapshot on the secondary is in the created phase when the daemon is restarted. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

pkalever · 2026-01-30T09:26:11Z

Still need the CI to pass before merging. The code changes look fine

Latest jenkins run successfully:
http://141.125.111.75:8080/job/rbd_group_snap_mirror_tests_suite_run/688/console

Defined below routines which makes calls to image replayers: prune_image_snapshot() get_replayers_by_image_id() set_image_replayer_end_limits() this commit start using them. Also use get_replayers_by_image_id() in other places of group replayer Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

pkalever requested a review from a team as a code owner December 16, 2025 14:28

github-actions bot added the rbd label Dec 16, 2025

pkalever changed the title ~~fix stuck to sync mirror group snaps on restart of rbd-mirror daemon~~ rbd-mirror: fix stuck to sync mirror group snaps on restart of daemon Dec 16, 2025

pkalever force-pushed the remove-creating-snaps-on-restart branch 3 times, most recently from b63d1da to 96a6b8f Compare December 18, 2025 12:47

pkalever force-pushed the remove-creating-snaps-on-restart branch from 96a6b8f to 371d9fa Compare December 24, 2025 12:41

github-actions bot added the tests label Dec 24, 2025

pkalever force-pushed the remove-creating-snaps-on-restart branch 3 times, most recently from 58bdccf to 705049c Compare January 1, 2026 11:05

pkalever force-pushed the remove-creating-snaps-on-restart branch from 705049c to 5b18ab5 Compare January 2, 2026 16:44

VinayBhaskar-V reviewed Jan 7, 2026

View reviewed changes

VinayBhaskar-V reviewed Jan 8, 2026

View reviewed changes

pkalever force-pushed the remove-creating-snaps-on-restart branch from 5b18ab5 to 308c496 Compare January 8, 2026 18:54

pkalever force-pushed the remove-creating-snaps-on-restart branch from 308c496 to 6cf6d2c Compare January 9, 2026 06:33

pkalever requested a review from VinayBhaskar-V January 9, 2026 06:34

VinayBhaskar-V reviewed Jan 12, 2026

View reviewed changes

pkalever force-pushed the remove-creating-snaps-on-restart branch from 6cf6d2c to 401d925 Compare January 13, 2026 19:08

VinayBhaskar-V approved these changes Jan 19, 2026

View reviewed changes

ajarr requested changes Jan 27, 2026

View reviewed changes

Prasanna Kumar Kalever added 2 commits January 27, 2026 11:45

pkalever force-pushed the remove-creating-snaps-on-restart branch from 401d925 to 00a72b9 Compare January 27, 2026 09:45

pkalever commented Jan 27, 2026

View reviewed changes

pkalever requested a review from ajarr January 27, 2026 09:47

ajarr requested changes Jan 27, 2026

View reviewed changes

qa/workunits/rbd/rbd_mirror_helpers.sh Outdated Show resolved Hide resolved

qa/workunits/rbd/rbd_mirror_helpers.sh Outdated Show resolved Hide resolved

qa/workunits/rbd/rbd_mirror_helpers.sh Outdated Show resolved Hide resolved

pkalever requested a review from ajarr January 28, 2026 07:52

pkalever force-pushed the remove-creating-snaps-on-restart branch 2 times, most recently from d83d0e3 to 7c946bc Compare January 28, 2026 08:08

ajarr requested changes Jan 28, 2026

View reviewed changes

src/tools/rbd_mirror/group_replayer/Replayer.cc Show resolved Hide resolved

pkalever force-pushed the remove-creating-snaps-on-restart branch from 7c946bc to 6896d7b Compare January 29, 2026 10:25

pkalever requested a review from ajarr January 29, 2026 10:25

ajarr mentioned this pull request Jan 29, 2026

rbd-mirror: allow resync while a group snapshot is still syncing #66600

Merged

14 tasks

pkalever force-pushed the remove-creating-snaps-on-restart branch from 6896d7b to 48f893a Compare January 29, 2026 17:10

ajarr approved these changes Jan 29, 2026

View reviewed changes

pkalever force-pushed the remove-creating-snaps-on-restart branch from 48f893a to ca44955 Compare January 29, 2026 17:44

pkalever force-pushed the remove-creating-snaps-on-restart branch from ca44955 to 1289a82 Compare February 2, 2026 13:20

pkalever requested a review from idryomov February 2, 2026 13:28

idryomov merged commit 680b2ea into ceph:wip-rbd-cgsm-base Feb 2, 2026
7 of 10 checks passed

		wait_for_group_synced "${primary_cluster}" "${pool}"/"${group0}" "${secondary_cluster}" "${pool}"/"${group0}"

		mirror_group_snapshot "${primary_cluster}" "${pool}/${group0}"

Conversation

pkalever commented Dec 16, 2025

Checklist

Uh oh!

pkalever commented Jan 3, 2026

Uh oh!

pkalever commented Jan 3, 2026

Uh oh!

VinayBhaskar-V Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VinayBhaskar-V Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pkalever commented Jan 9, 2026

Uh oh!

VinayBhaskar-V commented Jan 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VinayBhaskar-V Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

VinayBhaskar-V Jan 7, 2026 •

edited

Loading

VinayBhaskar-V Jan 8, 2026 •

edited

Loading

VinayBhaskar-V Jan 12, 2026 •

edited

Loading