Skip to content

rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field#65164

Merged
idryomov merged 1 commit intoceph:wip-rbd-cgsm-basefrom
VinayBhaskar-V:wip-add-complete-field
Nov 22, 2025
Merged

rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field#65164
idryomov merged 1 commit intoceph:wip-rbd-cgsm-basefrom
VinayBhaskar-V:wip-add-complete-field

Conversation

@VinayBhaskar-V
Copy link
Copy Markdown
Contributor

@VinayBhaskar-V VinayBhaskar-V commented Aug 21, 2025

This commit introduces the new field complete, of type MirrorGroupSnapshotCompleteState enum,
to the GroupSnapshotNamespaceMirror structure. This change is necessary to align behavior of
mirror group snapshots with that of mirror image snapshots, allowing for a precise differentiation
between a group snapshot that has been created and one that has been fully synced.

1. Handling Old-Style Snapshots

Decoding Old Snapshots: The original GroupSnapshotNamespaceMirror structure lacked the complete field,
which implicitly defaulted to a bool value of false upon initialization.
When an old snapshot (lacking the complete field) is decoded by an upgraded client,
the implicit default value maps to MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED.

Completion Check: A snapshot is determined old by checking it's complete filed i.e
complete == MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED and if it's old the sync completion for these group snapshots is determined by checking the state field i.e state == GROUP_SNAPSHOT_STATE_CREATED.

During a upgrade where OSDs have not yet been updated, the new client will be forced to create snapshots using the old style.
These snapshots will be initialized with MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED and will stay on that
to prevent immediate, incorrect cleanup by the old OSDs and in this case state field is set to GROUP_SNAPSHOT_STATE_CREATED
only after snapshot completed it's sync.

2. Handling New-Style Snapshots

New snapshots are initialized with complete == MIRROR_GROUP_SNAPSHOT_INCOMPLETE, state == GROUP_SNAPSHOT_STATE_CREATING
The group snapshot's state is marked as GROUP_SNAPSHOT_STATE_CREATED as soon as its metadata is fully available and stored.

Completion Check: The snapshot's sync is confirmed only when complete == MIRROR_GROUP_SNAPSHOT_COMPLETE
along with state check (state == GROUP_SNAPSHOT_STATE_CREATED) is satisfied.

This approach ensures seamless transition and compatibility, allowing the system to correctly interpret the
synchronization status of both old and newly created group snapshots.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@VinayBhaskar-V VinayBhaskar-V requested a review from a team as a code owner August 21, 2025 05:57
@github-actions github-actions bot added the rbd label Aug 21, 2025
@VinayBhaskar-V VinayBhaskar-V force-pushed the wip-add-complete-field branch 2 times, most recently from 3f69b1e to 35416e5 Compare September 1, 2025 08:38
@VinayBhaskar-V VinayBhaskar-V changed the title rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field wip:rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field Sep 2, 2025
@VinayBhaskar-V VinayBhaskar-V changed the title wip:rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field Sep 4, 2025
@VinayBhaskar-V
Copy link
Copy Markdown
Contributor Author

Not yet tested with rbd_mirror_group.sh and rbd_mirror_group_simple.sh

@VinayBhaskar-V VinayBhaskar-V force-pushed the wip-add-complete-field branch 2 times, most recently from a5f9f89 to 496493f Compare September 10, 2025 08:35
@VinayBhaskar-V
Copy link
Copy Markdown
Contributor Author

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@ajarr
Copy link
Copy Markdown
Contributor

ajarr commented Oct 27, 2025

@ajarr Regarding

I prefer that this be variable either be set to "incomplete" or "complete" string literal for the human readable output. This way it will closely match the machine readable output as well, which currently sets "complete" field to true or false.

Discussion Link

Got it. I see that we're trying to match what's done for mirror image snapshots. Referred do_list_snaps in src/tools/rbd/action/Snap.cc . You can ignore my comment

@ajarr
Copy link
Copy Markdown
Contributor

ajarr commented Oct 28, 2025

Naming suggestions for methods in group_replayer/Replayer.cc

  • try_create_group_snapshot ---> try_creating_group_snapshot
  • create_group_snapshot ---> creating_group_snapshot
  • create_mirror_snapshot/handle_create_mirror_snapshot ---> creating_mirror_snapshot/handle_creating_mirror_snapshot
  • create_user_snapshot/handle_create_user_snapshot ---> creating_user_snapshot/handle_creating_user_snapshot

With the above suggestion, we can subtly get the idea that the function is generating a group snapshot in CREATING state. The current naming just tells me that the group snapshot is created without indicating that the snapshot is in CREATING state.

  • post_mirror_snapshot_complete/handle_post_mirror_snapshot_complete -> created_mirror_snapshot/handle_created_mirror_snapshot
  • post_user_snapshot_complete/handle_post_user_snapshot_complete -> created_user_snapshot_complete/handle_created_user_snapshot

The previous naming of the above methods was misleading. The methods weren't executing steps after (post) a mirror or user snapshot was complete. They were actually trying to set a mirror or user snapshot to a complete if the snapshot met the requirements. With the changes in the PR, we set the state of mirror snapshot to CREATED if the requirements are met.

And then I'd break up the handle_post_mirror_snapshot_complete(). It'd call complete_miror_snapshot() that has a callback handle_complete_mirror_snapshot(), which is currently named handle_mirror_snapshot_complete_field() . So something like,

template <typename I>
void Replayer<I>::handle_created_mirror_snapshot(
    int r, const std::string &group_snap_id,
    const cls::rbd::GroupSnapshot &local_snap,
    Context *on_finish) {
  dout(10) << group_snap_id << ", r=" << r << dendl;

  if (r < 0) {
    on_finish->complete(r);
    return;
  }

  complete_mirror_snapshot(group_snap_id, local_snap, on_finish);
}

template <typename I>
void Replayer<I>::complete_mirror_snapshot(
    const std::string &group_snap_id,
    const cls::rbd::GroupSnapshot &local_snap,
    Context *on_finish) {
  std::unique_lock locker{m_lock};
  dout(10) << group_snap_id << ", r=" << r << dendl;

  uint64_t num_images_sync_pending = 0;
  for (auto snap_spec : local_snap.snaps){
     // count number of member image snaps that have not fully synced
  }
  
  // if num_images_sync_pending is zero, i.e., the snapshot has been fully synced
  // then asynchronously set the mirror snapshot's namespace to COMPLETE
  
  }
  
template <typename I>
void Replayer<I>::handle_complete_mirror_snapshot(
    const std::string &group_snap_id,
    const cls::rbd::GroupSnapshot &local_snap,
    Context *on_finish) {
  // body is same as that of handle_complete_mirror_snapshot_field 
}

@VinayBhaskar-V what do you think?

@pkalever
Copy link
Copy Markdown

Naming suggestions for methods in group_replayer/Replayer.cc

I would keep the naming changes into a separate commit, as they seem a bit out of the context to the subject of the PR here, but if we are touching them anyway and @VinayBhaskar-V is fine applying it here, I have some suggestions:

  • try_create_group_snapshot ---> try_creating_group_snapshot

try_creating_group_snapshot sounds good.

  • create_group_snapshot ---> creating_group_snapshot
  • create_mirror_snapshot/handle_create_mirror_snapshot ---> creating_mirror_snapshot/handle_creating_mirror_snapshot
  • create_user_snapshot/handle_create_user_snapshot ---> creating_user_snapshot/handle_creating_user_snapshot

while these sounds odd to me, and I would stick to the existing.

With the above suggestion, we can subtly get the idea that the function is generating a group snapshot in CREATING state. The current naming just tells me that the group snapshot is created without indicating that the snapshot is in CREATING state.

  • post_mirror_snapshot_complete/handle_post_mirror_snapshot_complete -> created_mirror_snapshot/handle_created_mirror_snapshot
  • post_user_snapshot_complete/handle_post_user_snapshot_complete -> created_user_snapshot_complete/handle_created_user_snapshot

I see Raman's valid point, how about just mirror_snapshot_complete/user_snapshot_complete ?

The previous naming of the above methods was misleading. The methods weren't executing steps after (post) a mirror or user snapshot was complete. They were actually trying to set a mirror or user snapshot to a complete if the snapshot met the requirements. With the changes in the PR, we set the state of mirror snapshot to CREATED if the requirements are met.

And then I'd break up the handle_post_mirror_snapshot_complete(). It'd call complete_miror_snapshot() that has a callback handle_complete_mirror_snapshot(), which is currently named handle_mirror_snapshot_complete_field() . So something like,

.. and if we really want a split for the later routine, we can then name it post_mirror_snapshot_complete or mirror_snapshot_complete_finish and alike for user snaps.

@VinayBhaskar-V
Copy link
Copy Markdown
Contributor Author

  • create_group_snapshot ---> creating_group_snapshot
  • create_mirror_snapshot/handle_create_mirror_snapshot ---> creating_mirror_snapshot/handle_creating_mirror_snapshot
  • create_user_snapshot/handle_create_user_snapshot ---> creating_user_snapshot/handle_creating_user_snapshot

while these sounds odd to me, and I would stick to the existing.

I agree with prasanna here

  • post_mirror_snapshot_complete/handle_post_mirror_snapshot_complete -> created_mirror_snapshot/handle_created_mirror_snapshot
  • post_user_snapshot_complete/handle_post_user_snapshot_complete -> created_user_snapshot_complete/handle_created_user_snapshot

instead of this how about having something like

  • update_mirror_snapshot_state_to_created / handle_update_mirror_snapshot_state_to_created
  • update_user_snapshot_state_to_created / handle_update_user_snapshot_state_to_created

And then I'd break up the handle_post_mirror_snapshot_complete(). It'd call complete_miror_snapshot() that has a callback handle_complete_mirror_snapshot(), which is currently named handle_mirror_snapshot_complete_field() . So something like,

I agree with ramana for this.

@ajarr ajarr self-requested a review November 5, 2025 16:48
@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 6, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@VinayBhaskar-V VinayBhaskar-V force-pushed the wip-add-complete-field branch 3 times, most recently from 791e94b to c941c74 Compare November 21, 2025 13:21
Copy link
Copy Markdown
Contributor

@ajarr ajarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

…lete field

This commit introduces the new field **complete**, of type **MirrorGroupSnapshotCompleteState** enum,
to the GroupSnapshotNamespaceMirror structure. This change is necessary to align behavior of
mirror group snapshots with that of mirror image snapshots, allowing for a precise differentiation
between a group snapshot that has been created and one that has been fully synced.

**1. Handling Old-Style Snapshots**

Decoding Old Snapshots: The original GroupSnapshotNamespaceMirror structure lacked the complete field,
which implicitly defaulted to a bool value of false upon initialization.
When an old snapshot (lacking the complete field) is decoded by an upgraded client,
the implicit default value maps to MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED.

Completion Check: A snapshot is determined old by checking it's complete filed i.e
complete == MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED and if it's old the sync completion
for these group snapshots is determined by checking the state field
i.e state == GROUP_SNAPSHOT_STATE_CREATED.

During a upgrade where **OSDs have not yet been updated**, the new client will be forced to create
snapshots using the old style. These snapshots will be initialized with MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED
and will stay on that to prevent immediate, incorrect cleanup by the old OSDs and in this case
state field is set to **GROUP_SNAPSHOT_STATE_CREATED** only after snapshot completed it's sync.

**2. Handling New-Style Snapshots**

New snapshots are initialized with complete == **MIRROR_GROUP_SNAPSHOT_INCOMPLETE**,
state == GROUP_SNAPSHOT_STATE_CREATING. The group snapshot's state is marked as GROUP_SNAPSHOT_STATE_CREATED
as soon as its metadata is fully available and stored.

Completion Check: The snapshot's sync is confirmed only when complete == MIRROR_GROUP_SNAPSHOT_COMPLETE
along with state check (state == GROUP_SNAPSHOT_STATE_CREATED) is satisfied.

This approach ensures seamless transition and compatibility, allowing the system to correctly interpret the
synchronization status of both old and newly created group snapshots.

Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
Copy link
Copy Markdown

@pkalever pkalever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

It will be nice to validate the pruning of group snapshots manually with the PR as we don't have tests coverage there.

@idryomov
Copy link
Copy Markdown
Contributor

It will be nice to validate the pruning of group snapshots manually with the PR as we don't have tests coverage there.

@pkalever shared offline:

I had tested the basic pruning functioning (focusing on secondary) both for mirror and user snapshots with 2 user snapshots and a bunch of mirror group snapshots with a couple of scenarios and they seem to be working as before.

@idryomov idryomov merged commit a59ef51 into ceph:wip-rbd-cgsm-base Nov 22, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants