pacific: mirror snapshot schedule and trash purge schedule fixes #46778

idryomov · 2022-06-21T16:49:18Z

backport trackers:

https://tracker.ceph.com/issues/56143
https://tracker.ceph.com/issues/56144

parent trackers:

https://tracker.ceph.com/issues/56090
https://tracker.ceph.com/issues/53914

…e logs Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit bd4af82) Conflicts: src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit e4a16e2 ("mgr/rbd_support: add type annotation") not in pacific ] src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

…eduling Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 568345b) Conflicts: src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit e4a16e2 ("mgr/rbd_support: add type annotation") not in pacific ] src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

Make refresh_pools() behave the same as refresh_images(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 7d1e644) Conflicts: src/pybind/mgr/rbd_support/trash_purge_schedule.py [ commit e4a16e2 ("mgr/rbd_support: add type annotation") not in pacific ]

The existing logic often leads to refresh_pools() and refresh_images() being invoked after a 120 second delay instead of after an intended 60 second delay. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit ef3edd3) Conflicts: src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit e4a16e2 ("mgr/rbd_support: add type annotation") not in pacific ] src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

If load_schedules() (i.e. periodic refresh) races with add_schedule() invoked by the user for a fresh image, that image's schedule may get lost until the next rebuild (not refresh!) of the queue: 1. periodic refresh invokes load_schedules() 2. load_schedules() creates a new Schedules instance and loads schedules from rbd_mirror_snapshot_schedule object 3. add_schedule() is invoked for a new image (an image that isn't present in self.images) by the user 4. before load_schedules() can grab self.lock, add_schedule() commits the new schedule to rbd_mirror_snapshot_schedule object and adds it to self.schedules 5. load_schedules() grabs self.lock and reassigns self.schedules with Schedules instance that is now stale 6. periodic refresh invokes load_pool_images() which discovers the new image; eventually it is added to self.images 7. periodic refresh invokes refresh_queue() which attempts to enqueue() the new image; this fails because a matching schedule isn't present The next periodic refresh recovers the discarded schedule from rbd_mirror_snapshot_schedule object but no attempt to enqueue() that image is made since it is already "known" at that point. Despite the schedule being in place, no snapshots are created until the queue is rebuilt from scratch or rbd_support module is reloaded. To fix that, extend self.lock critical sections so that add_schedule() and remove_schedule() can't get stepped on by load_schedules(). Fixes: https://tracker.ceph.com/issues/56090 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 95a0ec7) Conflicts: src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit e4a16e2 ("mgr/rbd_support: add type annotation") not in pacific ] src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

Establishing a watch on rbd_mirroring object and skipping rescanning image mirror snapshots on periodic refresh unless rbd_mirroring object gets notified in the interim is flawed. rbd_mirroring object is notified when mirroring is enabled or disabled on some image (including when the image is removed), but it is not notified when images are promoted or demoted. However, load_pool_images() discards images that are not primary at the time of the scan. If the image is promoted later, no snapshots are created even if the schedule is in place. This happens regardless of whether the schedule is added before or after the promotion. This effectively reverts commit 69259c8 ("mgr/rbd_support: make mirror_snapshot_schedule rescan only updated pools"). An alternative fix could be to stop discarding non-primary images (i.e. drop if not info['primary']: continue check added in commit d39eb28 ("mgr/rbd_support: mirror snapshot schedule should skip non-primary images")), but that would clutter the queue and therefore "rbd mirror snapshot schedule status" output with bogus entries. Performing a rescan roughly every 60 seconds should be manageable: currently it amounts to a single mirror_image_status_list request, followed by mirror_image_get, get_snapcontext and snapshot_get requests for each snapshot-based mirroring enabled image and concluded by a single dir_list request. Among these, per-image get_snapcontext and snapshot_get requests are necessary for determining primaryness. Fixes: https://tracker.ceph.com/issues/53914 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 7fb4fdb) Conflicts: src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit e4a16e2 ("mgr/rbd_support: add type annotation") not in pacific ]

idryomov added 6 commits June 21, 2022 18:35

idryomov added bug-fix rbd pybind pacific-batch-1 labels Jun 21, 2022

idryomov added this to the pacific milestone Jun 21, 2022

idryomov requested review from trociny and ideepika June 21, 2022 16:49

idryomov requested a review from a team as a code owner June 21, 2022 16:49

github-actions bot added the tests label Jun 21, 2022

ideepika approved these changes Jun 21, 2022

View reviewed changes

idryomov added the needs-qa label Jun 21, 2022

sunnyku approved these changes Jun 21, 2022

View reviewed changes

trociny approved these changes Jun 22, 2022

View reviewed changes

yuriw added the wip-yuri8-testing label Jul 1, 2022

yuriw merged commit 853c888 into ceph:pacific Jul 5, 2022

idryomov deleted the wip-rbd-schedule-backports-pacific branch July 5, 2022 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pacific: mirror snapshot schedule and trash purge schedule fixes #46778

pacific: mirror snapshot schedule and trash purge schedule fixes #46778

idryomov commented Jun 21, 2022

pacific: mirror snapshot schedule and trash purge schedule fixes #46778

pacific: mirror snapshot schedule and trash purge schedule fixes #46778

Conversation

idryomov commented Jun 21, 2022