Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacific: mirror snapshot schedule and trash purge schedule fixes #46778

Merged
merged 6 commits into from Jul 5, 2022

Commits on Jun 21, 2022

  1. mgr/rbd_support: disambiguate mirror snapshot and trash purge schedul…

    …e logs
    
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    (cherry picked from commit bd4af82)
    
    Conflicts:
    	src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
    	  e4a16e2 ("mgr/rbd_support: add type annotation") not in
    	  pacific ]
    	src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]
    idryomov committed Jun 21, 2022
    Copy the full SHA
    034574a View commit details
    Browse the repository at this point in the history
  2. mgr/rbd_support: add logs for when there is no schedule and for desch…

    …eduling
    
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    (cherry picked from commit 568345b)
    
    Conflicts:
    	src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
    	  e4a16e2 ("mgr/rbd_support: add type annotation") not in
    	  pacific ]
    	src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]
    idryomov committed Jun 21, 2022
    Copy the full SHA
    8cbe5cc View commit details
    Browse the repository at this point in the history
  3. mgr/rbd_support: bail from refresh_pools() when there is no schedule

    Make refresh_pools() behave the same as refresh_images().
    
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    (cherry picked from commit 7d1e644)
    
    Conflicts:
    	src/pybind/mgr/rbd_support/trash_purge_schedule.py [ commit
    	  e4a16e2 ("mgr/rbd_support: add type annotation") not in
    	  pacific ]
    idryomov committed Jun 21, 2022
    Copy the full SHA
    d0e63f4 View commit details
    Browse the repository at this point in the history
  4. mgr/rbd_support: refresh schedule queue immediately after delay elapses

    The existing logic often leads to refresh_pools() and refresh_images()
    being invoked after a 120 second delay instead of after an intended 60
    second delay.
    
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    (cherry picked from commit ef3edd3)
    
    Conflicts:
    	src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
    	  e4a16e2 ("mgr/rbd_support: add type annotation") not in
    	  pacific ]
    	src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]
    idryomov committed Jun 21, 2022
    Copy the full SHA
    196a3cd View commit details
    Browse the repository at this point in the history
  5. mgr/rbd_support: avoid losing a schedule on load vs add race

    If load_schedules() (i.e. periodic refresh) races with add_schedule()
    invoked by the user for a fresh image, that image's schedule may get
    lost until the next rebuild (not refresh!) of the queue:
    
    1. periodic refresh invokes load_schedules()
    2. load_schedules() creates a new Schedules instance and loads
       schedules from rbd_mirror_snapshot_schedule object
    3. add_schedule() is invoked for a new image (an image that isn't
       present in self.images) by the user
    4. before load_schedules() can grab self.lock, add_schedule() commits
       the new schedule to rbd_mirror_snapshot_schedule object and adds it
       to self.schedules
    5. load_schedules() grabs self.lock and reassigns self.schedules with
       Schedules instance that is now stale
    6. periodic refresh invokes load_pool_images() which discovers the new
       image; eventually it is added to self.images
    7. periodic refresh invokes refresh_queue() which attempts to enqueue()
       the new image; this fails because a matching schedule isn't present
    
    The next periodic refresh recovers the discarded schedule from
    rbd_mirror_snapshot_schedule object but no attempt to enqueue() that
    image is made since it is already "known" at that point.  Despite the
    schedule being in place, no snapshots are created until the queue is
    rebuilt from scratch or rbd_support module is reloaded.
    
    To fix that, extend self.lock critical sections so that add_schedule()
    and remove_schedule() can't get stepped on by load_schedules().
    
    Fixes: https://tracker.ceph.com/issues/56090
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    (cherry picked from commit 95a0ec7)
    
    Conflicts:
    	src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
    	  e4a16e2 ("mgr/rbd_support: add type annotation") not in
    	  pacific ]
    	src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]
    idryomov committed Jun 21, 2022
    Copy the full SHA
    3276330 View commit details
    Browse the repository at this point in the history
  6. mgr/rbd_support: always rescan image mirror snapshots on refresh

    Establishing a watch on rbd_mirroring object and skipping rescanning
    image mirror snapshots on periodic refresh unless rbd_mirroring object
    gets notified in the interim is flawed.  rbd_mirroring object is
    notified when mirroring is enabled or disabled on some image (including
    when the image is removed), but it is not notified when images are
    promoted or demoted.  However, load_pool_images() discards images that
    are not primary at the time of the scan.  If the image is promoted
    later, no snapshots are created even if the schedule is in place.  This
    happens regardless of whether the schedule is added before or after the
    promotion.
    
    This effectively reverts commit 69259c8 ("mgr/rbd_support: make
    mirror_snapshot_schedule rescan only updated pools").  An alternative
    fix could be to stop discarding non-primary images (i.e. drop
    
        if not info['primary']:
            continue
    
    check added in commit d39eb28 ("mgr/rbd_support: mirror snapshot
    schedule should skip non-primary images")), but that would clutter the
    queue and therefore "rbd mirror snapshot schedule status" output with
    bogus entries.  Performing a rescan roughly every 60 seconds should be
    manageable: currently it amounts to a single mirror_image_status_list
    request, followed by mirror_image_get, get_snapcontext and snapshot_get
    requests for each snapshot-based mirroring enabled image and concluded
    by a single dir_list request.  Among these, per-image get_snapcontext
    and snapshot_get requests are necessary for determining primaryness.
    
    Fixes: https://tracker.ceph.com/issues/53914
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    (cherry picked from commit 7fb4fdb)
    
    Conflicts:
    	src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
    	  e4a16e2 ("mgr/rbd_support: add type annotation") not in
    	  pacific ]
    idryomov committed Jun 21, 2022
    Copy the full SHA
    9a70196 View commit details
    Browse the repository at this point in the history