New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926
Conversation
During send_acquire_lock, there's a case where there's no watcher handle present and lock request is delayed. If the client is blocklisted, the delayed request will not continue and the call that requested lock will never complete. The lock process will now propagate -EBLOCKLIST, to callback instead of indefinitely delaying. Fixes: https://tracker.ceph.com/issues/59115 Signed-off-by: Christopher Hoffman <choffman@redhat.com> (cherry picked from commit 6a0aead)
Simulate getting MWatchNotify CEPH_WATCH_EVENT_DISCONNECT message after the client is blocklisted. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 0dfe87d)
Eliminate a race where a client is able to submit an operation after WatchCtx2::handle_error() is invoked on its watch due to blocklisting. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit a14498e)
needs_exclusive_lock() calls acquire_lock() with owner_lock held. If lock acquisiton races with lock shut down, ManagedLock completes ImageDispatch context directly and dispatch is retried immediately on the same thread (due to DISPATCH_RESULT_RESTART). This results in recursion into needs_exclusive_lock() and, barring locking issues, can lead to unbounded stack growth if lock shut down takes its time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 1d943f8)
EBLOCKLISTED has a very special meaning but happens to be an alias for ESHUTDOWN. If the client gets blocklisted, we always want to propagate EBLOCKLISTED error code since it's generated by the OSD. For ManagedLock use case of indicating that an operation on the lock raced with lock shut down, meaning that a higher level request can just be restarted, ERESTART should do. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 76856d9)
Rados suite review: http://pulpito.front.sepia.ceph.com/?branch=wip-yuri5-testing-2023-04-25-0837-pacific Failures, unrelated: Details: |
backport tracker: https://tracker.ceph.com/issues/59370
backport of #50630
parent tracker: https://tracker.ceph.com/issues/59115
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh