pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926

chrisphoffman · 2023-04-06T15:45:13Z

backport tracker: https://tracker.ceph.com/issues/59370

backport of #50630
parent tracker: https://tracker.ceph.com/issues/59115

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

During send_acquire_lock, there's a case where there's no watcher handle present and lock request is delayed. If the client is blocklisted, the delayed request will not continue and the call that requested lock will never complete. The lock process will now propagate -EBLOCKLIST, to callback instead of indefinitely delaying. Fixes: https://tracker.ceph.com/issues/59115 Signed-off-by: Christopher Hoffman <choffman@redhat.com> (cherry picked from commit 6a0aead)

Simulate getting MWatchNotify CEPH_WATCH_EVENT_DISCONNECT message after the client is blocklisted. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 0dfe87d)

Eliminate a race where a client is able to submit an operation after WatchCtx2::handle_error() is invoked on its watch due to blocklisting. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit a14498e)

needs_exclusive_lock() calls acquire_lock() with owner_lock held. If lock acquisiton races with lock shut down, ManagedLock completes ImageDispatch context directly and dispatch is retried immediately on the same thread (due to DISPATCH_RESULT_RESTART). This results in recursion into needs_exclusive_lock() and, barring locking issues, can lead to unbounded stack growth if lock shut down takes its time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 1d943f8)

EBLOCKLISTED has a very special meaning but happens to be an alias for ESHUTDOWN. If the client gets blocklisted, we always want to propagate EBLOCKLISTED error code since it's generated by the OSD. For ManagedLock use case of indicating that an operation on the lock raced with lock shut down, meaning that a higher level request can just be restarted, ERESTART should do. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 76856d9)

ljflores · 2023-05-04T18:13:06Z

Rados suite review: http://pulpito.front.sepia.ceph.com/?branch=wip-yuri5-testing-2023-04-25-0837-pacific

Failures, unrelated:
1. https://tracker.ceph.com/issues/59192
2. https://tracker.ceph.com/issues/55347
3. https://tracker.ceph.com/issues/54071
4. https://tracker.ceph.com/issues/57386
5. https://tracker.ceph.com/issues/59529
6. https://tracker.ceph.com/issues/59530
7. https://tracker.ceph.com/issues/57255

Details:
1. cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
2. SELinux Denials during cephadm/workunits/test_cephadm - Ceph - Orchestrator
3. rados/cephadm/osds: Invalid command: missing required parameter hostname() - Ceph - Orchestrator
4. cephadm/test_dashboard_e2e.sh: Expected to find content: '/^foo$/' within the selector: 'cd-modal .badge' but never did - Ceph - Mgr - Dashboard
5. mds_upgrade_sequence: overall HEALTH_ERR 1 filesystem with deprecated feature inline_data; 1 filesystem is offline; 1 filesystem is online with fewer MDS than max_mds - Ceph - CephFS
6. mgr-nfs-upgrade: mds.foofs has 0/2 - Ceph - CephFS
7. rados/cephadm/mds_upgrade_sequence, pacific : cephadm [ERR] Upgrade: Paused due to UPGRADE_NO_STANDBY_MGR: Upgrade: Need standby mgr daemon - Ceph - Orchestrator

chrisphoffman and others added 5 commits April 6, 2023 15:44

test/librados_test_stub: raise a watch error on blocklisting

44fddab

Simulate getting MWatchNotify CEPH_WATCH_EVENT_DISCONNECT message after the client is blocklisted. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> (cherry picked from commit 0dfe87d)

chrisphoffman requested a review from a team as a code owner April 6, 2023 15:45

chrisphoffman added this to the pacific milestone Apr 6, 2023

chrisphoffman added the rbd label Apr 6, 2023

github-actions bot added the tests label Apr 6, 2023

idryomov approved these changes Apr 7, 2023

View reviewed changes

idryomov added needs-qa pacific-batch-1 labels Apr 7, 2023

yuriw added the wip-yuri5-testing label Apr 18, 2023

yuriw merged commit b815eda into ceph:pacific May 9, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926

pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926

chrisphoffman commented Apr 6, 2023

ljflores commented May 4, 2023

pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926

pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926

Conversation

chrisphoffman commented Apr 6, 2023

ljflores commented May 4, 2023