Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacific: librbd: don't wait for a watch in send_acquire_lock() if client is blocklisted #50926

Merged
merged 5 commits into from May 9, 2023

Conversation

chrisphoffman
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/59370


backport of #50630
parent tracker: https://tracker.ceph.com/issues/59115

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

chrisphoffman and others added 5 commits April 6, 2023 15:44
During send_acquire_lock, there's a case where
there's no watcher handle present and lock request is delayed.
If the client is blocklisted, the delayed request will not
continue and the call that requested lock will never complete.

The lock process will now propagate -EBLOCKLIST, to callback
instead of indefinitely delaying.

Fixes: https://tracker.ceph.com/issues/59115
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 6a0aead)
Simulate getting MWatchNotify CEPH_WATCH_EVENT_DISCONNECT message after
the client is blocklisted.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0dfe87d)
Eliminate a race where a client is able to submit an operation after
WatchCtx2::handle_error() is invoked on its watch due to blocklisting.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit a14498e)
needs_exclusive_lock() calls acquire_lock() with owner_lock held.
If lock acquisiton races with lock shut down, ManagedLock completes
ImageDispatch context directly and dispatch is retried immediately on
the same thread (due to DISPATCH_RESULT_RESTART).  This results in
recursion into needs_exclusive_lock() and, barring locking issues, can
lead to unbounded stack growth if lock shut down takes its time.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1d943f8)
EBLOCKLISTED has a very special meaning but happens to be an alias for
ESHUTDOWN.  If the client gets blocklisted, we always want to propagate
EBLOCKLISTED error code since it's generated by the OSD.

For ManagedLock use case of indicating that an operation on the lock
raced with lock shut down, meaning that a higher level request can just
be restarted, ERESTART should do.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 76856d9)
@chrisphoffman chrisphoffman requested a review from a team as a code owner April 6, 2023 15:45
@chrisphoffman chrisphoffman added this to the pacific milestone Apr 6, 2023
@github-actions github-actions bot added the tests label Apr 6, 2023
@ljflores
Copy link
Contributor

ljflores commented May 4, 2023

Rados suite review: http://pulpito.front.sepia.ceph.com/?branch=wip-yuri5-testing-2023-04-25-0837-pacific

Failures, unrelated:
1. https://tracker.ceph.com/issues/59192
2. https://tracker.ceph.com/issues/55347
3. https://tracker.ceph.com/issues/54071
4. https://tracker.ceph.com/issues/57386
5. https://tracker.ceph.com/issues/59529
6. https://tracker.ceph.com/issues/59530
7. https://tracker.ceph.com/issues/57255

Details:
1. cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
2. SELinux Denials during cephadm/workunits/test_cephadm - Ceph - Orchestrator
3. rados/cephadm/osds: Invalid command: missing required parameter hostname() - Ceph - Orchestrator
4. cephadm/test_dashboard_e2e.sh: Expected to find content: '/^foo$/' within the selector: 'cd-modal .badge' but never did - Ceph - Mgr - Dashboard
5. mds_upgrade_sequence: overall HEALTH_ERR 1 filesystem with deprecated feature inline_data; 1 filesystem is offline; 1 filesystem is online with fewer MDS than max_mds - Ceph - CephFS
6. mgr-nfs-upgrade: mds.foofs has 0/2 - Ceph - CephFS
7. rados/cephadm/mds_upgrade_sequence, pacific : cephadm [ERR] Upgrade: Paused due to UPGRADE_NO_STANDBY_MGR: Upgrade: Need standby mgr daemon - Ceph - Orchestrator

@yuriw yuriw merged commit b815eda into ceph:pacific May 9, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants