Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/rbd_support: recover from rados client blocklisting #49742

Merged
merged 4 commits into from May 10, 2023

Conversation

ajarr
Copy link
Contributor

@ajarr ajarr commented Jan 14, 2023

In certain scenarios the OSDs were slow to process RBD requests.
This lead to the rbd_support module's RBD client not being able to
gracefully handover a RBD exclusive lock to another RBD client.
After the condition persisted for some time, the other RBD client
forcefully acquired the lock by blocklisting the rbd_support module's
RBD client, and consequently blocklisted the module's RADOS client. The
rbd_support module stopped working. To recover the module, the entire
mgr service had to be restarted which reloaded other mgr modules.

Instead of recovering the rbd_support module from client blocklisting
by being disruptive to other mgr modules, recover the module
automatically without restarting the mgr serivce. On client getting
blocklisted, shutdown the module's handlers and blocklisted client,
create a new rados client for the module, and start the new handlers.

Fixes: https://tracker.ceph.com/issues/56724
Signed-off-by: Ramana Raja rraja@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@ajarr ajarr requested a review from a team as a code owner January 14, 2023 00:42
@ajarr ajarr marked this pull request as draft January 14, 2023 00:42
@ajarr ajarr force-pushed the fix-56724 branch 3 times, most recently from a2531e2 to 0ba629b Compare January 17, 2023 03:06
@ajarr ajarr changed the title mgr/rbd_support: add own RADOS client for MirrorSnapshotScheduleHandler mgr/rbd_support: add own RADOS client for handlers Jan 17, 2023
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@ajarr
Copy link
Contributor Author

ajarr commented Apr 18, 2023

@ajarr
Copy link
Contributor Author

ajarr commented Apr 18, 2023

@ajarr
Copy link
Contributor Author

ajarr commented Apr 18, 2023

jenkins test make check

@ajarr
Copy link
Contributor Author

ajarr commented Apr 19, 2023

jenkins test make check arm64

1 similar comment
@ajarr
Copy link
Contributor Author

ajarr commented Apr 20, 2023

jenkins test make check arm64

src/pybind/mgr/rbd_support/schedule.py Outdated Show resolved Hide resolved
src/pybind/mgr/rbd_support/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py Outdated Show resolved Hide resolved
src/pybind/mgr/rbd_support/trash_purge_schedule.py Outdated Show resolved Hide resolved
qa/workunits/rbd/cli_generic.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/cli_generic.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/cli_generic.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/cli_generic.sh Outdated Show resolved Hide resolved
src/pybind/mgr/rbd_support/task.py Show resolved Hide resolved
@idryomov
Copy link
Contributor

idryomov commented May 7, 2023

Overall, I think there is room for improvement in the tests (probably OK to defer to another PR though):

  • for TrashPurgeScheduleHandler and MirrorSnapshotScheduleHandler tests, instead of just checking that both pre- and post-blocklisting schedules show up, it would be good to test that both pre- and post-blocklisting scheduled work actually gets done. For trash purge scheduler, I would suggest creating two pools with an image trashed in each, adding a short (1-2m) schedule for one pool before blocklisting and for another after blocklisting and asserting that trash purge runs after enough time passes. And similarly for mirror snapshot scheduler: a single pool with two images would do as per-image schedules can be added there.
  • for TaskHandler test, queue some (let's say 5) flattens on different images before blocklisting and a sixth flatten after blocklisting and assert that all six flattens complete.

@ajarr ajarr requested a review from idryomov May 7, 2023 16:12
@ajarr
Copy link
Contributor Author

ajarr commented May 7, 2023

@idryomov accidentally requested for re-review. pls ignore

@idryomov
Copy link
Contributor

idryomov commented May 7, 2023

Current tests appear to be stable with the following fixups:

diff --git a/qa/workunits/rbd/cli_generic.sh b/qa/workunits/rbd/cli_generic.sh
index 46160c8059df..57279d26dcee 100755
--- a/qa/workunits/rbd/cli_generic.sh
+++ b/qa/workunits/rbd/cli_generic.sh
@@ -1503,13 +1503,15 @@ test_perf_image_iostat_recovery() {
     echo "testing recovery of perf handler after module's RADOS client is blocklisted..."
     remove_images
 
-    ceph osd pool create rbd1 8
-    rbd pool init rbd1
-    rbd namespace create rbd1/ns
+    ceph osd pool create rbd3 8
+    rbd pool init rbd3
+    rbd namespace create rbd3/ns
 
-    IMAGE_SPECS=("rbd1/test1" "rbd1/ns/test2")
+    IMAGE_SPECS=("rbd3/test1" "rbd3/ns/test2")
     for spec in "${IMAGE_SPECS[@]}"; do
-        rbd create $RBD_CREATE_ARGS --size 10G $spec
+        # ensure all images are created without a separate data pool
+        # as we filter iostat by specific pool specs below
+        rbd create $RBD_CREATE_ARGS --size 10G --rbd-default-data-pool '' $spec
     done
 
     BENCH_PIDS=()
@@ -1519,7 +1521,7 @@ test_perf_image_iostat_recovery() {
         BENCH_PIDS+=($!)
     done
 
-    test "$(rbd perf image iostat --format json rbd1 |
+    test "$(rbd perf image iostat --format json rbd3 |
         jq -r 'map(.image) | sort | join(" ")')" = 'test1'
 
     # Fetch and blocklist the rbd_support module's RADOS client
@@ -1529,10 +1531,10 @@ test_perf_image_iostat_recovery() {
     ceph osd blocklist add $CLIENT_ADDR
     ceph osd blocklist ls | grep $CLIENT_ADDR
 
-    expect_fail rbd perf image iostat --format json rbd1/ns
+    expect_fail rbd perf image iostat --format json rbd3/ns
     sleep 10
     for i in `seq 24`; do
-        test "$(rbd perf image iostat --format json rbd1/ns |
+        test "$(rbd perf image iostat --format json rbd3/ns |
             jq -r 'map(.image) | sort | join(" ")')" = 'test2' && break
 	sleep 10
     done
@@ -1543,7 +1545,7 @@ test_perf_image_iostat_recovery() {
     wait
 
     remove_images
-    ceph osd pool rm rbd1 rbd1 --yes-i-really-really-mean-it
+    ceph osd pool rm rbd3 rbd3 --yes-i-really-really-mean-it
 }
 
 test_mirror_pool_peer_bootstrap_create() {
@@ -1661,7 +1663,6 @@ test_tasks_recovery() {
     ceph osd blocklist add $CLIENT_ADDR
     ceph osd blocklist ls | grep $CLIENT_ADDR
 
-    test "$(ceph rbd task list)" = "[]"
     expect_fail ceph rbd task add flatten rbd2/clone1
     sleep 10
     for i in `seq 24`; do
@@ -1670,10 +1671,6 @@ test_tasks_recovery() {
     done
     test "$(ceph rbd task list)" != "[]"
 
-    # queue flatten and check that it completes
-    rbd info rbd2/clone1 | grep 'parent: '
-    expect_fail rbd snap unprotect rbd2/img1@snap
-    ceph rbd task add flatten rbd2/clone1
     for i in {1..12}; do
         rbd info rbd2/clone1 | grep 'parent: ' || break
         sleep 10

ajarr added 2 commits May 8, 2023 13:39
... requests to be completed.

Signed-off-by: Ramana Raja <rraja@redhat.com>
Signed-off-by: Ramana Raja <rraja@redhat.com>
ajarr added 2 commits May 8, 2023 16:45
In certain scenarios the OSDs were slow to process RBD requests.
This lead to the rbd_support module's RBD client not being able to
gracefully handover a RBD exclusive lock to another RBD client.
After the condition persisted for some time, the other RBD client
forcefully acquired the lock by blocklisting the rbd_support module's
RBD client, and consequently blocklisted the module's RADOS client. The
rbd_support module stopped working. To recover the module, the entire
mgr service had to be restarted which reloaded other mgr modules.

Instead of recovering the rbd_support module from client blocklisting
by being disruptive to other mgr modules, recover the module
automatically without restarting the mgr serivce. On client getting
blocklisted, shutdown the module's handlers and blocklisted client,
create a new rados client for the module, and start the new handlers.

Fixes: https://tracker.ceph.com/issues/56724
Signed-off-by: Ramana Raja <rraja@redhat.com>
... after the module's RADOS client is blocklisted.

Signed-off-by: Ramana Raja <rraja@redhat.com>
@ajarr
Copy link
Contributor Author

ajarr commented May 8, 2023

Overall, I think there is room for improvement in the tests (probably OK to defer to another PR though):

  • for TrashPurgeScheduleHandler and MirrorSnapshotScheduleHandler tests, instead of just checking that both pre- and post-blocklisting schedules show up, it would be good to test that both pre- and post-blocklisting scheduled work actually gets done. For trash purge scheduler, I would suggest creating two pools with an image trashed in each, adding a short (1-2m) schedule for one pool before blocklisting and for another after blocklisting and asserting that trash purge runs after enough time passes. And similarly for mirror snapshot scheduler: a single pool with two images would do as per-image schedules can be added there.
  • for TaskHandler test, queue some (let's say 5) flattens on different images before blocklisting and a sixth flatten after blocklisting and assert that all six flattens complete.

Created tracker ticket for now, https://tracker.ceph.com/issues/59681

@idryomov
Copy link
Contributor

No related failures:

https://pulpito.ceph.com/dis-2023-05-08_22:48:48-rbd-wip-dis-testing-distro-default-smithi/
https://pulpito.ceph.com/dis-2023-05-09_11:02:54-rbd-wip-dis-testing-distro-default-smithi/
https://pulpito.ceph.com/dis-2023-05-09_20:53:52-rbd-wip-dis-testing-distro-default-smithi/

This is with #49975 excluded in the last rerun -- it's causing "Exiting scrub checking -- not all pgs scrubbed." errors. Per @neha-ojha the plan is to introduce a more aggressive QoS profile for teuthology tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants