quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations #50979

batrick · 2023-04-10T18:47:34Z

backport tracker: https://tracker.ceph.com/issues/59294

backport of #46318, #47834 and #50404
parent tracker: https://tracker.ceph.com/issues/58923

batrick · 2023-04-11T12:32:48Z

jenkins test make check

src/mon/MgrMonitor.cc

idryomov

The three commits from #50404 are missing "cherry picked from" annotations.

Also, if not using ceph-backport.sh script for creating backports, please consider replicating what it does as far as the description of the PR goes manually. In particular, providing a link to the main PR(s) is very useful because then the backport PR gets linked from there automatically as well.

There is a race condition between the `mgr fail` handling and `mgrbeacon`. ```diff diff --git a/src/mon/MgrMonitor.cc b/src/mon/MgrMonitor.cc index 8ada44e2628..9000b2e0687 100644 --- a/src/mon/MgrMonitor.cc +++ b/src/mon/MgrMonitor.cc @@ -1203,7 +1203,9 @@ bool MgrMonitor::prepare_command(MonOpRequestRef op) } if (changed && pending_map.active_gid == 0) { + dout(5) << "========== changed and active_state == 0" << dendl; promote_standby(); + dout(5) << "========== after promote_standby: " << pending_map.active_gid << dendl; } } else if (prefix == "mgr module enable") { string module; ``` ``` 2022-05-17T00:11:19.602+0200 7f6bd5769700 0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "mgr fail", "who": "x"} v 0) v1 ... 2022-05-17T00:11:19.614+0200 7f6bd5769700 5 mon.a@0(leader).mgr e25 ========== changed and active_state == 0 2022-05-17T00:11:19.614+0200 7f6bd5769700 5 mon.a@0(leader).mgr e25 ========== after promote_standby: 0 2022-05-17T00:11:19.614+0200 7f6bd5769700 4 mon.a@0(leader).mgr e25 prepare_command done, r=0 ... 2022-05-17T00:11:19.630+0200 7f6bd5769700 4 mon.a@0(leader).mgr e25 selecting new active 4210 x (was 0 ) ``` ```cpp bool MgrMonitor::prepare_beacon(MonOpRequestRef op) if (pending_map.active_gid == m->get_gid()) { // ... } else if (pending_map.active_gid == 0) { // There is no currently active daemon, select this one. if (pending_map.standbys.count(m->get_gid())) { drop_standby(m->get_gid(), false); } dout(4) << "selecting new active " << m->get_gid() << " " << m->get_name() << " (was " << pending_map.active_gid << " " << pending_map.active_name << ")" << dendl; pending_map.active_gid = m->get_gid(); pending_map.active_name = m->get_name(); pending_map.active_change = ceph_clock_now() ``` The `25` version of `MgrMap`, when handled at `mgr.x`, doesn't trigger the `respawn()` path: ``` 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr ms_dispatch2 active mgrmap(e 25) v1 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 4 mgr handle_mgr_map received map epoch 25 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 4 mgr handle_mgr_map active in map: 1 active is 4210 2022-05-17T00:10:11.197+0200 7fa3d6613700 10 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_read_frame_dispatch tag=17 2022-05-17T00:10:11.197+0200 7fa3d6613700 5 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_message got 43089 + 0 + 0 byte message. envelope type=1796 src mon.0 off 0 2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active ``` Fixes: https://tracker.ceph.com/issues/55711 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com> (cherry picked from commit 23c3f76)

in 23c3f76, the change to fail the mgr is proposed immediately. but `MgrMonitor::prepare_command()` method still returns `true` in this case. its indirect caller of `PaxosService::dispatch()` considers this as a sign that it needs to propose the change with `propose_pending()`. but the pending change has already been proposed by `MgrMonitor::prepare_command()`, and `have_pending` is also cleared by this call. as we don't allow consecutive paxos proposals, the second `propose_pending()` call is delayed with a configured latency. but when the timer is fired, this poseponed call would find itself trying to propose nothing. the change to fail the mgr has been proposed. that's why we have `ceph_assert(have_pending)` assertion failures. in this change, the second proposal is not proposed anymore if the proposal is proposed immediately. this should avoid the assertion failure. this change should address the regression introduced by 23c3f76. Fixes: https://tracker.ceph.com/issues/56850 Signed-off-by: Kefu Chai <tchaikov@gmail.com> (cherry picked from commit 5b1c6ad)

The return value is used to indicate whether the pending state should be committed. There is no concept of "handled message" here (unlike preprocess_query). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> (cherry picked from commit 8caa1fd)

This race fixed by 23c3f7 exists wherever we drop the active mgr. Resolve this by forcing immediate proposal (circumventing any delays) whenever the active is dropped. Fixes: 23c3f76 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> (cherry picked from commit 30b20d3)

Plugging PAXOS has the effect of batching map updates into a single PAXOS transaction. Since we're updating the OSDMap several times and the MgrMap, plug PAXOS for efficiency. This also has the nice effect of reducing any delay between the active mgr getting dropped and the blocklisting of its clients. This doesn't resolve any race condition as the two maps are never processed in one unit. So the former active manager may process the OSDMap blocklists before learning it is dropped from the MgrMap. Fixes: https://tracker.ceph.com/issues/58923 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> (cherry picked from commit 2e057bb)

batrick · 2023-05-04T22:23:20Z

The three commits from #50404 are missing "cherry picked from" annotations.

Also, if not using ceph-backport.sh script for creating backports, please consider replicating what it does as far as the description of the PR goes manually. In particular, providing a link to the main PR(s) is very useful because then the backport PR gets linked from there automatically as well.

Thanks for the nudge @idryomov . I need to revisit that script as the last time I tried it there were dependency issues.

idryomov · 2023-05-08T11:50:45Z

jenkins test api

ljflores · 2023-05-22T22:53:32Z

Rados suite review: https://tracker.ceph.com/projects/rados/wiki/QUINCY#httpstrellocomcllcQOKAa1755-wip-yuri10-testing-2023-05-18-0815-quincy

batrick added the needs-qa label Apr 10, 2023

batrick added this to the quincy milestone Apr 10, 2023

batrick requested a review from a team as a code owner April 10, 2023 18:47

github-actions bot added core mon labels Apr 10, 2023

ajarr approved these changes Apr 11, 2023

View reviewed changes

src/mon/MgrMonitor.cc Show resolved Hide resolved

ajarr mentioned this pull request Apr 11, 2023

quincy: mon: fix a race between mgr fail and MgrMonitor::prepare_beacon() #50408

Closed

batrick force-pushed the i59294 branch from 7002a3b to a069557 Compare April 11, 2023 18:46

idryomov reviewed Apr 14, 2023

View reviewed changes

rzarzynski and others added 5 commits May 4, 2023 18:21

batrick force-pushed the i59294 branch from a069557 to cbc7c5e Compare May 4, 2023 22:22

idryomov approved these changes May 8, 2023

View reviewed changes

idryomov added the quincy-batch-1 label May 11, 2023

yuriw added the wip-yuri10-testing label May 18, 2023

yuriw merged commit 7d929e2 into ceph:quincy May 22, 2023
10 checks passed

batrick deleted the i59294 branch June 7, 2023 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations #50979

quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations #50979

batrick commented Apr 10, 2023 •

edited by idryomov

batrick commented Apr 11, 2023

idryomov left a comment •

edited

batrick commented May 4, 2023

idryomov commented May 8, 2023

ljflores commented May 22, 2023

quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations #50979

quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations #50979

Conversation

batrick commented Apr 10, 2023 • edited by idryomov

batrick commented Apr 11, 2023

idryomov left a comment • edited

Choose a reason for hiding this comment

batrick commented May 4, 2023

idryomov commented May 8, 2023

ljflores commented May 22, 2023

batrick commented Apr 10, 2023 •

edited by idryomov

idryomov left a comment •

edited