Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quincy: MgrMonitor: batch commit OSDMap and MgrMap mutations #50979

Merged
merged 5 commits into from May 22, 2023

Conversation

batrick
Copy link
Member

@batrick batrick commented Apr 10, 2023

@batrick batrick added this to the quincy milestone Apr 10, 2023
@batrick batrick requested a review from a team as a code owner April 10, 2023 18:47
@batrick
Copy link
Member Author

batrick commented Apr 11, 2023

jenkins test make check

src/mon/MgrMonitor.cc Show resolved Hide resolved
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three commits from #50404 are missing "cherry picked from" annotations.

Also, if not using ceph-backport.sh script for creating backports, please consider replicating what it does as far as the description of the PR goes manually. In particular, providing a link to the main PR(s) is very useful because then the backport PR gets linked from there automatically as well.

rzarzynski and others added 5 commits May 4, 2023 18:21
There is a race condition between the `mgr fail` handling  and `mgrbeacon`.

```diff
diff --git a/src/mon/MgrMonitor.cc b/src/mon/MgrMonitor.cc
index 8ada44e2628..9000b2e0687 100644
--- a/src/mon/MgrMonitor.cc
+++ b/src/mon/MgrMonitor.cc
@@ -1203,7 +1203,9 @@ bool MgrMonitor::prepare_command(MonOpRequestRef op)
     }

     if (changed && pending_map.active_gid == 0) {
+      dout(5) << "========== changed and active_state == 0" << dendl;
       promote_standby();
+      dout(5) << "========== after promote_standby: " << pending_map.active_gid << dendl;
     }
   } else if (prefix == "mgr module enable") {
     string module;
```

```
2022-05-17T00:11:19.602+0200 7f6bd5769700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "mgr fail", "who": "x"} v 0) v1
...
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== changed and active_state == 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== after promote_standby: 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 prepare_command done, r=0
...
2022-05-17T00:11:19.630+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 selecting new active 4210 x (was 0 )
```

```cpp
bool MgrMonitor::prepare_beacon(MonOpRequestRef op)
  if (pending_map.active_gid == m->get_gid()) {
    // ...
  } else if (pending_map.active_gid == 0) {
    // There is no currently active daemon, select this one.
    if (pending_map.standbys.count(m->get_gid())) {
      drop_standby(m->get_gid(), false);
    }
    dout(4) << "selecting new active " << m->get_gid()
            << " " << m->get_name()
            << " (was " << pending_map.active_gid << " "
            << pending_map.active_name << ")" << dendl;
    pending_map.active_gid = m->get_gid();
    pending_map.active_name = m->get_name();
    pending_map.active_change = ceph_clock_now()
```

The `25` version of `MgrMap`, when handled at `mgr.x`, doesn't trigger the `respawn()` path:

```
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr ms_dispatch2 active mgrmap(e 25) v1
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map received map epoch 25
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map active in map: 1 active is 4210
2022-05-17T00:10:11.197+0200 7fa3d6613700 10 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_read_frame_dispatch tag=17
2022-05-17T00:10:11.197+0200 7fa3d6613700  5 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_message got 43089 + 0 + 0 byte message. envelope type=1796 src mon.0 off 0
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active
```

Fixes: https://tracker.ceph.com/issues/55711
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
(cherry picked from commit 23c3f76)
in 23c3f76, the change to fail the mgr
is proposed immediately. but `MgrMonitor::prepare_command()` method still
returns `true` in this case. its indirect caller of
`PaxosService::dispatch()` considers this as a sign that it needs to
propose the change with `propose_pending()`. but the pending change has
already been proposed by `MgrMonitor::prepare_command()`, and
`have_pending` is also cleared by this call. as we don't allow
consecutive paxos proposals, the second `propose_pending()` call is
delayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
`ceph_assert(have_pending)` assertion failures.

in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.

this change should address the regression introduced by
23c3f76.

Fixes: https://tracker.ceph.com/issues/56850
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 5b1c6ad)
The return value is used to indicate whether the pending state should be
committed. There is no concept of "handled message" here (unlike
preprocess_query).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8caa1fd)
This race fixed by 23c3f7 exists wherever we drop the active mgr.
Resolve this by forcing immediate proposal (circumventing any delays)
whenever the active is dropped.

Fixes: 23c3f76
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 30b20d3)
Plugging PAXOS has the effect of batching map updates into a single
PAXOS transaction. Since we're updating the OSDMap several times and the
MgrMap, plug PAXOS for efficiency. This also has the nice effect of
reducing any delay between the active mgr getting dropped and the
blocklisting of its clients. This doesn't resolve any race condition as
the two maps are never processed in one unit. So the former active
manager may process the OSDMap blocklists before learning it is dropped
from the MgrMap.

Fixes: https://tracker.ceph.com/issues/58923
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 2e057bb)
@batrick
Copy link
Member Author

batrick commented May 4, 2023

The three commits from #50404 are missing "cherry picked from" annotations.

Also, if not using ceph-backport.sh script for creating backports, please consider replicating what it does as far as the description of the PR goes manually. In particular, providing a link to the main PR(s) is very useful because then the backport PR gets linked from there automatically as well.

Thanks for the nudge @idryomov . I need to revisit that script as the last time I tried it there were dependency issues.

@idryomov
Copy link
Contributor

idryomov commented May 8, 2023

jenkins test api

@ljflores
Copy link
Contributor

@yuriw yuriw merged commit 7d929e2 into ceph:quincy May 22, 2023
10 checks passed
@batrick batrick deleted the i59294 branch June 7, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants