New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quincy: mon/Elector: Change how we handle removed_ranks and notify_rank_removed() #49311
quincy: mon/Elector: Change how we handle removed_ranks and notify_rank_removed() #49311
Conversation
|
jenkins test make check |
|
jenkins test windows |
|
jenkins test make check |
6868ff2
to
4386d18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; the right commits are included and the conflicts were just changing unsigned->int. [Please note that in future; it makes checking easier :) ]
|
jenkins test windows |
|
0 untracked failures, RADOS approved Failures jobid: [7123713] jobid: [7123747, 7123902] jobid: [7123774, 7123853, 7123929, 7124006] jobid: [7123797, 7123925] jobid: [7123808, 7123867, 7123962] DEAD |
…ElectionLogic & Elector Problem: Currently there are not ConnectionTracker logs, therefore it is really hard to debug Solution: Enable loggings for most functions in ConnectionTracker.cc Most of the logs are in debug_mon = 30. Also Added some logs in Elector and ElectionLogic so that debugging will be easier in the future. Fixes: https://tracker.ceph.com/issues/58049 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 58f2bd4)
PROBLEM: In `ConnectionTracker::receive_peer_report` we loop through ranks which is bad when there is `notify_rank_removed` before this and the ranks are not adjusted yet. When we rely on the rank in certain scenarios, we end up with extra peer_report copy which we don't want. SOLUTION: In `ConnectionTracker::receive_peer_report` instead of passing `report.rank` in the function `ConnectionTracker::reports`, we pass `i.first` instead so that trim old ranks properly. We also added a assert in notify_rank_removed(), comparing expected rank provided by the monmap against the rank that we adjust ourself to as a sanity check. We edited test/mon/test_election.cc to reflect the changes made in notify_rank_removed(). Fixes: https://tracker.ceph.com/issues/58049 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 7c52cce) Conflicts: src/mon/Elector.cc - trivial fix src/mon/Elector.h - trivial fix
when a new monitor joins, there is a chance that it will recive a monmap that recently removed a monitor and ``removed_rank`` will have some content in it. A new monitor that joins should never remove rank in peer_tracker but rather call ``notify_clear_peer_state()`` to reset the `peer_report`. In the case when it is a monitor that has joined quorum before and is only 1 epoch behind the newest monmap provided by the probe_replied monitor. We can actually remove and adjust ranks in `peer_report` since we are sure that if there is any content in removed_ranks, then it has to be because in the next epoch we are removing a rank, since every update of an epoch we always clear the removed_ranks. There is no point in keeping the content of ``removed_ranks`` after monmap gets updated to the epoch. Therefore, clear ``removed_ranks`` every update. When there is discontinuity between monmaps for more 1 epoch or the new monitor never joined quorum before, we always reset `peer_tracker`. Moreover, beneficial for monitor log to also log which rank has been removed at the current time of the monmap. So add removed_ranks to `print_summary` and `dump` in MonMap.cc. Fixes: https://tracker.ceph.com/issues/58049 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 0440257) Conflicts: src/mon/Monitor.cc - trivial fix
In `notify_clear_peer_state()` we another mechanism in reseting our `peer_tracker.rank` to match our own monitor.rank. This is added so there is a way for us to recover from a scenrio where `peer_tracker.rank` is messed up from adjusting the ranks or removing ranks. `notifiy_clear_peer_state()` can be triggered by using the command: `ceph connection scores reset` Also in `clear_peer_reports`, besides reassigning my_reports to an empty object, we also have to make `my_reports` = `rank` from `peer_tracker`, such that we don't get -1 as a rank in my_reports. Fixes: https://tracker.ceph.com/issues/58049 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 55cf717)
…d connection report When upgrading the monitors (include booting up), we check if `peer_tracker` is dirty or not. If so, we clear it. Added some functions in `Elector` and `ConnectionTracker` class to check for clean `peer_tracker`. Moreover, there could be some cases where due to startup weirdness or abnormal circumstances, we might get a report from our own rank. Therefore, it doesn't hurt to add a sanity check in `ConnectionTracker::report_live_connection` and `ConnectionTracker::report_dead_connection`. Fixes: https://tracker.ceph.com/issues/58049 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 25ce77c)
Problem: --mon-initial-members does nothing but causes monmap to populate ``removed_ranks`` because the way we start monitors in standalone tests uses ``run_mon $dir $id ..`` on each mon. Regardless of --mon-initial-members=a,b,c, if we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests), everytime we run a monitor (e.g.,run mon.b) it will pre-build our monmap with ``` noname-a=mon.noname-a addrs v2:127.0.0.1:7127/0, b=mon.b addrs v2:127.0.0.1:7128/0, noname-c=mon.noname-c addrs v2:127.0.0.1:7129/0, ``` Now, with --mon-initial-members=a,b,c we are letting monmap know that we should have initial members name: a,b,c, which we only have `b` as a match. So what ``MonMap::set_initial_members`` do is that it will remove noname-a and noname-c which will populate `removed_ranks`. Solution: remove all instances of --mon-initial-members in the standalone test as it has no impact on the nature of the tests themselves. Fixes: https://tracker.ceph.com/issues/58132 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit e1c095d)
Problem: In RHCS the user can choose to manually remove a monitor rank before shutting the monitor down. Causing inconsistency in monmap. for example we remove mon.a from the monmap, there is a short period where mon.a is still operational and will try to remove itself from monmap but we will run into an assertion in ConnectionTracker::notify_ranks_removed(). Solution: In Monitor::notify_new_monmap() we prevent the func from going into removing our own rank, or ranks that doesn't exists in monmap. FYI: this is an RHCS problem only, in ODF, we never remove a monitor from monmap before shutting it down. Fixes: https://tracker.ceph.com/issues/58049 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 924e7ec)
4386d18
to
45bbede
Compare
|
jenkins test make check |
Problem:
Currently, there is an issue when performing a DR test with 2 sites stretch cluster
where removing monitors and adding new ones to the cluster
causes incorrect
rankin ConnectionTracker class.This causes the monitor to think that they are someone else
in the
peer_trackercopy and will never update thecorrect field of itself, causing a deadlock in the election process (Ceph becoming unresponsive)
when using election strategy: 3 (Connectivity mode).
Solution:
It was really hard to debug the issue so the first thing we did was to add additional loggings to ConnectionTracker,
Elector and ElectionLogic Classes.
In
ConnectionTracker::receive_peer_reportwe loop through ranks which is bad when there isnotify_rank_removedbefore this and the ranks are not adjusted yet. When we rely on the rank in certain scenarios, we end up with extra peer_report copy which we don't want. Therefore, instead of passingreport.rankin the functionConnectionTracker::reports, we passi.firstinstead so that we trim old ranks properly. We also added an assert in notify_rank_removed(), comparing the expected rank provided by the monmap against the rank that we adjust ourselves to as a sanity check. We edited test/mon/test_election.cc to reflect the changes made in notify_rank_removed().MonMap::removed_rank does not get cleared every update of the epoch, this was the root cause of the problem. Therefore, we fix this by making MonMap clear removed_ranks every update. Moreover, When there is discontinuity between monmaps for more than 1 epoch or the new monitor never joined the quorum before,
we always reset
peer_tracker.Added a way for us to manually reset
peer_tracker.rankwhen executing the command:ceph connection scores resetfor each monitor. The peer_tracker.rank will match the current rank of the Monitor.When upgrading the monitors (including booting up), we check if
peer_trackeris dirty or not. If so, we clear it. Added some functions in theElectorandConnectionTrackerclasses to check for cleanpeer_tracker. Moreover, there could be some cases where due to startup weirdness or abnormal circumstances, we might get a report from our own rank. Therefore, it doesn't hurt to add a sanity check inConnectionTracker::report_live_connectionandConnectionTracker::report_dead_connection.--mon-initial-members does nothing but cause monmap
to populate
removed_ranks, therefore, remove all instances of --mon-initial-members in the standalone test as it has no impact on the nature of the tests themselves.Monitor::notify_new_monmap()skips removal of non-exist rankIn RHCS the user can choose to manually remove a monitor rank
before shutting the monitor down. Causing inconsistency in monmap.
Therefore, in
Monitor::notify_new_monmap()we prevent the function from going into removing our own rank or ranks that don't exist in monmap.Fixes: https://tracker.ceph.com/issues/58049
Fixes: https://tracker.ceph.com/issues/58132
Backporting relevant commits from the main PR:
#48991
Signed-off-by: Kamoltat ksirivad@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows