-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pacific:mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode() #48803
pacific:mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode() #48803
Conversation
Added bug reproducer for https://bugzilla.redhat.com/show_bug.cgi?id=2104207 Added more logs in MON. Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit 62fe3cb)
Problem: There are certain scenarios in degraded stretched cluster where will try to go into the function ``Monitor::go_recovery_stretch_mode()`` that will lead to a `ceph_assert`. Solution: Make sure ``dead_mon_buckets.size() == 0`` in ``OSDMonitor:update_from_paxos()`` before going into ``Monitor::go_recovery_stretch_mode()``. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2104207 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit d95c41a)
jenkins test make check |
@kamoltat I found a failure that looks related. Can you take a look? /a/yuriw-2022-11-30_15:10:52-rados-wip-yuri3-testing-2022-11-28-0750-pacific-distro-default-smithi/7098562
|
This one also might be related: /a/yuriw-2022-11-29_15:35:32-rados-wip-yuri3-testing-2022-11-28-0750-pacific-distro-default-smithi/7097000/
|
And this one: /a/yuriw-2022-11-29_15:35:32-rados-wip-yuri3-testing-2022-11-28-0750-pacific-distro-default-smithi/7096933 It seems like there's a problem when the mons restart in some of the jobs. |
@yuriw merged this before I had a chance to investigate (my fault). I had seen some failures I thought were related in a previous test batch, and put it through another batch since I thought there were updates. But somehow this run looked fine... Here is the review: https://pulpito.ceph.com/?branch=wip-yuri2-testing-2022-12-07-0821-pacific Failures, unrelated: Details: |
This PR: #47340 We are in the process of fixing it |
…tch_mode()" This commit belongs to ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 94dc970. Signed-off-by: Kamoltat <ksirivad@redhat.com>
This commit belongs to ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 025d3fa. Signed-off-by: Kamoltat <ksirivad@redhat.com>
…tch_mode()" This commit belongs to ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 94dc970. Fixes: https://tracker.ceph.com/issues/58239 Signed-off-by: Kamoltat <ksirivad@redhat.com>
This commit belongs to ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 025d3fa. Fixes: https://tracker.ceph.com/issues/58239 Signed-off-by: Kamoltat <ksirivad@redhat.com>
This commit belongs to ceph/ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 025d3fa. Fixes: https://tracker.ceph.com/issues/58239 Signed-off-by: Kamoltat <ksirivad@redhat.com>
…tch_mode()" This commit belongs to ceph/ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 94dc970. Fixes: https://tracker.ceph.com/issues/58239 Signed-off-by: Kamoltat <ksirivad@redhat.com>
This commit belongs to ceph/ceph#48803 which introduced https://tracker.ceph.com/issues/58239. Therefore, we are reverting it. This reverts commit 025d3fa. Fixes: https://tracker.ceph.com/issues/58239 Signed-off-by: Kamoltat <ksirivad@redhat.com>
Problem:
There are certain scenarios in degraded
stretched cluster where will try to
go into the
function
Monitor::go_recovery_stretch_mode()
that will lead to a
ceph_assert
.Solution:
Make sure
dead_mon_buckets.size() == 0
in
OSDMonitor:update_from_paxos()
before going into
Monitor::go_recovery_stretch_mode()
.Fixes:
https://tracker.ceph.com/issues/57017
Backporting relevant commits from main PR:
#47340
Signed-off-by: Kamoltat ksirivad@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows