New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon/MgrMonitor: do not propose again for "mgr fail" #47834
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in 23c3f76, the change to fail the mgr is proposed immediately. but `MgrMonitor::prepare_command()` method still returns `true` in this case. its indirect caller of `PaxosService::dispatch()` considers this as a sign that it needs to propose the change with `propose_pending()`. but the pending change has already been proposed by `MgrMonitor::prepare_command()`, and `have_pending` is also cleared by this call. as we don't allow consecutive paxos proposals, the second `propose_pending()` call is delayed with a configured latency. but when the timer is fired, this poseponed call would find itself trying to propose nothing. the change to fail the mgr has been proposed. that's why we have `ceph_assert(have_pending)` assertion failures. in this change, the second proposal is not proposed anymore if the proposal is proposed immediately. this should avoid the assertion failure. this change should address the regression introduced by 23c3f76. Fixes: https://tracker.ceph.com/issues/56850 Signed-off-by: Kefu Chai <tchaikov@gmail.com>
tchaikov
changed the title
mon/MgrMonitor: do not propse again for "mgr fail"
mon/MgrMonitor: do not propose again for "mgr fail"
Aug 27, 2022
tested using
|
jenkins test make check arm64 |
1 similar comment
jenkins test make check arm64 |
rzarzynski
approved these changes
Aug 29, 2022
i cannot take it anymore. too many api test failures recently. will revert this change, if it does more harm than good. in the mean time, i will try to run it against the rados test suite. but the build is way too slow today. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
in 23c3f76, the change to fail the mgr
is proposed immediately. but
MgrMonitor::prepare_command()
method stillreturns
true
in this case. its indirect caller ofPaxosService::dispatch()
considers this as a sign that it needs topropose the change with
propose_pending()
. but the pending change hasalready been proposed by
MgrMonitor::prepare_command()
, andhave_pending
is also cleared by this call. as we don't allowconsecutive paxos proposals, the second
propose_pending()
call isdelayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
ceph_assert(have_pending)
assertion failures.in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.
this change should address the regression introduced by
23c3f76.
Fixes: https://tracker.ceph.com/issues/56850
Signed-off-by: Kefu Chai tchaikov@gmail.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows