Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/MgrMonitor: do not propose again for "mgr fail" #47834

Merged
merged 1 commit into from Aug 29, 2022

Conversation

tchaikov
Copy link
Contributor

in 23c3f76, the change to fail the mgr
is proposed immediately. but MgrMonitor::prepare_command() method still
returns true in this case. its indirect caller of
PaxosService::dispatch() considers this as a sign that it needs to
propose the change with propose_pending(). but the pending change has
already been proposed by MgrMonitor::prepare_command(), and
have_pending is also cleared by this call. as we don't allow
consecutive paxos proposals, the second propose_pending() call is
delayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
ceph_assert(have_pending) assertion failures.

in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.

this change should address the regression introduced by
23c3f76.

Fixes: https://tracker.ceph.com/issues/56850
Signed-off-by: Kefu Chai tchaikov@gmail.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

in 23c3f76, the change to fail the mgr
is proposed immediately. but `MgrMonitor::prepare_command()` method still
returns `true` in this case. its indirect caller of
`PaxosService::dispatch()` considers this as a sign that it needs to
propose the change with `propose_pending()`. but the pending change has
already been proposed by `MgrMonitor::prepare_command()`, and
`have_pending` is also cleared by this call. as we don't allow
consecutive paxos proposals, the second `propose_pending()` call is
delayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
`ceph_assert(have_pending)` assertion failures.

in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.

this change should address the regression introduced by
23c3f76.

Fixes: https://tracker.ceph.com/issues/56850
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@tchaikov tchaikov marked this pull request as ready for review August 27, 2022 16:07
@tchaikov tchaikov requested a review from a team as a code owner August 27, 2022 16:07
@tchaikov tchaikov changed the title mon/MgrMonitor: do not propse again for "mgr fail" mon/MgrMonitor: do not propose again for "mgr fail" Aug 27, 2022
@tchaikov
Copy link
Contributor Author

tested using run-backend-api-tests.sh.

  • before this change, run-backend-api-tests.sh failed randomly.
  • after this change, run-backend-api-tests.sh passes.

@tchaikov
Copy link
Contributor Author

jenkins test make check arm64

1 similar comment
@tchaikov
Copy link
Contributor Author

jenkins test make check arm64

@tchaikov tchaikov merged commit 28d890e into ceph:main Aug 29, 2022
@tchaikov tchaikov deleted the wip-56850 branch August 29, 2022 16:00
@tchaikov
Copy link
Contributor Author

tchaikov commented Aug 29, 2022

i cannot take it anymore. too many api test failures recently. will revert this change, if it does more harm than good. in the mean time, i will try to run it against the rados test suite. but the build is way too slow today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants