New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: check is_shutdown() in timer callbacks #14919
Conversation
retest this please. |
@liewegas mind taking a look at this one also? |
We have lots of code like void Elector::shutdown() { if (expire_event) mon->timer.cancel_event(expire_event); } They won't break (cancel_event behaves if the event is already gone), but it makes me nervous having these dangling pointers to freed memory. I think this will be safer than maintaining all of the shutdown timer checks, though! |
src/mon/Monitor.cc
Outdated
@@ -899,6 +899,10 @@ void Monitor::shutdown() | |||
|
|||
wait_for_paxos_write(); | |||
|
|||
// stop timer earlier so the event callbacks won't panic seeing an inactive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/earlier/early/
@liewegas how about exposing the |
changelog
|
Hrm my other worry is that we haven't stopped the finisher yet and it's
possible one of those events will try to schedule something new.
I'm leaning toward making sure every shutdown properly cancels all of its
events... it seems like the bug here is just that Paxos::shutdown(), or
whatever method is triggered when is_leader() switches to false, didn't
clean up its events like it should have?
|
even if Paxos cancels all its events, there is still chance that the maybe we can change mon's state to |
changelog
|
the latest change is different now. needs review.
adding [DNM], i haven't tested this yet. |
maybe the callback should just return and do nothing if is_leader is false
then?
|
@liewegas there are a handful call backs checking the monitor state, i think we could check (i just skimmed through the code, seems it's safe to do so) |
src/mon/Monitor.cc
Outdated
@@ -899,8 +899,6 @@ void Monitor::shutdown() | |||
|
|||
wait_for_paxos_write(); | |||
|
|||
state = STATE_SHUTDOWN; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do this either :( because we use is_shutdown() in places like ms_dispatch to drop new work on the floor. Without it, we will take in a start new work even after systems are shutting down..
Looking a bit more it seems like checking for is_shutdown() in the callacks is the safest thing? |
instead of doing it manually. Signed-off-by: Kefu Chai <kchai@redhat.com>
introduce a helper class: C_MonContext, and initialize all timer events using it, to ensure that they do check is_shutdown() before doing their work. Fixes: http://tracker.ceph.com/issues/19825 Signed-off-by: Kefu Chai <kchai@redhat.com>
changelog
@liewegas fixed and repushed. |
void finish(int r) override { | ||
ps->proposal_timer = 0; | ||
proposal_timer = new C_MonContext(mon, [this](int r) { | ||
proposal_timer = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't run in is_shutdown(). I think it's okay in this case since timer tolerates a cancel_event on a non-event, though, and who cares during shutdown.
expire_event = new C_ElectionExpire(this); | ||
expire_event = new C_MonContext(mon, [this](int) { | ||
expire(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg so much better
Fixes: http://tracker.ceph.com/issues/19825
Signed-off-by: Kefu Chai kchai@redhat.com