New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: make MDSMonitor tolerant of slow mon elections #11167
Conversation
} | ||
} | ||
|
||
last_tick = now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this sets last_tick on peon monitors, which means if they get promoted to leader in less time than mds_beacon_interval+mds_beacon_grace they won't go through the beacon reset process...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, usually peons will have an empty last_beacon when they come up, as it's only populated in prepare_beacon and in the leader-specific later part of tick. That said, there's nothing to reset it when a leader goes back to being a peon, so one could end up with stale state there. Maybe we need to clear out the ephemeral bits like last_beacon explicitly when we go leader->peon.
utime_t cutoff = now; | ||
cutoff -= g_conf->mds_beacon_grace; | ||
|
||
// make sure last_beacon is fully populated | ||
for (const auto &p : pending_fsmap.mds_roles) { | ||
auto &gid = p.first; | ||
if (last_beacon.count(gid) == 0) { | ||
last_beacon[gid].stamp = ceph_clock_now(g_ceph_context); | ||
last_beacon[gid].stamp = now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and if we had a (stale) last_beacon map (like from being leader before, because another mon is flapping) then this doesn't get triggered. And down below we will mark the MDS as failed.
// When did the mon last call into our tick() method? Used for detecting | ||
// when the mon was not updating us for some period (e.g. during slow | ||
// election) to reset last_beacon timeouts | ||
utime_t last_tick; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this needs to get reset on monitor elections or something. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...yep! (wrote my earlier reply before reading to the end of your comments)
@jcsp ping |
Updated, also made the last_beacon etc private while we're here. |
Seems like we should still move the beacon reset code into a helper and call it explicitly when we win an election, so that the leader always has clean state? I think the failsafe tick vs last_tick check should stay, though, to handle the loady slow mon case... |
@liewegas the intent of the last_beacon.clear() in on_restart is to reset the beacon state on winning an election -- I haven't followed through the paths that actually invoke on_restart(), so is there somewhere else? |
This was almost all public. Signed-off-by: John Spray <john.spray@redhat.com>
Previously MDS daemons would get failed incorrectly when they appeared to have timed out due to delays in calling into MDSMonitor that were actually caused by e.g. slow leveldb writes leading to slow mon elections. Fixes: http://tracker.ceph.com/issues/17308 Signed-off-by: John Spray <john.spray@redhat.com>
aha, yeah, lgtm!
|
Oops, I wrote Reviewed-by myself 😬 |
Previously MDS daemons would get failed incorrectly
when they appeared to have timed out due to
delays in calling into MDSMonitor that were
actually caused by e.g. slow leveldb writes leading
to slow mon elections.
Fixes: http://tracker.ceph.com/issues/17308
Signed-off-by: John Spray john.spray@redhat.com