Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: make MDSMonitor tolerant of slow mon elections #11167

Merged
merged 2 commits into from Nov 16, 2016
Merged

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Sep 21, 2016

Previously MDS daemons would get failed incorrectly
when they appeared to have timed out due to
delays in calling into MDSMonitor that were
actually caused by e.g. slow leveldb writes leading
to slow mon elections.

Fixes: http://tracker.ceph.com/issues/17308
Signed-off-by: John Spray john.spray@redhat.com

@jcsp jcsp added bug-fix cephfs Ceph File System labels Sep 21, 2016
}
}

last_tick = now;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this sets last_tick on peon monitors, which means if they get promoted to leader in less time than mds_beacon_interval+mds_beacon_grace they won't go through the beacon reset process...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, usually peons will have an empty last_beacon when they come up, as it's only populated in prepare_beacon and in the leader-specific later part of tick. That said, there's nothing to reset it when a leader goes back to being a peon, so one could end up with stale state there. Maybe we need to clear out the ephemeral bits like last_beacon explicitly when we go leader->peon.

utime_t cutoff = now;
cutoff -= g_conf->mds_beacon_grace;

// make sure last_beacon is fully populated
for (const auto &p : pending_fsmap.mds_roles) {
auto &gid = p.first;
if (last_beacon.count(gid) == 0) {
last_beacon[gid].stamp = ceph_clock_now(g_ceph_context);
last_beacon[gid].stamp = now;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and if we had a (stale) last_beacon map (like from being leader before, because another mon is flapping) then this doesn't get triggered. And down below we will mark the MDS as failed.

// When did the mon last call into our tick() method? Used for detecting
// when the mon was not updating us for some period (e.g. during slow
// election) to reset last_beacon timeouts
utime_t last_tick;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this needs to get reset on monitor elections or something. :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...yep! (wrote my earlier reply before reading to the end of your comments)

@liewegas liewegas modified the milestone: kraken Nov 3, 2016
@liewegas
Copy link
Member

liewegas commented Nov 4, 2016

@jcsp ping

@jcsp
Copy link
Contributor Author

jcsp commented Nov 9, 2016

Updated, also made the last_beacon etc private while we're here.

@jcsp jcsp removed their assignment Nov 9, 2016
@liewegas
Copy link
Member

Seems like we should still move the beacon reset code into a helper and call it explicitly when we win an election, so that the leader always has clean state? I think the failsafe tick vs last_tick check should stay, though, to handle the loady slow mon case...

@jcsp
Copy link
Contributor Author

jcsp commented Nov 15, 2016

@liewegas the intent of the last_beacon.clear() in on_restart is to reset the beacon state on winning an election -- I haven't followed through the paths that actually invoke on_restart(), so is there somewhere else?

John Spray added 2 commits November 15, 2016 23:09
This was almost all public.

Signed-off-by: John Spray <john.spray@redhat.com>
Previously MDS daemons would get failed incorrectly
when they appeared to have timed out due to
delays in calling into MDSMonitor that were
actually caused by e.g. slow leveldb writes leading
to slow mon elections.

Fixes: http://tracker.ceph.com/issues/17308
Signed-off-by: John Spray <john.spray@redhat.com>
@liewegas
Copy link
Member

liewegas commented Nov 15, 2016 via email

@jcsp jcsp merged commit f4fda3f into ceph:master Nov 16, 2016
@jcsp jcsp deleted the wip-17308 branch November 16, 2016 11:59
@jcsp
Copy link
Contributor Author

jcsp commented Nov 16, 2016

Oops, I wrote Reviewed-by myself 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix cephfs Ceph File System
Projects
None yet
3 participants