mon: make MDSMonitor tolerant of slow mon elections #11167

jcsp · 2016-09-21T10:48:48Z

Previously MDS daemons would get failed incorrectly
when they appeared to have timed out due to
delays in calling into MDSMonitor that were
actually caused by e.g. slow leveldb writes leading
to slow mon elections.

Fixes: http://tracker.ceph.com/issues/17308
Signed-off-by: John Spray john.spray@redhat.com

gregsfortytwo · 2016-09-23T20:58:32Z

src/mon/MDSMonitor.cc

+    }
+  }
+
+  last_tick = now;


So this sets last_tick on peon monitors, which means if they get promoted to leader in less time than mds_beacon_interval+mds_beacon_grace they won't go through the beacon reset process...

Hmm, usually peons will have an empty last_beacon when they come up, as it's only populated in prepare_beacon and in the leader-specific later part of tick. That said, there's nothing to reset it when a leader goes back to being a peon, so one could end up with stale state there. Maybe we need to clear out the ephemeral bits like last_beacon explicitly when we go leader->peon.

gregsfortytwo · 2016-09-23T20:58:35Z

src/mon/MDSMonitor.cc

  utime_t cutoff = now;
  cutoff -= g_conf->mds_beacon_grace;

  // make sure last_beacon is fully populated
  for (const auto &p : pending_fsmap.mds_roles) {
    auto &gid = p.first;
    if (last_beacon.count(gid) == 0) {
-      last_beacon[gid].stamp = ceph_clock_now(g_ceph_context);
+      last_beacon[gid].stamp = now;


...and if we had a (stale) last_beacon map (like from being leader before, because another mon is flapping) then this doesn't get triggered. And down below we will mark the MDS as failed.

gregsfortytwo · 2016-09-23T20:59:09Z

src/mon/MDSMonitor.h

+  // When did the mon last call into our tick() method?  Used for detecting
+  // when the mon was not updating us for some period (e.g. during slow
+  // election) to reset last_beacon timeouts
+  utime_t last_tick;


So this needs to get reset on monitor elections or something. :/

...yep! (wrote my earlier reply before reading to the end of your comments)

liewegas · 2016-11-04T18:03:06Z

@jcsp ping

jcsp · 2016-11-09T19:55:46Z

Updated, also made the last_beacon etc private while we're here.

liewegas · 2016-11-14T15:10:35Z

Seems like we should still move the beacon reset code into a helper and call it explicitly when we win an election, so that the leader always has clean state? I think the failsafe tick vs last_tick check should stay, though, to handle the loady slow mon case...

jcsp · 2016-11-15T23:08:50Z

@liewegas the intent of the last_beacon.clear() in on_restart is to reset the beacon state on winning an election -- I haven't followed through the paths that actually invoke on_restart(), so is there somewhere else?

This was almost all public. Signed-off-by: John Spray <john.spray@redhat.com>

Previously MDS daemons would get failed incorrectly when they appeared to have timed out due to delays in calling into MDSMonitor that were actually caused by e.g. slow leveldb writes leading to slow mon elections. Fixes: http://tracker.ceph.com/issues/17308 Signed-off-by: John Spray <john.spray@redhat.com>

liewegas · 2016-11-15T23:16:22Z

aha, yeah, lgtm!

jcsp · 2016-11-16T12:01:17Z

Oops, I wrote Reviewed-by myself 😬

jcsp added bug-fix cephfs Ceph File System labels Sep 21, 2016

gregsfortytwo requested changes Sep 23, 2016

View reviewed changes

gregsfortytwo assigned jcsp Sep 23, 2016

liewegas modified the milestone: kraken Nov 3, 2016

jcsp force-pushed the wip-17308 branch from 7a7b14f to 1d0ce06 Compare November 9, 2016 19:53

jcsp removed their assignment Nov 9, 2016

John Spray added 2 commits November 15, 2016 23:09

mon: clean up MDSMonitor interface

14588e5

This was almost all public. Signed-off-by: John Spray <john.spray@redhat.com>

jcsp force-pushed the wip-17308 branch from 1d0ce06 to 9950062 Compare November 15, 2016 23:09

jcsp merged commit f4fda3f into ceph:master Nov 16, 2016

jcsp deleted the wip-17308 branch November 16, 2016 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon: make MDSMonitor tolerant of slow mon elections #11167

mon: make MDSMonitor tolerant of slow mon elections #11167

jcsp commented Sep 21, 2016

gregsfortytwo Sep 23, 2016

jcsp Nov 9, 2016

gregsfortytwo Sep 23, 2016

gregsfortytwo Sep 23, 2016

jcsp Nov 9, 2016

liewegas commented Nov 4, 2016

jcsp commented Nov 9, 2016

liewegas commented Nov 14, 2016

jcsp commented Nov 15, 2016

liewegas commented Nov 15, 2016 via email

jcsp commented Nov 16, 2016

mon: make MDSMonitor tolerant of slow mon elections #11167

mon: make MDSMonitor tolerant of slow mon elections #11167

Conversation

jcsp commented Sep 21, 2016

gregsfortytwo Sep 23, 2016

Choose a reason for hiding this comment

jcsp Nov 9, 2016

Choose a reason for hiding this comment

gregsfortytwo Sep 23, 2016

Choose a reason for hiding this comment

gregsfortytwo Sep 23, 2016

Choose a reason for hiding this comment

jcsp Nov 9, 2016

Choose a reason for hiding this comment

liewegas commented Nov 4, 2016

jcsp commented Nov 9, 2016

liewegas commented Nov 14, 2016

jcsp commented Nov 15, 2016

liewegas commented Nov 15, 2016 via email

jcsp commented Nov 16, 2016