Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: restart boot process if waiting for luminous mons #16341

Merged
merged 1 commit into from Jul 17, 2017

Conversation

liewegas
Copy link
Member

If we start_boot and see that we don't have luminous mons, we will stop.
But we don't currently reliably notice when the luminous upgrade completes.
If we happen to be connected to the last mon we will start_boot() because
of the trigger in ms_handle_connect(), but if we are not connected to the
last mon we'll eventually get a monmap update but not restart booting.

Fix by setting a flag if we are waiting, and restart boot if the flag is
set, we are in preboot, and we see we now have luminous mons.

Fixes: http://tracker.ceph.com/issues/20631
Signed-off-by: Sage Weil sage@redhat.com

If we start_boot and see that we don't have luminous mons, we will stop.
But we don't currently reliably notice when the luminous upgrade completes.
If we happen to be connected to the last mon we will start_boot() because
of the trigger in ms_handle_connect(), but if we are not connected to the
last mon we'll eventually get a monmap update but not restart booting.

Fix by setting a flag if we are waiting, and restart boot if the flag is
set, we are in preboot, and we see we now have luminous mons.

Fixes: http://tracker.ceph.com/issues/20631
Signed-off-by: Sage Weil <sage@redhat.com>
@jecluis
Copy link
Member

jecluis commented Jul 14, 2017

OSD asserted on restart. I'll upload the log file to the ticket.

@liewegas
Copy link
Member Author

Works with my hacky test (which involves

diff --git a/src/boost b/src/boost
--- a/src/boost
+++ b/src/boost
@@ -1 +1 @@
-Subproject commit 1790aff3b34374d2af85f8c16755d101f49d2b6e
+Subproject commit 1790aff3b34374d2af85f8c16755d101f49d2b6e-dirty
diff --git a/src/mon/MonmapMonitor.cc b/src/mon/MonmapMonitor.cc
index 7a0fb68..5090af6 100644
--- a/src/mon/MonmapMonitor.cc
+++ b/src/mon/MonmapMonitor.cc
@@ -191,7 +191,10 @@ void MonmapMonitor::on_active()
   if (mon->is_leader())
     mon->clog->info() << "monmap " << *mon->monmap;
 
-  apply_mon_features(mon->get_quorum_mon_features());
+  utime_t after = mon->monmap->created;
+  after += 60;
+  if (ceph_clock_now() >= after)
+    apply_mon_features(mon->get_quorum_mon_features());
 }
 
 bool MonmapMonitor::preprocess_query(MonOpRequestRef op)
diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc
index 4a613ae..71f4f40 100644
--- a/src/mon/OSDMonitor.cc
+++ b/src/mon/OSDMonitor.cc
@@ -1268,7 +1268,7 @@ version_t OSDMonitor::get_trim_to()
     dout(10) << __func__ << ": quorum not formed" << dendl;
     return 0;
   }
-
+  return 0;
   epoch_t floor;
   if (mon->monmap->get_required_features().contains_all(
         ceph::features::mon::FEATURE_LUMINOUS)) {

and running with

MON=3 OSD=1 MDS=1 ../src/vstart.sh  -d -n -x -l   -o 'mon_debug_no_initial_persistent_features = true' -o 'mon debug no rquire luminous = true'

verifying osd is blocked, waiting a minutes, then restarting the mon the osd isn't connected to to trigger an election.

@liewegas
Copy link
Member Author

@liewegas liewegas merged commit b8737fa into ceph:master Jul 17, 2017
@liewegas liewegas deleted the wip-20631 branch July 17, 2017 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants