Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: restart boot process if waiting for luminous mons #16341

Merged
merged 1 commit into from Jul 17, 2017

Conversation

Projects
None yet
3 participants
@liewegas
Copy link
Member

liewegas commented Jul 14, 2017

If we start_boot and see that we don't have luminous mons, we will stop.
But we don't currently reliably notice when the luminous upgrade completes.
If we happen to be connected to the last mon we will start_boot() because
of the trigger in ms_handle_connect(), but if we are not connected to the
last mon we'll eventually get a monmap update but not restart booting.

Fix by setting a flag if we are waiting, and restart boot if the flag is
set, we are in preboot, and we see we now have luminous mons.

Fixes: http://tracker.ceph.com/issues/20631
Signed-off-by: Sage Weil sage@redhat.com

osd: restart boot process if waiting for luminous mons
If we start_boot and see that we don't have luminous mons, we will stop.
But we don't currently reliably notice when the luminous upgrade completes.
If we happen to be connected to the last mon we will start_boot() because
of the trigger in ms_handle_connect(), but if we are not connected to the
last mon we'll eventually get a monmap update but not restart booting.

Fix by setting a flag if we are waiting, and restart boot if the flag is
set, we are in preboot, and we see we now have luminous mons.

Fixes: http://tracker.ceph.com/issues/20631
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas requested a review from jecluis Jul 14, 2017

@jecluis

This comment has been minimized.

Copy link
Member

jecluis commented Jul 14, 2017

OSD asserted on restart. I'll upload the log file to the ticket.

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jul 14, 2017

Works with my hacky test (which involves

diff --git a/src/boost b/src/boost
--- a/src/boost
+++ b/src/boost
@@ -1 +1 @@
-Subproject commit 1790aff3b34374d2af85f8c16755d101f49d2b6e
+Subproject commit 1790aff3b34374d2af85f8c16755d101f49d2b6e-dirty
diff --git a/src/mon/MonmapMonitor.cc b/src/mon/MonmapMonitor.cc
index 7a0fb68..5090af6 100644
--- a/src/mon/MonmapMonitor.cc
+++ b/src/mon/MonmapMonitor.cc
@@ -191,7 +191,10 @@ void MonmapMonitor::on_active()
   if (mon->is_leader())
     mon->clog->info() << "monmap " << *mon->monmap;
 
-  apply_mon_features(mon->get_quorum_mon_features());
+  utime_t after = mon->monmap->created;
+  after += 60;
+  if (ceph_clock_now() >= after)
+    apply_mon_features(mon->get_quorum_mon_features());
 }
 
 bool MonmapMonitor::preprocess_query(MonOpRequestRef op)
diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc
index 4a613ae..71f4f40 100644
--- a/src/mon/OSDMonitor.cc
+++ b/src/mon/OSDMonitor.cc
@@ -1268,7 +1268,7 @@ version_t OSDMonitor::get_trim_to()
     dout(10) << __func__ << ": quorum not formed" << dendl;
     return 0;
   }
-
+  return 0;
   epoch_t floor;
   if (mon->monmap->get_required_features().contains_all(
         ceph::features::mon::FEATURE_LUMINOUS)) {

and running with

MON=3 OSD=1 MDS=1 ../src/vstart.sh  -d -n -x -l   -o 'mon_debug_no_initial_persistent_features = true' -o 'mon debug no rquire luminous = true'

verifying osd is blocked, waiting a minutes, then restarting the mon the osd isn't connected to to trigger an election.

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jul 17, 2017

@liewegas liewegas merged commit b8737fa into ceph:master Jul 17, 2017

3 of 4 checks passed

make check (arm64) make check failed
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@liewegas liewegas deleted the liewegas:wip-20631 branch Jul 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.