osd: make the PG's SORTBITWISE assert a more generous shutdown #18047

gregsfortytwo · 2017-09-29T22:36:06Z

We want to stop working if we get activated while sortbitwise is not set
on the cluster, but we might have old maps where it wasn't if the flag
was changed recently. Instead, if SORTBITWISE is not set, further check
if the OSD is active and the map we are now looking at is newer than
when the OSD booted. Do a more polite shutdown with logged error message
instead of asserting.

Fixes: http://tracker.ceph.com/issues/20416

Signed-off-by: Greg Farnum gfarnum@redhat.com

liewegas · 2017-10-02T02:12:33Z

src/osd/PG.cc

-  assert(osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE));
+  // we don't speak non-sortbitwise, so we'd better not go
+  // active if they somehow haven't set it!
+  if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) ||


I think || should be && ?

Yeah, changing

liewegas · 2017-10-02T02:16:02Z

src/osd/PG.cc

+  if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) ||
+	 // but we have to be nice in case we're sortbitwise *now*
+	 // and are processing updates from before that happened
+      (osd->osd->is_active() && osdmap->get_epoch() < osd->get_boot_epoch())) {


..but this condition is..weird. osdmap is the PG's osdmap, but that may lag, so maybe we might be processing the PG interval after the osd is active but PG is still catching up?

But perhaps it's simpler to just drop this assert entirely! We have one in OSD::activate_map():

if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) { derr << __func__ << " SORTBITWISE flag is not set" << dendl; ceph_abort(); }

so I think any check here is redundant.

So the point here is that we only want to bail out if we're doing work in a PG. So we check for SORTBITWISE missing, AND that we're working in the (global) current interval. Our best local approximation for that involves making sure the OSD is active and that our map is newer than when the OSD went active...
and now I see that this condition is also backwards. No idea what the heck I was doing when I wrote this code. :/ Also changing.

gregsfortytwo · 2017-10-04T17:57:39Z

src/osd/PG.cc

+  if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) ||
+	 // but we have to be nice in case we're sortbitwise *now*
+	 // and are processing updates from before that happened
+      (osd->osd->is_active() && osdmap->get_epoch() < osd->get_boot_epoch())) {


In my first pass at this I did just drop the PG assert. I ended up not doing that because both this and the OSD::activate_map() one were inserted as part of one PR from you, and they're accomplishing slightly different things — the broader OSD one is called after the OSDMap gets published, which I think means it can leak out to PGs and get used before the main OSD manages to shut everything down. Not sure if there are other invariants they're covering or not.

gregsfortytwo · 2017-10-04T17:58:53Z

@liewegas, updated.

liewegas · 2017-10-04T19:11:15Z

how about moving the assert out of activate_map and instead putting it before the call to consume_map()? then we know we won't feed !sortbitwise maps to pgs while we are active:

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index b53ff58177..1561357a0f 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -7758,6 +7758,11 @@ void OSD::_committed_osd_maps(epoch_t first, epoch_t last, MOSDMap *m)
   check_osdmap_features(store);
 
   // yay!
+  if (is_active() && !osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) {
+    derr << __func__ << " SORTBITWISE flag is not set" << dendl;
+    ceph_abort();
+  }
+
   consume_map();
 
   if (is_active() || is_waiting_for_healthy())

The pg peering thread asserting basedon the osd active state and epochs just feels very fragile to me...

gregsfortytwo · 2017-10-04T19:59:01Z

Yeah, that works for me if it works for you.

gregsfortytwo · 2017-10-04T21:11:32Z

Updated again; I can squash down and try to figure out some user testing if this looks good.

liewegas · 2017-10-04T21:44:46Z

src/osd/OSD.cc

+   */
+  if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) {
+    epoch_t boot = service.get_boot_epoch();
+    if (boot && boot <= osdmap->get_epoch()) {


Can we just condition this on is_active() instead? I think the piece that matters is us communicating iwth other OSDs, which is controlled here:

void OSD::dispatch_context(PG::RecoveryCtx &ctx, PG *pg, OSDMapRef curmap, ThreadPool::TPHandle *handle) { if (service.get_osdmap()->is_up(whoami) && is_active()) { do_notifies(*ctx.notify_list, curmap); do_queries(*ctx.query_map, curmap); do_infos(*ctx.info_map, curmap); }

..and that is conditioned on is_active() (and is_up()... although i'm not quite sure when that diverges from is_active()).

I don't think boot epoch is reset to 0 if we get marked down, which means we would abort trying to chew through maps while we were down if the flag is set then, whereas i think we only should crash if we get marked up and it is set?

I don't think the cases where we are marked down and don't update boot_epoch will matter — if sortbitwise gets turned off we want to complain; and if it gets turned back on we'll happily churn through when restarted — but I can make that change. There was some reason not to when doing it in the PG case but it works here.

(We have the double-check in that condition because we can prepublish maps before doing enough processing on them to make sure we're still supposed to be active.)

yuriw · 2017-10-05T15:48:48Z

test this please

yuriw · 2017-10-05T16:42:28Z

wip-yuri-testing2-2017-10-05-1641

We want to stop working if we get activated while sortbitwise is not set on the cluster, but we might have old maps where it wasn't if the flag was changed recently. And doing it in the PG code was a bit silly anyway. Instead check SORTBITWISE in the main OSDMap handling code prior to prepublishing it. Let it go through if we aren't active at the time. Fixes: http://tracker.ceph.com/issues/20416 Signed-off-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo · 2017-10-05T17:59:46Z

Squashed down into one commit for easy backporting.

smithfarm · 2017-11-08T15:33:17Z

luminous backport PR: #18132

gregsfortytwo added bug-fix core needs-review labels Sep 29, 2017

gregsfortytwo requested review from liewegas and jdurgin September 29, 2017 22:36

liewegas requested changes Oct 2, 2017

View reviewed changes

gregsfortytwo commented Oct 4, 2017

View reviewed changes

gregsfortytwo force-pushed the wip-20416-bitwise-assert branch from d7f7c56 to 89bfe65 Compare October 4, 2017 17:58

gregsfortytwo force-pushed the wip-20416-bitwise-assert branch from 89bfe65 to e2a4b37 Compare October 4, 2017 21:10

liewegas reviewed Oct 4, 2017

View reviewed changes

gregsfortytwo force-pushed the wip-20416-bitwise-assert branch from e2a4b37 to 595f98f Compare October 4, 2017 21:59

liewegas approved these changes Oct 5, 2017

View reviewed changes

liewegas added needs-qa and removed needs-review labels Oct 5, 2017

yuriw added the wip-yuri2-testing label Oct 5, 2017

gregsfortytwo force-pushed the wip-20416-bitwise-assert branch from 595f98f to 0a691b2 Compare October 5, 2017 17:59

yuriw merged commit 5dc2dea into ceph:master Oct 6, 2017

gregsfortytwo deleted the wip-20416-bitwise-assert branch November 28, 2023 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: make the PG's SORTBITWISE assert a more generous shutdown #18047

osd: make the PG's SORTBITWISE assert a more generous shutdown #18047

gregsfortytwo commented Sep 29, 2017

liewegas Oct 2, 2017

gregsfortytwo Oct 4, 2017

liewegas Oct 2, 2017

gregsfortytwo Oct 4, 2017

gregsfortytwo Oct 4, 2017

gregsfortytwo commented Oct 4, 2017

liewegas commented Oct 4, 2017

gregsfortytwo commented Oct 4, 2017

gregsfortytwo commented Oct 4, 2017

liewegas Oct 4, 2017

gregsfortytwo Oct 4, 2017

yuriw commented Oct 5, 2017

yuriw commented Oct 5, 2017

gregsfortytwo commented Oct 5, 2017

smithfarm commented Nov 8, 2017

osd: make the PG's SORTBITWISE assert a more generous shutdown #18047

osd: make the PG's SORTBITWISE assert a more generous shutdown #18047

Conversation

gregsfortytwo commented Sep 29, 2017

liewegas Oct 2, 2017

Choose a reason for hiding this comment

gregsfortytwo Oct 4, 2017

Choose a reason for hiding this comment

liewegas Oct 2, 2017

Choose a reason for hiding this comment

gregsfortytwo Oct 4, 2017

Choose a reason for hiding this comment

gregsfortytwo Oct 4, 2017

Choose a reason for hiding this comment

gregsfortytwo commented Oct 4, 2017

liewegas commented Oct 4, 2017

gregsfortytwo commented Oct 4, 2017

gregsfortytwo commented Oct 4, 2017

liewegas Oct 4, 2017

Choose a reason for hiding this comment

gregsfortytwo Oct 4, 2017

Choose a reason for hiding this comment

yuriw commented Oct 5, 2017

yuriw commented Oct 5, 2017

gregsfortytwo commented Oct 5, 2017

smithfarm commented Nov 8, 2017