Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: make the PG's SORTBITWISE assert a more generous shutdown #18047

Merged
merged 1 commit into from Oct 6, 2017

Conversation

gregsfortytwo
Copy link
Member

We want to stop working if we get activated while sortbitwise is not set
on the cluster, but we might have old maps where it wasn't if the flag
was changed recently. Instead, if SORTBITWISE is not set, further check
if the OSD is active and the map we are now looking at is newer than
when the OSD booted. Do a more polite shutdown with logged error message
instead of asserting.

Fixes: http://tracker.ceph.com/issues/20416

Signed-off-by: Greg Farnum gfarnum@redhat.com

src/osd/PG.cc Outdated
assert(osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE));
// we don't speak non-sortbitwise, so we'd better not go
// active if they somehow haven't set it!
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think || should be && ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, changing

src/osd/PG.cc Outdated
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) ||
// but we have to be nice in case we're sortbitwise *now*
// and are processing updates from before that happened
(osd->osd->is_active() && osdmap->get_epoch() < osd->get_boot_epoch())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..but this condition is..weird. osdmap is the PG's osdmap, but that may lag, so maybe we might be processing the PG interval after the osd is active but PG is still catching up?

But perhaps it's simpler to just drop this assert entirely! We have one in OSD::activate_map():

  if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) {
    derr << __func__ << " SORTBITWISE flag is not set" << dendl;
    ceph_abort();
  }

so I think any check here is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the point here is that we only want to bail out if we're doing work in a PG. So we check for SORTBITWISE missing, AND that we're working in the (global) current interval. Our best local approximation for that involves making sure the OSD is active and that our map is newer than when the OSD went active...
and now I see that this condition is also backwards. No idea what the heck I was doing when I wrote this code. :/ Also changing.

src/osd/PG.cc Outdated
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) ||
// but we have to be nice in case we're sortbitwise *now*
// and are processing updates from before that happened
(osd->osd->is_active() && osdmap->get_epoch() < osd->get_boot_epoch())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my first pass at this I did just drop the PG assert. I ended up not doing that because both this and the OSD::activate_map() one were inserted as part of one PR from you, and they're accomplishing slightly different things — the broader OSD one is called after the OSDMap gets published, which I think means it can leak out to PGs and get used before the main OSD manages to shut everything down. Not sure if there are other invariants they're covering or not.

@gregsfortytwo
Copy link
Member Author

@liewegas, updated.

@liewegas
Copy link
Member

liewegas commented Oct 4, 2017

how about moving the assert out of activate_map and instead putting it before the call to consume_map()? then we know we won't feed !sortbitwise maps to pgs while we are active:

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index b53ff58177..1561357a0f 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -7758,6 +7758,11 @@ void OSD::_committed_osd_maps(epoch_t first, epoch_t last, MOSDMap *m)
   check_osdmap_features(store);
 
   // yay!
+  if (is_active() && !osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) {
+    derr << __func__ << " SORTBITWISE flag is not set" << dendl;
+    ceph_abort();
+  }
+
   consume_map();
 
   if (is_active() || is_waiting_for_healthy())

The pg peering thread asserting basedon the osd active state and epochs just feels very fragile to me...

@gregsfortytwo
Copy link
Member Author

Yeah, that works for me if it works for you.

@gregsfortytwo
Copy link
Member Author

Updated again; I can squash down and try to figure out some user testing if this looks good.

src/osd/OSD.cc Outdated
*/
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) {
epoch_t boot = service.get_boot_epoch();
if (boot && boot <= osdmap->get_epoch()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just condition this on is_active() instead? I think the piece that matters is us communicating iwth other OSDs, which is controlled here:

void OSD::dispatch_context(PG::RecoveryCtx &ctx, PG *pg, OSDMapRef curmap,
                           ThreadPool::TPHandle *handle)
{
  if (service.get_osdmap()->is_up(whoami) &&
      is_active()) {
    do_notifies(*ctx.notify_list, curmap);
    do_queries(*ctx.query_map, curmap);
    do_infos(*ctx.info_map, curmap);
  }

..and that is conditioned on is_active() (and is_up()... although i'm not quite sure when that diverges from is_active()).

I don't think boot epoch is reset to 0 if we get marked down, which means we would abort trying to chew through maps while we were down if the flag is set then, whereas i think we only should crash if we get marked up and it is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the cases where we are marked down and don't update boot_epoch will matter — if sortbitwise gets turned off we want to complain; and if it gets turned back on we'll happily churn through when restarted — but I can make that change. There was some reason not to when doing it in the PG case but it works here.

(We have the double-check in that condition because we can prepublish maps before doing enough processing on them to make sure we're still supposed to be active.)

@yuriw
Copy link
Contributor

yuriw commented Oct 5, 2017

test this please

@yuriw
Copy link
Contributor

yuriw commented Oct 5, 2017

We want to stop working if we get activated while sortbitwise is not set
on the cluster, but we might have old maps where it wasn't if the flag
was changed recently. And doing it in the PG code was a bit silly anyway.

Instead check SORTBITWISE in the main OSDMap handling code prior to
prepublishing it. Let it go through if we aren't active at the time.

Fixes: http://tracker.ceph.com/issues/20416

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo
Copy link
Member Author

Squashed down into one commit for easy backporting.

@yuriw yuriw merged commit 5dc2dea into ceph:master Oct 6, 2017
@smithfarm
Copy link
Contributor

luminous backport PR: #18132

@gregsfortytwo gregsfortytwo deleted the wip-20416-bitwise-assert branch November 28, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants