New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: make the PG's SORTBITWISE assert a more generous shutdown #18047
Conversation
src/osd/PG.cc
Outdated
assert(osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)); | ||
// we don't speak non-sortbitwise, so we'd better not go | ||
// active if they somehow haven't set it! | ||
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think || should be && ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, changing
src/osd/PG.cc
Outdated
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) || | ||
// but we have to be nice in case we're sortbitwise *now* | ||
// and are processing updates from before that happened | ||
(osd->osd->is_active() && osdmap->get_epoch() < osd->get_boot_epoch())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..but this condition is..weird. osdmap is the PG's osdmap, but that may lag, so maybe we might be processing the PG interval after the osd is active but PG is still catching up?
But perhaps it's simpler to just drop this assert entirely! We have one in OSD::activate_map():
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) { derr << __func__ << " SORTBITWISE flag is not set" << dendl; ceph_abort(); }
so I think any check here is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the point here is that we only want to bail out if we're doing work in a PG. So we check for SORTBITWISE missing, AND that we're working in the (global) current interval. Our best local approximation for that involves making sure the OSD is active and that our map is newer than when the OSD went active...
and now I see that this condition is also backwards. No idea what the heck I was doing when I wrote this code. :/ Also changing.
src/osd/PG.cc
Outdated
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE) || | ||
// but we have to be nice in case we're sortbitwise *now* | ||
// and are processing updates from before that happened | ||
(osd->osd->is_active() && osdmap->get_epoch() < osd->get_boot_epoch())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my first pass at this I did just drop the PG assert. I ended up not doing that because both this and the OSD::activate_map() one were inserted as part of one PR from you, and they're accomplishing slightly different things — the broader OSD one is called after the OSDMap gets published, which I think means it can leak out to PGs and get used before the main OSD manages to shut everything down. Not sure if there are other invariants they're covering or not.
d7f7c56
to
89bfe65
Compare
@liewegas, updated. |
how about moving the assert out of activate_map and instead putting it before the call to consume_map()? then we know we won't feed !sortbitwise maps to pgs while we are active: diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index b53ff58177..1561357a0f 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -7758,6 +7758,11 @@ void OSD::_committed_osd_maps(epoch_t first, epoch_t last, MOSDMap *m) check_osdmap_features(store); // yay! + if (is_active() && !osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) { + derr << __func__ << " SORTBITWISE flag is not set" << dendl; + ceph_abort(); + } + consume_map(); if (is_active() || is_waiting_for_healthy()) The pg peering thread asserting basedon the osd active state and epochs just feels very fragile to me... |
Yeah, that works for me if it works for you. |
89bfe65
to
e2a4b37
Compare
Updated again; I can squash down and try to figure out some user testing if this looks good. |
src/osd/OSD.cc
Outdated
*/ | ||
if (!osdmap->test_flag(CEPH_OSDMAP_SORTBITWISE)) { | ||
epoch_t boot = service.get_boot_epoch(); | ||
if (boot && boot <= osdmap->get_epoch()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just condition this on is_active() instead? I think the piece that matters is us communicating iwth other OSDs, which is controlled here:
void OSD::dispatch_context(PG::RecoveryCtx &ctx, PG *pg, OSDMapRef curmap, ThreadPool::TPHandle *handle) { if (service.get_osdmap()->is_up(whoami) && is_active()) { do_notifies(*ctx.notify_list, curmap); do_queries(*ctx.query_map, curmap); do_infos(*ctx.info_map, curmap); }
..and that is conditioned on is_active() (and is_up()... although i'm not quite sure when that diverges from is_active()).
I don't think boot epoch is reset to 0 if we get marked down, which means we would abort trying to chew through maps while we were down if the flag is set then, whereas i think we only should crash if we get marked up and it is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the cases where we are marked down and don't update boot_epoch will matter — if sortbitwise gets turned off we want to complain; and if it gets turned back on we'll happily churn through when restarted — but I can make that change. There was some reason not to when doing it in the PG case but it works here.
(We have the double-check in that condition because we can prepublish maps before doing enough processing on them to make sure we're still supposed to be active.)
e2a4b37
to
595f98f
Compare
test this please |
We want to stop working if we get activated while sortbitwise is not set on the cluster, but we might have old maps where it wasn't if the flag was changed recently. And doing it in the PG code was a bit silly anyway. Instead check SORTBITWISE in the main OSDMap handling code prior to prepublishing it. Let it go through if we aren't active at the time. Fixes: http://tracker.ceph.com/issues/20416 Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Squashed down into one commit for easy backporting. |
595f98f
to
0a691b2
Compare
luminous backport PR: #18132 |
We want to stop working if we get activated while sortbitwise is not set
on the cluster, but we might have old maps where it wasn't if the flag
was changed recently. Instead, if SORTBITWISE is not set, further check
if the OSD is active and the map we are now looking at is newer than
when the OSD booted. Do a more polite shutdown with logged error message
instead of asserting.
Fixes: http://tracker.ceph.com/issues/20416
Signed-off-by: Greg Farnum gfarnum@redhat.com