New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: Fix a bunch of stretch peering issues #40049
osd: Fix a bunch of stretch peering issues #40049
Conversation
…ion works Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
This makes it easy and cheap to call from non-stretch contexts. Signed-off-by: Greg Farnum <gfarnum@redhat.com>
src/osd/osd_types.h
Outdated
@@ -1517,10 +1517,11 @@ struct pg_pool_t { | |||
return peering_crush_bucket_count != 0; | |||
} | |||
|
|||
bool stretch_set_can_peer(const set<int>& want, const OSDMap& osdmap, | |||
bool stretch_set_can_peer(const set<int>& want, const OSDMap *osdmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? It appears that osdmap is assumed to be non-null, so normally a ref would be the order of the day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was something in a dropped commit that made it easier this way, but that's not true any more and you're right. Changed!
src/osd/PeeringState.cc
Outdated
<< " from oversized want " << want << dendl; | ||
want.pop_back(); | ||
if (!pool.info.is_stretch_pool()) { | ||
while (want.size() > pool.info.size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this. This is here because we choose to leave the extra want set osds in until async recovery makes its selections (22c8cda). Is there something later on that handles this if we calc_replicated_acting_stretch includes extra osds? Can calc_replicated_acting_stretch include extra osds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you are right! I misunderstood what was going on here when I made that change and should have looked more closely. I removed it and replaced it with a more verbose comment on the logic.
src/osd/PeeringState.cc
Outdated
@@ -2651,7 +2650,8 @@ void PeeringState::activate( | |||
|
|||
if (is_primary()) { | |||
// only update primary last_epoch_started if we will go active | |||
if (acting.size() >= pool.info.min_size) { | |||
if ((acting.size() >= pool.info.min_size) && | |||
pool.info.stretch_set_can_peer(acting, get_osdmap().get(), NULL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repeated acting.size() >= pool.info.min_size was already not great. With the added stretch_set_can_peer condition, I'd like some kind of helper, maybe acting_set_writeable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but it's still more lines of code to do a helper function than to condense down the three places we check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines of code is not the concern. The concern is that we'll fail to update one of these if we have to modify a condition. I really do want a helper here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
…vateable When reviewing, I mistakenly thought we needed to skip a size check in choose_acting() in case of mismatches between size and bucket counts, but that is not accurate! Signed-off-by: Greg Farnum <gfarnum@redhat.com>
…very Happily this is pretty simple: we just need to check that the resulting wanted set can peer, which we have a function for. Run it before actually swapping the want and candidate_want sets. If we're not in stretch mode, this is a cheap function call that will always return true, so it's equivalent to what we already have for them. Signed-off-by: Greg Farnum <gfarnum@redhat.com>
There was a major error here! get_ancestor() was type-deduced to return a bucket_candidates_t -- a *copy* of what was in the map, not the reference to it we wanted to actually amend! Fix this by returning a pointer instead. There's a way to coerce things to return a reference instead but the syntax seems clumsier to me and I'm not familiar with it anyway -- this works just fine. Signed-off-by: Greg Farnum <gfarnum@redhat.com>
…tch mode We were adding them once from the acting set, and then once from the all_infos set, and that hit an assert later on. (I think it was otherwise harmless, but I don't want to weaken the assert!) Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo I think you still need to push the new branch? |
…ctive I misunderstood and there was a pretty serious error here: to prevent accidents, choose_acting() was preventing PGs from *finishing* peering if they didn't satisfy the stretch cluster rules. What we actually want to do is to finish peering, but not go active. Happily, this is easy to fix -- we just add a call to stretch_set_can_peer() alongside existing min_size checks when we choose whether to go PG_STATE_ACTIVE or PG_STATE_PEERED! Signed-off-by: Greg Farnum <gfarnum@redhat.com>
We want to add an OSD from the mandatory member if we DON'T already have one! Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Use it instead of direct checks against min_size and stretch_set_can_peer() when deciding whether to go STATE_ACTIVE/STATE_PEERED or do updates to things like last_epoch_started. Signed-off-by: Greg Farnum <gfarnum@redhat.com>
38045b4
to
95bec98
Compare
@athanatos pushed now; in addition to your comments note the extra patch swapping the boolean direction on adding a mandatory member. |
https://pulpito.ceph.com/gregf-2021-03-13_08:24:54-rados-wip-stretch-fixes-312-distro-basic-gibba/ I didn't filter anything out, and between some messy jobs and a short-lived bug in master there were 34 failures and 4 dead jobs against 484 passes, but they all have a pretty clear fault that isn't related to the changes here. 5961449: Certificate verification failed (Suse certificate verification failure) Will merge tomorrow once I've figured out how to do proper backports for Pacific and my downstream. 😆 |
A bug report came in on stretch mode; fixing and testing that
resulted in a number of small but very important bug fixes becoming evident.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox