Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Fix a bunch of stretch peering issues #40049

Merged
merged 10 commits into from Mar 15, 2021

Conversation

gregsfortytwo
Copy link
Member

A bug report came in on stretch mode; fixing and testing that
resulted in a number of small but very important bug fixes becoming evident.

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

…ion works

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
This makes it easy and cheap to call from non-stretch contexts.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@@ -1517,10 +1517,11 @@ struct pg_pool_t {
return peering_crush_bucket_count != 0;
}

bool stretch_set_can_peer(const set<int>& want, const OSDMap& osdmap,
bool stretch_set_can_peer(const set<int>& want, const OSDMap *osdmap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? It appears that osdmap is assumed to be non-null, so normally a ref would be the order of the day?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was something in a dropped commit that made it easier this way, but that's not true any more and you're right. Changed!

<< " from oversized want " << want << dendl;
want.pop_back();
if (!pool.info.is_stretch_pool()) {
while (want.size() > pool.info.size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this. This is here because we choose to leave the extra want set osds in until async recovery makes its selections (22c8cda). Is there something later on that handles this if we calc_replicated_acting_stretch includes extra osds? Can calc_replicated_acting_stretch include extra osds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right! I misunderstood what was going on here when I made that change and should have looked more closely. I removed it and replaced it with a more verbose comment on the logic.

@@ -2651,7 +2650,8 @@ void PeeringState::activate(

if (is_primary()) {
// only update primary last_epoch_started if we will go active
if (acting.size() >= pool.info.min_size) {
if ((acting.size() >= pool.info.min_size) &&
pool.info.stretch_set_can_peer(acting, get_osdmap().get(), NULL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repeated acting.size() >= pool.info.min_size was already not great. With the added stretch_set_can_peer condition, I'd like some kind of helper, maybe acting_set_writeable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this but it's still more lines of code to do a helper function than to condense down the three places we check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines of code is not the concern. The concern is that we'll fail to update one of these if we have to modify a condition. I really do want a helper here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

…vateable

When reviewing, I mistakenly thought we needed to skip a size check in
choose_acting() in case of mismatches between size and bucket counts, but that
is not accurate!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
…very

Happily this is pretty simple: we just need to check that the resulting
wanted set can peer, which we have a function for. Run it before actually
swapping the want and candidate_want sets.

If we're not in stretch mode, this is a cheap function call that
will always return true, so it's equivalent to what we already have for them.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
There was a major error here! get_ancestor() was type-deduced to return
a bucket_candidates_t -- a *copy* of what was in the map, not the reference
to it we wanted to actually amend!

Fix this by returning a pointer instead. There's a way to coerce things
to return a reference instead but the syntax seems clumsier to me
and I'm not familiar with it anyway -- this works just fine.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
…tch mode

We were adding them once from the acting set, and then once from the all_infos
set, and that hit an assert later on. (I think it was otherwise harmless, but
I don't want to weaken the assert!)

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@athanatos
Copy link
Contributor

@gregsfortytwo I think you still need to push the new branch?

…ctive

I misunderstood and there was a pretty serious error here: to prevent
accidents, choose_acting() was preventing PGs from *finishing* peering
if they didn't satisfy the stretch cluster rules. What we actually want
to do is to finish peering, but not go active.

Happily, this is easy to fix -- we just add a call to stretch_set_can_peer()
alongside existing min_size checks when we choose whether to go PG_STATE_ACTIVE
or PG_STATE_PEERED!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
We want to add an OSD from the mandatory member if we DON'T already have one!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Use it instead of direct checks against min_size and stretch_set_can_peer()
when deciding whether to go STATE_ACTIVE/STATE_PEERED or do updates
to things like last_epoch_started.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@gregsfortytwo
Copy link
Member Author

@athanatos pushed now; in addition to your comments note the extra patch swapping the boolean direction on adding a mandatory member.

@athanatos athanatos self-requested a review March 12, 2021 23:05
@gregsfortytwo
Copy link
Member Author

https://pulpito.ceph.com/gregf-2021-03-13_08:24:54-rados-wip-stretch-fixes-312-distro-basic-gibba/

I didn't filter anything out, and between some messy jobs and a short-lived bug in master there were 34 failures and 4 dead jobs against 484 passes, but they all have a pretty clear fault that isn't related to the changes here.

5961449: Certificate verification failed (Suse certificate verification failure)
5961455: The script only supports CentOS
5961488: iscsi not valid: iscsi not one of 'true', 'false'
5961497: Certificate verification failed
5961504: Certificate verification failed
5961511: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961512: This is getting an Aborted signal in the bstore_kv_sync thread
5961548: iscsi not valid: iscsi not one of 'true', 'false'
5961564: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961585: The script only supports CentOS
5961587: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961602: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961613: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961615: iscsi not valid: iscsi not one of 'true', 'false'
5961628: This fails on “Scrubbing terminated -- not all pgs were active and clean.”, but the ceph.log shows it was in “1 active+remapped+backfill_wait+backfill_toofull”
5961649: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961663: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961672: iscsi not valid: iscsi not one of 'true', 'false'
5961681: Failed test_cephfs_mirror
5961718: The script only supports CentOS
5961723: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961756: iscsi not valid: iscsi not one of 'true', 'false'
5961777: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961806: iscsi not valid: iscsi not one of 'true', 'false'
5961833: The script only supports CentOS
5961873: iscsi not valid: iscsi not one of 'true', 'false'
5961881: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961887: mount.nfs: Protocol not supported; ERROR: test_create_and_delete_export (tasks.cephfs.test_nfs.TestNFS)
5961891: This fails on “Scrubbing terminated -- not all pgs were active and clean.” at 00:40:43.341, but looking at ceph.log it shows PGs went from 1 remapped to all active+clean at 00:40:41.668994
5961933: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961940: iscsi not valid: iscsi not one of 'true', 'false'
5961945: not sure exactly what this is, but it’s something to do with setup in cephadm
5961949: AssertionError: machine gibba037.front.sepia.ceph.com is locked by scheduled_teuthology@teuthology, not scheduled_gregf@teuthology
Dead jobs:
5961645: Maybe osd.5 crashed, but I can’t tell because we don’t have logs? :( Something went wrong for the supervisor as well so I think it was just bad luck all around
5961670: This just failed during reimage
5961830: Failed on setup; “SSH connection to gibba016 was lost”
5961831: Failed on setup; “raise AnsibleFailedError(failures)”

Will merge tomorrow once I've figured out how to do proper backports for Pacific and my downstream. 😆

@gregsfortytwo gregsfortytwo merged commit b788fc1 into ceph:master Mar 15, 2021
@gregsfortytwo gregsfortytwo deleted the wip-stretch-fixes-2 branch March 15, 2021 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants