osd: Fix a bunch of stretch peering issues #40049

gregsfortytwo · 2021-03-12T00:07:36Z

A bug report came in on stretch mode; fixing and testing that
resulted in a number of small but very important bug fixes becoming evident.

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

…ion works Signed-off-by: Greg Farnum <gfarnum@redhat.com>

Signed-off-by: Greg Farnum <gfarnum@redhat.com>

This makes it easy and cheap to call from non-stretch contexts. Signed-off-by: Greg Farnum <gfarnum@redhat.com>

athanatos · 2021-03-12T00:45:52Z

src/osd/osd_types.h

@@ -1517,10 +1517,11 @@ struct pg_pool_t {
    return peering_crush_bucket_count != 0;
  }

-  bool stretch_set_can_peer(const set<int>& want, const OSDMap& osdmap,
+  bool stretch_set_can_peer(const set<int>& want, const OSDMap *osdmap,


Why this change? It appears that osdmap is assumed to be non-null, so normally a ref would be the order of the day?

There was something in a dropped commit that made it easier this way, but that's not true any more and you're right. Changed!

athanatos · 2021-03-12T00:52:40Z

src/osd/PeeringState.cc

-               << " from oversized want " << want << dendl;
-    want.pop_back();
+  if (!pool.info.is_stretch_pool()) {
+    while (want.size() > pool.info.size) {


I don't really understand this. This is here because we choose to leave the extra want set osds in until async recovery makes its selections (22c8cda). Is there something later on that handles this if we calc_replicated_acting_stretch includes extra osds? Can calc_replicated_acting_stretch include extra osds?

Ah, you are right! I misunderstood what was going on here when I made that change and should have looked more closely. I removed it and replaced it with a more verbose comment on the logic.

athanatos · 2021-03-12T01:05:10Z

src/osd/PeeringState.cc

@@ -2651,7 +2650,8 @@ void PeeringState::activate(

  if (is_primary()) {
    // only update primary last_epoch_started if we will go active
-    if (acting.size() >= pool.info.min_size) {
+    if ((acting.size() >= pool.info.min_size) &&
+	 pool.info.stretch_set_can_peer(acting, get_osdmap().get(), NULL)) {


The repeated acting.size() >= pool.info.min_size was already not great. With the added stretch_set_can_peer condition, I'd like some kind of helper, maybe acting_set_writeable?

I thought about this but it's still more lines of code to do a helper function than to condense down the three places we check.

Lines of code is not the concern. The concern is that we'll fail to update one of these if we have to modify a condition. I really do want a helper here.

Fair enough.

…vateable When reviewing, I mistakenly thought we needed to skip a size check in choose_acting() in case of mismatches between size and bucket counts, but that is not accurate! Signed-off-by: Greg Farnum <gfarnum@redhat.com>

…very Happily this is pretty simple: we just need to check that the resulting wanted set can peer, which we have a function for. Run it before actually swapping the want and candidate_want sets. If we're not in stretch mode, this is a cheap function call that will always return true, so it's equivalent to what we already have for them. Signed-off-by: Greg Farnum <gfarnum@redhat.com>

There was a major error here! get_ancestor() was type-deduced to return a bucket_candidates_t -- a *copy* of what was in the map, not the reference to it we wanted to actually amend! Fix this by returning a pointer instead. There's a way to coerce things to return a reference instead but the syntax seems clumsier to me and I'm not familiar with it anyway -- this works just fine. Signed-off-by: Greg Farnum <gfarnum@redhat.com>

…tch mode We were adding them once from the acting set, and then once from the all_infos set, and that hit an assert later on. (I think it was otherwise harmless, but I don't want to weaken the assert!) Signed-off-by: Greg Farnum <gfarnum@redhat.com>

athanatos · 2021-03-12T21:42:10Z

@gregsfortytwo I think you still need to push the new branch?

…ctive I misunderstood and there was a pretty serious error here: to prevent accidents, choose_acting() was preventing PGs from *finishing* peering if they didn't satisfy the stretch cluster rules. What we actually want to do is to finish peering, but not go active. Happily, this is easy to fix -- we just add a call to stretch_set_can_peer() alongside existing min_size checks when we choose whether to go PG_STATE_ACTIVE or PG_STATE_PEERED! Signed-off-by: Greg Farnum <gfarnum@redhat.com>

We want to add an OSD from the mandatory member if we DON'T already have one! Signed-off-by: Greg Farnum <gfarnum@redhat.com>

Use it instead of direct checks against min_size and stretch_set_can_peer() when deciding whether to go STATE_ACTIVE/STATE_PEERED or do updates to things like last_epoch_started. Signed-off-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo · 2021-03-12T22:43:15Z

@athanatos pushed now; in addition to your comments note the extra patch swapping the boolean direction on adding a mandatory member.

gregsfortytwo · 2021-03-15T08:10:45Z

https://pulpito.ceph.com/gregf-2021-03-13_08:24:54-rados-wip-stretch-fixes-312-distro-basic-gibba/

I didn't filter anything out, and between some messy jobs and a short-lived bug in master there were 34 failures and 4 dead jobs against 484 passes, but they all have a pretty clear fault that isn't related to the changes here.

5961449: Certificate verification failed (Suse certificate verification failure)
5961455: The script only supports CentOS
5961488: iscsi not valid: iscsi not one of 'true', 'false'
5961497: Certificate verification failed
5961504: Certificate verification failed
5961511: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961512: This is getting an Aborted signal in the bstore_kv_sync thread
5961548: iscsi not valid: iscsi not one of 'true', 'false'
5961564: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961585: The script only supports CentOS
5961587: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961602: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961613: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961615: iscsi not valid: iscsi not one of 'true', 'false'
5961628: This fails on “Scrubbing terminated -- not all pgs were active and clean.”, but the ceph.log shows it was in “1 active+remapped+backfill_wait+backfill_toofull”
5961649: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961663: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961672: iscsi not valid: iscsi not one of 'true', 'false'
5961681: Failed test_cephfs_mirror
5961718: The script only supports CentOS
5961723: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961756: iscsi not valid: iscsi not one of 'true', 'false'
5961777: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961806: iscsi not valid: iscsi not one of 'true', 'false'
5961833: The script only supports CentOS
5961873: iscsi not valid: iscsi not one of 'true', 'false'
5961881: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961887: mount.nfs: Protocol not supported; ERROR: test_create_and_delete_export (tasks.cephfs.test_nfs.TestNFS)
5961891: This fails on “Scrubbing terminated -- not all pgs were active and clean.” at 00:40:43.341, but looking at ceph.log it shows PGs went from 1 remapped to all active+clean at 00:40:41.668994
5961933: https://tracker.ceph.com/issues/49726 FAILED ceph_assert(!version || comp->get_version64() == version)
5961940: iscsi not valid: iscsi not one of 'true', 'false'
5961945: not sure exactly what this is, but it’s something to do with setup in cephadm
5961949: AssertionError: machine gibba037.front.sepia.ceph.com is locked by scheduled_teuthology@teuthology, not scheduled_gregf@teuthology
Dead jobs:
5961645: Maybe osd.5 crashed, but I can’t tell because we don’t have logs? :( Something went wrong for the supervisor as well so I think it was just bad luck all around
5961670: This just failed during reimage
5961830: Failed on setup; “SSH connection to gibba016 was lost”
5961831: Failed on setup; “raise AnsibleFailedError(failures)”

Will merge tomorrow once I've figured out how to do proper backports for Pacific and my downstream. 😆

gregsfortytwo added 3 commits March 11, 2021 05:39

script: set_up_stretch_mode: include OSDs in root=default so pg creat…

3d4db5b

…ion works Signed-off-by: Greg Farnum <gfarnum@redhat.com>

scripts: some additions to help with local testing

6489d27

Signed-off-by: Greg Farnum <gfarnum@redhat.com>

osd: check for is_stretch_pool() in stretch_set_can_peer()

e9185b0

This makes it easy and cheap to call from non-stretch contexts. Signed-off-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo added bug-fix core needs-qa needs-review labels Mar 12, 2021

gregsfortytwo requested review from athanatos, jdurgin and neha-ojha March 12, 2021 00:07

athanatos reviewed Mar 12, 2021

View reviewed changes

gregsfortytwo added 4 commits March 12, 2021 20:56

gregsfortytwo added 3 commits March 12, 2021 21:51

osd: PeeringState: fix a boolean conditional direction

d7fac05

We want to add an OSD from the mandatory member if we DON'T already have one! Signed-off-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo force-pushed the wip-stretch-fixes-2 branch from 38045b4 to 95bec98 Compare March 12, 2021 22:42

athanatos self-requested a review March 12, 2021 23:05

athanatos approved these changes Mar 12, 2021

View reviewed changes

gregsfortytwo merged commit b788fc1 into ceph:master Mar 15, 2021

gregsfortytwo mentioned this pull request Mar 15, 2021

pacific: osd: Fix a bunch of stretch peering issues #40129

Merged

gregsfortytwo deleted the wip-stretch-fixes-2 branch March 15, 2021 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: Fix a bunch of stretch peering issues #40049

osd: Fix a bunch of stretch peering issues #40049

gregsfortytwo commented Mar 12, 2021

athanatos Mar 12, 2021

gregsfortytwo Mar 12, 2021

athanatos Mar 12, 2021

gregsfortytwo Mar 12, 2021

athanatos Mar 12, 2021

gregsfortytwo Mar 12, 2021

athanatos Mar 12, 2021

gregsfortytwo Mar 12, 2021

athanatos commented Mar 12, 2021

gregsfortytwo commented Mar 12, 2021

gregsfortytwo commented Mar 15, 2021

osd: Fix a bunch of stretch peering issues #40049

osd: Fix a bunch of stretch peering issues #40049

Conversation

gregsfortytwo commented Mar 12, 2021

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

athanatos commented Mar 12, 2021

gregsfortytwo commented Mar 12, 2021

gregsfortytwo commented Mar 15, 2021