Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacific: osd: Fix a bunch of stretch peering issues #40129

Merged
merged 10 commits into from
Mar 21, 2021

Conversation

gregsfortytwo
Copy link
Member

A bug report came in on stretch mode; fixing and testing that
resulted in a number of small but very important bug fixes becoming evident.

Backported from (merged) master PR #40049

…ion works

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 3d4db5b)
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 6489d27)
This makes it easy and cheap to call from non-stretch contexts.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit e9185b0)
…vateable

When reviewing, I mistakenly thought we needed to skip a size check in
choose_acting() in case of mismatches between size and bucket counts, but that
is not accurate!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit e19639a)
…very

Happily this is pretty simple: we just need to check that the resulting
wanted set can peer, which we have a function for. Run it before actually
swapping the want and candidate_want sets.

If we're not in stretch mode, this is a cheap function call that
will always return true, so it's equivalent to what we already have for them.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit c5f3cca)
There was a major error here! get_ancestor() was type-deduced to return
a bucket_candidates_t -- a *copy* of what was in the map, not the reference
to it we wanted to actually amend!

Fix this by returning a pointer instead. There's a way to coerce things
to return a reference instead but the syntax seems clumsier to me
and I'm not familiar with it anyway -- this works just fine.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit d092530)
…tch mode

We were adding them once from the acting set, and then once from the all_infos
set, and that hit an assert later on. (I think it was otherwise harmless, but
I don't want to weaken the assert!)

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 3313a8b)
…ctive

I misunderstood and there was a pretty serious error here: to prevent
accidents, choose_acting() was preventing PGs from *finishing* peering
if they didn't satisfy the stretch cluster rules. What we actually want
to do is to finish peering, but not go active.

Happily, this is easy to fix -- we just add a call to stretch_set_can_peer()
alongside existing min_size checks when we choose whether to go PG_STATE_ACTIVE
or PG_STATE_PEERED!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit e534ca5)
We want to add an OSD from the mandatory member if we DON'T already have one!

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit d7fac05)
Use it instead of direct checks against min_size and stretch_set_can_peer()
when deciding whether to go STATE_ACTIVE/STATE_PEERED or do updates
to things like last_epoch_started.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 95bec98)
@liewegas liewegas merged commit 5a8702b into ceph:pacific Mar 21, 2021
@gregsfortytwo gregsfortytwo deleted the pacific-stretch-fixes-2 branch March 23, 2021 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants