Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/OSDMap: apply primary affinity when zero affinity was set #49777

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/osd/OSDMap.cc
Expand Up @@ -2758,7 +2758,8 @@ void OSDMap::_apply_primary_affinity(ps_t seed,
seed, o) >> 16) >= a) {
// we chose not to use this primary. note it anyway as a
// fallback in case we don't pick anyone else, but keep looking.
if (pos < 0)
// don't use it if affinity is 0, though.
if (pos < 0 && a > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right for me but would prefer to have a review also from @athanatos or @gregsfortytwo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can change this without a feature flag -- it changes the osd mapping for some OSDMaps and during an upgrade OSDs would disagree about the primary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Can we use the reef flag maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more complicated than that — this changes how the OSD will check historical mappings. I'm not sure if we've actually made this kind of change to how crush mappings work before — we just went to straw2 instead of trying to update straw, for instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @gregsfortytwo is right. It's possible that that won't be a problem here because I can't think of a case where we care about which osd was primary for previous mappings. Nevertheless, need to think more carefully about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's actually different than the situation described in the bug report. https://tracker.ceph.com/issues/44400 seems to be specifically about an osd with primary affinity of 0 being marked out resulting in a temporary acting set where up is different from acting and the out osd remains primary of the temporary acting set even with a primary affinity of 0. This patch shouldn't affect that case for the reason I outline above.

The behavior you are describing may be a second bug!

Copy link
Contributor

@athanatos athanatos Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NitzanMordhai I'm actually not sure the behavior you note above is worth fixing.

First, this specific fix is problematic because it may result in selecting no position to be the primary if all osds happen to have a primary affinity of 0, which would be invalid.

More generally, _apply_primary_affinity will only select a weight 0 osd if that osd is first and all other osds rejected the mapping. This is possible with your example because all of the osds other than the weight 0 one have weight .5 -- that's a very odd configuration where every osd is configured to reject some of its primary mappings (100% for osd 1 and 50% each for 0 and 2). For ~1/4 of pgs then, _apply_primary_affinity is going to find that all osds reject being primary (1 * 1/2 * 1/2). It's not really all that much more correct to select 0 or 2 as primary in those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos @gregsfortytwo , thanks a lot for the review. my recreation of the issue was to force the situation, and yes, it looks more complicate to fix now.

Copy link
Contributor

@idryomov idryomov Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the reef flag maybe?

Note that with changes like this the kernel client must be kept in mind and consulted before the change is made. This would avoid mistakes like conditioning client-facing behavior on a field that lives in the OSD section in the osdmap, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov 's point is accurate, the client definitely cares about the primary affinity behavior, so changing it would be very complicated. For the reasons I outlined above, no change is going to be particularly satisfying in the event that none of the osds for a PG have a primary affinity of 1, so I'm going to close this PR.

pos = i;
} else {
pos = i;
Expand Down