New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd/OSDMap: apply primary affinity when zero affinity was set #49777
osd/OSDMap: apply primary affinity when zero affinity was set #49777
Conversation
When zero primary affinity set for one or more of the osds that was primary _apply_primary_affinity can sometimes pick it again as primary, even with zero primary affinity was set and all the other osds in the acting set were not picked as primary. The fix will set the fallback method to use the first osd as primary and was not set as primary affinity zero Fixes: https://tracker.ceph.com/issues/44400 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
jenkins test make check |
jenkins test windows |
@NitzanMordhai thanks for the fix! |
@@ -2758,7 +2758,8 @@ void OSDMap::_apply_primary_affinity(ps_t seed, | |||
seed, o) >> 16) >= a) { | |||
// we chose not to use this primary. note it anyway as a | |||
// fallback in case we don't pick anyone else, but keep looking. | |||
if (pos < 0) | |||
// don't use it if affinity is 0, though. | |||
if (pos < 0 && a > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems right for me but would prefer to have a review also from @athanatos or @gregsfortytwo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can change this without a feature flag -- it changes the osd mapping for some OSDMaps and during an upgrade OSDs would disagree about the primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! Can we use the reef flag maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more complicated than that — this changes how the OSD will check historical mappings. I'm not sure if we've actually made this kind of change to how crush mappings work before — we just went to straw2 instead of trying to update straw, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @gregsfortytwo is right. It's possible that that won't be a problem here because I can't think of a case where we care about which osd was primary for previous mappings. Nevertheless, need to think more carefully about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's actually different than the situation described in the bug report. https://tracker.ceph.com/issues/44400 seems to be specifically about an osd with primary affinity of 0 being marked out resulting in a temporary acting set where up is different from acting and the out osd remains primary of the temporary acting set even with a primary affinity of 0. This patch shouldn't affect that case for the reason I outline above.
The behavior you are describing may be a second bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NitzanMordhai I'm actually not sure the behavior you note above is worth fixing.
First, this specific fix is problematic because it may result in selecting no position to be the primary if all osds happen to have a primary affinity of 0, which would be invalid.
More generally, _apply_primary_affinity will only select a weight 0 osd if that osd is first and all other osds rejected the mapping. This is possible with your example because all of the osds other than the weight 0 one have weight .5 -- that's a very odd configuration where every osd is configured to reject some of its primary mappings (100% for osd 1 and 50% each for 0 and 2). For ~1/4 of pgs then, _apply_primary_affinity is going to find that all osds reject being primary (1 * 1/2 * 1/2). It's not really all that much more correct to select 0 or 2 as primary in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athanatos @gregsfortytwo , thanks a lot for the review. my recreation of the issue was to force the situation, and yes, it looks more complicate to fix now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the reef flag maybe?
Note that with changes like this the kernel client must be kept in mind and consulted before the change is made. This would avoid mistakes like conditioning client-facing behavior on a field that lives in the OSD section in the osdmap, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idryomov 's point is accurate, the client definitely cares about the primary affinity behavior, so changing it would be very complicated. For the reasons I outlined above, no change is going to be particularly satisfying in the event that none of the osds for a PG have a primary affinity of 1, so I'm going to close this PR.
When zero primary affinity set for one or more of the osds that was primary _apply_primary_affinity can sometimes pick it again as primary, even with zero primary affinity was set and all the other osds in the acting set were not picked as primary.
The fix will set the fallback method to use the first osd as primary and was not set as primary affinity zero
Fixes: https://tracker.ceph.com/issues/44400
Signed-off-by: Nitzan Mordechai nmordech@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows