Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems right for me but would prefer to have a review also from @athanatos or @gregsfortytwo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can change this without a feature flag -- it changes the osd mapping for some OSDMaps and during an upgrade OSDs would disagree about the primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! Can we use the reef flag maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more complicated than that — this changes how the OSD will check historical mappings. I'm not sure if we've actually made this kind of change to how crush mappings work before — we just went to straw2 instead of trying to update straw, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @gregsfortytwo is right. It's possible that that won't be a problem here because I can't think of a case where we care about which osd was primary for previous mappings. Nevertheless, need to think more carefully about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's actually different than the situation described in the bug report. https://tracker.ceph.com/issues/44400 seems to be specifically about an osd with primary affinity of 0 being marked out resulting in a temporary acting set where up is different from acting and the out osd remains primary of the temporary acting set even with a primary affinity of 0. This patch shouldn't affect that case for the reason I outline above.
The behavior you are describing may be a second bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NitzanMordhai I'm actually not sure the behavior you note above is worth fixing.
First, this specific fix is problematic because it may result in selecting no position to be the primary if all osds happen to have a primary affinity of 0, which would be invalid.
More generally, _apply_primary_affinity will only select a weight 0 osd if that osd is first and all other osds rejected the mapping. This is possible with your example because all of the osds other than the weight 0 one have weight .5 -- that's a very odd configuration where every osd is configured to reject some of its primary mappings (100% for osd 1 and 50% each for 0 and 2). For ~1/4 of pgs then, _apply_primary_affinity is going to find that all osds reject being primary (1 * 1/2 * 1/2). It's not really all that much more correct to select 0 or 2 as primary in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athanatos @gregsfortytwo , thanks a lot for the review. my recreation of the issue was to force the situation, and yes, it looks more complicate to fix now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with changes like this the kernel client must be kept in mind and consulted before the change is made. This would avoid mistakes like conditioning client-facing behavior on a field that lives in the OSD section in the osdmap, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idryomov 's point is accurate, the client definitely cares about the primary affinity behavior, so changing it would be very complicated. For the reasons I outlined above, no change is going to be particularly satisfying in the event that none of the osds for a PG have a primary affinity of 1, so I'm going to close this PR.