Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/OSDMap: apply primary affinity when zero affinity was set #49777

Conversation

NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Jan 18, 2023

When zero primary affinity set for one or more of the osds that was primary _apply_primary_affinity can sometimes pick it again as primary, even with zero primary affinity was set and all the other osds in the acting set were not picked as primary.
The fix will set the fallback method to use the first osd as primary and was not set as primary affinity zero

Fixes: https://tracker.ceph.com/issues/44400
Signed-off-by: Nitzan Mordechai nmordech@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

When zero primary affinity set for one or more of the osds that was primary
_apply_primary_affinity can sometimes pick it again as primary, even with
zero primary affinity was set and all the other osds in the acting set were not
picked as primary.
The fix will set the fallback method to use the first osd as primary and was not
set as primary affinity zero

Fixes: https://tracker.ceph.com/issues/44400
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
@NitzanMordhai NitzanMordhai requested a review from a team as a code owner January 18, 2023 06:06
@github-actions github-actions bot added the core label Jan 18, 2023
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

@NitzanMordhai
Copy link
Contributor Author

jenkins test windows

@dvanders
Copy link
Contributor

@NitzanMordhai thanks for the fix!
Could you confirm one thing: if a client that does not have this fix connects to a cluster that has this fix, will that client be able to correctly identify the primary?

@@ -2758,7 +2758,8 @@ void OSDMap::_apply_primary_affinity(ps_t seed,
seed, o) >> 16) >= a) {
// we chose not to use this primary. note it anyway as a
// fallback in case we don't pick anyone else, but keep looking.
if (pos < 0)
// don't use it if affinity is 0, though.
if (pos < 0 && a > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right for me but would prefer to have a review also from @athanatos or @gregsfortytwo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can change this without a feature flag -- it changes the osd mapping for some OSDMaps and during an upgrade OSDs would disagree about the primary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Can we use the reef flag maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more complicated than that — this changes how the OSD will check historical mappings. I'm not sure if we've actually made this kind of change to how crush mappings work before — we just went to straw2 instead of trying to update straw, for instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @gregsfortytwo is right. It's possible that that won't be a problem here because I can't think of a case where we care about which osd was primary for previous mappings. Nevertheless, need to think more carefully about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's actually different than the situation described in the bug report. https://tracker.ceph.com/issues/44400 seems to be specifically about an osd with primary affinity of 0 being marked out resulting in a temporary acting set where up is different from acting and the out osd remains primary of the temporary acting set even with a primary affinity of 0. This patch shouldn't affect that case for the reason I outline above.

The behavior you are describing may be a second bug!

Copy link
Contributor

@athanatos athanatos Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NitzanMordhai I'm actually not sure the behavior you note above is worth fixing.

First, this specific fix is problematic because it may result in selecting no position to be the primary if all osds happen to have a primary affinity of 0, which would be invalid.

More generally, _apply_primary_affinity will only select a weight 0 osd if that osd is first and all other osds rejected the mapping. This is possible with your example because all of the osds other than the weight 0 one have weight .5 -- that's a very odd configuration where every osd is configured to reject some of its primary mappings (100% for osd 1 and 50% each for 0 and 2). For ~1/4 of pgs then, _apply_primary_affinity is going to find that all osds reject being primary (1 * 1/2 * 1/2). It's not really all that much more correct to select 0 or 2 as primary in those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos @gregsfortytwo , thanks a lot for the review. my recreation of the issue was to force the situation, and yes, it looks more complicate to fix now.

Copy link
Contributor

@idryomov idryomov Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the reef flag maybe?

Note that with changes like this the kernel client must be kept in mind and consulted before the change is made. This would avoid mistakes like conditioning client-facing behavior on a field that lives in the OSD section in the osdmap, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov 's point is accurate, the client definitely cares about the primary affinity behavior, so changing it would be very complicated. For the reasons I outlined above, no change is going to be particularly satisfying in the event that none of the osds for a PG have a primary affinity of 1, so I'm going to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants