-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd/PeeringState: do not exclude up from acting_recovery_backfill #31703
Conversation
@@ -1703,9 +1703,6 @@ void PeeringState::calc_replicated_acting( | |||
acting_backfill->insert(up_cand); | |||
ss << " osd." << i << " (up) accepted " << cur_info << std::endl; | |||
} | |||
if (want->size() >= size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liewegas it's wrong to break here as we still want all up peers go to the acting_backfill set..
retest this please |
It might be clearer to show this as a revert of c3e2990 and a commit with the alternate fix. |
I tried. The conflict of reverting is huge(pg.cc -> peeringstate.cc) :-( |
The code moved to src/osd/PeeringState.cc. You could do the following: |
This reverts commit c3e2990. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
If we choose a primary that does not belong to the current up set, and all up peers are still recoverable, then we might end up excluding some up peer from the acting_recovery_backfill set too due to the "want size <= pool size" constraint (since ceph#24035), as a result of which all up peers might not get recovered in one go. Fix by falling through any oversized want set to async recovery, which should be able to handle it nicely. Fixes: https://tracker.ceph.com/issues/42577 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
de0cb0e
to
22c8cda
Compare
@dzafman Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me and also addresses https://tracker.ceph.com/issues/35924
retest this please |
@tchaikov Thanks! |
If we choose a primary that does not belong to the current up set,
and all up peers are still recoverable, then we might end up excluding
some up peer from the acting_recovery_backfill set too due to the
"want size <= pool size" constraint (since #24035),
as a result of which all up peers might not get recovered in one go.
Fix by falling through any oversized want set to async recovery, which
should be able to handle it nicely.
Fixes: https://tracker.ceph.com/issues/42577
Signed-off-by: xie xingguo xie.xingguo@zte.com.cn
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox