Skip to content

Commit

Permalink
osd/PeeringState: do not exclude up from acting_recovery_backfill
Browse files Browse the repository at this point in the history
If we choose a primary that does not belong to the current up set,
and all up peers are still recoverable, then we might end up excluding
some up peer from the acting_recovery_backfill set too due to the
"want size <= pool size" constraint (since #24035),
as a result of which all up peers might not get recovered in one go.

Fix by falling through any oversized want set to async recovery, which
should be able to handle it nicely.

Fixes: https://tracker.ceph.com/issues/42577
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
  • Loading branch information
xiexingguo committed Nov 20, 2019
1 parent 82bb83f commit 22c8cda
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions src/osd/PeeringState.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2077,6 +2077,14 @@ bool PeeringState::choose_acting(pg_shard_t &auth_log_shard_id,
get_osdmap());
}
}
while (want.size() > pool.info.size) {
// async recovery should have taken out as many osds as it can.
// if not, then always evict the last peer
// (will get synchronously recovered later)
psdout(10) << __func__ << " evicting osd." << want.back()
<< " from oversized want " << want << dendl;
want.pop_back();
}
if (want != acting) {
psdout(10) << __func__ << " want " << want << " != acting " << acting
<< ", requesting pg_temp change" << dendl;
Expand Down

0 comments on commit 22c8cda

Please sign in to comment.