Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

merged 9 commits into from Oct 25, 2017


None yet
3 participants
Copy link

commented Oct 23, 2017


  • re-schedule on recovery->backfill transition so that our priority is lowered
  • respect primary's priority for remote recovery reservation

...and implement remote recovery preemption. With this change we should always be
working on the highest priority recovery, no matter what.

The remaining gap that I see is that there is sometimes work that we could be doing
but aren't because of the ordered primary-then-replicas lock ordering approach.

liewegas added some commits Oct 22, 2017

messages/MRecoveryReserve: pass priority to replica
Signed-off-by: Sage Weil <>
osd/PG: respect primary's priority for remote recovery reservation
This now mirrors the backfill approach (e.g., RequestBackfillPrio).

Signed-off-by: Sage Weil <>
messages/MBackfillReserve: rename CANCEL -> RELEASE
This way me match the terminology used by MRecoveryReserve.  It is also
a bit more suggestive of primary->replica action, whereas "cancel" could
mean replica canceling its grant.

Document the meaning in the headers to clarify meaning.

Signed-off-by: Sage Weil <>
osd/PG: explicit TOOFULL verb for backfill cancellation
We were sending REJECT if the replica filled up, and the primary would set
the BACKFILL_TOOFULL state as a result.  Make it an explicit verb for

Signed-off-by: Sage Weil <>
osd/PG: allow preemption of remote backfill reservations
If we have granted a remote backfill reservation, and a higher priority
request comes in, send a REVOKE message back to the primary and drop the
reservation (allowing the higher-priority reservation to be GRANTed).

We can only do this if the primary is running new code because it must
understand the REVOKE message.

Signed-off-by: Sage Weil <>
osd/PG: move local_reserver recovery cancel to Recovering state trans…

This is easier to follow than canceling the reservation in the next state.

Signed-off-by: Sage Weil <>
osd/PG: on recovery done, requeue for backfill
We were keeping our existing recovery reservation slot (with a high
priority) and going straight to waiting for backfill reservations on
the peers.  This is a problem because the reserver thinks we're doing
high priority work when we're actually doing lower-priority backfill.

Fix by closing out our recovery reservation and going to the
WaitLocalBackfillReserved state, where we'll re-request backfill at the
appropriate priority.

Signed-off-by: Sage Weil <>

@liewegas liewegas requested a review from gregsfortytwo Oct 23, 2017

@liewegas liewegas force-pushed the liewegas:wip-remote-res-preemption branch 2 times, most recently from f475ebb to 3e91fda Oct 23, 2017

@liewegas liewegas requested a review from dzafman Oct 23, 2017


This comment has been minimized.

Copy link
Member Author

commented Oct 23, 2017

Only some of this can be backported because of the protocol changes. See #18498

GRANT = 1,
REQUEST = 0, // primary->replica: please reserve slot
GRANT = 1, // replica->primary: ok, i reserved it

This comment has been minimized.

Copy link

amitkumar50 Oct 24, 2017


NIT: tabs left

This comment has been minimized.

Copy link

liewegas Oct 25, 2017

Author Member

what does "tabs left" mean?

liewegas added some commits Oct 23, 2017

osd/PG: simplify Recovering state change
Signed-off-by: Sage Weil <>
osd/PrimaryLogPG: clear DEGRADED at recovery completion even if more …

We may have log recovery *and* backfill to do, but cease to be degraded
as soon as the log recovery portion is done.  If that's the case, clear
the DEGRADED bit so that the PG state is not misleading.

Signed-off-by: Sage Weil <>

@liewegas liewegas force-pushed the liewegas:wip-remote-res-preemption branch from 3e91fda to 2207607 Oct 24, 2017


This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2017

@@ -11231,6 +11231,9 @@ bool PrimaryLogPG::start_recovery_ops(
if (state_test(PG_STATE_RECOVERING)) {
if (get_osdmap()->get_pg_size(info.pgid.pgid) <= acting.size()) {

This comment has been minimized.

Copy link

dzafman Oct 25, 2017


FYI, In the future I'm hoping to base PG_STATE_DEGRADED on the num_objects_degraded count.

Copy link

left a comment

Looks good.

@liewegas liewegas merged commit 92141dc into ceph:master Oct 25, 2017

4 of 5 checks passed

make check make check failed
Docs: build check OK - docs built
Signed-off-by all commits in this PR are signed
Unmodified Submodules submodules for project are unmodified
make check (arm64) make check succeeded

@liewegas liewegas deleted the liewegas:wip-remote-res-preemption branch Oct 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.