osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

liewegas · 2017-10-23T14:01:53Z

Bugs:

re-schedule on recovery->backfill transition so that our priority is lowered
respect primary's priority for remote recovery reservation

...and implement remote recovery preemption. With this change we should always be
working on the highest priority recovery, no matter what.

The remaining gap that I see is that there is sometimes work that we could be doing
but aren't because of the ordered primary-then-replicas lock ordering approach.

Signed-off-by: Sage Weil <sage@redhat.com>

This now mirrors the backfill approach (e.g., RequestBackfillPrio). Signed-off-by: Sage Weil <sage@redhat.com>

This way me match the terminology used by MRecoveryReserve. It is also a bit more suggestive of primary->replica action, whereas "cancel" could mean replica canceling its grant. Document the meaning in the headers to clarify meaning. Signed-off-by: Sage Weil <sage@redhat.com>

We were sending REJECT if the replica filled up, and the primary would set the BACKFILL_TOOFULL state as a result. Make it an explicit verb for clarity. Signed-off-by: Sage Weil <sage@redhat.com>

If we have granted a remote backfill reservation, and a higher priority request comes in, send a REVOKE message back to the primary and drop the reservation (allowing the higher-priority reservation to be GRANTed). We can only do this if the primary is running new code because it must understand the REVOKE message. Signed-off-by: Sage Weil <sage@redhat.com>

…ition This is easier to follow than canceling the reservation in the next state. Signed-off-by: Sage Weil <sage@redhat.com>

We were keeping our existing recovery reservation slot (with a high priority) and going straight to waiting for backfill reservations on the peers. This is a problem because the reserver thinks we're doing high priority work when we're actually doing lower-priority backfill. Fix by closing out our recovery reservation and going to the WaitLocalBackfillReserved state, where we'll re-request backfill at the appropriate priority. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas · 2017-10-23T22:25:35Z

Only some of this can be backported because of the protocol changes. See #18498

code-with-amitk · 2017-10-24T13:18:25Z

src/messages/MRecoveryReserve.h

-    GRANT = 1,
-    RELEASE = 2,
+    REQUEST = 0,   // primary->replica: please reserve slot
+    GRANT = 1,     // replica->primary: ok, i reserved it


NIT: tabs left

what does "tabs left" mean?

Signed-off-by: Sage Weil <sage@redhat.com>

…backfill We may have log recovery *and* backfill to do, but cease to be degraded as soon as the log recovery portion is done. If that's the case, clear the DEGRADED bit so that the PG state is not misleading. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas · 2017-10-25T02:18:47Z

http://pulpito.ceph.com/sage-2017-10-24_19:15:36-rados-wip-remote-res-preemption-distro-basic-smithi/

only non-noisy failure in there was the bluefs bug fixed by #18503

dzafman · 2017-10-25T18:48:12Z

src/osd/PrimaryLogPG.cc

@@ -11231,6 +11231,9 @@ bool PrimaryLogPG::start_recovery_ops(
  if (state_test(PG_STATE_RECOVERING)) {
    state_clear(PG_STATE_RECOVERING);
    state_clear(PG_STATE_FORCED_RECOVERY);
+    if (get_osdmap()->get_pg_size(info.pgid.pgid) <= acting.size()) {
+      state_clear(PG_STATE_DEGRADED);


FYI, In the future I'm hoping to base PG_STATE_DEGRADED on the num_objects_degraded count.

dzafman

Looks good.

liewegas added 7 commits October 22, 2017 14:07

messages/MRecoveryReserve: pass priority to replica

a78dd16

Signed-off-by: Sage Weil <sage@redhat.com>

osd/PG: respect primary's priority for remote recovery reservation

4078102

This now mirrors the backfill approach (e.g., RequestBackfillPrio). Signed-off-by: Sage Weil <sage@redhat.com>

osd/PG: explicit TOOFULL verb for backfill cancellation

0e9dac1

We were sending REJECT if the replica filled up, and the primary would set the BACKFILL_TOOFULL state as a result. Make it an explicit verb for clarity. Signed-off-by: Sage Weil <sage@redhat.com>

osd/PG: move local_reserver recovery cancel to Recovering state trans…

dde0077

…ition This is easier to follow than canceling the reservation in the next state. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas added bug-fix core feature wip-sage-testing labels Oct 23, 2017

liewegas requested a review from gregsfortytwo October 23, 2017 14:02

liewegas added wip-sage2-testing and removed wip-sage-testing labels Oct 23, 2017

liewegas force-pushed the wip-remote-res-preemption branch 2 times, most recently from f475ebb to 3e91fda Compare October 23, 2017 22:14

liewegas requested a review from dzafman October 23, 2017 22:25

code-with-amitk reviewed Oct 24, 2017

View reviewed changes

liewegas added 2 commits October 24, 2017 08:59

osd/PG: simplify Recovering state change

8b93144

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-remote-res-preemption branch from 3e91fda to 2207607 Compare October 24, 2017 13:59

liewegas removed the wip-sage2-testing label Oct 25, 2017

dzafman reviewed Oct 25, 2017

View reviewed changes

dzafman approved these changes Oct 25, 2017

View reviewed changes

liewegas merged commit 92141dc into ceph:master Oct 25, 2017

liewegas deleted the wip-remote-res-preemption branch October 25, 2017 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

liewegas commented Oct 23, 2017 •

edited

Loading

liewegas commented Oct 23, 2017

code-with-amitk Oct 24, 2017

liewegas Oct 25, 2017

liewegas commented Oct 25, 2017

dzafman Oct 25, 2017

dzafman left a comment

osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

Conversation

liewegas commented Oct 23, 2017 • edited Loading

liewegas commented Oct 23, 2017

code-with-amitk Oct 24, 2017

Choose a reason for hiding this comment

liewegas Oct 25, 2017

Choose a reason for hiding this comment

liewegas commented Oct 25, 2017

dzafman Oct 25, 2017

Choose a reason for hiding this comment

dzafman left a comment

Choose a reason for hiding this comment

liewegas commented Oct 23, 2017 •

edited

Loading