Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fix recovery reservation bugs, and implement remote reservation preemption #18485

Merged
merged 9 commits into from
Oct 25, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Oct 23, 2017

Bugs:

  • re-schedule on recovery->backfill transition so that our priority is lowered
  • respect primary's priority for remote recovery reservation

...and implement remote recovery preemption. With this change we should always be
working on the highest priority recovery, no matter what.

The remaining gap that I see is that there is sometimes work that we could be doing
but aren't because of the ordered primary-then-replicas lock ordering approach.

Signed-off-by: Sage Weil <sage@redhat.com>
This now mirrors the backfill approach (e.g., RequestBackfillPrio).

Signed-off-by: Sage Weil <sage@redhat.com>
This way me match the terminology used by MRecoveryReserve.  It is also
a bit more suggestive of primary->replica action, whereas "cancel" could
mean replica canceling its grant.

Document the meaning in the headers to clarify meaning.

Signed-off-by: Sage Weil <sage@redhat.com>
We were sending REJECT if the replica filled up, and the primary would set
the BACKFILL_TOOFULL state as a result.  Make it an explicit verb for
clarity.

Signed-off-by: Sage Weil <sage@redhat.com>
If we have granted a remote backfill reservation, and a higher priority
request comes in, send a REVOKE message back to the primary and drop the
reservation (allowing the higher-priority reservation to be GRANTed).

We can only do this if the primary is running new code because it must
understand the REVOKE message.

Signed-off-by: Sage Weil <sage@redhat.com>
…ition

This is easier to follow than canceling the reservation in the next state.

Signed-off-by: Sage Weil <sage@redhat.com>
We were keeping our existing recovery reservation slot (with a high
priority) and going straight to waiting for backfill reservations on
the peers.  This is a problem because the reserver thinks we're doing
high priority work when we're actually doing lower-priority backfill.

Fix by closing out our recovery reservation and going to the
WaitLocalBackfillReserved state, where we'll re-request backfill at the
appropriate priority.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

Only some of this can be backported because of the protocol changes. See #18498

GRANT = 1,
RELEASE = 2,
REQUEST = 0, // primary->replica: please reserve slot
GRANT = 1, // replica->primary: ok, i reserved it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: tabs left

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "tabs left" mean?

Signed-off-by: Sage Weil <sage@redhat.com>
…backfill

We may have log recovery *and* backfill to do, but cease to be degraded
as soon as the log recovery portion is done.  If that's the case, clear
the DEGRADED bit so that the PG state is not misleading.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

http://pulpito.ceph.com/sage-2017-10-24_19:15:36-rados-wip-remote-res-preemption-distro-basic-smithi/

only non-noisy failure in there was the bluefs bug fixed by #18503

@@ -11231,6 +11231,9 @@ bool PrimaryLogPG::start_recovery_ops(
if (state_test(PG_STATE_RECOVERING)) {
state_clear(PG_STATE_RECOVERING);
state_clear(PG_STATE_FORCED_RECOVERY);
if (get_osdmap()->get_pg_size(info.pgid.pgid) <= acting.size()) {
state_clear(PG_STATE_DEGRADED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, In the future I'm hoping to base PG_STATE_DEGRADED on the num_objects_degraded count.

Copy link
Contributor

@dzafman dzafman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@liewegas liewegas merged commit 92141dc into ceph:master Oct 25, 2017
@liewegas liewegas deleted the wip-remote-res-preemption branch October 25, 2017 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants