-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: fix recovery reservation bugs, and implement remote reservation preemption #18485
Conversation
Signed-off-by: Sage Weil <sage@redhat.com>
This now mirrors the backfill approach (e.g., RequestBackfillPrio). Signed-off-by: Sage Weil <sage@redhat.com>
This way me match the terminology used by MRecoveryReserve. It is also a bit more suggestive of primary->replica action, whereas "cancel" could mean replica canceling its grant. Document the meaning in the headers to clarify meaning. Signed-off-by: Sage Weil <sage@redhat.com>
We were sending REJECT if the replica filled up, and the primary would set the BACKFILL_TOOFULL state as a result. Make it an explicit verb for clarity. Signed-off-by: Sage Weil <sage@redhat.com>
If we have granted a remote backfill reservation, and a higher priority request comes in, send a REVOKE message back to the primary and drop the reservation (allowing the higher-priority reservation to be GRANTed). We can only do this if the primary is running new code because it must understand the REVOKE message. Signed-off-by: Sage Weil <sage@redhat.com>
…ition This is easier to follow than canceling the reservation in the next state. Signed-off-by: Sage Weil <sage@redhat.com>
We were keeping our existing recovery reservation slot (with a high priority) and going straight to waiting for backfill reservations on the peers. This is a problem because the reserver thinks we're doing high priority work when we're actually doing lower-priority backfill. Fix by closing out our recovery reservation and going to the WaitLocalBackfillReserved state, where we'll re-request backfill at the appropriate priority. Signed-off-by: Sage Weil <sage@redhat.com>
f475ebb
to
3e91fda
Compare
Only some of this can be backported because of the protocol changes. See #18498 |
GRANT = 1, | ||
RELEASE = 2, | ||
REQUEST = 0, // primary->replica: please reserve slot | ||
GRANT = 1, // replica->primary: ok, i reserved it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: tabs left
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "tabs left" mean?
Signed-off-by: Sage Weil <sage@redhat.com>
…backfill We may have log recovery *and* backfill to do, but cease to be degraded as soon as the log recovery portion is done. If that's the case, clear the DEGRADED bit so that the PG state is not misleading. Signed-off-by: Sage Weil <sage@redhat.com>
3e91fda
to
2207607
Compare
only non-noisy failure in there was the bluefs bug fixed by #18503 |
@@ -11231,6 +11231,9 @@ bool PrimaryLogPG::start_recovery_ops( | |||
if (state_test(PG_STATE_RECOVERING)) { | |||
state_clear(PG_STATE_RECOVERING); | |||
state_clear(PG_STATE_FORCED_RECOVERY); | |||
if (get_osdmap()->get_pg_size(info.pgid.pgid) <= acting.size()) { | |||
state_clear(PG_STATE_DEGRADED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, In the future I'm hoping to base PG_STATE_DEGRADED on the num_objects_degraded count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Bugs:
...and implement remote recovery preemption. With this change we should always be
working on the highest priority recovery, no matter what.
The remaining gap that I see is that there is sometimes work that we could be doing
but aren't because of the ordered primary-then-replicas lock ordering approach.