osd/PG: fix recovery op leak #18524

liewegas · 2017-10-25T03:34:28Z

This fixes a recovery op leak recently intorduced by the recovery preemption
changes and the new DeferRecovery stuff. Or possibly a bit early? Either way, I think this fixes it.

/a/sage-2017-10-24_22:15:33-rados-wip-sage-testing-2017-10-24-1544-distro-basic-smithi/1771206

liewegas · 2017-10-25T03:37:14Z

@dzafman I think may have actually come from e708410, which cleared the waiting_for_backfill but didn't finish_recovery_op(max) if it was non-empty. I made the same mistake ("wth is this finish_recovery_op for? i'll leave it off") when I added the unfound and rejected cases recently.

In any case, I think this patch clears it up?

hjwsm1989 · 2017-10-25T05:54:26Z

we also meet this problem, backfilling state not cleared with rops > 0.

dzafman · 2017-10-25T06:35:42Z

src/osd/PG.cc

  pg->state_set(PG_STATE_BACKFILL_TOOFULL);
-


I'd feel better if we cleared PG_STATE_BACKFILLING here like the other 2 cases. There is a case where we are able to get reservations, but PrimaryLogPG::do_scan() noticed that we've gone over the backfill_full_ratio while backfilling. We might as well.

If you don't want to include that here, I'll put it in later.

updated! also added more detail to the commit msg (when bug was intorduced and then duplicated)

liewegas · 2017-10-26T02:49:57Z

http://pulpito.ceph.com/sage-2017-10-25_20:20:08-rados-wip-sage2-testing-2017-10-25-1347-distro-basic-smithi/

For multiple backfill targets, we start MAX multiple times. Signed-off-by: Sage Weil <sage@redhat.com>

Previously, there was only one time we would end up in this region of code: when the backfill was rejected by the peer. Previously that was apparently reliably when we had an outstanding SCAN request, because we would unconditionally cancle the MAX recovery op and clear waiting_on_backfill. See 624aaf2 for when this code appeared. Now we have several similar paths, and we don't always have an outstanding scan call (I don't think!). Regardless, move most these three cases into a common helper and make the finish_recovery_op completion conditional on whether there is an outstanding SCAN. This fixes a leak of a recovery op when we defer while a scan is outstanding (this bug was recently introduced by e708410 and then duplicated by 2463c64). Note that there is still one other time we register MAX ops: when we are finishing backfill. There, we start one per target. But we will always get back our reply and process it in the normal way (that old commit did not change the timing for these). Signed-off-by: Sage Weil <sage@redhat.com>

…gets If we have multiple targets, we may still be waiting on them when we get a revocation. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas · 2017-10-26T22:48:10Z

http://pulpito.ceph.com/sage-2017-10-26_18:06:33-rados-wip-sage2-testing-2017-10-26-1123-distro-basic-smithi/

liewegas requested a review from dzafman October 25, 2017 03:35

liewegas added bug-fix core labels Oct 25, 2017

liewegas mentioned this pull request Oct 25, 2017

luminous: osd: fix recovery priority and pg state on recovery->backfill transition #18498

Merged

dzafman reviewed Oct 25, 2017

View reviewed changes

dzafman approved these changes Oct 25, 2017

View reviewed changes

liewegas force-pushed the wip-backfill-rops branch from 5014783 to b6dfb20 Compare October 25, 2017 15:38

liewegas added needs-qa wip-sage2-testing labels Oct 25, 2017

liewegas added 2 commits October 25, 2017 21:50

osd/PG: make recovering_oids a multiset

ebb4093

For multiple backfill targets, we start MAX multiple times. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-backfill-rops branch from b6dfb20 to efd1a77 Compare October 26, 2017 02:52

osd/PG: handle remote backfill recovation while waiting for other tar…

8afd4ec

…gets If we have multiple targets, we may still be waiting on them when we get a revocation. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas merged commit 3f6e0b6 into ceph:master Oct 26, 2017

liewegas deleted the wip-backfill-rops branch October 26, 2017 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd/PG: fix recovery op leak #18524

osd/PG: fix recovery op leak #18524

liewegas commented Oct 25, 2017 •

edited

liewegas commented Oct 25, 2017 •

edited

hjwsm1989 commented Oct 25, 2017

dzafman Oct 25, 2017

dzafman Oct 25, 2017

liewegas Oct 25, 2017

liewegas commented Oct 26, 2017

liewegas commented Oct 26, 2017

osd/PG: fix recovery op leak #18524

osd/PG: fix recovery op leak #18524

Conversation

liewegas commented Oct 25, 2017 • edited

liewegas commented Oct 25, 2017 • edited

hjwsm1989 commented Oct 25, 2017

dzafman Oct 25, 2017

Choose a reason for hiding this comment

dzafman Oct 25, 2017

Choose a reason for hiding this comment

liewegas Oct 25, 2017

Choose a reason for hiding this comment

liewegas commented Oct 26, 2017

liewegas commented Oct 26, 2017

liewegas commented Oct 25, 2017 •

edited

liewegas commented Oct 25, 2017 •

edited