Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PrimaryLogPG: defer evict if head *or* object intersect scrub interval #21628

Merged
merged 1 commit into from Apr 25, 2018

Conversation

Projects
None yet
2 participants
@liewegas
Copy link
Member

liewegas commented Apr 24, 2018

Consider a scenario like:

  • scrub [3:2525d100:::earlier:head,3:2525d12f:::foo:200]
  • we see 3:2525d12f:::foo:100 and include it in scrub map
  • scrub [3:2525d12f:::foo:200, 3:2525dfff:::later:head]
  • some op(s) that cause scrub to be preempted
  • agent_work wants to evict 3:2525d12f:::foo:100
    • write_blocked_by_scrub sees scrub is preempted, returns false
    • 3:2525d12f:::foo:100 is removed, :head SnapSet is updated
  • scrub rescrubs [3:2525d12f:::foo:200, 3:2525dfff:::later:head]
    • includes (updated) :head SnapSet
    • issues error like "3:2525d12f:::foo:100 is an unexpected clone"

Fix the problem by checking if anything part of the object-to-evict and
its head touch the scrub range; if so, back off. Do not let eviction
preempt scrub; we can come back and do it later.

Fixes: http://tracker.ceph.com/issues/23646
Signed-off-by: Sage Weil sage@redhat.com

@liewegas liewegas requested review from dzafman and jdurgin Apr 24, 2018

@liewegas liewegas added this to the mimic milestone Apr 24, 2018

@liewegas liewegas force-pushed the liewegas:wip-23646 branch from f55cd39 to 42070f6 Apr 24, 2018

@@ -5205,6 +5205,12 @@ bool PG::write_blocked_by_scrub(const hobject_t& soid)
return true;
}

bool PG::range_intersects_scrub(const hobject_t &start, const hobject_t& end)
{
return (start <= scrubber.end &&

This comment has been minimized.

Copy link
@dzafman

dzafman Apr 24, 2018

Member

I think it should be start < scrubber.end because scrubber.end is NOT part of the range.

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 24, 2018

Author Member

yes--updated!

osd/PrimaryLogPG: defer evict if head *or* object intersect scrub int…
…erval

Consider a scenario like:
- scrub [3:2525d100:::earlier:head,3:2525d12f:::foo:200]
 - we see 3:2525d12f:::foo:100 and include it in scrub map
- scrub [3:2525d12f:::foo:200, 3:2525dfff:::later:head]
- some op(s) that cause scrub to be preempted
- agent_work wants to evict 3:2525d12f:::foo:100
  - write_blocked_by_scrub sees scrub is preempted, returns false
  - 3:2525d12f:::foo:100 is removed, :head SnapSet is updated
- scrub rescrubs [3:2525d12f:::foo:200, 3:2525dfff:::later:head]
  - includes (updated) :head SnapSet
  - issues error like "3:2525d12f:::foo:100 is an unexpected clone"

Fix the problem by checking if anything part of the object-to-evict and
its head touch the scrub range; if so, back off.  Do not let eviction
preempt scrub; we can come back and do it later.

Fixes: http://tracker.ceph.com/issues/23646
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-23646 branch from 42070f6 to c20a95b Apr 24, 2018

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Apr 24, 2018

This is based on David's theory that it's just the clone removals that are causing the problem, which I think is right! New clones only appear just to the left of head.

Hmm, one exception might be a clone promotion, but IIRC those are already present in the SnapSet, and scrub tolerates them not being there in cache pools, so no scrub impact.

@liewegas liewegas added the needs-qa label Apr 24, 2018

@dzafman

This comment has been minimized.

Copy link
Member

dzafman commented Apr 24, 2018

Actually, my theory is that a head removal caused a false scrub error in a cache pool because the clone was already seen and waiting for head to process in scrub_snapshot_metadata().

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Apr 25, 2018

Hmm, so there would be two patterns:

  • the head SnapSet is updated to not include a previously scrubbed clone
  • the head is removed entirely

Both would have a similar effect. I think the second is harder to hit because the head won't be removed until the clones are, so you'd have to scrub the clone, then remove it, then also remove the head. Can we even scrub and snap trim at the same time? Seems like we shouldn't.

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Apr 25, 2018

In any case, the other place we remove objects after checking for scrub is hit_set_remove, and we never have clones of those objects, so I think this fix is good.

http://pulpito.ceph.com/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/

@liewegas liewegas merged commit 60958b3 into ceph:master Apr 25, 2018

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@liewegas liewegas deleted the liewegas:wip-23646 branch Apr 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.