Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: fix SharedBlob unregistration #18805

Merged
merged 1 commit into from Nov 12, 2017
Merged

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Nov 8, 2017

We use the SharedBlobSet remove() in three cases:

  • from SharedBlob::put(), we try to remove ourselves from the set, but
    have to deal with a racing lookup, so the removal is conditional on
    nref still being 0.
  • from split_cache(), we move the SharedBlob to another collection
  • from make_blob_unshared(), we remove the entry when we clear the sbid.

The problem is that the condtiional remove() (for the first case) was being
used for all three cases, and in the second two cases nref is always != 0,
so it doesn't actually happen. This can lead to a crash during cache
shutdown.

Fix by making two variants: remove() that is unconditional, and
try_remove() that is conditional.

Fixes: http://tracker.ceph.com/issues/22039
Signed-off-by: Sage Weil sage@redhat.com

@@ -1670,7 +1670,7 @@ void BlueStore::SharedBlob::put()
<< " removing self from set " << get_parent()
<< dendl;
if (get_parent()) {
if (get_parent()->remove(this)) {
if (get_parent()->try_remove(this)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny thing but you can use NEW remove here instead and get rid off try_remove at all. See if (--nrerf == 0) above...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't take the lock until inside try_remove(), and we're handling the race between one thread dropping the last reference and trying to remove from the man and another thread calling lookup() and bumping the ref from 0 -> 1.

We use the SharedBlobSet remove() in three cases:

- from SharedBlob::put(), we try to remove ourselves from the set, but
  have to deal with a racing lookup, so the removal is conditional on
  nref still being 0.
- from split_cache(), we move the SharedBlob to another collection
- from make_blob_unshared(), we remove the entry when we clear the sbid.

The problem is that the condtiional remove() (for the first case) was being
used for all three cases, and in the second two cases nref is always != 0,
so it doesn't actually happen.  This can lead to a crash during cache
shutdown.

Fix by making two variants: remove() that is unconditional, and
try_remove() that is conditional.

Set the sb->coll pointer after because remove() asserts the parent matches
where we are unregistering.

Fixes: http://tracker.ceph.com/issues/22039
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas liewegas merged commit ef54a05 into ceph:master Nov 12, 2017
@liewegas liewegas deleted the wip-22039 branch November 12, 2017 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants