New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore: fix SharedBlob unregistration #18805
Conversation
@@ -1670,7 +1670,7 @@ void BlueStore::SharedBlob::put() | |||
<< " removing self from set " << get_parent() | |||
<< dendl; | |||
if (get_parent()) { | |||
if (get_parent()->remove(this)) { | |||
if (get_parent()->try_remove(this)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny thing but you can use NEW remove here instead and get rid off try_remove at all. See if (--nrerf == 0) above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't take the lock until inside try_remove(), and we're handling the race between one thread dropping the last reference and trying to remove from the man and another thread calling lookup() and bumping the ref from 0 -> 1.
We use the SharedBlobSet remove() in three cases: - from SharedBlob::put(), we try to remove ourselves from the set, but have to deal with a racing lookup, so the removal is conditional on nref still being 0. - from split_cache(), we move the SharedBlob to another collection - from make_blob_unshared(), we remove the entry when we clear the sbid. The problem is that the condtiional remove() (for the first case) was being used for all three cases, and in the second two cases nref is always != 0, so it doesn't actually happen. This can lead to a crash during cache shutdown. Fix by making two variants: remove() that is unconditional, and try_remove() that is conditional. Set the sb->coll pointer after because remove() asserts the parent matches where we are unregistering. Fixes: http://tracker.ceph.com/issues/22039 Signed-off-by: Sage Weil <sage@redhat.com>
We use the SharedBlobSet remove() in three cases:
have to deal with a racing lookup, so the removal is conditional on
nref still being 0.
The problem is that the condtiional remove() (for the first case) was being
used for all three cases, and in the second two cases nref is always != 0,
so it doesn't actually happen. This can lead to a crash during cache
shutdown.
Fix by making two variants: remove() that is unconditional, and
try_remove() that is conditional.
Fixes: http://tracker.ceph.com/issues/22039
Signed-off-by: Sage Weil sage@redhat.com