New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore: Fix race condition in Onode:put() #48566
Conversation
The race condition happens when an Onode is unpinned in one "put" thread and being trimmed right away(after the cache lock is released) by another thread Fixes: https://tracker.ceph.com/issues/57895 Signed-off-by: dongdong tao <dongdong.tao@canonical.com>
@taodd Thanks a ton for digging into this. You are right, this is fairly core code and this isn't the first time we've hit issues here. Added a couple of the relevant folks as potential reviewers. |
@ifedo01 mentioned he has a (bigger) PR that also may fix this issue, but I haven't looked it over yet: #47702 |
ocs->lock.unlock(); | ||
} | ||
auto pn = --put_nref; | ||
if (nref == 0 && pn == 0) { | ||
if (n == 0 && put_nref == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all - it's not mandatory that onode is cached and hence potentially two threads might own onode independently with no references from c->onode_map.
So we might have two threads owning onode with nref == 2. Finally both threads call put()....
Case 1:
Thread A makes nref ==1 and goes through if(n == 1) block but before it reaches --put_nref thread B falls through and deletes onode... At this point thread A operates on a released onode.
Case 2:
Thread A makes nref == 1 and reaches n = nref. At this point thread B makes nref == 0 and falls through to put() return - with no onode release due to put_nref != 0. Then thread A continues and bypasses delete due to n == 1 as well. Hence Onode is leaking...
Generally my idea behind put_ref increment/decrement is that it has to "wrap" other manipulations on onode within put(). So its increment has to be the first op in put() and decrement to be the last one before the delete. I haven't achieved that completely but looks like you're moving even further from the original idea...
So we don't have good enough fix for now :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your review.
I did really assume caching onode is mandatory -- an Onode had to be added to the cache before it could be referenced.
Could you please give me some examples that the Onode might be referenced without adding to cache ?( is it deep-scrub ? ) thanks a lot :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taodd - I don't have any real-life examples under my hand atm. But generally:
a) it's available with the current onode design.
b) it's a bad practice to have a dependency between Onode use case (e.g. whether we put it into the cache or not) and its life-cycle tracking(aka ref counting). That latter has to be completely use case agnostic. Even if we don't actively use this mode at the moment - one can start using it in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
such as two onode cached in two threads, but the onode is trimed.
@taodd I have a question, why onode put do not use a onode lock to make judgement and delete a atomic operation ? |
This PR can be closed actually, Please see #47702 |
This race condition happens when an Onode is unpinned in one "put" thread and being trimmed right away(after the cache lock is released) by another thread
The race happens like this:
Fixes: https://tracker.ceph.com/issues/57895
Signed-off-by: dongdong tao dongdong.tao@canonical.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows