New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librgw: fix librgw fhcache deadlock problem #36001
Conversation
Fixes: https://tracker.ceph.com/issues/46439 Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
@cbodley Please take a look at this, thanks:-) |
i don't think the recursive mutex is a good solution for deadlocks in general, but i'll defer to @mattbenjamin |
@xxhdx1985126 this is worrisome--it's usually not a real fix for an unexpected race. We're discussing whether to accept this fix provisionally in Bug Scrub. We're leaning towards, we should get a discrete reproducer and fix it correctly (with a retry, &c). Oh, you say in the tracker it's related to a recycled entry. There's special logic to deal with that sort of case--it should be possible to extend it? |
@mattbenjamin Hi, I'm a little confused in that: it seems both fh_cache and fh_lru holds exactly the same set of file handlers, is this right? If so, why keep two caches? Thanks:-) |
@mattbenjamin It seems that fh_cache is responsible for efficient lookup, and fh_lru is responsible for keeping refcnt and further deciding which fh to evict, is this right? |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
@xxhdx1985126 could you report status after merge of #36942 -- whether it fixes the issue you hit, or whether it could be generalized to do so? (That change shows the intended style of negotiation between locks in fh_cache and fh_lru.) |
@mattbenjamin ok:-) |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Fixes: https://tracker.ceph.com/issues/46439
Signed-off-by: Xuehan Xu xxhdx1985126@gmail.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox