Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librgw: fix librgw fhcache deadlock problem #36001

Closed

Conversation

xxhdx1985126
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/46439
Signed-off-by: Xuehan Xu xxhdx1985126@gmail.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Fixes: https://tracker.ceph.com/issues/46439
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
@xxhdx1985126
Copy link
Contributor Author

@cbodley Please take a look at this, thanks:-)

@cbodley
Copy link
Contributor

cbodley commented Jul 17, 2020

i don't think the recursive mutex is a good solution for deadlocks in general, but i'll defer to @mattbenjamin

@mattbenjamin
Copy link
Contributor

mattbenjamin commented Jul 23, 2020

@xxhdx1985126 this is worrisome--it's usually not a real fix for an unexpected race. We're discussing whether to accept this fix provisionally in Bug Scrub. We're leaning towards, we should get a discrete reproducer and fix it correctly (with a retry, &c). Oh, you say in the tracker it's related to a recycled entry. There's special logic to deal with that sort of case--it should be possible to extend it?

@xxhdx1985126
Copy link
Contributor Author

@mattbenjamin Hi, I'm a little confused in that: it seems both fh_cache and fh_lru holds exactly the same set of file handlers, is this right? If so, why keep two caches? Thanks:-)

@xxhdx1985126
Copy link
Contributor Author

@mattbenjamin It seems that fh_cache is responsible for efficient lookup, and fh_lru is responsible for keeping refcnt and further deciding which fh to evict, is this right?

@stale
Copy link

stale bot commented Nov 9, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Nov 9, 2020
@mattbenjamin
Copy link
Contributor

@xxhdx1985126 could you report status after merge of #36942 -- whether it fixes the issue you hit, or whether it could be generalized to do so? (That change shows the intended style of negotiation between locks in fh_cache and fh_lru.)

@stale stale bot removed the stale label Nov 9, 2020
@xxhdx1985126
Copy link
Contributor Author

@mattbenjamin ok:-)

@stale
Copy link

stale bot commented Jan 11, 2021

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jan 11, 2021
@stale
Copy link

stale bot commented Jul 21, 2021

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@stale stale bot closed this Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants