New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw/multisite: Fix deadlock by not holding mutex over coroutine #54505
Conversation
We applied this patch on Reef but we're still experiencing the deadlock: https://paste.opendev.org/raw/bPUD6MaEBOfQ5qQa9i6y/. For what is worth, I haven't seen any hanging |
@petrutlucian94 thanks a lot for testing. i don't think we've set up a reproducer for this one yet |
f4d2b0a
to
f8ef9c2
Compare
If the FIFO doesn't exist, let clients race to create it, then stash and use whoever wins. Fixes: https://tracker.ceph.com/issues/63373 Signed-off-by: Adam Emerson <aemerson@redhat.com>
f8ef9c2
to
8fa8443
Compare
passed qa in https://pulpito.ceph.com/cbodley-2024-01-03_21:57:26-rgw-wip-cbodley-testing-distro-default-smithi/ (this fixes one issue, but doesn't entirely resolve https://tracker.ceph.com/issues/63373) |
@cbodley could you elborate what is still broken? Is the remaining part of this issue tracked anywhere but in https://tracker.ceph.com/issues/63373? I'd really love for this to be resolved with 18.2.2, but apparently https://tracker.ceph.com/issues/63373 is not on the roadmap for 18.2.2 at https://tracker.ceph.com/versions/678. |
@frittentheke i don't think that roadmap page is accurate, it usually only tracks backports that already merged in https://tracker.ceph.com/issues/63373#note-20, @mkogan1 bisected this down to a specific commit but nobody has followed up on that yet. if we can't make progress there, we should probably revert all of #49741. cc @adamemerson @mattbenjamin @smanjara we're also nearing the code freeze for the squid release, and i'd love to have it fixed there too |
i opened #55522 which leaves most of #49741 intact, but replaces a single null_yield in RGWDataChangesLog::add_entry() to make it all synchronous |
Race to create the FIFO, stash it under mutex. Never hold the mutex over a yield, null or otherwise.
Fixes: https://tracker.ceph.com/issues/63373
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e