-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache listeners including writes to caches deadlock when used in ForkJoinPool #1065
Comments
Thank you for the detailed and thoughtful bug report. Your idea of using a non-static thread local sounds like the right solution. I'll try to look into this by the end of the week. Please let me know if you need/want a release once this is fixed. Until then, you might be able to not enable |
also, fwiw, it looks like the JCache spec does seem to warn about this in I like your solution, not sure if they are implying to avoid this or were simply warning of the danger of undefined behavior. |
This sidesteps an edge-case wherein a JCache listener in a primary cache mutates a secondary Caffeine cache. While this is typically not a problem, if the thread that mutates the primary cache is from a ForkJoinPool, the EventDispatcher can deadlock as the calling thread may be re-used. Under this condition, the secondary cache mutation is effectively waiting on its own Future and will deadlock.
This sidesteps an edge-case wherein a JCache listener in a primary cache mutates a secondary Caffeine cache. While this is typically not a problem, if the thread that mutates the primary cache is from a ForkJoinPool, the EventDispatcher can deadlock as the calling thread may be re-used. Under this condition, the secondary cache mutation is effectively waiting on its own Future and will deadlock.
Submitted a PR as the fix was pretty simple. Took a while to build though, the tests in this repo are very thorough! |
thanks! I'd like to incorporate your test case though. You can leave that to me and I'll merge it in with your pr. |
This sidesteps an edge-case wherein a JCache listener in a primary cache mutates a secondary Caffeine cache. While this is typically not a problem, if the thread that mutates the primary cache is from a ForkJoinPool, the EventDispatcher can deadlock as the calling thread may be re-used. Under this condition, the secondary cache mutation is effectively waiting on its own Future and will deadlock.
The unit and integration tests verify that the event dispatcher does not leak the pending acknowledgements across threads if the executor descides to run the listener on the same thread as the publisher. This occurred due to using a static ThreadLocal rather than per cache instance, which was a mistake as typically that is how ThreadLocals are meant to be used. While crafting tests, it was noticed that the CaffeineConfiguration's setters are not fluent like the JCache's basic implementation, MutableConfiguration. This was changed for consistency and convenience.
Thanks again for the fix, merged with tests. That was certainly a tricky one! Let me know if you'd like a release and I'll try to get to that soon. |
Thanks for merging it! A release would be nice, we've worked around the problem for the moment by not using a ForkJoinPool but there are other places in our code that might be at risk through a client library using Caffeine without our knowledge. |
Released in 3.1.7 |
When an application using a cache listener that writes to another Caffeine cache while running from a ForkJoinPool, it will eventually deadlock 100% of the time. This is very much an edge case, but it is very difficult to diagnose when it occurs. If your application includes a listener that uses a library that transitively writes a Caffeine cache, this can manifest as well.
The root cause appears to be that the EventDispatcher.awaitSynchronous() method checks for all pending Futures in a static ThreadLocal. When used outside of a ForkJoinPool, this is fine as the ThreadLocal will isolate any calling thread from others. But when used in a ForkJoinPool, the calling thread may be reused, which causes the listener to inadvertently block on its own Future.
I suspect changing the EventDispatcher.pending field to be non-static will skate around this issue. Here is a sample reproduction that will eventually always fail (though it may take a few hundred iterations to do so):
The text was updated successfully, but these errors were encountered: