New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sometimes internalDeleteTopicForcefully will block forever #14438
Comments
@leizhiyuan - which version of pulsar are you seeing this error? Have you reproduced the problem against all of the versions taht @315157973 lists here? |
@leizhiyuan - in your thread dump, what is the stack for the thread named pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java Lines 310 to 319 in 0ef7baa
In 2.8, we added that pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java Lines 78 to 79 in 0ef7baa
If that executor is not handling requests, it will not complete futures, and which will lead to the problem you're seeing here. In my case, I can see the following stack for my metadata-store thread:
|
Based on that thread dump, it looks like the blocking is coming from this code: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java Lines 2279 to 2284 in 0ef7baa
When the zk thread completes the future returned by The fundamental issue here is deadlock. @merlimat @lhotari @codelipenghui @eolivelli - I think we should seriously consider solving this deadlock scenario for 2.10.0. EDIT: added reference to 2.10.0 |
good work @michaeljmarshall We should not call 'join' and chain the async functions calls to persistentTopicExists. I agree that we should fix this. |
@leizhiyuan I will send a PR for this if you don't have time |
Sorry @eolivelli, I missed your message, I have already pushed a PR. #14469 |
@leizhiyuan - note that the bug I'm seeing is triggered by |
@leizhiyuan - it'd be valuable to know what your |
|
@leizhiyuan - thanks for sharing that thread dump. I am intrigued that the I might have been wrong in my initial analysis in thinking we had the same issues. |
@leizhiyuan - after looking through your thread dump a bit more, I am wondering if you have a separate issue in your delayed message implementation? I see that thread is blocked on some other thread that is completing a lookup call (that lookup call has a timeout on it, so that thread isn't indefinitely blocked). Is there any chance that the
|
Thanks for your help, but we analyzed this thread, |
Master issue #14438 ### Motivation Invoking the ``join()`` method in the async method will cause some deadlock. ### Modifications - Refactor ``PersistentTopic#tryToDeletePartitionedMetadata`` to pure async.
Master issue apache#14438 ### Motivation Invoking the ``join()`` method in the async method will cause some deadlock. ### Modifications - Refactor ``PersistentTopic#tryToDeletePartitionedMetadata`` to pure async.
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java
Line 312 in d1fb88a
in our scene, when we dump the thread many times , we can see this will wait forever
The text was updated successfully, but these errors were encountered: