-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue 4070][pulsar-client-cpp] Fix for possible deadlock when closing Pulsar client #6277
Conversation
@heronr I think there might be a possible issue with this change. I'm getting a segfault when running tests and tests failing (for unrelated issues):
where we have: ~thread()
{
if (joinable())
std::terminate();
} So I think that in this case we have not called |
This implies that there is thread leak in certain cases, probably caused by the Executor service being destroyed in the context of the worker thread. I will try to run the tests locally and track this down, but it may prove difficult to resolve. |
@heronr I don't think there's a thread leak. Rather, the error seems to be that the The main issue I see here is that the I believe the only solution is that we should ensure, that |
Agreed on your assessment. I tracked one instance of a non-joined thread to the HTTPLookupService owning the ExecutorServiceProvider that it posts its own work onto. Now the provider it uses is owned by the ClientImpl, but this just exposed a different non-joined thread caused by the ClientImpl itself being destroyed on the ExecutorService worker thread. Here is the callstack Ultimately because shared_ptrs are being used to manage the lifetime of objects that then post work to ExecutorServices owned by those same objects, we can encounter a non-joinable thread since there is not fine control over when the ref count goes to 0. If we can designate the pulsar::Client as non-copyable then its destructor can force a shutdown() on the ClientImpl and ensure that all outstanding worker threads are joined at that point as a result. The latest exception that I linked stems from the fact that if a pulsar::Client is destructed without first calling shutdown, any outstanding work on an ExecutorService thread will likely result in the destruction of the ClientImpl on that thread and make it impossible to join. |
@heronr I'm not sure that fixes the underlying issue. It still will be triggering (another) std::thread object being destroyed from itself. Actually, I think the solution should be easy: just use if (std::this_thread::get_id() != worker_.get_id() && worker_.joinable()) {
worker_.join();
} else {
worker_.detach();
} |
Yes, I also considered using detach() as it is the path of least resistance. I just can't help but feel like it's a workaround solution that sweeps the underlying problem under the rug. |
…g Pulsar client (apache#6277) * Attempt at fixing deadlock during client.close() * Fixed formatting * Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined * Possible formatting fixes
…g Pulsar client (apache#6277) * Attempt at fixing deadlock during client.close() * Fixed formatting * Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined * Possible formatting fixes (cherry picked from commit 2e1c74a)
…g Pulsar client (apache#6277) * Attempt at fixing deadlock during client.close() * Fixed formatting * Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined * Possible formatting fixes (cherry picked from commit 2e1c74a)
…g Pulsar client (apache#6277) * Attempt at fixing deadlock during client.close() * Fixed formatting * Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined * Possible formatting fixes
Fixes #4070
Motivation
This change is to fix a possible deadlock that can occur when closing the Pulsar client that is caused by the ExecutorService worker thread attempting to join itself.
Modifications
The close() method on the ExecutorService will now not join the worker_ thread if its thread id is the same as the calling thread. The type of worker_ was changed to std::thread to allow for the check since the thread id is not exposed by boost::asio::detail::thread.
Verifying this change
This change is already covered by existing tests.
Does this pull request potentially affect one of the following parts:
Documentation