New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disconnecting a locality results in segfault using heartbeat example #1589
Comments
I have noticed that there is sometimes a different failure, more frequently when the debugger is not attached, but sometimes when it is. {what}: assertion 'NULL != runtime::runtime_.get()' failed: HPX(assertion_failure) by the looks of things, a late parcel or something of that sort is occurring - not clear if the two segfaults are directly related or if one preempts the other sometimes,.
|
@biddisco: please check branch fixing_1589 and report back whether your problems persist. |
The heartbeat example is now running fine. No crashes on client or console after initial testing. |
This was fixed by merging #1597 |
It seems the problem still exists : I get the following crash https://gist.github.com/ericLemanissier/15b176518c202a5498d3 |
@biddisco Very recently we faced the exact same crash. Although the circumstances might not be the same here but just wanted to highlight that the problem for us was with the ordering of one of the class member of type boost::asio::ip::tcp::socket before the member of type boost::asio::io_service. With this setup the dtor of the io_service gets called before the dtor for tcp::socket. The critical section which the above code faults on, gets default destructed deep inside the dtor for the boost::asio::io_service member. Moving boost::asio::io_service member before the boost::asio::ip::tcp::socket member fixed this crash. Neelabh |
@n-mam Ohh, perfect! I was hunting this issue for a while but couldn't find it. Would you be able to create a PR fixing this issue? |
@n-mam Good work. Very nice to see people chipping in with catches like that one. |
@hkaiser @biddisco , I think I should have been more specific earlier. The crash which I fixed was with our code. Nothing to do with this project or library. Prior to the fix, we had one boost::asio::ip::tcp::socket member declared before a boost::asio::io_service member in a simple class(say xyz). Because of this, the dtor of io_service was called first which destroyed the critical section object which the tcp::socket dtor later tried to access; all this while still inside the same dtor call stack for our xyz object. I saw the call stack originally posted with this issue and it looked similar to what happened with our crash. The only difference with my issue was that the crash callstack showed only the boost::.. ::sockets's dtor at the top. However, the call stack with this issue also has the dtor of io_service being shown. This, with my issue, came up over the non-crashing callstack i.e. the one which delete's the critical section (I had put a memory access breakpoint on the CS address). hence the earlier statement "Although the circumstances might not be the same" If there is a more recent crash dump which you could share then I can have a look. |
The attached stacktrace comes from running the heartbeat worker example. When it disconnects from the console process,, it shuts down the parcelport and a mutex deep inside boost:asio is accessed seemingly after it has been destroyed. The mutex is locked during win_iocp_socket_service_base::destroy but it appears to have unitialized memory contents.
On windows, console started using
Worker started using
This has been reproduced with boost_1_55 and boost_1_58 (in the hope that it was a boost error that had been fixed in a more recent version)
---stacktrace---
The text was updated successfully, but these errors were encountered: