Disconnecting a locality results in segfault using heartbeat example #1589

biddisco · 2015-06-08T09:09:38Z

The attached stacktrace comes from running the heartbeat worker example. When it disconnects from the console process,, it shuts down the parcelport and a mutex deep inside boost:asio is accessed seemingly after it has been destroyed. The mutex is locked during win_iocp_socket_service_base::destroy but it appears to have unitialized memory contents.

On windows, console started using

heartbeat_console.exe --hpx:threads=2 --runfor=100 -Ihpx.parcel.port=7910 \
    --hpx:attach-debugger=exception -Ihpx.agas.port=7911

Worker started using

heartbeat.exe --hpx:threads=2 --hpx:agas=127.0.0.1:7910 --runfor=2 \
    --hpx:attach-debugger=exception

This has been reproduced with boost_1_55 and boost_1_58 (in the hope that it was a boost error that had been fixed in a more recent version)

---stacktrace---

ntdll.dll!00007ffa9aceb2de()    Unknown
hpxd.dll!boost::asio::detail::win_mutex::lock() Line 51 C++
    hpxd.dll!boost::asio::detail::scoped_lock<boost::asio::detail::win_mutex>::scoped_lock<boost::asio::detail::win_mutex>(boost::asio::detail::win_mutex & m) Line 47  C++
    hpxd.dll!boost::asio::detail::win_iocp_socket_service_base::destroy(boost::asio::detail::win_iocp_socket_service_base::base_implementation_type & impl) Line 154    C++
    hpxd.dll!boost::asio::stream_socket_service<boost::asio::ip::tcp>::destroy(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl) Line 139 C++
    hpxd.dll!boost::asio::basic_io_object<boost::asio::stream_socket_service<boost::asio::ip::tcp>,1>::~basic_io_object<boost::asio::stream_socket_service<boost::asio::ip::tcp>,1>() Line 196  C++
    hpxd.dll!boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >::~basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >() Line 1512   C++
    [External Code] 
    hpxd.dll!hpx::parcelset::policies::tcp::receiver::~receiver() Line 56   C++
    [External Code] 
    hpxd.dll!boost::checked_delete<hpx::parcelset::policies::tcp::receiver>(hpx::parcelset::policies::tcp::receiver * x) Line 34    C++
    hpxd.dll!boost::detail::sp_counted_impl_p<hpx::parcelset::policies::tcp::receiver>::dispose() Line 79   C++
    hpxd.dll!boost::detail::sp_counted_base::release() Line 104 C++
    hpxd.dll!boost::detail::shared_count::~shared_count() Line 447  C++
    [External Code] 
    hpxd.dll!boost::asio::detail::win_iocp_socket_accept_op<boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >,boost::asio::ip::tcp,boost::_bi::bind_t<void,boost::_mfi::mf2<void,hpx::parcelset::policies::tcp::connection_handler,boost::system::error_code const & __ptr64,boost::shared_ptr<hpx::parcelset::policies::tcp::receiver> >,boost::_bi::list3<boost::_bi::value<hpx::parcelset::policies::tcp::connection_handler * __ptr64>,boost::arg<1>,boost::_bi::value<boost::shared_ptr<hpx::parcelset::policies::tcp::receiver> > > > >::do_complete(boost::asio::detail::win_iocp_io_service * owner, boost::asio::detail::win_iocp_operation * base, const boost::system::error_code & result_ec, unsigned __int64 __formal) Line 145   C++
    hpxd.dll!boost::asio::detail::win_iocp_operation::destroy() Line 52 C++
    hpxd.dll!boost::asio::detail::win_iocp_io_service::shutdown_service() Line 126  C++
    hpxd.dll!boost::asio::detail::service_registry::~service_registry() Line 38 C++
    [External Code] 
    hpxd.dll!boost::asio::io_service::~io_service() Line 53 C++
    [External Code] 
    hpxd.dll!boost::checked_delete<boost::asio::io_service>(boost::asio::io_service * x) Line 34    C++
    hpxd.dll!boost::detail::sp_counted_impl_p<boost::asio::io_service>::dispose() Line 79   C++
    hpxd.dll!boost::detail::sp_counted_base::release() Line 104 C++
    hpxd.dll!boost::detail::shared_count::~shared_count() Line 447  C++
    [External Code] 
    hpxd.dll!hpx::util::io_service_pool::clear_locked() Line 194    C++
    hpxd.dll!hpx::util::io_service_pool::clear() Line 183   C++
    hpxd.dll!hpx::parcelset::parcelport_impl<hpx::parcelset::policies::tcp::connection_handler>::stop(bool blocking) Line 167   C++
>   hpxd.dll!hpx::parcelset::parcelhandler::stop(bool blocking) Line 291    C++
    hpxd.dll!hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>::stop(bool blocking) Line 419  C++
    hpxd.dll!hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>::run(const hpx::util::function<int __cdecl(void),void,void> & func) Line 525   C++
    hpxd.dll!hpx::detail::run(hpx::runtime & rt, const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, boost::program_options::variables_map & vm, hpx::runtime_mode mode, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown) Line 523 C++
    hpxd.dll!hpx::detail::run_priority_local(const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::util::command_line_handling & cfg, bool blocking) Line 857 C++
    hpxd.dll!hpx::detail::run_or_start(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & ini_config, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode, bool blocking) Line 1080   C++
    heartbeat.exe!hpx::init(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 49  C++
    heartbeat.exe!hpx::init(const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 99 C++
    heartbeat.exe!main(int argc, char * * argv) Line 141    C++
    [External Code]

The text was updated successfully, but these errors were encountered:

hkaiser · 2015-06-08T12:19:17Z

This looks like to be connected to #866.

@biddisco does this always happen or only when running in the debugger?

biddisco · 2015-06-09T05:46:18Z

I have noticed that there is sometimes a different failure, more frequently when the debugger is not attached, but sometimes when it is.

{what}: assertion 'NULL != runtime::runtime_.get()' failed: HPX(assertion_failure)

by the looks of things, a late parcel or something of that sort is occurring - not clear if the two segfaults are directly related or if one preempts the other sometimes,.

    msvcr120d.dll!00007ffa6fc17642()    Unknown
    msvcr120d.dll!00007ffa6fd42044()    Unknown
>   hpxd.dll!hpx::detail::assertion_failed_msg(const char * msg, const char * expr, const char * function, const char * file, long line) Line 390   C++
    hpxd.dll!hpx::detail::assertion_failed(const char * expr, const char * function, const char * file, long line) Line 345 C++
    hpxd.dll!hpx::assertion_failed(const char * expr, const char * function, const char * file, long line) Line 1588    C++
    hpxd.dll!hpx::get_runtime() Line 798    C++
    hpxd.dll!hpx::naming::get_agas_client() Line 1180   C++
    hpxd.dll!hpx::naming::detail::decrement_refcnt(hpx::naming::detail::id_type_impl * p) Line 121  C++
    hpxd.dll!hpx::naming::detail::gid_managed_deleter(hpx::naming::detail::id_type_impl * p) Line 181   C++
    hpxd.dll!hpx::naming::detail::intrusive_ptr_release(hpx::naming::detail::id_type_impl * p) Line 452 C++
    hpxd.dll!boost::intrusive_ptr<hpx::naming::detail::id_type_impl>::~intrusive_ptr<hpx::naming::detail::id_type_impl>() Line 98   C++
    [External Code] 
    heartbeat.exe!hpx::lcos::detail::promise<void,hpx::util::unused_type>::requires_delete() Line 401   C++
    heartbeat.exe!hpx::lcos::detail::intrusive_ptr_release(hpx::lcos::detail::promise<void,hpx::util::unused_type> * p) Line 412    C++
    heartbeat.exe!hpx::components::detail_adl_barrier::manage_lifetime<hpx::traits::managed_object_is_lifetime_controlled>::release<hpx::lcos::detail::promise<void,hpx::util::unused_type> >(hpx::lcos::detail::promise<void,hpx::util::unused_type> * component) Line 148 C++
    heartbeat.exe!hpx::components::intrusive_ptr_release<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type>(hpx::components::managed_component<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type> * p) Line 349   C++
    heartbeat.exe!boost::intrusive_ptr<hpx::components::managed_component<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type> >::~intrusive_ptr<hpx::components::managed_component<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type> >() Line 98  C++
    heartbeat.exe!hpx::lcos::promise<void,hpx::util::unused_type>::~promise<void,hpx::util::unused_type>() Line 688 C++
    [External Code] 
    heartbeat.exe!hpx::util::detail::vtable::delete_<boost::_bi::bind_t<void,void (__cdecl*)(hpx::lcos::promise<void,hpx::util::unused_type>),boost::_bi::list1<boost::_bi::value<hpx::lcos::promise<void,hpx::util::unused_type> > > > >(void * * v) Line 93   C++
    hpxd.dll!hpx::util::detail::function_base<hpx::util::detail::function_vtable_ptr<void __cdecl(void),void,void>,void __cdecl(void)>::~function_base<hpx::util::detail::function_vtable_ptr<void __cdecl(void),void,void>,void __cdecl(void)>() Line 79   C++
    [External Code] 
    hpxd.dll!hpx::components::server::runtime_support::~runtime_support() Line 116  C++
    [External Code] 
    hpxd.dll!boost::checked_delete<hpx::components::server::runtime_support>(hpx::components::server::runtime_support * x) Line 34  C++
    hpxd.dll!boost::scoped_ptr<hpx::components::server::runtime_support>::~scoped_ptr<hpx::components::server::runtime_support>() Line 83   C++
    hpxd.dll!hpx::runtime::~runtime() Line 498  C++
    hpxd.dll!hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>::~runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>() Line 219    C++
    [External Code] 
    hpxd.dll!hpx::detail::run_priority_local(const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::util::command_line_handling & cfg, bool blocking) Line 857 C++
    hpxd.dll!hpx::detail::run_or_start(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & ini_config, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode, bool blocking) Line 1080   C++
    heartbeat.exe!hpx::init(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 49  C++
    heartbeat.exe!hpx::init(const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 99 C++
    heartbeat.exe!main(int argc, char * * argv) Line 141    C++
    [External Code]

hkaiser · 2015-06-09T17:44:29Z

@biddisco: please check branch fixing_1589 and report back whether your problems persist.

biddisco · 2015-06-09T18:37:28Z

The heartbeat example is now running fine. No crashes on client or console after initial testing.

hkaiser · 2015-06-13T23:05:14Z

This was fixed by merging #1597

ericLemanissier · 2015-10-01T17:51:59Z

It seems the problem still exists : I get the following crash https://gist.github.com/ericLemanissier/15b176518c202a5498d3
It happens for all tests involving distributed.tcp, and the compiler is MinGW (#1773)

n-mam · 2017-08-14T05:09:55Z

@biddisco Very recently we faced the exact same crash. Although the circumstances might not be the same here but just wanted to highlight that the problem for us was with the ordering of one of the class member of type boost::asio::ip::tcp::socket before the member of type boost::asio::io_service.

With this setup the dtor of the io_service gets called before the dtor for tcp::socket. The critical section which the above code faults on, gets default destructed deep inside the dtor for the boost::asio::io_service member.

Moving boost::asio::io_service member before the boost::asio::ip::tcp::socket member fixed this crash.

Neelabh

hkaiser · 2017-08-14T12:55:13Z

@n-mam Ohh, perfect! I was hunting this issue for a while but couldn't find it. Would you be able to create a PR fixing this issue?

biddisco · 2017-08-14T13:46:29Z

@n-mam Good work. Very nice to see people chipping in with catches like that one.

n-mam · 2017-08-14T17:12:19Z

@hkaiser @biddisco , I think I should have been more specific earlier. The crash which I fixed was with our code. Nothing to do with this project or library. Prior to the fix, we had one boost::asio::ip::tcp::socket member declared before a boost::asio::io_service member in a simple class(say xyz). Because of this, the dtor of io_service was called first which destroyed the critical section object which the tcp::socket dtor later tried to access; all this while still inside the same dtor call stack for our xyz object.

I saw the call stack originally posted with this issue and it looked similar to what happened with our crash. The only difference with my issue was that the crash callstack showed only the boost::.. ::sockets's dtor at the top. However, the call stack with this issue also has the dtor of io_service being shown. This, with my issue, came up over the non-crashing callstack i.e. the one which delete's the critical section (I had put a memory access breakpoint on the CS address). hence the earlier statement "Although the circumstances might not be the same" If there is a more recent crash dump which you could share then I can have a look.

biddisco added type: defect category: parcel transport platform: Windows compiler: msvc category: examples affecting: CSCS labels Jun 8, 2015

hkaiser mentioned this issue Jun 9, 2015

Improved the reliability of connecting/disconnecting localities #1597

Merged

hkaiser closed this as completed Jun 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disconnecting a locality results in segfault using heartbeat example #1589

Disconnecting a locality results in segfault using heartbeat example #1589

biddisco commented Jun 8, 2015

hkaiser commented Jun 8, 2015

biddisco commented Jun 9, 2015

hkaiser commented Jun 9, 2015

biddisco commented Jun 9, 2015

hkaiser commented Jun 13, 2015

ericLemanissier commented Oct 1, 2015

n-mam commented Aug 14, 2017 •

edited

hkaiser commented Aug 14, 2017

biddisco commented Aug 14, 2017

n-mam commented Aug 14, 2017

Disconnecting a locality results in segfault using heartbeat example #1589

Disconnecting a locality results in segfault using heartbeat example #1589

Comments

biddisco commented Jun 8, 2015

hkaiser commented Jun 8, 2015

biddisco commented Jun 9, 2015

hkaiser commented Jun 9, 2015

biddisco commented Jun 9, 2015

hkaiser commented Jun 13, 2015

ericLemanissier commented Oct 1, 2015

n-mam commented Aug 14, 2017 • edited

hkaiser commented Aug 14, 2017

biddisco commented Aug 14, 2017

n-mam commented Aug 14, 2017

n-mam commented Aug 14, 2017 •

edited