Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnecting a locality results in segfault using heartbeat example #1589

Closed
biddisco opened this issue Jun 8, 2015 · 10 comments
Closed

Disconnecting a locality results in segfault using heartbeat example #1589

biddisco opened this issue Jun 8, 2015 · 10 comments

Comments

@biddisco
Copy link
Contributor

biddisco commented Jun 8, 2015

The attached stacktrace comes from running the heartbeat worker example. When it disconnects from the console process,, it shuts down the parcelport and a mutex deep inside boost:asio is accessed seemingly after it has been destroyed. The mutex is locked during win_iocp_socket_service_base::destroy but it appears to have unitialized memory contents.

On windows, console started using

heartbeat_console.exe --hpx:threads=2 --runfor=100 -Ihpx.parcel.port=7910 \
    --hpx:attach-debugger=exception -Ihpx.agas.port=7911

Worker started using

heartbeat.exe --hpx:threads=2 --hpx:agas=127.0.0.1:7910 --runfor=2 \
    --hpx:attach-debugger=exception

This has been reproduced with boost_1_55 and boost_1_58 (in the hope that it was a boost error that had been fixed in a more recent version)

---stacktrace---

ntdll.dll!00007ffa9aceb2de()    Unknown
hpxd.dll!boost::asio::detail::win_mutex::lock() Line 51 C++
    hpxd.dll!boost::asio::detail::scoped_lock<boost::asio::detail::win_mutex>::scoped_lock<boost::asio::detail::win_mutex>(boost::asio::detail::win_mutex & m) Line 47  C++
    hpxd.dll!boost::asio::detail::win_iocp_socket_service_base::destroy(boost::asio::detail::win_iocp_socket_service_base::base_implementation_type & impl) Line 154    C++
    hpxd.dll!boost::asio::stream_socket_service<boost::asio::ip::tcp>::destroy(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl) Line 139 C++
    hpxd.dll!boost::asio::basic_io_object<boost::asio::stream_socket_service<boost::asio::ip::tcp>,1>::~basic_io_object<boost::asio::stream_socket_service<boost::asio::ip::tcp>,1>() Line 196  C++
    hpxd.dll!boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >::~basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >() Line 1512   C++
    [External Code] 
    hpxd.dll!hpx::parcelset::policies::tcp::receiver::~receiver() Line 56   C++
    [External Code] 
    hpxd.dll!boost::checked_delete<hpx::parcelset::policies::tcp::receiver>(hpx::parcelset::policies::tcp::receiver * x) Line 34    C++
    hpxd.dll!boost::detail::sp_counted_impl_p<hpx::parcelset::policies::tcp::receiver>::dispose() Line 79   C++
    hpxd.dll!boost::detail::sp_counted_base::release() Line 104 C++
    hpxd.dll!boost::detail::shared_count::~shared_count() Line 447  C++
    [External Code] 
    hpxd.dll!boost::asio::detail::win_iocp_socket_accept_op<boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >,boost::asio::ip::tcp,boost::_bi::bind_t<void,boost::_mfi::mf2<void,hpx::parcelset::policies::tcp::connection_handler,boost::system::error_code const & __ptr64,boost::shared_ptr<hpx::parcelset::policies::tcp::receiver> >,boost::_bi::list3<boost::_bi::value<hpx::parcelset::policies::tcp::connection_handler * __ptr64>,boost::arg<1>,boost::_bi::value<boost::shared_ptr<hpx::parcelset::policies::tcp::receiver> > > > >::do_complete(boost::asio::detail::win_iocp_io_service * owner, boost::asio::detail::win_iocp_operation * base, const boost::system::error_code & result_ec, unsigned __int64 __formal) Line 145   C++
    hpxd.dll!boost::asio::detail::win_iocp_operation::destroy() Line 52 C++
    hpxd.dll!boost::asio::detail::win_iocp_io_service::shutdown_service() Line 126  C++
    hpxd.dll!boost::asio::detail::service_registry::~service_registry() Line 38 C++
    [External Code] 
    hpxd.dll!boost::asio::io_service::~io_service() Line 53 C++
    [External Code] 
    hpxd.dll!boost::checked_delete<boost::asio::io_service>(boost::asio::io_service * x) Line 34    C++
    hpxd.dll!boost::detail::sp_counted_impl_p<boost::asio::io_service>::dispose() Line 79   C++
    hpxd.dll!boost::detail::sp_counted_base::release() Line 104 C++
    hpxd.dll!boost::detail::shared_count::~shared_count() Line 447  C++
    [External Code] 
    hpxd.dll!hpx::util::io_service_pool::clear_locked() Line 194    C++
    hpxd.dll!hpx::util::io_service_pool::clear() Line 183   C++
    hpxd.dll!hpx::parcelset::parcelport_impl<hpx::parcelset::policies::tcp::connection_handler>::stop(bool blocking) Line 167   C++
>   hpxd.dll!hpx::parcelset::parcelhandler::stop(bool blocking) Line 291    C++
    hpxd.dll!hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>::stop(bool blocking) Line 419  C++
    hpxd.dll!hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>::run(const hpx::util::function<int __cdecl(void),void,void> & func) Line 525   C++
    hpxd.dll!hpx::detail::run(hpx::runtime & rt, const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, boost::program_options::variables_map & vm, hpx::runtime_mode mode, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown) Line 523 C++
    hpxd.dll!hpx::detail::run_priority_local(const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::util::command_line_handling & cfg, bool blocking) Line 857 C++
    hpxd.dll!hpx::detail::run_or_start(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & ini_config, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode, bool blocking) Line 1080   C++
    heartbeat.exe!hpx::init(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 49  C++
    heartbeat.exe!hpx::init(const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 99 C++
    heartbeat.exe!main(int argc, char * * argv) Line 141    C++
    [External Code] 
@hkaiser
Copy link
Member

hkaiser commented Jun 8, 2015

This looks like to be connected to #866.

@biddisco does this always happen or only when running in the debugger?

@biddisco
Copy link
Contributor Author

biddisco commented Jun 9, 2015

I have noticed that there is sometimes a different failure, more frequently when the debugger is not attached, but sometimes when it is.

{what}: assertion 'NULL != runtime::runtime_.get()' failed: HPX(assertion_failure)

by the looks of things, a late parcel or something of that sort is occurring - not clear if the two segfaults are directly related or if one preempts the other sometimes,.

    msvcr120d.dll!00007ffa6fc17642()    Unknown
    msvcr120d.dll!00007ffa6fd42044()    Unknown
>   hpxd.dll!hpx::detail::assertion_failed_msg(const char * msg, const char * expr, const char * function, const char * file, long line) Line 390   C++
    hpxd.dll!hpx::detail::assertion_failed(const char * expr, const char * function, const char * file, long line) Line 345 C++
    hpxd.dll!hpx::assertion_failed(const char * expr, const char * function, const char * file, long line) Line 1588    C++
    hpxd.dll!hpx::get_runtime() Line 798    C++
    hpxd.dll!hpx::naming::get_agas_client() Line 1180   C++
    hpxd.dll!hpx::naming::detail::decrement_refcnt(hpx::naming::detail::id_type_impl * p) Line 121  C++
    hpxd.dll!hpx::naming::detail::gid_managed_deleter(hpx::naming::detail::id_type_impl * p) Line 181   C++
    hpxd.dll!hpx::naming::detail::intrusive_ptr_release(hpx::naming::detail::id_type_impl * p) Line 452 C++
    hpxd.dll!boost::intrusive_ptr<hpx::naming::detail::id_type_impl>::~intrusive_ptr<hpx::naming::detail::id_type_impl>() Line 98   C++
    [External Code] 
    heartbeat.exe!hpx::lcos::detail::promise<void,hpx::util::unused_type>::requires_delete() Line 401   C++
    heartbeat.exe!hpx::lcos::detail::intrusive_ptr_release(hpx::lcos::detail::promise<void,hpx::util::unused_type> * p) Line 412    C++
    heartbeat.exe!hpx::components::detail_adl_barrier::manage_lifetime<hpx::traits::managed_object_is_lifetime_controlled>::release<hpx::lcos::detail::promise<void,hpx::util::unused_type> >(hpx::lcos::detail::promise<void,hpx::util::unused_type> * component) Line 148 C++
    heartbeat.exe!hpx::components::intrusive_ptr_release<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type>(hpx::components::managed_component<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type> * p) Line 349   C++
    heartbeat.exe!boost::intrusive_ptr<hpx::components::managed_component<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type> >::~intrusive_ptr<hpx::components::managed_component<hpx::lcos::detail::promise<void,hpx::util::unused_type>,hpx::components::detail::this_type> >() Line 98  C++
    heartbeat.exe!hpx::lcos::promise<void,hpx::util::unused_type>::~promise<void,hpx::util::unused_type>() Line 688 C++
    [External Code] 
    heartbeat.exe!hpx::util::detail::vtable::delete_<boost::_bi::bind_t<void,void (__cdecl*)(hpx::lcos::promise<void,hpx::util::unused_type>),boost::_bi::list1<boost::_bi::value<hpx::lcos::promise<void,hpx::util::unused_type> > > > >(void * * v) Line 93   C++
    hpxd.dll!hpx::util::detail::function_base<hpx::util::detail::function_vtable_ptr<void __cdecl(void),void,void>,void __cdecl(void)>::~function_base<hpx::util::detail::function_vtable_ptr<void __cdecl(void),void,void>,void __cdecl(void)>() Line 79   C++
    [External Code] 
    hpxd.dll!hpx::components::server::runtime_support::~runtime_support() Line 116  C++
    [External Code] 
    hpxd.dll!boost::checked_delete<hpx::components::server::runtime_support>(hpx::components::server::runtime_support * x) Line 34  C++
    hpxd.dll!boost::scoped_ptr<hpx::components::server::runtime_support>::~scoped_ptr<hpx::components::server::runtime_support>() Line 83   C++
    hpxd.dll!hpx::runtime::~runtime() Line 498  C++
    hpxd.dll!hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>::~runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_fifo,hpx::threads::policies::lockfree_lifo>,hpx::threads::policies::callback_notifier>() Line 219    C++
    [External Code] 
    hpxd.dll!hpx::detail::run_priority_local(const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::util::command_line_handling & cfg, bool blocking) Line 857 C++
    hpxd.dll!hpx::detail::run_or_start(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & ini_config, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode, bool blocking) Line 1080   C++
    heartbeat.exe!hpx::init(const hpx::util::function<int __cdecl(boost::program_options::variables_map &),void,void> & f, const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 49  C++
    heartbeat.exe!hpx::init(const boost::program_options::options_description & desc_cmdline, int argc, char * * argv, const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & cfg, const hpx::util::function<void __cdecl(void),void,void> & startup, const hpx::util::function<void __cdecl(void),void,void> & shutdown, hpx::runtime_mode mode) Line 99 C++
    heartbeat.exe!main(int argc, char * * argv) Line 141    C++
    [External Code] 

@hkaiser
Copy link
Member

hkaiser commented Jun 9, 2015

@biddisco: please check branch fixing_1589 and report back whether your problems persist.

@biddisco
Copy link
Contributor Author

biddisco commented Jun 9, 2015

The heartbeat example is now running fine. No crashes on client or console after initial testing.

@hkaiser
Copy link
Member

hkaiser commented Jun 13, 2015

This was fixed by merging #1597

@hkaiser hkaiser closed this as completed Jun 13, 2015
@ericLemanissier
Copy link
Contributor

It seems the problem still exists : I get the following crash https://gist.github.com/ericLemanissier/15b176518c202a5498d3
It happens for all tests involving distributed.tcp, and the compiler is MinGW (#1773)

@n-mam
Copy link

n-mam commented Aug 14, 2017

@biddisco Very recently we faced the exact same crash. Although the circumstances might not be the same here but just wanted to highlight that the problem for us was with the ordering of one of the class member of type boost::asio::ip::tcp::socket before the member of type boost::asio::io_service.

With this setup the dtor of the io_service gets called before the dtor for tcp::socket. The critical section which the above code faults on, gets default destructed deep inside the dtor for the boost::asio::io_service member.

Moving boost::asio::io_service member before the boost::asio::ip::tcp::socket member fixed this crash.

Neelabh

@hkaiser
Copy link
Member

hkaiser commented Aug 14, 2017

@n-mam Ohh, perfect! I was hunting this issue for a while but couldn't find it. Would you be able to create a PR fixing this issue?

@biddisco
Copy link
Contributor Author

@n-mam Good work. Very nice to see people chipping in with catches like that one.

@n-mam
Copy link

n-mam commented Aug 14, 2017

@hkaiser @biddisco , I think I should have been more specific earlier. The crash which I fixed was with our code. Nothing to do with this project or library. Prior to the fix, we had one boost::asio::ip::tcp::socket member declared before a boost::asio::io_service member in a simple class(say xyz). Because of this, the dtor of io_service was called first which destroyed the critical section object which the tcp::socket dtor later tried to access; all this while still inside the same dtor call stack for our xyz object.

I saw the call stack originally posted with this issue and it looked similar to what happened with our crash. The only difference with my issue was that the crash callstack showed only the boost::.. ::sockets's dtor at the top. However, the call stack with this issue also has the dtor of io_service being shown. This, with my issue, came up over the non-crashing callstack i.e. the one which delete's the critical section (I had put a memory access breakpoint on the CS address). hence the earlier statement "Although the circumstances might not be the same" If there is a more recent crash dump which you could share then I can have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants