Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition - 1d_stencil_8 - SuperMIC #1395

Closed
parsa opened this issue Mar 5, 2015 · 5 comments
Closed

Race Condition - 1d_stencil_8 - SuperMIC #1395

parsa opened this issue Mar 5, 2015 · 5 comments

Comments

@parsa
Copy link
Contributor

parsa commented Mar 5, 2015

Changeset: 95d87a2
Arguments: --nx 100000 --np 200 -t 20
Configuration: Debug, Release, Boost 1.55.0, 10 nodes, 1 locality per node, SuperMIC
Case 1: Runs forever
Stack Trace 1 - MPI Barrier, plugins/parcelport/mpi/parcelport_mpi.cpp:190

#18 main (argc=9, argv=0x7fffffffc818) at /home/parsa/hpx/repo/examples/1d_stencil/1d_stencil_8.cpp:652 (at 0x000000000048caff)
#17 hpx::init(const boost::program_options::options_description &, int, char **, const std::vector<std::string, std::allocator<std::string> > &, const hpx::util::function_nonser &, const hpx::util::function_nonser &, enum hpx::runtime_mode) (desc_cmdline=..., argc=9, argv=0x7fffffffc818, cfg=std::vector of length 1, capacity 1 = {...}, startup=..., shutdown=..., mode=hpx::runtime_mode_default) at /home/parsa/hpx/repo/hpx/hpx_init_impl.hpp:98 (at 0x00000000006a52d4)
#16 hpx::init(const hpx::util::function_nonser &, const boost::program_options::options_description &, int, char **, const std::vector<std::string, std::allocator<std::string> > &, const hpx::util::function_nonser &, const hpx::util::function_nonser &, enum hpx::runtime_mode) (f=..., desc_cmdline=..., argc=9, argv=0x7fffffffc818, cfg=std::vector of length 1, capacity 1 = {...}, startup=..., shutdown=..., mode=hpx::runtime_mode_default) at /home/parsa/hpx/repo/hpx/hpx_init_impl.hpp:47 (at 0x00000000006a5226)
#15 hpx::detail::run_or_start (f=..., desc_cmdline=..., argc=9, argv=0x7fffffffc818, ini_config=std::vector of length 1, capacity 1 = {...}, startup=..., shutdown=..., mode=hpx::runtime_mode_default, blocking=true) at /home/parsa/hpx/repo/src/hpx_init.cpp:1081 (at 0x00002aaaab004a00)
#14 hpx::detail::run_priority_local (startup=..., shutdown=..., cfg=..., blocking=true) at /home/parsa/hpx/repo/src/hpx_init.cpp:857 (at 0x00002aaaab002008)
#13 hpx::detail::run (rt=..., f=..., vm=..., mode=hpx::runtime_mode_worker, startup=..., shutdown=...) at /home/parsa/hpx/repo/src/hpx_init.cpp:524 (at 0x00002aaaaaffa6ac)
#12 hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo>, hpx::threads::policies::callback_notifier>::run (this=0x2aaab211dc00, func=...) at /home/parsa/hpx/repo/src/runtime_impl.cpp:524 (at 0x00002aaaaaf74b63)
#11 hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo>, hpx::threads::policies::callback_notifier>::stop (this=0x2aaab211dc00, blocking=true) at /home/parsa/hpx/repo/src/runtime_impl.cpp:419 (at 0x00002aaaaaf743bb)
#10 hpx::parcelset::parcelhandler::stop (this=0x2aaab211e468, blocking=true) at /home/parsa/hpx/repo/src/runtime/parcelset/parcelhandler.cpp:301 (at 0x00002aaaab204614)
#9 hpx::parcelset::policies::mpi::parcelport::stop (this=0x2aaab2081a00, blocking=true) at /home/parsa/hpx/repo/plugins/parcelport/mpi/parcelport_mpi.cpp:190 (at 0x00002aaaabea7495)
#8 PMPI_Barrier () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0d99d4b)
#7 MPIR_Barrier_impl () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0d999d7)
#6 MPIR_Barrier_or_coll_fn () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0d998e7)
#5 MPIR_Barrier_intra () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0d99433)
#4 MPIC_Sendrecv_ft () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0df4921)
#3 MPIC_Sendrecv () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0df4623)
#2 MPIC_Wait () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0df3a87)
#1 MPIDI_CH3I_Progress () from /usr/local/packages/python/2.7.7-anaconda/lib/libmpich.so.3 (at 0x00002aaab0dacd7f)
#0 sched_yield () from /lib64/libc.so.6 (at 0x000000303e2cf3a7)

Stack Trace 2 - Other Threads, src/util/io_service_pool.cpp:85:

#13 main (argc=9, argv=0x7fffffffc818) at /home/parsa/hpx/repo/examples/1d_stencil/1d_stencil_8.cpp:652 (at 0x000000000048caff)
#12 hpx::init(const boost::program_options::options_description &, int, char **, const std::vector<std::string, std::allocator<std::string> > &, const hpx::util::function_nonser &, const hpx::util::function_nonser &, enum hpx::runtime_mode) (desc_cmdline=..., argc=9, argv=0x7fffffffc818, cfg=std::vector of length 1, capacity 1 = {...}, startup=..., shutdown=..., mode=hpx::runtime_mode_default) at /home/parsa/hpx/repo/hpx/hpx_init_impl.hpp:98 (at 0x00000000006a52d4)
#11 hpx::init(const hpx::util::function_nonser &, const boost::program_options::options_description &, int, char **, const std::vector<std::string, std::allocator<std::string> > &, const hpx::util::function_nonser &, const hpx::util::function_nonser &, enum hpx::runtime_mode) (f=..., desc_cmdline=..., argc=9, argv=0x7fffffffc818, cfg=std::vector of length 1, capacity 1 = {...}, startup=..., shutdown=..., mode=hpx::runtime_mode_default) at /home/parsa/hpx/repo/hpx/hpx_init_impl.hpp:47 (at 0x00000000006a5226)
#10 hpx::detail::run_or_start (f=..., desc_cmdline=..., argc=9, argv=0x7fffffffc818, ini_config=std::vector of length 1, capacity 1 = {...}, startup=..., shutdown=..., mode=hpx::runtime_mode_default, blocking=true) at /home/parsa/hpx/repo/src/hpx_init.cpp:1081 (at 0x00002aaaab004a00)
#9 hpx::detail::run_priority_local (startup=..., shutdown=..., cfg=..., blocking=true) at /home/parsa/hpx/repo/src/hpx_init.cpp:857 (at 0x00002aaaab002008)
#8 hpx::detail::run (rt=..., f=..., vm=..., mode=hpx::runtime_mode_worker, startup=..., shutdown=...) at /home/parsa/hpx/repo/src/hpx_init.cpp:524 (at 0x00002aaaaaffa6ac)
#7 hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo>, hpx::threads::policies::callback_notifier>::run (this=0x2aaab211dc00, func=...) at /home/parsa/hpx/repo/src/runtime_impl.cpp:523 (at 0x00002aaaaaf74b3d)
#6 hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo>, hpx::threads::policies::callback_notifier>::wait (this=0x2aaab211dc00) at /home/parsa/hpx/repo/src/runtime_impl.cpp:378 (at 0x00002aaaaaf73d03)
#5 hpx::util::io_service_pool::thread_run (this=0x2aaab211de30, index=0) at /home/parsa/hpx/repo/src/util/io_service_pool.cpp:85 (at 0x00002aaaabca0a53)
#4 boost::asio::io_service::run (this=0x2aaab20257d0) at /usr/local/packages/boost/1.55.0/INTEL-14.0.2-python-2.7.7-anaconda/include/boost/asio/impl/io_service.ipp:59 (at 0x00002aaaabca6297)
#3 boost::asio::detail::task_io_service::run (this=0x2aaab2116480, ec=...) at /usr/local/packages/boost/1.55.0/INTEL-14.0.2-python-2.7.7-anaconda/include/boost/asio/detail/impl/task_io_service.ipp:153 (at 0x00002aaaabca6d12)
#2 boost::asio::detail::task_io_service::do_run_one (this=0x2aaab2116480, lock=..., this_thread=..., ec=...) at /usr/local/packages/boost/1.55.0/INTEL-14.0.2-python-2.7.7-anaconda/include/boost/asio/detail/impl/task_io_service.ipp:395 (at 0x00002aaaabca729b)
#1 boost::asio::detail::posix_event::wait (this=0x7fffffffb130, lock=...) at /usr/local/packages/boost/1.55.0/INTEL-14.0.2-python-2.7.7-anaconda/include/boost/asio/detail/posix_event.hpp:80 (at 0x00002aaaabca6551)
#0 pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (at 0x000000303e60b43c)

Case 2: result has already been stored for this promise: HPX(promise_already_satisfied), assertion 'NULL != self_.get()' failed: HPX(assertion_failure)
Stack Trace - src/exception.cpp:391:

#8 ?? () (at 0x0000000000000000)
#7 hpx::util::coroutines::detail::lx::trampoline (fun=0x2aaabe4342a0) at /home/parsa/hpx/repo/hpx/util/coroutine/detail/context_linux_x86.hpp:89 (at 0x00002aaaaafd92a0)
#6 hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::decay<std::remove_reference<hpx::threads::thread_function_type&>::type>::type, hpx::threads::coroutine_type, hpx::util::coroutines::detail::default_context_impl, hpx::threads::detail::coroutine_allocator>::operator() (this=0x2aaabe4342a0) at /home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl.hpp:324 (at 0x00002aaaaaf9bf9b)
#5 hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::decay<std::remove_reference<hpx::threads::thread_function_type&>::type>::type, hpx::threads::coroutine_type, hpx::util::coroutines::detail::default_context_impl, hpx::threads::detail::coroutine_allocator>::reset_self_on_exit::~reset_self_on_exit (this=0x2aaac7b55f38) at /home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl.hpp:371 (at 0x00002aaaaaf9c34b)
#4 hpx::util::coroutines::detail::coroutine_impl<hpx::threads::coroutine_type, hpx::util::coroutines::detail::default_context_impl, hpx::threads::detail::coroutine_allocator>::set_self (self=0x0) at /home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl_impl.hpp:19 (at 0x00002aaaab6b087f)
#3 hpx::assertion_failed (expr=0x2aaaac215e54 "NULL != self_.get()", function=0x2aaaac20caa0 "static void hpx::util::coroutines::detail::coroutine_impl<CoroutineType, ContextImpl, Heap>::set_self(hpx::util::coroutines::detail::coroutine_self<CoroutineType> *) [with CoroutineType = hpx::util::c"..., file=0x2aaaac20ca40 "/home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl_impl.hpp", line=19) at /home/parsa/hpx/repo/hpx/exception.hpp:1587 (at 0x00002aaaaafdc169)
#2 hpx::detail::assertion_failed (expr=0x2aaaac215e54 "NULL != self_.get()", function=0x2aaaac20caa0 "static void hpx::util::coroutines::detail::coroutine_impl<CoroutineType, ContextImpl, Heap>::set_self(hpx::util::coroutines::detail::coroutine_self<CoroutineType> *) [with CoroutineType = hpx::util::c"..., file=0x2aaaac20ca40 "/home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl_impl.hpp", line=19) at /home/parsa/hpx/repo/src/exception.cpp:345 (at 0x00002aaaab0b2246)
#1 hpx::detail::assertion_failed_msg (msg=0x2aaaac215e54 "NULL != self_.get()", expr=0x2aaaac215e54 "NULL != self_.get()", function=0x2aaaac20caa0 "static void hpx::util::coroutines::detail::coroutine_impl<CoroutineType, ContextImpl, Heap>::set_self(hpx::util::coroutines::detail::coroutine_self<CoroutineType> *) [with CoroutineType = hpx::util::c"..., file=0x2aaaac20ca40 "/home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl_impl.hpp", line=19) at /home/parsa/hpx/repo/src/exception.cpp:391 (at 0x00002aaaab0b31df)
#0 abort () from /lib64/libc.so.6 (at 0x00000035fd433f10)

Log:

{stack-trace}: 4 frames:
0x2aaaab09f71d  : ??? + 0x2aaaab09f71d in /work/parsa/build/inteldebug/lib/libhpxd.so.0
0x2aaaab1084eb  : ??? + 0x2aaaab1084eb in /work/parsa/build/inteldebug/lib/libhpxd.so.0
0x2aaaab0b1c6f  : hpx::detail::backtrace(unsigned long) + 0x23 in /work/parsa/build/inteldebug/lib/libhpxd.so.0
0x2aaaab126a22  : boost::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::string const&, std::string const&, long, std::string const&) + 0xd2 in /work/parsa/build/inteldebug/lib/libhpxd.so.0
{locality-id}: 5
{hostname}: [ (mpi:5) ]
{process-id}: 7794
{function}: promise_base<R>::set_result
{file}: /home/parsa/hpx/repo/hpx/lcos/local/promise.hpp
{line}: 120
{os-thread}: 12, worker-thread#12
{thread-id}: 00002aaac2022820
{thread-description}: N14stepper_server17from_right_actionE
{state}: running
{auxinfo}: 
{version}: V0.9.10-rc1 (AGAS: V3.0), Git: 95d87a21fc33f5f9c4547e943d8766ad607a76c5
{boost}: V1.55.0
{build-type}: debug
{date}: Mar  5 2015 20:04:09
{platform}: linux
{compiler}: Intel C++ C++0x mode version 1400
{stdlib}: GNU libstdc++ version 20120313
{what}: result has already been stored for this promise: HPX(promise_already_satisfied)

Runtime is not available, reporting error locally. 
{stack-trace}: 4 frames:
0x2aaaab09f71d  : ??? + 0x2aaaab09f71d in /work/parsa/build/inteldebug/lib/libhpxd.so.0
0x2aaaab1084eb  : ??? + 0x2aaaab1084eb in /work/parsa/build/inteldebug/lib/libhpxd.so.0
0x2aaaab0b1c6f  : hpx::detail::backtrace(unsigned long) + 0x23 in /work/parsa/build/inteldebug/lib/libhpxd.so.0
0x2aaaab126a22  : boost::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::string const&, std::string const&, long, std::string const&) + 0xd2 in /work/parsa/build/inteldebug/lib/libhpxd.so.0
{locality-id}: 4294967295
{process-id}: 7794
{function}: static void hpx::util::coroutines::detail::coroutine_impl<CoroutineType, ContextImpl, Heap>::set_self(hpx::util::coroutines::detail::coroutine_self<CoroutineType> *) [with CoroutineType = hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, ContextImpl = hpx::util::coroutines::detail::lx::x86_linux_context_impl, Heap = hpx::threads::detail::coroutine_allocator]
{file}: /home/parsa/hpx/repo/hpx/util/coroutine/detail/coroutine_impl_impl.hpp
{line}: 19
{os-thread}: <unknown>
{state}: not running
{auxinfo}: 
{version}: V0.9.10-rc1 (AGAS: V3.0), Git: 95d87a21fc33f5f9c4547e943d8766ad607a76c5
{boost}: V1.55.0
{build-type}: debug
{date}: Mar  5 2015 20:04:09
{platform}: linux
{compiler}: Intel C++ C++0x mode version 1400
{stdlib}: GNU libstdc++ version 20120313
{what}: assertion 'NULL != self_.get()' failed: HPX(assertion_failure)
@hkaiser
Copy link
Member

hkaiser commented Mar 6, 2015

Parsa said that this does not happen when running RelWithDebInfo or Debug.

@parsa
Copy link
Contributor Author

parsa commented Mar 6, 2015

Update: It does happen with debug.

@parsa parsa changed the title "result has already been stored for this promise" - 1d_stencil_8 - SuperMIC Race Condition - 1d_stencil_8 - SuperMIC Mar 6, 2015
sithhell added a commit that referenced this issue Mar 6, 2015
sithhell added a commit that referenced this issue Mar 6, 2015
@hkaiser
Copy link
Member

hkaiser commented Mar 6, 2015

Is this resolved now?

@sithhell
Copy link
Member

sithhell commented Mar 6, 2015 via email

@parsa
Copy link
Contributor Author

parsa commented Mar 6, 2015

Fixed.

@parsa parsa closed this as completed Mar 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants