Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--hpx:nodes=cat $PBS_NODEFILE works; --hpx:nodefile=$PBS_NODEFILE does not. #723

Closed
maeneas opened this issue Feb 22, 2013 · 3 comments
Closed

Comments

@maeneas
Copy link
Contributor

maeneas commented Feb 22, 2013

58b0a27
boost 1.53.0
gcc 4.6.3
hdf5 1.8.10
hwloc 1.6.1
gperftools 2.0
libunwind 0.99

When running gtcx in distributed:
this fails:
pbsdsh -v -u /home/manderson/hpx/bin/gtcx_client --hpx:nodefile=$PBS_NODEFILE --os_factor 16

Gives the error:
{what}: no console locality registered: HPX(network_error)

however, this works:
pbsdsh -v -u /home/manderson/hpx/bin/gtcx_client --hpx:nodes=cat $PBS_NODEFILE --os_factor 16

For large scale runs, we probably don't want to have to cat the nodefile so this really needs to be fixed at some point.

Here's the stack trace:

{stack-trace}: 9 frames:
0x7f3d3ba683a1 : hpx::detail::backtrace() + 0x61 in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3baa3946 : boost::exception_ptr hpx::detail::get_exceptionhpx::exception(hpx::exception const&, std::string const&, std::string const&, long) + 0x46 in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3baa3c0a : void hpx::detail::throw_exceptionhpx::exception(hpx::exception const&, std::string const&, std::string const&, long) + 0x1a in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3bb01ea0 : hpx::pre_main(hpx::runtime_mode) + 0x1a80 in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3babe297 : hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler, hpx::threads::policies::callback_notifier>::run_helper(hpx::util::function_nonser<int ()>, int&) + 0xc7 in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3bab2fe2 : hpx::util::detail::vtable::typeboost::_bi::bind_t<hpx::threads::detail::tagged_thread_state<hpx::threads::thread_state_enum, boost::mfi::mf2hpx::threads::detail::tagged_thread_state<hpx::threads::thread_state_enum, hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler, hpx::threads::policies::callback_notifier>, hpx::util::function_nonser<int ()>, int&>, boost::bi::list3<boost::bi::value<hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler, hpx::threads::policies::callback_notifier>>, boost::bi::value<hpx::util::function_nonser<int ()> >, boost::reference_wrapper > >, hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), void, void>::invoke(void, hpx::threads::thread_state_ex_enum) + 0xc2 in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3bd066db : hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator>::operator()() + 0x16b in /home/manderson/hpx/lib/hpx/libhpx.so.1
0x7f3d3bcf26e9 : void hpx::util::coroutines::detail::lx::trampoline<hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator> >(hpx::util::coroutines::detail::coroutine_impl_wrapper<hpx::util::function_nonser<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum)>, hpx::util::coroutines::coroutine<hpx::threads::thread_state_enum (hpx::threads::thread_state_ex_enum), hpx::threads::detail::coroutine_allocator, hpx::util::coroutines::detail::lx::x86_linux_context_impl>, hpx::util::coroutines::detail::lx::x86_linux_context_impl, hpx::threads::detail::coroutine_allocator>
) + 0x9 in /home/manderson/hpx/lib/hpx/libhpx.so.1
{env}: 59 entries:

@hkaiser
Copy link
Member

hkaiser commented Feb 22, 2013

PBS creates the file referred to by $PBS_NODEFILE on node zero only.Thus --hpx:nodefile=$PBS_NODEFILE will not work. You will need to make this file available to all nodes yourself:

#!/bin/bash
#
#PBS -l nodes=2:ppn=4

APP_PATH=~/packages/hpx/bin/hello_world
APP_OPTIONS=

cp $PBS_NODEFILE /scratch/mynodefile
pbsdsh -u $APP_PATH $APP_OPTIONS --hpx:nodefile=/scratch/mynodefile

which assumes that /scratch/mynodefile is accessible from all nodes.

I will add some better error handling to give a meaningful message for your use case.

@ghost ghost assigned hkaiser Feb 22, 2013
@hkaiser
Copy link
Member

hkaiser commented Feb 22, 2013

Matt, also, could you give me the full error output, please? The stack alone is not really helpful in this context. I would like to know at least the error message, and the file name and line number of the point of error.

hkaiser added a commit that referenced this issue Feb 22, 2013
… --hpx:nodes=`cat $PBS_NODEFILE` works; --hpx:nodefile=$PBS_NODEFILE does not.)
@hkaiser
Copy link
Member

hkaiser commented Mar 4, 2013

Matt, is this solved now? Can I close this?

@hkaiser hkaiser closed this as completed Mar 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants