Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS X: assertion 'pu_offset_ < hardware_concurrency' failed #944

Closed
eschnett opened this issue Oct 10, 2013 · 2 comments

Comments

Projects
None yet
3 participants
@eschnett
Copy link
Contributor

commented Oct 10, 2013

I run with the command described below, and receive the error message described below. Note that this is OS X where hwloc is disabled, and I am not using options to set the PU offset.

openmpirun -x HPX_HAVE_PARCELPORT_TCPIP=0 -np 2 ./bin/block_matrix --hpx:threads=2 --hpx:debug-hpx-log=block_matrix.log
{stack-trace}: 19 frames:
0x110a5b88d     : hpx::util::backtrace::backtrace(unsigned long) + 0x81 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a5b966     : hpx::util::trace_on_new_stack() + 0x1e in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a55715     : hpx::detail::backtrace() + 0x18 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a58232     : boost::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::string const&, std::string const&, long) + 0x46 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a58591     : void hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, std::string const&, std::string const&, long) + 0x38 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a55cef     : hpx::detail::assertion_failed_msg(char const*, char const*, char const*, char const*, long) + 0x238 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a55ab7     : hpx::detail::assertion_failed_msg(char const*, char const*, char const*, char const*, long) + 0x0 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x10edd702c     : itt_sync_create(void*, char const*, char const*) + 0x0 in /Users/eschnett/src/block-matrix/./bin/block_matrix
0x110f0b419     : hpx::threads::policies::detail::affinity_data::get_pu_num(unsigned long, unsigned long) const + 0x41 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110f0ad28     : hpx::threads::policies::detail::affinity_data::init_cached_pu_nums(unsigned long) + 0x66 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110f0af6a     : hpx::threads::policies::detail::affinity_data::init(hpx::threads::policies::init_affinity_data const&) + 0x66 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110f29864     : hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>::init(hpx::threads::policies::init_affinity_data const&) + 0x2a in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110f1d64c     : hpx::threads::threadmanager_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>::init(hpx::threads::policies::init_affinity_data const&) + 0x2a in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110af1e8b     : hpx::runtime_impl<hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>, hpx::threads::policies::callback_notifier>::runtime_impl(hpx::util::runtime_configuration const&, hpx::runtime_mode, unsigned long, hpx::threads::policies::local_priority_queue_scheduler<boost::mutex>::init_parameter const&, hpx::threads::policies::init_affinity_data const&) + 0xb65 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a9982d     : hpx::detail::run_priority_local(hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::util::command_line_handling&, bool) + 0x26b in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a99ec2     : hpx::run_or_start(int (*)(boost::program_options::variables_map&), boost::program_options::options_description const&, int, char**, std::vector<std::string, std::allocator<std::string> > const&, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::runtime_mode, bool) + 0x475 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x110a9a849     : hpx::init(int (*)(boost::program_options::variables_map&), boost::program_options::options_description const&, int, char**, std::vector<std::string, std::allocator<std::string> > const&, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::runtime_mode) + 0x55 in /Users/eschnett/hpx/lib/hpx/libhpxd.0.dylib
0x10ef705f9     : hpx::init(int (*)(boost::program_options::variables_map&), boost::program_options::options_description const&, int, char**, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::runtime_mode) + 0x5a in /Users/eschnett/src/block-matrix/./bin/block_matrix
0x10ef7067b     : hpx::init(boost::program_options::options_description const&, int, char**, hpx::util::function_nonser<void ()> const&, hpx::util::function_nonser<void ()> const&, hpx::runtime_mode) + 0x4d in /Users/eschnett/src/block-matrix/./bin/block_matrix
{env}: 75 entries:
  Apple_PubSub_Socket_Render=/tmp/launch-oP3rRo/Render
  Apple_Ubiquity_Message=/tmp/launch-lyBiiK/Apple_Ubiquity_Message
  BERKELEY_UPC_DIR=/Users/eschnett/berkeley_upc-2.10.0
  BOOST_DIR=/Users/eschnett/boost_1_54_0
  COG_INSTALL_PATH=/Users/eschnett/src/cog-jglobus-1.4
  COMMAND_MODE=unix2003
  CVS_RSH=ssh
  DISPLAY=/tmp/launch-EylMwd/org.macosforge.xquartz:0
  DYLD_LIBRARY_PATH=/Users/eschnett/gt4.0.8/lib:/usr/local/cuda/lib::/Users/eschnett/saga-cpp-1.4/lib:/Users/eschnett/src/sac2c-1.00-beta-darwin-i386/sac2c/lib:/Users/eschnett/hpx/lib/hpx:/Users/eschnett/boost_1_54_0/lib:/Users/eschnett/hpx/lib/hpx
  EDITOR=/Applications/Emacs.app/Contents/MacOS/bin/emacsclient
  FORTRESS_HOME=/Users/eschnett/src/PFC
  GLOBUS_LOCATION=/Users/eschnett/gt4.0.8
  GLOBUS_PATH=/Users/eschnett/gt4.0.8
  GPG_AGENT_INFO=/tmp/gpg-gZQXG7/S.gpg-agent:412:1
  HARC_INSTALL_PATH=/Users/eschnett/src/cct-harc-1.9.2
  HISTCONTROL=ignoredups
  HISTSIZE=5000
  HOME=/Users/eschnett
  HPX_DIR=/Users/eschnett/hpx
  HPX_HAVE_PARCELPORT_TCPIP=0
  LANG=en_US.UTF-8
  LD_LIBRARY_PATH=/Users/eschnett/gt4.0.8/lib
  LIBPATH=/Users/eschnett/gt4.0.8/lib:/usr/lib:/lib
  LOGNAME=eschnett
  MANPATH=/Users/eschnett/gt4.0.8/man:/Users/eschnett/man:/opt/local/man:
  OMPI_APP_CTX_NUM_PROCS=2
  OMPI_ARGV=--hpx:threads=2 --hpx:debug-hpx-log=block_matrix.log
  OMPI_COMMAND=block_matrix
  OMPI_COMM_WORLD_LOCAL_RANK=1
  OMPI_COMM_WORLD_LOCAL_SIZE=2
  OMPI_COMM_WORLD_NODE_RANK=1
  OMPI_COMM_WORLD_RANK=1
  OMPI_COMM_WORLD_SIZE=2
  OMPI_FIRST_RANKS=0
  OMPI_MCA_ess=env
  OMPI_MCA_initial_wdir=/Users/eschnett/src/block-matrix
  OMPI_MCA_mpi_yield_when_idle=1
  OMPI_MCA_orte_app_num=0
  OMPI_MCA_orte_ess_jobid=1606287361
  OMPI_MCA_orte_ess_node_rank=1
  OMPI_MCA_orte_ess_num_procs=2
  OMPI_MCA_orte_ess_vpid=1
  OMPI_MCA_orte_hnp_uri=1606287360.0;tcp://10.10.164.189:52824
  OMPI_MCA_orte_local_daemon_uri=1606287360.0;tcp://10.10.164.189:52824
  OMPI_MCA_orte_num_nodes=1
  OMPI_MCA_orte_num_restarts=0
  OMPI_MCA_orte_peer_fini_barrier_id=2
  OMPI_MCA_orte_peer_init_barrier_id=1
  OMPI_MCA_orte_peer_modex_id=0
  OMPI_MCA_orte_precondition_transports=088e4b9dd77a2d7a-a7651fd9a3a25fea
  OMPI_MCA_orte_tmpdir_base=/var/folders/gl/zvl8d6415vsbkd50nnll95k40000gs/T
  OMPI_MCA_shmem_RUNTIME_QUERY_hint=mmap
  OMPI_NUM_APP_CTX=1
  OMPI_UNIVERSE_SIZE=1
  OPAL_OUTPUT_STDERR_FD=21
  PATH=/Users/eschnett/src/scripted/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/Users/eschnett/Library/Haskell/bin:/Users/eschnett/gt4.0.8/bin:/Users/eschnett/gt4.0.8/sbin:/Developer/CUDA/bin/darwin/release:/usr/local/cuda/bin:/usr/local/visit/bin:/Users/eschnett/bin:/opt/local/bin:/opt/local/sbin:/usr/X11R6/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/X11/bin:/usr/local/MacGPG2/bin:/Users/eschnett/src/PFC/bin:/Users/eschnett/src/pcommands-2.0/bin:/Users/eschnett/berkeley_upc-2.10.0/bin:/Users/eschnett/src/sac2c-1.00-beta-darwin-i386/sac2c/bin:/Users/eschnett/src/git-hg/bin
  PWD=/Users/eschnett/src/block-matrix
  QTDIR=/opt/local/libexec/qt4-mac
  SAC2CBASE=/Users/eschnett/src/sac2c-1.00-beta-darwin-i386/sac2c
  SACBASE=/Users/eschnett/src/sac2c-1.00-beta-darwin-i386
  SAGA_DIR=/Users/eschnett/saga-cpp-1.4
  SECURITYSESSIONID=186a5
  SHELL=/bin/bash
  SHLIB_PATH=/Users/eschnett/gt4.0.8/lib
  SHLVL=1
  SSH_AGENT_PID=317
  SSH_AUTH_SOCK=/var/folders/gl/zvl8d6415vsbkd50nnll95k40000gs/T//ssh-CvKK8muKoU7c/agent.313
  TERM=xterm
  TERM_PROGRAM=Apple_Terminal
  TERM_PROGRAM_VERSION=309
  TERM_SESSION_ID=79348984-28F2-4F84-BA05-89D54E3D15B5
  TMPDIR=/var/folders/gl/zvl8d6415vsbkd50nnll95k40000gs/T/
  USER=eschnett
  _=/opt/local/bin/openmpirun
  __CF_USER_TEXT_ENCODING=0x1F9:0:82
{locality-id}: 1
{process-id}: 29359
{function}: std::size_t hpx::threads::policies::detail::affinity_data::get_pu_num(std::size_t, std::size_t) const
{file}: /Users/eschnett/src/hpx/src/runtime/threads/policies/affinity_data.cpp
{line}: 135
{os-thread}: <unknown>
{config}:
  HPX_HAVE_NATIVE_TLS=OFF
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=OFF
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_SHMEM=OFF
  HPX_HAVE_PARCELPORT_IBVERBS=OFF
  HPX_HAVE_VERIFY_LOCKS=ON
  HPX_HAVE_HWLOC=OFF
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_LIMIT=5
  HPX_ACTION_ARGUMENT_LIMIT=4
  HPX_COMPONENT_CREATE_ARGUMENT_LIMIT=5
  HPX_FUNCTION_ARGUMENT_LIMIT=7
  HPX_LOCK_LIMIT=5
  HPX_TUPLE_LIMIT=10
  HPX_WAIT_ARGUMENT_LIMIT=5
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
  HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
  HPX_PREFIX=/Users/eschnett/hpx
{version}: V0.9.7-trunk (AGAS: V3.0), Git: abf38bac97ee61bbb09bf3fee5cbba93c966f869
{boost}: V1.54.0
{build-type}: debug
{date}: Oct 10 2013 16:06:38
{platform}: Mac OS
{compiler}: GNU C++ version 4.8.1
{stdlib}: GNU libstdc++ version 20130531
{what}: assertion 'pu_offset_ < hardware_concurrency' failed: HPX(assertion_failure)

[Redshift:29359] *** Process received signal ***
[Redshift:29359] Signal: Abort trap: 6 (6)
[Redshift:29359] Signal code:  (0)
[Redshift:29359] [ 0] 2   libsystem_c.dylib                   0x00007fff8db5c90a _sigtramp + 26
[Redshift:29359] [ 1] 3   ???                                 0x0000000116b52e00 0x0 + 4675939840
[Redshift:29359] [ 2] 4   libsystem_c.dylib                   0x00007fff8dbb3f61 abort + 143
[Redshift:29359] [ 3] 5   libhpxd.0.dylib                     0x0000000110a55f9f _ZN3hpx6detail30report_exception_and_terminateERKN5boost13exception_ptrE + 89
[Redshift:29359] [ 4] 6   libhpxd.0.dylib                     0x0000000110af3d76 _ZN3hpx12runtime_implINS_7threads8policies30local_priority_queue_schedulerIN5boost5mutexEEENS2_17callback_notifierEE12report_errorEmRKNS4_13exception_ptrE + 76
[Redshift:29359] [ 5] 7   libhpxd.0.dylib                     0x0000000110af3eb9 _ZN3hpx12runtime_implINS_7threads8policies30local_priority_queue_schedulerIN5boost5mutexEEENS2_17callback_notifierEE12report_errorERKNS4_13exception_ptrE + 61
[Redshift:29359] [ 6] 8   libhpxd.0.dylib                     0x0000000110a55ea3 _ZN3hpx6detail20assertion_failed_msgEPKcS2_S2_S2_l + 1004
[Redshift:29359] [ 7] 9   libhpxd.0.dylib                     0x0000000110a55ab7 _ZN3hpx6detail20assertion_failed_msgEPKcS2_S2_S2_l + 0
[Redshift:29359] [ 8] 10  block_matrix                        0x000000010edd702c _Z15itt_sync_createPvPKcS1_ + 0
[Redshift:29359] [ 9] 11  libhpxd.0.dylib                     0x0000000110f0b419 _ZNK3hpx7threads8policies6detail13affinity_data10get_pu_numEmm + 65
[Redshift:29359] [10] 12  libhpxd.0.dylib                     0x0000000110f0ad28 _ZN3hpx7threads8policies6detail13affinity_data19init_cached_pu_numsEm + 102
[Redshift:29359] [11] 13  libhpxd.0.dylib                     0x0000000110f0af6a _ZN3hpx7threads8policies6detail13affinity_data4initERKNS1_18init_affinity_dataE + 102
[Redshift:29359] [12] 14  libhpxd.0.dylib                     0x0000000110f29864 _ZN3hpx7threads8policies30local_priority_queue_schedulerIN5boost5mutexEE4initERKNS1_18init_affinity_dataE + 42
[Redshift:29359] [13] 15  libhpxd.0.dylib                     0x0000000110f1d64c _ZN3hpx7threads18threadmanager_implINS0_8policies30local_priority_queue_schedulerIN5boost5mutexEEENS2_17callback_notifierEE4initERKNS2_18init_affinity_dataE + 42
[Redshift:29359] [14] 16  libhpxd.0.dylib                     0x0000000110af1e8b _ZN3hpx12runtime_implINS_7threads8policies30local_priority_queue_schedulerIN5boost5mutexEEENS2_17callback_notifierEEC1ERKNS_4util21runtime_configurationENS_12runtime_modeEmRKNS6_14init_parameterERKNS2_18init_affinity_dataE + 2917
[Redshift:29359] [15] 17  libhpxd.0.dylib                     0x0000000110a9982d _ZN3hpx6detail18run_priority_localERKNS_4util15function_nonserIFvvEEES6_RNS1_21command_line_handlingEb + 619
[Redshift:29359] [16] 18  libhpxd.0.dylib                     0x0000000110a99ec2 _ZN3hpx12run_or_startEPFiRN5boost15program_options13variables_mapEERKNS1_19options_descriptionEiPPcRKSt6vectorISsSaISsEERKNS_4util15function_nonserIFvvEEESL_NS_12runtime_modeEb + 1141
[Redshift:29359] [17] 19  libhpxd.0.dylib                     0x0000000110a9a849 _ZN3hpx4initEPFiRN5boost15program_options13variables_mapEERKNS1_19options_descriptionEiPPcRKSt6vectorISsSaISsEERKNS_4util15function_nonserIFvvEEESL_NS_12runtime_modeE + 85
[Redshift:29359] [18] 20  block_matrix                        0x000000010ef705f9 _ZN3hpx4initEPFiRN5boost15program_options13variables_mapEERKNS1_19options_descriptionEiPPcRKNS_4util15function_nonserIFvvEEESG_NS_12runtime_modeE + 90
[Redshift:29359] [19] 21  block_matrix                        0x000000010ef7067b _ZN3hpx4initERKN5boost15program_options19options_descriptionEiPPcRKNS_4util15function_nonserIFvvEEESC_NS_12runtime_modeE + 77
[Redshift:29359] [20] 22  block_matrix                        0x000000010ef6d556 main + 202
[Redshift:29359] [21] 23  libdyld.dylib                       0x00007fff96e087e1 start + 0
[Redshift:29359] [22] 24  ???                                 0x0000000000000003 0x0 + 3
@hkaiser

This comment has been minimized.

Copy link
Member

commented Oct 10, 2013

This is related to the halfway implemented #421: Support multiple HPX instances per node in a batch environment like PBS or SLURM. We changed things such that each locality receives a pu-offset while registering. This offset is determined by the number of localities running on the same node, Without HWLOC we can not do any sensible detection of the underlying hardware, which causes the current scheme to fail. For now, try adding --hpx:cores=N to the command line, where N is the number of cores each locality should occupy. While this will not influence any affinity settings (as OSX does not support this), it might resolve the issue for the time being.

@ghost ghost assigned sithhell Oct 10, 2013

@eschnett

This comment has been minimized.

Copy link
Contributor Author

commented Oct 10, 2013

Settin --hpx:cores=4 does not help.

openmpirun -x HPX_HAVE_PARCELPORT_TCPIP=0 -np 2 ./bin/block_matrix --hpx:cores=4 --hpx:threads=2 --hpx:debug-hpx-log=block_matrix.log

@hkaiser hkaiser closed this in db0b46c Oct 11, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.