New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed run of 1d_stencil_8 uses less threads than spec. & sometimes gives errors #1230
Comments
Which error? How to reproduce it? |
I put details of exactly what I did it may take several tries to get the error but the I'mcorrect number of threads always occurred |
Sorry for being dense... but what 'number of threads'? I don't see any thread numbers in the original ticket above. |
Maybe I'm doing something wrong but I assume that this would give me 128 OS threads and the output says 120. then ran: It took 5 or 6 tries to get the error this morning |
sorry I did not specify the number of threads but took the default, however using -t16 does the same thing |
You shouldn't need to specify -t16 if you use srun with -c16. |
The inconsistent number of threads reported should be fixed by 7f7ed0c (currently on the branch fixing_1230). I have the suspicion that one (or more) of the marvin nodes have hyper-threading enabled, while others have not. I will ask people to investigate this. |
This is strange it works for mpi parcel port but when I build for tcp port using the above example gives me this error hpx::init: std::exception caught: mpi_environment::init: HPX is not compiled for MPI, but 'hpx.parcel.mpi.enable=1'. Please set HPX_HAVE_PARCELPORT_MPI=ON while configuring using cmake. this doesn't make sense I was trying to run using the default not mpi, |
{env}: 70 entries: |
1 similar comment
{env}: 70 entries: |
Error with the fixing_1230 branch: To reproduce:
Then
|
That latest issue should be solved by 29dd95e (still on the branch). |
Previous one's fixed.
|
This is a separate issue. Please create a new ticket for this. |
This error occurred using 8 nodes allocating 2 localities per node as well
Allocating 2 nodes works fine
Allocating 4 nodes 1 locality per node give correct number of threads but gives error on termination also
to allocated nodes: srun -p marvin -N 8 -n 8 -c 16 --pty /bin/bash -l
then ran:
--nx 10000 --np 1000
Localities,OS_Threads,Execution_Time_sec,Points_per_Partition,Partitions,Time_Steps
8, 120, 1.397280098, 10000, 1000, 45
ran again
srun bin/1d_stencil_8 --nx 10000 --np 1000
Localities,OS_Threads,Execution_Time_sec,Points_per_Partition,Partitions,Time_Steps
8, 120, 1.401265766, 10000, 1000, 45
{env}: 69 entries:
CPATH=/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/ipp/include:/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/tbb/include:/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/ipp/include:/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/tbb/include
GDBSERVER_MIC=/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/target/mic/bin/gdbserver
GDB_CROSS=/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64_mic/py27/bin/gdb-mic
GLIBCPP_FORCE_NEW=1
GLIBCXX_FORCE_NEW=1
HOME=/home/pagrubel
IDB_HOME=/opt/intel/composer_xe_2013_sp1.2.144/bin/intel64
INCLUDE=/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/mkl/include:/opt/intel/composer_xe_2013_sp1.2.144/mkl/include
INTEL_LICENSE_FILE=/opt/intel/composer_xe_2013_sp1.2.144/licenses:/opt/intel/licenses:/home/pagrubel/intel/licenses:/opt/intel/composer_xe_2013_sp1.2.144/licenses:/opt/intel/licenses:/home/pagrubel/intel/licenses
IPPROOT=/opt/intel/composer_xe_2013_sp1.2.144/ipp
LANG=en_US.UTF-8
LD_LIBRARY_PATH=/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/mpirt/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/../compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/intel64/gcc4.4:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/mpirt/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/../compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/intel64/gcc4.4
LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/../compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/intel64/gcc4.4:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/../compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/ipp/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/intel64/gcc4.4
LOGNAME=pagrubel
MAIL=/var/mail/pagrubel
MANPATH=/opt/intel/composer_xe_2013_sp1.2.144/man/en_US:/opt/intel/composer_xe_2013_sp1.2.144/man/en_US:/opt/intel/composer_xe_2013_sp1.2.144/man/en_US:/opt/intel/composer_xe_2013_sp1.2.144/man/en_US:/opt/intel/composer_xe_2013_sp1.2.144/man/en_US:/opt/intel/composer_xe_2013_sp1.2.144/man/en_US:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64_mic/py27/share/man:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64/py27/share/man:/usr/local/man:/usr/local/share/man:/usr/share/man::/opt/intel/vtune_amplifier_xe_2013/man::/opt/intel/vtune_amplifier_xe_2013/man
MIC_LD_LIBRARY_PATH=/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/mpirt/lib/mic:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/mic:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/mpirt/lib/mic:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/mic
MIC_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/mic:/opt/intel/composer_xe_2013_sp1.2.144/tbb/lib/mic
MKLROOT=/opt/intel/composer_xe_2013_sp1.2.144/mkl
NLSPATH=/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/ipp/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64_mic/py27/share/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64/py27/share/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/debugger/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/ipp/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64_mic/py27/share/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64/py27/share/locale/%l_%t/%N:/opt/intel/composer_xe_2013_sp1.2.144/debugger/intel64/locale/%l_%t/%N
PATH=/opt/intel/vtune_amplifier_xe_2013/bin64:/opt/intel/composer_xe_2013_sp1.2.144/bin/intel64:/opt/intel/composer_xe_2013_sp1.2.144/mpirt/bin/intel64:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64_mic/py27/bin:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gdb/intel64/py27/bin:/opt/intel/composer_xe_2013_sp1.2.144/bin/intel64:/opt/intel/composer_xe_2013_sp1.2.144/bin/intel64_mic:/opt/intel/composer_xe_2013_sp1.2.144/debugger/gui/intel64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/pagrubel/mic_linux/bin:/opt/papi/5.0.1-release/bin
PWD=/home/pagrubel/build/hpx_buildrelease
SHELL=/bin/bash
SHLVL=2
SLURMD_NODENAME=marvin00
SLURM_CHECKPOINT_IMAGE_DIR=/home/pagrubel/build/hpx_buildrelease
SLURM_CPUS_ON_NODE=16
SLURM_CPUS_PER_TASK=16
SLURM_DISTRIBUTION=cyclic
SLURM_GTIDS=0
SLURM_JOBID=156167
SLURM_JOB_CPUS_PER_NODE=16(x8)
SLURM_JOB_ID=156167
SLURM_JOB_NAME=/bin/bash
SLURM_LAUNCH_NODE_IPADDR=10.1.1.33
SLURM_LOCALID=0
SLURM_NNODES=8
SLURM_NODEID=0
SLURM_NODELIST=marvin[00-07]
SLURM_NPROCS=8
SLURM_NTASKS=8
SLURM_PRIO_PROCESS=0
SLURM_PROCID=0
SLURM_PTY_PORT=57449
SLURM_PTY_WIN_COL=97
SLURM_PTY_WIN_ROW=30
SLURM_SRUN_COMM_HOST=10.1.1.33
SLURM_SRUN_COMM_PORT=35629
SLURM_STEPID=1
SLURM_STEP_ID=1
SLURM_STEP_LAUNCHER_PORT=35629
SLURM_STEP_NODELIST=marvin[00-07]
SLURM_STEP_NUM_NODES=8
SLURM_STEP_NUM_TASKS=8
SLURM_STEP_TASKS_PER_NODE=1(x8)
SLURM_SUBMIT_DIR=/home/pagrubel/build/hpx_buildrelease
SLURM_TASKS_PER_NODE=1(x8)
SLURM_TASK_PID=21805
SLURM_TOPOLOGY_ADDR=marvin00
SLURM_TOPOLOGY_ADDR_PATTERN=node
SSH_CLIENT=128.123.131.182 34859 22
SSH_CONNECTION=128.123.131.182 34859 10.1.1.11 22
SSH_TTY=/dev/pts/0
TBBROOT=/opt/intel/composer_xe_2013_sp1.2.144/tbb
TERM=xterm
TMPDIR=/tmp
USER=pagrubel
VTUNE_AMPLIFIER_XE_2013_DIR=/opt/intel/vtune_amplifier_xe_2013
_=/usr/bin/srun
{locality-id}: 0
{hostname}: 10.1.1.33:7910
{process-id}: 21805
{function}: parcelhandler::default_write_handler
{file}: /home/pagrubel/hpx/src/runtime/parcelset/parcelhandler.cpp
{line}: 668
{os-thread}: parcel-thread-tcp#1
{config}:
HPX_HAVE_NATIVE_TLS=ON
HPX_HAVE_STACKTRACES=ON
HPX_HAVE_COMPRESSION_BZIP2=OFF
HPX_HAVE_COMPRESSION_SNAPPY=OFF
HPX_HAVE_COMPRESSION_ZLIB=OFF
HPX_HAVE_PARCEL_COALESCING=ON
HPX_HAVE_PARCELPORT_IPC=OFF
HPX_HAVE_PARCELPORT_IBVERBS=OFF
HPX_HAVE_VERIFY_LOCKS=OFF
HPX_HAVE_HWLOC=ON
HPX_HAVE_ITTNOTIFY=OFF
HPX_LIMIT=5
HPX_ACTION_ARGUMENT_LIMIT=5
HPX_COMPONENT_CREATE_ARGUMENT_LIMIT=5
HPX_FUNCTION_ARGUMENT_LIMIT=8
HPX_LOCK_LIMIT=5
HPX_TUPLE_LIMIT=8
HPX_WAIT_ARGUMENT_LIMIT=5
HPX_PARCEL_MAX_CONNECTIONS=512
HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
HPX_PREFIX=/home/pagrubel/build/hpx_buildrelease
{version}: V0.9.9-trunk (AGAS: V3.0), Git: b26cb34
{boost}: V1.55.0
{build-type}: release
{date}: Aug 14 2014 17:05:38
{platform}: linux
{compiler}: GNU C++ version 4.9.0
{stdlib}: GNU libstdc++ version 20140704
{what}: Broken pipe
srun: error: marvin00: task 0: Aborted
The text was updated successfully, but these errors were encountered: