Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Veloc test fails on perlmutter #56

Open
wspear opened this issue Dec 8, 2022 · 7 comments
Open

Veloc test fails on perlmutter #56

wspear opened this issue Dec 8, 2022 · 7 comments

Comments

@wspear
Copy link
Collaborator

wspear commented Dec 8, 2022

@gonsie
@vsoch

The veloc standalone test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/veloc fails when run on the veloc installed as part of the e4s 22.11 deployment on perlmutter using these variants:

-- linux-sles15-zen3 / gcc@11.2.0 -------------------------------
44htwoe veloc@1.5~ipo build_system=cmake build_type=RelWithDebInfo

With this console output:

REDSET 0.1.0 ABORT: rank 0 on nid001901: XOR requires at least 2 ranks per set, but found 1 rank(s) in set @ /tmp/lpeyrala/spack-stage/spack-stage-redset-0.2.0-4rss7cokqukwuvzvzlymxazgem3a6gim/spack-src/src/redset_xor.c:157
MPICH ERROR [Rank 0] [job id 3921177.7] [Tue Dec  6 14:52:36 2022] [nid001901] - Abort(-1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
srun: error: nid001901: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=3921177.7
slurmstepd: error: *** STEP 3921177.7 ON nid001901 CANCELLED AT 2022-12-06T22:52:36 ***
srun: error: nid001901: task 1: Terminated
srun: Force Terminated StepId=3921177.7

Updating to the latest heatdis_mem.c included with Veloc 1.5 resulted in this runtime error output:

[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread

@vsoch
Copy link

vsoch commented Dec 8, 2022

Sorry I don't work on this, not sure how you want my help? If you have a specific question I can help with let me know. Maybe ping someone that develops veloc or maintains this repo? I'm actually not sure what it is.

@bnicolae
Copy link

bnicolae commented Dec 8, 2022

Please update to the latest VELOC release, which is 1.6. If you still see any issues, try the master branch.

@wspear
Copy link
Collaborator Author

wspear commented Dec 8, 2022

@bnicolae

Veloc 1.6 fails to be built by the spack package out of the box (1.5 builds fine in the same environment). The build error looks like:

==> Installing veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36
==> No binary for veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36 found: installing from source
==> Fetching https://github.com/ECP-VeloC/VELOC/archive/1.6.tar.gz
==> No patches needed for veloc
==> veloc: Executing phase: 'cmake'
==> veloc: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j16'

5 errors found in build log:
     81     [ 51%] Linking C executable heatdis_original
     82     cd /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen/test && /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/cmake-3.24.
            2-sgm62x4xhlgn7rl3h6rve2yez2bsra6s/bin/cmake -E cmake_link_script CMakeFiles/heatdis_original.dir/link.txt --verbose=1
     83     /home/wspear/bin/SPACK/spack/lib/spack/env/gcc/gcc -O2 -g -DNDEBUG CMakeFiles/heatdis_original.dir/heatdis_original.c.o -o heatdis_original  -Wl,-rpath,/home/wspear/bin/SPACK/spack/opt/spack/linux-u
            buntu22.04-westmere/gcc-11.2.0/intel-oneapi-mpi-2021.7.0-tib45i3vuhw4krn7oiihc4hlndmpbtce/mpi/2021.7.0/lib/release -lm /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/in
            tel-oneapi-mpi-2021.7.0-tib45i3vuhw4krn7oiihc4hlndmpbtce/mpi/2021.7.0/lib/release/libmpi.so /usr/lib/x86_64-linux-gnu/librt.a /usr/lib/x86_64-linux-gnu/libpthread.a /usr/lib/x86_64-linux-gnu/libdl.a
     84     make[2]: Leaving directory '/tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen'
     85     [ 51%] Built target heatdis_original
     86     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp: In constructor 'axl_module_t::axl_module_t(const string&, const string&, const st
            ring&)':
  >> 87     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:22:23: error: too few arguments to function 'int AXL_Init(const char*)'
     88        22 |     int ret = AXL_Init();
     89           |               ~~~~~~~~^~
     90     In file included from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.hpp:5,
     91                      from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:1:
     92     /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/axl-0.3.0-jalfn42f5zij76i2hr5xr6f24mfji53r/include/axl.h:35:5: note: declared here
     93        35 | int AXL_Init (const char* state_file);
     94           |     ^~~~~~~~
     95     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp: In member function 'bool axl_module_t::axl_transfer_file(const string&, const str
            ing&)':
  >> 96     /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:28:24: error: too many arguments to function 'int AXL_Create(axl_xfer_t, const cha
            r*)'
     97        28 |     int id = AXL_Create(axl_type, source.c_str(), NULL), result = id;
     98           |              ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     99     In file included from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.hpp:5,
     100                     from /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-src/src/storage/axl_module.cpp:1:
     101    /home/wspear/bin/SPACK/spack/opt/spack/linux-ubuntu22.04-westmere/gcc-11.2.0/axl-0.3.0-jalfn42f5zij76i2hr5xr6f24mfji53r/include/axl.h:45:5: note: declared here
     102       45 | int AXL_Create (axl_xfer_t type, const char* name);
     103          |     ^~~~~~~~~~
  >> 104    make[2]: *** [src/modules/CMakeFiles/veloc-modules.dir/build.make:247: src/modules/CMakeFiles/veloc-modules.dir/__/storage/axl_module.cpp.o] Error 1
     105    make[2]: *** Waiting for unfinished jobs....
     106    make[2]: Leaving directory '/tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen'
  >> 107    make[1]: *** [CMakeFiles/Makefile2:209: src/modules/CMakeFiles/veloc-modules.dir/all] Error 2
     108    make[1]: Leaving directory '/tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-2u34oen'
  >> 109    make: *** [Makefile:149: all] Error 2

See build log for details:
  /tmp/wspear/spack-stage/spack-stage-veloc-1.6-2u34oen6ttls4m6vlgi4hwgnteou7q36/spack-build-out.txt

@wspear
Copy link
Collaborator Author

wspear commented Dec 8, 2022

Also: Main shows the same error. It looks like the master branch was changed to main so spack install veloc@master won't work.

@wspear
Copy link
Collaborator Author

wspear commented Dec 15, 2022

We see the same error on Crusher.

@bnicolae
Copy link

@wspear: Most likely you are using old versions of the dependencies in the Spack recipe. Here are the versions of the dependencies you should use (you can find them in auto-install.py, our default non-Spack installation script):

install_dep('https://github.com/ECP-VeloC/KVTree.git', 'v1.2.0')
install_dep('https://github.com/ECP-VeloC/AXL.git', 'v0.5.0')
install_dep('https://github.com/ECP-VeloC/rankstr.git', 'v0.1.0')
install_dep('https://github.com/ECP-VeloC/shuffile.git', 'v0.1.0')
install_dep('https://github.com/ECP-VeloC/redset.git', 'v0.1.0')
install_dep('https://github.com/ECP-VeloC/er.git', 'v0.1.0')

@wspear
Copy link
Collaborator Author

wspear commented Dec 27, 2022

@bnicolae I was able to build veloc@1.6 with the changes in this PR: spack/spack#34706 but I'm still investigating the hang on Crusher/Perlmutter. Could you take a look at the PR and confirm it looks sane? I added the dependencies you listed but I wasn't clear if they needed configuration options added. (I'm guessing not since it builds without any)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants