-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ginkgo + CUDA Tests Fail on Marianas #540
Comments
Relevant failing test logs in PNNL CI: NlpSparse1_6 (Click to show logs)test 22
Start 22: NlpSparse1_6
22: Test command: /share/apps/openmpi/4.1.0/gcc/10.2.0/bin/mpirun "-n" "1" "/people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe" "500" "-ginkgo_cuda" "-selfcheck"
22: Test timeout computed to be: 1800
22: [1659713290.594617] [dlt03:33270:0] ucp_context.c:1470 UCX WARN UCP version is incompatible, required: 1.10, actual: 1.8 (release 0 /usr/lib64/libucp.so.0)
22: [1659713290.621398] [dlt03:33270:0] ib_iface.c:665 UCX ERROR ibv_create_cq(cqe=4096) failed: Cannot allocate memory
22: [dlt03.local:33270] pml_ucx.c:273 Error: Failed to create UCP worker
22: ===============
22: Hiop SOLVER
22: ===============
22: Using 1 MPI ranks.
22: ---------------
22: Problem Summary
22: ---------------
22: Total number of variables: 500
22: lower/upper/lower_and_upper bounds: 499 / 1 / 1
22: Total number of equality constraints: 1
22: Total number of inequality constraints: 498
22: lower/upper/lower_and_upper bounds: 498 / 497 / 497
22: LSQ linear solver --- KKT_SPARSE_XDYcYd linsys: MA57 size 1497 cons 499 nnz 3991 (option 'duals_init_linear_solver_sparse' 'auto')
22: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
22: 0 7.6705009e+00 9.980e+00 1.118e+00 -1.00 0.000e+00 0.000e+00 -(-)
22: Setting up Ginkgo solver ...
22: terminate called after throwing an instance of 'gko::CudaError'
22: what(): /tmp/ruth521/spack-stage/spack-stage-ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/spack-src/cuda/base/executor.cpp:192: raw_copy_to: cudaErrorInvalidValue: invalid argument
22: [dlt03:33270] *** Process received signal ***
22: [dlt03:33270] Signal: Aborted (6)
22: [dlt03:33270] Signal code: (-6)
22: [dlt03:33270] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7f9f37a82630]
22: [dlt03:33270] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7f9f27bd8387]
22: [dlt03:33270] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7f9f27bd9a78]
22: [dlt03:33270] [ 3] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0x9934c)[0x7f9f2852334c]
22: [dlt03:33270] [ 4] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4656)[0x7f9f2852e656]
22: [dlt03:33270] [ 5] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa46c1)[0x7f9f2852e6c1]
22: [dlt03:33270] [ 6] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4955)[0x7f9f2852e955]
22: [dlt03:33270] [ 7] /qfs/projects/exasgd/src/cameron/spack/opt/spack/linux-centos7-zen2/gcc-10.2.0/ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/lib64/libginkgo_cuda.so.1.5.0(+0x18cb28)[0x7f9f29858b28]
22: [dlt03:33270] [ 8] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe(_ZNK3gko8Executor9copy_fromIdEEvPKS0_mPKT_PS4_+0x150)[0x64d6d6]
22: [dlt03:33270] [ 9] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe(_ZN3gko5arrayIdEaSERKS1_+0x2ca)[0x648daa]
22: [dlt03:33270] [10] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x646a54]
22: [dlt03:33270] [11] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x64353d]
22: [dlt03:33270] [12] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x63ab67]
22: [dlt03:33270] [13] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x635579]
22: [dlt03:33270] [14] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x6261dc]
22: [dlt03:33270] [15] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x6274b4]
22: [dlt03:33270] [16] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x627743]
22: [dlt03:33270] [17] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5c82b7]
22: [dlt03:33270] [18] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5c8473]
22: [dlt03:33270] [19] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5ca579]
22: [dlt03:33270] [20] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5bc02b]
22: [dlt03:33270] [21] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x4d7c60]
22: [dlt03:33270] [22] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9f27bc4555]
22: [dlt03:33270] [23] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x4d5237]
22: [dlt03:33270] *** End of error message ***
22: --------------------------------------------------------------------------
22: Primary job terminated normally, but 1 process returned
22: a non-zero exit code. Per user-direction, the job has been aborted.
22: --------------------------------------------------------------------------
22: --------------------------------------------------------------------------
22: mpirun noticed that process rank 0 with PID 33270 on node dlt03 exited on signal 6 (Aborted).
22: -------------------------------------------------------------------------- NlpSparse2_5 (Click to show logs)Start 27: NlpSparse2_5
27: Test command: /share/apps/openmpi/4.1.0/gcc/10.2.0/bin/mpirun "-n" "1" "/people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe" "500" "-ginkgo_cuda" "-inertiafree" "-selfcheck"
27: Test timeout computed to be: 1800
27: [1659713336.290306] [dlt03:33417:0] ucp_context.c:1470 UCX WARN UCP version is incompatible, required: 1.10, actual: 1.8 (release 0 /usr/lib64/libucp.so.0)
27: [1659713336.318631] [dlt03:33417:0] ib_iface.c:665 UCX ERROR ibv_create_cq(cqe=4096) failed: Cannot allocate memory
27: [dlt03.local:33417] pml_ucx.c:273 Error: Failed to create UCP worker
27: ===============
27: Hiop SOLVER
27: ===============
27: Using 1 MPI ranks.
27: ---------------
27: Problem Summary
27: ---------------
27: Total number of variables: 500
27: lower/upper/lower_and_upper bounds: 499 / 1 / 1
27: Total number of equality constraints: 2
27: Total number of inequality constraints: 499
27: lower/upper/lower_and_upper bounds: 498 / 498 / 497
27: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
27: 0 6.4379656e+01 9.980e+00 1.010e+00 0.00 0.000e+00 0.000e+00 -(-)
27: Setting up Ginkgo solver ...
27: terminate called after throwing an instance of 'gko::CudaError'
27: what(): /tmp/ruth521/spack-stage/spack-stage-ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/spack-src/cuda/base/executor.cpp:192: raw_copy_to: cudaErrorInvalidValue: invalid argument
27: [dlt03:33417] *** Process received signal ***
27: [dlt03:33417] Signal: Aborted (6)
27: [dlt03:33417] Signal code: (-6)
27: [dlt03:33417] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7f715e9aa630]
27: [dlt03:33417] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7f714eb00387]
27: [dlt03:33417] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7f714eb01a78]
27: [dlt03:33417] [ 3] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0x9934c)[0x7f714f44b34c]
27: [dlt03:33417] [ 4] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4656)[0x7f714f456656]
27: [dlt03:33417] [ 5] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa46c1)[0x7f714f4566c1]
27: [dlt03:33417] [ 6] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4955)[0x7f714f456955]
27: [dlt03:33417] [ 7] /qfs/projects/exasgd/src/cameron/spack/opt/spack/linux-centos7-zen2/gcc-10.2.0/ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/lib64/libginkgo_cuda.so.1.5.0(+0x18cb28)[0x7f7150780b28]
27: [dlt03:33417] [ 8] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe(_ZNK3gko8Executor9copy_fromIdEEvPKS0_mPKT_PS4_+0x150)[0x64de3a]
27: [dlt03:33417] [ 9] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe(_ZN3gko5arrayIdEaSERKS1_+0x2ca)[0x64950e]
27: [dlt03:33417] [10] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x6471b8]
27: [dlt03:33417] [11] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x643ca1]
27: [dlt03:33417] [12] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x63b2cb]
27: [dlt03:33417] [13] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x635cdd]
27: [dlt03:33417] [14] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x626940]
27: [dlt03:33417] [15] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x627c18]
27: [dlt03:33417] [16] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x627ea7]
27: [dlt03:33417] [17] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5c8a1b]
27: [dlt03:33417] [18] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5c8bd7]
27: [dlt03:33417] [19] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5cacdd]
27: [dlt03:33417] [20] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5bc78f]
27: [dlt03:33417] [21] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x4d7fed]
27: [dlt03:33417] [22] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f714eaec555]
27: [dlt03:33417] [23] /people/svcexasgd/gitlab/97388/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x4d51e7]
27: [dlt03:33417] *** End of error message ***
27: --------------------------------------------------------------------------
27: Primary job terminated normally, but 1 process returned
27: a non-zero exit code. Per user-direction, the job has been aborted.
27: --------------------------------------------------------------------------
27: --------------------------------------------------------------------------
27: mpirun noticed that process rank 0 with PID 33417 on node dlt03 exited on signal 6 (Aborted).
27: --------------------------------------------------------------------------
27/43 Test #27: NlpSparse2_5 ......................***Failed 9.74 sec Tagging relevant developers from offline discussion: @pelesh @nkoukpaizan @cnpetra @fritzgoebel @nychiang |
CC @maksud |
In PR #548 tests are still failing on Marianas, but now with different error messages (PNNL Pipeline): NlpSparse1_6 (Click to show logs) Start 22: NlpSparse1_6
22: Test command: /share/apps/openmpi/4.1.0/gcc/10.2.0/bin/mpirun "-n" "1" "/people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe" "500" "-ginkgo_cuda" "-selfcheck"
22: Test timeout computed to be: 10000000
22: [1664307938.160667] [dlt02:239553:0] ib_iface.c:964 UCX ERROR ibv_create_cq(cqe=4096) failed: Cannot allocate memory
22: [dlt02.local:239553] pml_ucx.c:273 Error: Failed to create UCP worker
22: ===============
22: Hiop SOLVER
22: ===============
22: Using 1 MPI ranks.
22: ---------------
22: Problem Summary
22: ---------------
22: Total number of variables: 500
22: lower/upper/lower_and_upper bounds: 499 / 1 / 1
22: Total number of equality constraints: 1
22: Total number of inequality constraints: 498
22: lower/upper/lower_and_upper bounds: 498 / 497 / 497
22: LSQ linear solver --- KKT_SPARSE_XDYcYd linsys: MA57 size 1497 cons 499 nnz 3991 (option 'duals_init_linear_solver_sparse' 'auto')
22: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
22: 0 7.6705009e+00 9.980e+00 1.118e+00 -1.00 0.000e+00 0.000e+00 -(-)
22: Setting up Ginkgo solver ...
22: terminate called after throwing an instance of 'std::length_error'
22: what(): cannot create std::vector larger than max_size()
22: [dlt02:239553] *** Process received signal ***
22: [dlt02:239553] Signal: Aborted (6)
22: [dlt02:239553] Signal code: (-6)
22: [dlt02:239553] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7fed38db6630]
22: [dlt02:239553] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7fed28f0c387]
22: [dlt02:239553] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7fed28f0da78]
22: [dlt02:239553] [ 3] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0x9934c)[0x7fed2985734c]
22: [dlt02:239553] [ 4] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4656)[0x7fed29862656]
22: [dlt02:239553] [ 5] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa46c1)[0x7fed298626c1]
22: [dlt02:239553] [ 6] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4955)[0x7fed29862955]
22: [dlt02:239553] [ 7] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(_ZSt20__throw_length_errorPKc+0x3d)[0x7fed29859bfa]
22: [dlt02:239553] [ 8] /qfs/projects/exasgd/src/cameron/spack/opt/spack/linux-centos7-zen2/gcc-10.2.0/ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/lib64/libglu.so(_ZN15Symbolic_Matrix7fill_inEPjS0_+0x800)[0x7fed29b91070]
22: [dlt02:239553] [ 9] /qfs/projects/exasgd/src/cameron/spack/opt/spack/linux-centos7-zen2/gcc-10.2.0/ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/lib64/libginkgo.so.1.5.0(_ZN3gko12experimental13factorization3GluIdiE15ReusableFactory22symbolic_factorizationEPKNS_5LinOpE+0x43c)[0x7fed2f7c321c]
22: [dlt02:239553] [10] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe(_ZN3gko12experimental13factorization3GluIdiE15ReusableFactoryC1ESt10shared_ptrIKNS_8ExecutorEEPKNS_5LinOpEPKNS_7reorder14ReorderingBaseERKNS3_25ReusableFactoryParametersE+0x23b)[0x6407ab]
22: [dlt02:239553] [11] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe(_ZNK3gko12experimental13factorization3GluIdiE25ReusableFactoryParameters2onESt10shared_ptrIKNS_8ExecutorEEPKNS_5LinOpEPKNS_7reorder14ReorderingBaseE+0x6f)[0x63a1c1]
22: [dlt02:239553] [12] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x62be3a]
22: [dlt02:239553] [13] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x62ca01]
22: [dlt02:239553] [14] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x62cc33]
22: [dlt02:239553] [15] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5ca30f]
22: [dlt02:239553] [16] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5ca5a2]
22: [dlt02:239553] [17] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5cc989]
22: [dlt02:239553] [18] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x5bdc49]
22: [dlt02:239553] [19] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x4d7d30]
22: [dlt02:239553] [20] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fed28ef8555]
22: [dlt02:239553] [21] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx1.exe[0x4d5307]
22: [dlt02:239553] *** End of error message ***
22: --------------------------------------------------------------------------
22: Primary job terminated normally, but 1 process returned
22: a non-zero exit code. Per user-direction, the job has been aborted.
22: --------------------------------------------------------------------------
22: --------------------------------------------------------------------------
22: mpirun noticed that process rank 0 with PID 239553 on node dlt02 exited on signal 6 (Aborted).
22: --------------------------------------------------------------------------
22/43 Test #22: NlpSparse1_6 ......................***Failed 15.30 sec NlpSparse2_5 (Click to show logs)test 27
Start 27: NlpSparse2_5
27: Test command: /share/apps/openmpi/4.1.0/gcc/10.2.0/bin/mpirun "-n" "1" "/people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe" "500" "-ginkgo_cuda" "-inertiafree" "-selfcheck"
27: Test timeout computed to be: 10000000
27: [1664307956.157689] [dlt02:239651:0] ib_iface.c:964 UCX ERROR ibv_create_cq(cqe=4096) failed: Cannot allocate memory
27: [dlt02.local:239651] pml_ucx.c:273 Error: Failed to create UCP worker
27: ===============
27: Hiop SOLVER
27: ===============
27: Using 1 MPI ranks.
27: ---------------
27: Problem Summary
27: ---------------
27: Total number of variables: 500
27: lower/upper/lower_and_upper bounds: 499 / 1 / 1
27: Total number of equality constraints: 2
27: Total number of inequality constraints: 499
27: lower/upper/lower_and_upper bounds: 498 / 498 / 497
27: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
27: 0 6.4379656e+01 9.980e+00 1.010e+00 0.00 0.000e+00 0.000e+00 -(-)
27: Setting up Ginkgo solver ...
27: terminate called after throwing an instance of 'std::length_error'
27: what(): cannot create std::vector larger than max_size()
27: [dlt02:239651] *** Process received signal ***
27: [dlt02:239651] Signal: Aborted (6)
27: [dlt02:239651] Signal code: (-6)
27: [dlt02:239651] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7efec26e2630]
27: [dlt02:239651] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7efeb2838387]
27: [dlt02:239651] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7efeb2839a78]
27: [dlt02:239651] [ 3] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0x9934c)[0x7efeb318334c]
27: [dlt02:239651] [ 4] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4656)[0x7efeb318e656]
27: [dlt02:239651] [ 5] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa46c1)[0x7efeb318e6c1]
27: [dlt02:239651] [ 6] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(+0xa4955)[0x7efeb318e955]
27: [dlt02:239651] [ 7] /share/apps/gcc/10.2.0/lib64/libstdc++.so.6(_ZSt20__throw_length_errorPKc+0x3d)[0x7efeb3185bfa]
27: [dlt02:239651] [ 8] /qfs/projects/exasgd/src/cameron/spack/opt/spack/linux-centos7-zen2/gcc-10.2.0/ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/lib64/libglu.so(_ZN15Symbolic_Matrix7fill_inEPjS0_+0x800)[0x7efeb34bd070]
27: [dlt02:239651] [ 9] /qfs/projects/exasgd/src/cameron/spack/opt/spack/linux-centos7-zen2/gcc-10.2.0/ginkgo-glu_experimental-dbmokiqc3tlyvnwehe546lb25lrnuaod/lib64/libginkgo.so.1.5.0(_ZN3gko12experimental13factorization3GluIdiE15ReusableFactory22symbolic_factorizationEPKNS_5LinOpE+0x43c)[0x7efeb90ef21c]
27: [dlt02:239651] [10] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe(_ZN3gko12experimental13factorization3GluIdiE15ReusableFactoryC1ESt10shared_ptrIKNS_8ExecutorEEPKNS_5LinOpEPKNS_7reorder14ReorderingBaseERKNS3_25ReusableFactoryParametersE+0x23b)[0x640f0f]
27: [dlt02:239651] [11] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe(_ZNK3gko12experimental13factorization3GluIdiE25ReusableFactoryParameters2onESt10shared_ptrIKNS_8ExecutorEEPKNS_5LinOpEPKNS_7reorder14ReorderingBaseE+0x6f)[0x63a925]
27: [dlt02:239651] [12] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x62c59e]
27: [dlt02:239651] [13] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x62d165]
27: [dlt02:239651] [14] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x62d397]
27: [dlt02:239651] [15] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5caa73]
27: [dlt02:239651] [16] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5cad06]
27: [dlt02:239651] [17] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5cd0ed]
27: [dlt02:239651] [18] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x5be3ad]
27: [dlt02:239651] [19] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x4d80bd]
27: [dlt02:239651] [20] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7efeb2824555]
27: [dlt02:239651] [21] /people/svcexasgd/gitlab/102551/build/src/Drivers/Sparse/NlpSparseEx2.exe[0x4d52b7]
27: [dlt02:239651] *** End of error message ***
27: --------------------------------------------------------------------------
27: Primary job terminated normally, but 1 process returned
27: a non-zero exit code. Per user-direction, the job has been aborted.
27: --------------------------------------------------------------------------
27: --------------------------------------------------------------------------
27: mpirun noticed that process rank 0 with PID 239651 on node dlt02 exited on signal 6 (Aborted).
27: --------------------------------------------------------------------------
27/43 Test #27: NlpSparse2_5 ......................***Failed 14.67 sec |
are the tests still failing? |
Yes see #548 for current state of work. |
Is there a way for me to get access to Marianas? I am failing to reproduce this so far |
The error message is here:
Marianas is CentOS 7 with a max compute capability of 60, and so the current assumption is that there is a bug with that specific build combination (see here in #521 ).
The current test config to
ctest
disables relevant tests, and so it may appear CI is passing: https://github.com/LLNL/hiop/blob/develop/.gitlab-ci.yml#L164The text was updated successfully, but these errors were encountered: