Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU tests are failing in allgather #841

Open
vchuravy opened this issue Jun 19, 2024 · 0 comments
Open

AMDGPU tests are failing in allgather #841

vchuravy opened this issue Jun 19, 2024 · 0 comments
Assignees

Comments

@vchuravy
Copy link
Member

Going as far back as #764 (comment)
x-ref: #839 (comment)

Might be as simple as updating the OpenMPI version we are using for the tests?
@luraess can you take a look?

698] signal (11.2): Segmentation fault
in expression starting at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/test_allgather.jl:9
unknown function (ip: 0x77ae2d2bb815)
opal_convertor_pack at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libopen-pal.so.40 (unknown line)
mca_btl_vader_sendi at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_btl_vader.so (unknown line)
mca_pml_ob1_send_inline.constprop.0 at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
mca_pml_ob1_send at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
ompi_coll_base_sendrecv_actual at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_base_allgather_intra_two_procs at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_tuned_allgather_intra_dec_fixed at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_coll_tuned.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/api/generated_api.jl:252 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:459
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:463 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:466
Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:490
unknown function (ip: 0x77ae210a5469)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:489
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
[699] signal (11.2): Segmentation fault
in expression starting at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/test_allgather.jl:9
unknown function (ip: 0x7f77886b9815)
opal_convertor_pack at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libopen-pal.so.40 (unknown line)
mca_btl_vader_sendi at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_btl_vader.so (unknown line)
mca_pml_ob1_send_inline.constprop.0 at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
mca_pml_ob1_send at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
ompi_coll_base_sendrecv_actual at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_base_allgather_intra_two_procs at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_tuned_allgather_intra_dec_fixed at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_coll_tuned.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/api/generated_api.jl:252 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:459
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:463 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:466
Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:490
unknown function (ip: 0x7f777c4a5469)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:489
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlin698] signal (11.2): Segmentation fault
in expression starting at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/test_allgather.jl:9
unknown function (ip: 0x77ae2d2bb815)
opal_convertor_pack at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libopen-pal.so.40 (unknown line)
mca_btl_vader_sendi at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_btl_vader.so (unknown line)
mca_pml_ob1_send_inline.constprop.0 at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
mca_pml_ob1_send at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
ompi_coll_base_sendrecv_actual at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_base_allgather_intra_two_procs at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_tuned_allgather_intra_dec_fixed at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_coll_tuned.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/api/generated_api.jl:252 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:459
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:463 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:466
Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:490
unknown function (ip: 0x77ae210a5469)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:489
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
[699] signal (11.2): Segmentation fault
in expression starting at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/test_allgather.jl:9
unknown function (ip: 0x7f77886b9815)
opal_convertor_pack at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libopen-pal.so.40 (unknown line)
mca_btl_vader_sendi at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_btl_vader.so (unknown line)
mca_pml_ob1_send_inline.constprop.0 at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
mca_pml_ob1_send at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_pml_ob1.so (unknown line)
ompi_coll_base_sendrecv_actual at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_base_allgather_intra_two_procs at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
ompi_coll_tuned_allgather_intra_dec_fixed at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/openmpi/mca_coll_tuned.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/gpuci-2/julialang/mpi-dot-jl/openmpi/lib/libmpi.so (unknown line)
MPI_Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/api/generated_api.jl:252 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:459
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:463 [inlined]
Allgather! at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:466
Allgather at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/src/collective.jl:490
unknown function (ip: 0x7f777c4a5469)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_body at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:489
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./Base.jl:495
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./Base.jl:495
jfptr_include_46393.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr_include_46393.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
exec_options at ./client.jl:318
jfptr__start_82729.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x77ae2d144d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 7672214 (Pool: 7663210; Big: 9004); GC: 12
_start at ./client.jl:552
jfptr__start_82729.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7f7788542d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 7672216 (Pool: 7663212; Big: 9004); GC: 12
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node amdgpu1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
test_allgather.jl: Error During Test at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/runtests.jl:75
  Got exception outside of a @test
  failed process: Process(`mpiexec -n 2 /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/bin/julia -C native -J/root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so --depwarn=yes --check-bounds=yes -g1 --color=yes --startup-file=no --startup-file=no /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/test_allgather.jl`, ProcessExited(139)) [139]ed]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./Base.jl:495
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2136
include at ./Base.jl:495
jfptr_include_46393.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr_include_46393.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
exec_options at ./client.jl:318
jfptr__start_82729.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x77ae2d144d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 7672214 (Pool: 7663210; Big: 9004); GC: 12
_start at ./client.jl:552
jfptr__start_82729.1 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7f7788542d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 7672216 (Pool: 7663212; Big: 9004); GC: 12
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node amdgpu1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
test_allgather.jl: Error During Test at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/runtests.jl:75
  Got exception outside of a @test
  failed process: Process(`mpiexec -n 2 /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/bin/julia -C native -J/root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/lib/julia/sys.so --depwarn=yes --check-bounds=yes -g1 --color=yes --startup-file=no --startup-file=no /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/mpi-dot-jl/test/test_allgather.jl`, ProcessExited(139)) [139]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants