Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia crashes with "free(): invalid pointer" or "double free or corruption (out)" on ubuntu 20.04.1 TPU vm #44242

Open
reidsanders opened this issue Feb 18, 2022 · 22 comments

Comments

@reidsanders
Copy link

reidsanders commented Feb 18, 2022

I'm trying to run julia on a tpu-vm v3-8 using the tpu-vm-pt-1.10 image. It crashes on various operations with "free(): invalid pointer." This happens with 1.7.2 binary, the 1.6.5 LTS, conda-forge version and a similar error occurs when building from source or using conda-forge.
For example

(@v1.6) pkg> generate Demo
  Generating  project Demo:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f5bc79ef3ed)
unknown function (ip: 0x7f5bc79f747b)
unknown function (ip: 0x7f5bc79f8cab)
git_mbedtls_stream_global_init at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
init_once at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
__pthread_once_slow at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
git_libgit2_init at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/error.jl:108 [inlined]
initialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:986
#164 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:971
lock at ./lock.jl:187
ensure_initialized at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:967 [inlined]
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:50
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:50 [inlined]
with at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/types.jl:1156 [inlined]
getconfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:160 [inlined]
project at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:30
#generate#3 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:15
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:10 [inlined]
#generate#2 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:8 [inlined]
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:8 [inlined]
#generate_deprecated#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:5 [inlined]
generate_deprecated at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:4
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:670
do_cmd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:405
#do_cmd#21 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:386
do_cmd at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:550
jfptr_YY.24_45436.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
jfptr_run_interface_54737.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
#44 at ./task.jl:411
jfptr_YY.44_53285.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:834
Allocations: 2654 (Pool: 2639; Big: 15); GC: 0
Aborted (core dumped)

Machine info:

$ uname -a
Linux *********** 5.11.0-1021-gcp #23~20.04.1-Ubuntu SMP Fri Oct 1 19:04:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I did notice that trying to check the libc.so.6 info segfaulted, which makes me suspicious google is doing something strange with their glibc.

$ /lib/x86_64-linux-gnu/libc.so.6 
Segmentation fault (core dumped)

I tried to make julia, and got a related crash:

Stdlibs: ────  40.897405 seconds 59.2925%
    JULIA usr/lib/julia/sys-o.a
munmap_chunk(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)

signal (11): Segmentation fault
in expression starting at none:0
ERROR: Failed to precompile __PackagePrecompilationStatementModule [top-level] to /tmp/jl_a28ZNv/compiled/v1.9/jl_CXWiQH.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1547
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1491
 [4] top-level scope
   @ none:3
free(): invalid size

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fe05715f3ed)
unknown function (ip: 0x7fe05716747b)
unknown function (ip: 0x7fe057168cbb)
close_unit_1 at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:742
close_units at /workspace/srcdir/gcc-11.1.0/libgfortran/io/unit.c:800

signal (11): Segmentation fault
in expression starting at none:0
ERROR: LoadError: failed process: Process(`/home/rs/tools/julia/usr/bin/julia -O0 --sysimage /home/rs/tools/julia/usr/lib/julia/sys.ji --trace-compile=/tmp/jl_a28ZNv/jl_dSBZAYY4Cb --startup-file=no -Cnative -e 'pushfirst!(DEPOT_PATH, "/tmp/jl_a28ZNv");

I found a very similar issue in discourse, but it had no replies. Github issues did not seem to have anything relevant. https://discourse.julialang.org/t/issues-with-julia-installation-on-google-tpu-vm/65783
I posted and got advice about various flags to try. I tried various combinations without success, though its not clear if the failure was for the same reason.

Output:
default_options.txt
USE_BINARYBUILDER=0.txt
USE_BINARYBUILDER=0-USE_BINARY_BUILDER_LIBGIT2=0.txt
USE_BINARYBUILDER=0-USE_SYSTEM_CURL=1.txt
USE_SYSTEM_CURL=1.txt

Those using USE_BINARYBUILDER=0 failed with

configure: error: --with-nghttp2 was specified but could not find libnghttp2 pkg-config file.
However specifying the path manually did not help.
PKG_CONFIG_PATH=/home/rs/tools/julia/usr/lib/pkgconfig/

Has anyone had success with julia on TPU vms? Thanks!

@mkitti
Copy link
Contributor

mkitti commented Feb 22, 2022

What's the quickest way to access the tpu-vm-pt-1.10?

@reidsanders
Copy link
Author

reidsanders commented Feb 22, 2022

What's the quickest way to access the tpu-vm-pt-1.10?

A command like:

gcloud alpha compute tpus tpu-vm create juliatpu
--zone=europe-west4-a
--accelerator-type=v2-8
--project=_____
--version=tpu-vm-pt-1.10

But I have tested it on v2-alpha with same result

@magicknight
Copy link

Same error on TPU-VM here.
I think it relate to the memory manage library used on it. I am looking for a solution.

@mkitti
Copy link
Contributor

mkitti commented Mar 8, 2022

@jekbradbury, do you have any insight about what may be happening here?

@JosePereiraUA
Copy link

Sorry, do we have an update on this? Thanks.

@mkitti
Copy link
Contributor

mkitti commented Jun 29, 2022

Unfortunately no. Could someone produce stack traces with a Julia nightly and Julia 1.8-rc1 available at https://julialang.org/downloads/ ?

@reidsanders
Copy link
Author

Same error:

(@v1.8) pkg> generate Demo
  Generating  project Demo:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7ffa2a1f126d)
unknown function (ip: 0x7ffa2a1f92fb)
unknown function (ip: 0x7ffa2a1fab2b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-1.8.0-rc1/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-1.8.0-rc1/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/error.jl:108 [inlined]
initialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/LibGit2.jl:986
#162 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/LibGit2.jl:975
lock at ./lock.jl:185
ensure_initialized at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/LibGit2.jl:971 [inlined]
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/config.jl:50
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/config.jl:50 [inlined]
with at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/types.jl:1159 [inlined]
getconfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/LibGit2/src/config.jl:160 [inlined]
project at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/generate.jl:26
#generate#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/generate.jl:9
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/generate.jl:3
jfptr_generate_73162.clone_1 at /home/rs/downloads/julia-1.8.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1838 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:730
do_cmd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:406
#do_cmd#21 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:387
do_cmd at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/Pkg/src/REPLMode/REPLMode.jl:551
jfptr_YY.24_76038.clone_1 at /home/rs/downloads/julia-1.8.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1838 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
#49 at ./task.jl:482
jfptr_YY.49_63753.clone_1 at /home/rs/downloads/julia-1.8.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2358 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2540
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1838 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:931
Allocations: 2903 (Pool: 2890; Big: 13); GC: 0
Aborted (core dumped)

With nightly:

(@v1.9) pkg> generate Demo
  Generating  project Demo:
free(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fd85966526d)
unknown function (ip: 0x7fd85966d2fb)
unknown function (ip: 0x7fd85966eb2b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/error.jl:109 [inlined]
initialize at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:986
#162 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:975
lock at ./lock.jl:229
ensure_initialized at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:971 [inlined]
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50 [inlined]
with at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/types.jl:1165 [inlined]
getconfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:160 [inlined]
project at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:26
#generate#1 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:9
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
generate at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:3
jfptr_generate_62315.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
do_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:730
do_cmd! at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:406
#do_cmd#21 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:387
do_cmd at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:551
jfptr_YY.24_57874.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:801 [inlined]
invokelatest at ./essentials.jl:798 [inlined]
run_interface at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/LineEdit.jl:2623
jfptr_run_interface_56136.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
run_frontend at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:1289
#62 at ./task.jl:499
jfptr_YY.62_56211.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
start_task at /cache/build/default-amdci4-5/julialang/julia-master/src/task.c:931
Allocations: 2871 (Pool: 2859; Big: 12); GC: 0
Aborted (core dumped)

If there's a more useful stack trace I can produce let me know.

@mkitti
Copy link
Contributor

mkitti commented Jul 8, 2022

Thanks. Can we isolate the error to the following line?

initialize at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:986

@check ccall((:git_libgit2_init, :libgit2), Cint, ())

In other words, is executing the following line sufficient to crash Julia?

ccall((:git_libgit2_init, :libgit2), Cint, ())

@mkitti
Copy link
Contributor

mkitti commented Jul 8, 2022

The next step would be to verify that git_libgit2_init fails from a C program.

@mkitti
Copy link
Contributor

mkitti commented Jul 8, 2022

The current binary build for LibGit2 included Julia is grabbed from https://github.com/JuliaBinaryWrappers/LibGit2_jll.jl
as per the version indicated here https://github.com/JuliaLang/julia/blob/master/deps/libgit2.version

The build script here is here:
https://github.com/JuliaPackaging/Yggdrasil/blob/master/L/LibGit2/build_tarballs.jl

@mkitti
Copy link
Contributor

mkitti commented Jul 8, 2022

As far as I can tell, none of the bundled patches that Julia applies touches code in the stack trace.

https://github.com/JuliaPackaging/Yggdrasil/tree/master/L/LibGit2/bundled/patches

@mkitti
Copy link
Contributor

mkitti commented Jul 8, 2022

Is LD_LIBRARY_PATH set to something?

@reidsanders
Copy link
Author

Nothing with tpu-vm-tf-2.8.0 :

julia-51bb96857d/bin$ echo $LD_LIBRARY_PATH

With tpu-vm-pt-1.11:

julia-1.8.0-rc1/bin$ echo $LD_LIBRARY_PATH
:/usr/local/lib
julia-1.8.0-rc1/bin$ ll /usr/local/lib
total 805628
drwxr-xr-x  5 root root      4096 Mar 16 00:14 ./
drwxr-xr-x 13 root root      4096 Mar 16 00:14 ../
drwxr-xr-x  4 root root      4096 Mar 16 00:14 bazel/
-rw-r--r--  1 root root   2478686 Mar 16 00:13 libiomp5.so
-rw-r--r--  1 root root    624633 Mar 16 00:13 libiomp5_db.so
-rw-r--r--  1 root root    205094 Mar 16 00:13 libiompstubs5.so
-rwxr-xr-x  1 root root  53034392 Mar 16 00:13 libmkl_avx.so.2*
-rwxr-xr-x  1 root root  50137440 Mar 16 00:13 libmkl_avx2.so.2*
-rwxr-xr-x  1 root root  66658456 Mar 16 00:13 libmkl_avx512.so.2*
-rwxr-xr-x  1 root root    523704 Mar 16 00:13 libmkl_blacs_intelmpi_ilp64.so.2*
-rwxr-xr-x  1 root root    320552 Mar 16 00:13 libmkl_blacs_intelmpi_lp64.so.2*
-rwxr-xr-x  1 root root    532928 Mar 16 00:13 libmkl_blacs_openmpi_ilp64.so.2*
-rwxr-xr-x  1 root root    321552 Mar 16 00:13 libmkl_blacs_openmpi_lp64.so.2*
-rwxr-xr-x  1 root root    168912 Mar 16 00:13 libmkl_cdft_core.so.2*
lrwxrwxrwx  1 root root        31 Mar 16 00:13 libmkl_core.so.1 -> /usr/local/lib/libmkl_core.so.2*
-rwxr-xr-x  1 root root  73999168 Mar 16 00:13 libmkl_core.so.2*
-rwxr-xr-x  1 root root  42416560 Mar 16 00:13 libmkl_def.so.2*
-rwxr-xr-x  1 root root  13272328 Mar 16 00:13 libmkl_gf_ilp64.so.2*
-rwxr-xr-x  1 root root  17047584 Mar 16 00:13 libmkl_gf_lp64.so.2*
-rwxr-xr-x  1 root root  30979016 Mar 16 00:13 libmkl_gnu_thread.so.2*
-rwxr-xr-x  1 root root  13277104 Mar 16 00:13 libmkl_intel_ilp64.so.2*
lrwxrwxrwx  1 root root        37 Mar 16 00:13 libmkl_intel_lp64.so.1 -> /usr/local/lib/libmkl_intel_lp64.so.2*
-rwxr-xr-x  1 root root  17056672 Mar 16 00:13 libmkl_intel_lp64.so.2*
lrwxrwxrwx  1 root root        39 Mar 16 00:13 libmkl_intel_thread.so.1 -> /usr/local/lib/libmkl_intel_thread.so.2*
-rwxr-xr-x  1 root root  64858584 Mar 16 00:13 libmkl_intel_thread.so.2*
-rwxr-xr-x  1 root root  48742776 Mar 16 00:13 libmkl_mc.so.2*
-rwxr-xr-x  1 root root  50321512 Mar 16 00:13 libmkl_mc3.so.2*
-rwxr-xr-x  1 root root  38037904 Mar 16 00:13 libmkl_pgi_thread.so.2*
-rwxr-xr-x  1 root root   8695128 Mar 16 00:13 libmkl_rt.so.2*
-rwxr-xr-x  1 root root   7718648 Mar 16 00:13 libmkl_scalapack_ilp64.so.2*
-rwxr-xr-x  1 root root   7736496 Mar 16 00:13 libmkl_scalapack_lp64.so.2*
-rwxr-xr-x  1 root root  29005200 Mar 16 00:13 libmkl_sequential.so.2*
-rwxr-xr-x  1 root root  40617024 Mar 16 00:13 libmkl_tbb_thread.so.2*
-rwxr-xr-x  1 root root  15887648 Mar 16 00:13 libmkl_vml_avx.so.2*
-rwxr-xr-x  1 root root  15038968 Mar 16 00:13 libmkl_vml_avx2.so.2*
-rwxr-xr-x  1 root root  14364256 Mar 16 00:13 libmkl_vml_avx512.so.2*
-rwxr-xr-x  1 root root   7756240 Mar 16 00:13 libmkl_vml_cmpt.so.2*
-rwxr-xr-x  1 root root   8766704 Mar 16 00:13 libmkl_vml_def.so.2*
-rwxr-xr-x  1 root root  14775632 Mar 16 00:13 libmkl_vml_mc.so.2*
-rwxr-xr-x  1 root root  14619984 Mar 16 00:13 libmkl_vml_mc2.so.2*
-rwxr-xr-x  1 root root  14628344 Mar 16 00:13 libmkl_vml_mc3.so.2*
-rwxr-xr-x  1 root root     17904 Mar 16 00:13 libomp-fallback-cstring.o*
-rwxr-xr-x  1 root root      3900 Mar 16 00:13 libomp-fallback-cstring.spv*
-rwxr-xr-x  1 root root    358864 Mar 16 00:13 libomp-spirvdevicertl-optional.o*
-rwxr-xr-x  1 root root      9120 Mar 16 00:13 libomp-spirvdevicertl-required.o*
-rwxr-xr-x  1 root root    110880 Mar 16 00:13 libomptarget-opencl-optional.bc*
-rwxr-xr-x  1 root root      2420 Mar 16 00:13 libomptarget-opencl-required.bc*
-rwxr-xr-x  1 root root   8690912 Mar 16 00:13 libomptarget.rtl.level0.so*
-rwxr-xr-x  1 root root   8673576 Mar 16 00:13 libomptarget.rtl.opencl.so*
-rwxr-xr-x  1 root root   8328280 Mar 16 00:13 libomptarget.rtl.x86_64.so*
-rwxr-xr-x  1 root root    592776 Mar 16 00:13 libomptarget.so*
-rwxr-xr-x  1 root root   2654200 Mar 16 00:13 libtbb.so*
-rwxr-xr-x  1 root root   2654200 Mar 16 00:13 libtbb.so.12*
-rwxr-xr-x  1 root root   2654200 Mar 16 00:13 libtbb.so.12.5*
-rwxr-xr-x  1 root root    211776 Mar 16 00:13 libtbbbind.so*
-rwxr-xr-x  1 root root    211776 Mar 16 00:13 libtbbbind.so.3*
-rwxr-xr-x  1 root root    211776 Mar 16 00:13 libtbbbind.so.3.5*
-rwxr-xr-x  1 root root    211328 Mar 16 00:13 libtbbbind_2_0.so*
-rwxr-xr-x  1 root root    211328 Mar 16 00:13 libtbbbind_2_0.so.3*
-rwxr-xr-x  1 root root    211328 Mar 16 00:13 libtbbbind_2_0.so.3.5*
-rwxr-xr-x  1 root root    216312 Mar 16 00:13 libtbbbind_2_5.so*
-rwxr-xr-x  1 root root    216312 Mar 16 00:13 libtbbbind_2_5.so.3*
-rwxr-xr-x  1 root root    216312 Mar 16 00:13 libtbbbind_2_5.so.3.5*
-rwxr-xr-x  1 root root   1058496 Mar 16 00:13 libtbbmalloc.so*
-rwxr-xr-x  1 root root   1058496 Mar 16 00:13 libtbbmalloc.so.2*
-rwxr-xr-x  1 root root   1058496 Mar 16 00:13 libtbbmalloc.so.2.5*
-rwxr-xr-x  1 root root     75104 Mar 16 00:13 libtbbmalloc_proxy.so*
-rwxr-xr-x  1 root root     75104 Mar 16 00:13 libtbbmalloc_proxy.so.2*
-rwxr-xr-x  1 root root     75104 Mar 16 00:13 libtbbmalloc_proxy.so.2.5*
-rw-r--r--  1 root root    117438 Mar 16 00:13 mkl_msg.cat
drwxrwsr-x  4 root staff     4096 Mar 16 00:11 python2.7/
drwxrwsr-x  3 root staff     4096 Mar  8 22:16 python3.8/

@mkitti
Copy link
Contributor

mkitti commented Jul 8, 2022

In other words, is executing the following line sufficient to crash Julia?

ccall((:git_libgit2_init, :libgit2), Cint, ())

Could you try this?

@giordano
Copy link
Contributor

giordano commented Jul 8, 2022

For the sake of making the package manager usable, you can set the environment variable JULIA_PKG_USE_CLI_GIT=true, which would avoid calling libgit2 in the first place. Of course this doesn't address the underlying issue, which would be great to understand and solve, but unfortunately when you get errors inside binary libraries there isn't much to do apart from firing up a debugger like gdb or lldb and walking your way inside them.

Side note, in case you wanted to try again to compile Julia with USE_BINARYBUILDER=0 (and related options), the process of building from source also the dependencies should have become more robust in the last few months, now this is even daily tested on CI for x86_64-linux-gnu.

@reidsanders
Copy link
Author

With preview ccall does produce same stacktrace.

julia> ccall((:git_libgit2_init, :libgit2), Cint, ())
free(): invalid pointer

signal (6): Aborted
in expression starting at REPL[1]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f34e6e2329d)
unknown function (ip: 0x7f34e6e2b32b)
unknown function (ip: 0x7f34e6e2cb5b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
top-level scope at ./REPL[1]:1
jl_toplevel_eval_flex at /cache/build/default-amdci4-5/julialang/julia-master/src/toplevel.c:903
jl_toplevel_eval_flex at /cache/build/default-amdci4-5/julialang/julia-master/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-5/julialang/julia-master/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
eval_user_input at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:152
repl_backend_loop at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:248
#start_repl_backend#46 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:233
start_repl_backend##kw at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:230 [inlined]
#run_repl#59 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:372
run_repl at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:357
jfptr_run_repl_56967.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
#990 at ./client.jl:413
jfptr_YY.990_45119.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:774
run_main_repl at ./client.jl:397
exec_options at ./client.jl:314
_start at ./client.jl:514
jfptr__start_26754.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
true_main at /cache/build/default-amdci4-5/julialang/julia-master/src/jlapi.c:567
jl_repl_entrypoint at /cache/build/default-amdci4-5/julialang/julia-master/src/jlapi.c:711
main at /cache/build/default-amdci4-5/julialang/julia-master/cli/loader_exe.c:59
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 2872 (Pool: 2858; Big: 14); GC: 0
Aborted (core dumped)

Unfortunately setting JULIA_PKG_USE_CLI_GIT=true does not seem to do anything.

(base) rs@t1v-n-d1477409-w-0:~/downloads/julia-51bb96857d/bin$ echo $JULIA_PKG_USE_CLI_GIT
true

(@v1.9) pkg> generate Demo2
  Generating  project Demo2:
munmap_chunk(): invalid pointer

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f2d3c6e129d)
unknown function (ip: 0x7f2d3c6e932b)
unknown function (ip: 0x7f2d3c6e957b)
git_mbedtls_stream_global_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
git_runtime_init at /home/rs/downloads/julia-51bb96857d/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/error.jl:109 [inlined]
initialize at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:986
#162 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:975
lock at ./lock.jl:229
ensure_initialized at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/LibGit2.jl:971 [inlined]
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50
GitConfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:50 [inlined]
with at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/types.jl:1165 [inlined]
getconfig at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/LibGit2/src/config.jl:160 [inlined]
project at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:26
#generate#1 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:9
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
generate at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/generate.jl:3
jfptr_generate_62315.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
do_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:730
do_cmd! at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:406
#do_cmd#21 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:387
do_cmd at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/Pkg/src/REPLMode/REPLMode.jl:551
jfptr_YY.24_57874.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-5/julialang/julia-master/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:801 [inlined]
invokelatest at ./essentials.jl:798 [inlined]
run_interface at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/LineEdit.jl:2623
jfptr_run_interface_56136.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
run_frontend at /cache/build/default-amdci4-5/julialang/julia-master/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:1289
#62 at ./task.jl:499
jfptr_YY.62_56211.clone_1 at /home/rs/downloads/julia-51bb96857d/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2393 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-5/julialang/julia-master/src/gf.c:2575
jl_apply at /cache/build/default-amdci4-5/julialang/julia-master/src/julia.h:1846 [inlined]
start_task at /cache/build/default-amdci4-5/julialang/julia-master/src/task.c:931
Allocations: 2868 (Pool: 2857; Big: 11); GC: 0
Aborted (core dumped)

@giordano
Copy link
Contributor

giordano commented Jul 9, 2022

Unfortunately setting JULIA_PKG_USE_CLI_GIT=true does not seem to do anything.

Did you export the variable? What's the value of ENV["JULIA_PKG_USE_CLI_GIT"] inside Julia?

@reidsanders
Copy link
Author

reidsanders commented Jul 9, 2022

Yes, I exported the variable. I also tried both true and 1.

julia> ENV["JULIA_PKG_USE_CLI_GIT"]
"1"

julia> ENV["JULIA_PKG_USE_CLI_GIT"]
"true"

@KristofferC
Copy link
Sponsor Member

JULIA_PKG_USE_CLI_GIT is only for setting how things are downloaded but during e.g. the Julia precompile stage, libgit2 is used for some other things (like creating a git repo).

@mkitti
Copy link
Contributor

mkitti commented Jul 9, 2022

What does the trace look like for the conda-forge build?

That build uses libgit2 built against openssl rather than mbedtls so the stack trace should not use git_mbedtls_stream_global_init.

https://github.com/conda-forge/libgit2-feedstock/blob/214137e20348d3132361c7260952b465c9373e71/recipe/meta.yaml#L28

Otherwise, I would try to build libgit2 directly and see if their tests are passing.

Maybe @giordano knows how to set this up with our bundled patches applied?
https://github.com/libgit2/libgit2/blob/main/CMakeLists.txt

@inkydragon
Copy link
Sponsor Member

@mkitti If you just want to build libgit2 with julia bundled patches.
Here are the steps to build:

git clone https://github.com/libgit2/libgit2.git

# get patch
wget https://raw.githubusercontent.com/JuliaPackaging/Yggdrasil/master/L/LibGit2/bundled/patches/libgit2-agent-nonfatal.patch
wget https://raw.githubusercontent.com/JuliaPackaging/Yggdrasil/master/L/LibGit2/bundled/patches/libgit2-hostkey.patch
wget https://raw.githubusercontent.com/JuliaPackaging/Yggdrasil/master/L/LibGit2/bundled/patches/libgit2-win32-ownership.patch

# checkout version
cd libgit2/
git checkout v1.4.3

# apply patch
patch -p1 -f < ../libgit2-agent-nonfatal.patch
patch -p1 -f < ../libgit2-hostkey.patch
patch -p1 -f < ../libgit2-win32-ownership.patch

# build flags
LIBGIT2_BUILD_FLAGS="-DCMAKE_BUILD_TYPE=Release -DUSE_THREADS=ON -DUSE_BUNDLED_ZLIB=ON -DUSE_SHA1=CollisionDetection"
# open tests and examples
LIBGIT2_BUILD_FLAGS="$LIBGIT2_BUILD_FLAGS -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON"
# use OpenSSL
LIBGIT2_BUILD_FLAGS="$LIBGIT2_BUILD_FLAGS -DUSE_HTTPS=\"OpenSSL\""

# build
mkdir build && cd build
cmake .. "$LIBGIT2_BUILD_FLAGS"
make -j `nproc`

julia enabled SSH support when building libgit2, using libssh2 with some patches.
I assume that whether SSH is enabled (-DUSE_SSH=ON) or not is irrelevant to this issue.

@giordano
Copy link
Contributor

Mentioning this also here: it's known that various Google environments (Google Cloud Platform, Google Colab, etc...) may set LD_PRELOAD to preload tcmalloc, which breaks lots of software, not just julia. When using GCP/Colab and other Google-provided virtual machines, makes sure LD_PRELOAD and LD_LIBRARY_PATH are not forcing to use external libraries which can cause problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants