Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during trampoline allocation when querying occupancy from multiple threads #707

Closed
norci opened this issue Feb 11, 2021 · 8 comments · Fixed by JuliaLang/julia#39621
Labels
bug Something isn't working

Comments

@norci
Copy link
Contributor

norci commented Feb 11, 2021

using the master branch.

I always get this error, in my project.
But I'm not able to reproduce it with a minimal code.

the operation is something like

x ./= sum(x; dims = 2)

Update:
It fails occasionally, when using a few @async tasks.

log:

signal (11): Segmentation fault
in expression starting at REPL[1]:1
trampoline_alloc at /buildworker/worker/package_linux64/build/src/runtime_ccall.cpp:244 [inlined]
jl_get_cfunction_trampoline at /buildworker/worker/package_linux64/build/src/runtime_ccall.cpp:350
#33 at /julia_depot/dev/CUDA/lib/cudadrv/occupancy.jl:64 [inlined]
lock at ./lock.jl:187
unknown function (ip: 0x7fe4ff04df05)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2238 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2420
#launch_configuration#31 at /julia_depot/dev/CUDA/lib/cudadrv/occupancy.jl:63
launch_configuration##kw at /julia_depot/dev/CUDA/lib/cudadrv/occupancy.jl:56 [inlined]
#mapreducedim!#335 at /julia_depot/dev/CUDA/src/mapreduce.jl:194
mapreducedim!##kw at /julia_depot/dev/CUDA/src/mapreduce.jl:143 [inlined]
#_mapreduce#17 at /julia_depot/packages/GPUArrays/WV76E/src/host/mapreduce.jl:62
_mapreduce##kw at /julia_depot/packages/GPUArrays/WV76E/src/host/mapreduce.jl:34 [inlined]
#mapreduce#15 at /julia_depot/packages/GPUArrays/WV76E/src/host/mapreduce.jl:28 [inlined]
mapreduce at /julia_depot/packages/GPUArrays/WV76E/src/host/mapreduce.jl:28 [inlined]
#_sum#684 at ./reducedim.jl:878 [inlined]
_sum at ./reducedim.jl:878 [inlined]
#_sum#683 at ./reducedim.jl:877 [inlined]
_sum at ./reducedim.jl:877 [inlined]
#sum#681 at ./reducedim.jl:873 [inlined]
sum at ./reducedim.jl:873 [inlined]
unknown function (ip: 0x7fe4ff170e31)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2238 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2420
run at /julia_depot/dev/ReinforcementLearningCore/src/core/experiment.jl:45
unknown function (ip: 0x7fe4fc27e9af)
#63 at /julia_depot/dev/CUDA/src/state.jl:540 [inlined]
task_local_storage at ./task.jl:276
stream! at /julia_depot/dev/CUDA/src/state.jl:537
macro expansion at /julia_depot/dev/CUDA/lib/nvtx/highlevel.jl:73 [inlined]
#31 at ./threadingconstructs.jl:169
unknown function (ip: 0x7fe4fc27ab7c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2238 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2420
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 422072602 (Pool: 421921897; Big: 150705); GC: 145
Segmentation fault (core dumped)
@norci norci added the bug Something isn't working label Feb 11, 2021
@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

What platform and Julia version is this?

@norci
Copy link
Contributor Author

norci commented Feb 11, 2021

julia> versioninfo()
Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
Environment:
  JULIA_DEPOT_PATH = /julia_depot:$JULIA_DEPOT_PATH
  JULIA_PATH = /usr/local/julia
  JULIA_NUM_THREADS = 24
  JULIA_PKG_SERVER = https://mirrors.sjtug.sjtu.edu.cn/julia

@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

This looks like JuliaLang/julia#38709; but we already have a lock around @cfunction. Maybe some other code is allocating a trampoline? @norci, do you do @cfunction($...) somewhere?

@maleadt maleadt changed the title mapreduce function failed, when using CUDA.stream! Segfault during trampoline allocation when querying occupancy from multiple threads Feb 11, 2021
@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

Could you try with the following patch to Julia:

diff --git a/src/runtime_ccall.cpp b/src/runtime_ccall.cpp
index 0dd727749f..e6ba5bce04 100644
--- a/src/runtime_ccall.cpp
+++ b/src/runtime_ccall.cpp
@@ -268,9 +268,10 @@ static void trampoline_deleter(void **f)
         free(nval);
 }
 
+static jl_mutex_t trampoline_lock;
+
 // Use of `cache` is not clobbered in JL_TRY
 JL_GCC_IGNORE_START("-Wclobbered")
-// TODO: need a thread lock around the cache access parts of this function
 extern "C" JL_DLLEXPORT
 jl_value_t *jl_get_cfunction_trampoline(
     // dynamic inputs:
@@ -284,6 +285,7 @@ jl_value_t *jl_get_cfunction_trampoline(
     jl_value_t **vals)
 {
     // lookup (fobj, vals) in cache
+    JL_LOCK(&trampoline_lock);
     if (!cache->table)
         htable_new(cache, 1);
     if (fill != jl_emptysvec) {
@@ -295,6 +297,7 @@ jl_value_t *jl_get_cfunction_trampoline(
         }
     }
     void *tramp = ptrhash_get(cache, (void*)fobj);
+    JL_UNLOCK(&trampoline_lock);
     if (tramp != HT_NOTFOUND) {
         assert((jl_datatype_t*)jl_typeof(tramp) == result_type);
         return (jl_value_t*)tramp;
@@ -347,10 +350,12 @@ jl_value_t *jl_get_cfunction_trampoline(
         free(nval);
         jl_rethrow();
     }
+    JL_LOCK(&trampoline_lock);
     tramp = trampoline_alloc();
     ((void**)result)[0] = tramp;
     tramp = init_trampoline(tramp, nval);
     ptrhash_put(cache, (void*)fobj, result);
+    JL_UNLOCK(&trampoline_lock);
     return result;
 }
 JL_GCC_IGNORE_STOP

And the following for CUDA.jl:

diff --git a/lib/cudadrv/occupancy.jl b/lib/cudadrv/occupancy.jl
index 64097df8..731c4392 100644
--- a/lib/cudadrv/occupancy.jl
+++ b/lib/cudadrv/occupancy.jl
@@ -60,12 +60,10 @@ function launch_configuration(fun::CuFunction; shmem::Union{Integer,Base.Callabl
     elseif Sys.ARCH == :x86 || Sys.ARCH == :x86_64
         shmem_cint = threads -> Cint(shmem(threads))
         # `@cfunction` needs a lock currently, https://github.com/JuliaLang/julia/issues/38709
-        cb = lock(_shmem_cb_lock) do
-            @cfunction($shmem_cint, Cint, (Cint,))
-        end
+        cb = @cfunction($shmem_cint, Cint, (Cint,))
         cuOccupancyMaxPotentialBlockSize(blocks_ref, threads_ref, fun, cb, 0, max_threads)
     else
         lock(_shmem_cb_lock) do 

@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

The call to trampoline_alloc should be locked too; I've edited the diff.

@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

Ah wait I can reproduce this, even with added locks:

function doit()
    a = rand(Int)
    function f()
        a += 1
        a
    end
    cf = @cfunction $f Int ()
    GC.@preserve cf begin
        fptr = Base.unsafe_convert(Ptr{Cvoid}, cf)
        b = ccall(fptr, Int, ())
        @assert a == b
        c = ccall(fptr, Int, ())
        @assert a == c
        @assert b+1 == c
    end
end

@sync Threads.@threads for i = 1:2000000
    doit()
end
signal (11): Segmentation fault
in expression starting at /home/tim/Julia/src/julia/build/release/wip.jl:19
trampoline_alloc at /home/tim/Julia/src/julia/src/runtime_ccall.cpp:244
jl_get_cfunction_trampoline at /home/tim/Julia/src/julia/src/runtime_ccall.cpp:354
doit at /home/tim/Julia/src/julia/build/release/wip.jl:8
macro expansion at /home/tim/Julia/src/julia/build/release/wip.jl:20 [inlined]
#8#threadsfor_fun at ./threadingconstructs.jl:81
#8#threadsfor_fun at ./threadingconstructs.jl:48
unknown function (ip: 0x7f00ed12aebc)
_jl_invoke at /home/tim/Julia/src/julia/src/gf.c:2238
jl_apply_generic at /home/tim/Julia/src/julia/src/gf.c:2420
jl_apply at /home/tim/Julia/src/julia/src/julia.h:1703
start_task at /home/tim/Julia/src/julia/src/task.c:839
unknown function (ip: (nil))
Allocations: 7249168 (Pool: 7249087; Big: 81); GC: 1
zsh: segmentation fault  ./julia wip.jl

@maleadt
Copy link
Member

maleadt commented Feb 12, 2021

More extensive version of the above locks implemented in JuliaLang/julia#39621. @norci, can you test this out? You can use the built binary from that PR: https://s3.amazonaws.com/julialangnightlies/assert_pretesting/linux/x64/1.7/julia-01cb47fa8e-linux64.tar.gz

@norci
Copy link
Contributor Author

norci commented Feb 12, 2021

More extensive version of the above locks implemented in JuliaLang/julia#39621. @norci, can you test this out? You can use the built binary from that PR: https://s3.amazonaws.com/julialangnightlies/assert_pretesting/linux/x64/1.7/julia-01cb47fa8e-linux64.tar.gz

Yes. This patch fixed this issue.
Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants