-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock during OOM #706
Comments
I'll have a look. Note that you don't generally need to |
does "task" mean a task created by I think most Julia functions are synchronized. |
Both. Each task also gets its own NVTX identifier, so you don't need to do the
In the CUDA sense of synchronization? I've been changing that (this also answers your third question), by (1) changing blocking calls to be asynchronous on the task-local stream + an explicit call to |
Can you try JuliaGPU/GPUCompiler.jl#150? I'm not seeing the deadlock, but the initial compilation was very slow due to the runtime being generated multiple times over. |
No. |
Which error? You don't report an error in this issue, but a deadlock. |
It's memory leak error.
and dead lock, after the GPU run out of memory, |
Is this with the modified Julia from #707? I can't reproduce this. |
this happens on |
Yeah, so please test with that modified build. |
The modified build has the same problem. I'm wondering why this error can not be reproduced in your env. In my test env, I only added CUDA. (testcuda) pkg> st --manifest
Status `/code/testcuda/Manifest.toml`
[621f4979] AbstractFFTs v1.0.0
[79e6a3ab] Adapt v3.2.0
[ab4f0b2a] BFloat16s v0.1.0
[fa961155] CEnum v0.4.1
[052768ef] CUDA v2.6.0 `/julia_depot/dev/CUDA`
[d360d2e6] ChainRulesCore v0.9.28
[34da2185] Compat v3.25.0
[864edb3b] DataStructures v0.18.9
[e2ba6199] ExprTools v0.1.3
[0c68f7d7] GPUArrays v6.2.0
[61eb1bfa] GPUCompiler v0.10.0
[929cbde3] LLVM v3.6.0
[1914dd2f] MacroTools v0.5.6
[c03570c3] Memoize v0.4.4
[872c559c] NNlib v0.7.14
[bac558e1] OrderedCollections v1.3.3
[189a3867] Reexport v1.0.0
[ae029012] Requires v1.1.2
[6c6a2e73] Scratch v1.0.3
[a759f4b9] TimerOutputs v0.5.7
... I got different error msg, after I reduced the thread number begin
using CUDA, Random
for _ = 1:100
@info "-"^30
@sync Threads.@threads for i = 1:20
x = CUDA.rand(100, 100, 100)
x ./= sum(x; dims = 2)
end
CUDA.memory_status()
end
end log [ Info: ------------------------------
Effective GPU memory usage: 98.92% (7.678 GiB/7.762 GiB)
CUDA allocator usage: 5.237 GiB
binned usage: 5.237 GiB (5.237 GiB allocated, 0 bytes cached)
[ Info: ------------------------------
ERROR: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:317 [inlined]
[2] threading_run(func::Function)
@ Base.Threads ./threadingconstructs.jl:34
[3] macro expansion
@ ./threadingconstructs.jl:93 [inlined]
[4] macro expansion
@ ./task.jl:382 [inlined]
[5] top-level scope
@ ./REPL[2]:5
nested task error: CURANDError: memory allocation failed (code 102, CURAND_STATUS_ALLOCATION_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.CURAND.curandStatus)
@ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/error.jl:53
[2] seed!(rng::CUDA.CURAND.RNG, seed::UInt64, offset::Int64)
@ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/random.jl:45
[3] seed!(rng::CUDA.CURAND.RNG)
@ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/random.jl:38
[4] (::CUDA.CURAND.var"#46#50"{CuContext})()
@ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/CURAND.jl:65
[5] get!
@ ./iddict.jl:163 [inlined]
[6] default_rng()
@ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/CURAND.jl:43
[7] rand
@ /julia_depot/dev/CUDA/src/random.jl:70 [inlined] Could you try increasing the loop count, in order to make the GPU OOM? So I think the main problem is memory leak. then why dead lock after OOM? |
That's not using the proposed fix? |
sorry I reverted GPUCompiler before. Can we add this code as a test case, and run on the CI system? |
I finally could reproduce this :-) A fix is up on #734, could you verify? |
On the master branch.
It seems a deadlock, when executing the following code.
I guess this is related to GC.
the GPU memory was full.
the CPU usage was 100%.
output:
@maleadt ,
Could you have a look? thanks.
I have some knowledge about CUDA dev. But I have no idea about Julia. I'm sorry to say I'm not able to help.
The text was updated successfully, but these errors were encountered: