Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock during OOM #706

Closed
norci opened this issue Feb 11, 2021 · 14 comments · Fixed by #734
Closed

Deadlock during OOM #706

norci opened this issue Feb 11, 2021 · 14 comments · Fixed by #734
Assignees
Labels
bug Something isn't working

Comments

@norci
Copy link
Contributor

norci commented Feb 11, 2021

On the master branch.

It seems a deadlock, when executing the following code.

I guess this is related to GC.
the GPU memory was full.
the CPU usage was 100%.

using CUDA, Random

for _ = 1:10
    @info "|"
    @sync Threads.@threads for i = 1:200
        NVTX.@range Random.randstring() CUDA.stream!(CuStream()) do
            x = CUDA.rand(100, 100, 100)
            x ./= sum(x; dims = 2)
        end
    end
    @info "-" CUDA.memory_status()
end

output:

# We can see the allocated memory increases...

[ Info: |
Effective GPU memory usage: 95.61% (7.421 GiB/7.762 GiB)
CUDA allocator usage: 5.649 GiB
binned usage: 5.649 GiB (5.649 GiB allocated, 0 bytes cached)
┌ Info: -
└   CUDA.memory_status() = nothing
[ Info: |
^C^C^C^C^CWARNING: Force throwing a SIGINT
^C^C^C^C^C^C^CSegmentation fault (core dumped)

@maleadt ,
Could you have a look? thanks.

I have some knowledge about CUDA dev. But I have no idea about Julia. I'm sorry to say I'm not able to help.

@norci norci added the bug Something isn't working label Feb 11, 2021
@norci norci changed the title CUDA.stream! is not thread safe CUDA.stream! is not thread safe, or memory leak? Feb 11, 2021
@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

I'll have a look. Note that you don't generally need to CUDA.stream!(CuStream()); every task gets its own stream already.

@maleadt maleadt self-assigned this Feb 11, 2021
@norci
Copy link
Contributor Author

norci commented Feb 11, 2021

I'll have a look. Note that you don't generally need to CUDA.stream!(CuStream()); every task gets its own stream already.

does "task" mean a task created by @async or Threads.@spawn ?

I think most Julia functions are synchronized.
if I use @async to create a few tasks, will the scheduler switch to another one, when a function is blocked by a cuda function call?

@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

does "task" mean a task created by @async or Threads.@spawn ?

Both. Each task also gets its own NVTX identifier, so you don't need to do the NVTX.@range randstring().

I think most Julia functions are synchronized.

In the CUDA sense of synchronization? I've been changing that (this also answers your third question), by (1) changing blocking calls to be asynchronous on the task-local stream + an explicit call to synchronize(stream()), and (2) making synchronize() yield back to the Julia scheduler to allow other tasks to run.

@maleadt maleadt changed the title CUDA.stream! is not thread safe, or memory leak? Initial compilation appears to deadlock Feb 11, 2021
@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

Can you try JuliaGPU/GPUCompiler.jl#150? I'm not seeing the deadlock, but the initial compilation was very slow due to the runtime being generated multiple times over.

@norci
Copy link
Contributor Author

norci commented Feb 12, 2021

Can you try JuliaGPU/GPUCompiler.jl#150? I'm not seeing the deadlock, but the initial compilation was very slow due to the runtime being generated multiple times over.

No.
I still got the same error, with that patch.

@maleadt
Copy link
Member

maleadt commented Feb 12, 2021

Which error? You don't report an error in this issue, but a deadlock.

@norci
Copy link
Contributor Author

norci commented Feb 12, 2021

It's memory leak error.

Effective GPU memory usage: 95.61% (7.421 GiB/7.762 GiB)
binned usage: 5.649 GiB (5.649 GiB allocated, 0 bytes cached)

and dead lock, after the GPU run out of memory,

@maleadt
Copy link
Member

maleadt commented Feb 12, 2021

Is this with the modified Julia from #707? I can't reproduce this.

@norci
Copy link
Contributor Author

norci commented Feb 12, 2021

this happens on Julia Version 1.6.0-rc1.

@maleadt
Copy link
Member

maleadt commented Feb 12, 2021

Yeah, so please test with that modified build.

@norci
Copy link
Contributor Author

norci commented Feb 12, 2021

Yeah, so please test with that modified build.

The modified build has the same problem.

I'm wondering why this error can not be reproduced in your env.

In my test env, I only added CUDA.

(testcuda) pkg> st --manifest
      Status `/code/testcuda/Manifest.toml`
  [621f4979] AbstractFFTs v1.0.0
  [79e6a3ab] Adapt v3.2.0
  [ab4f0b2a] BFloat16s v0.1.0
  [fa961155] CEnum v0.4.1
  [052768ef] CUDA v2.6.0 `/julia_depot/dev/CUDA`
  [d360d2e6] ChainRulesCore v0.9.28
  [34da2185] Compat v3.25.0
  [864edb3b] DataStructures v0.18.9
  [e2ba6199] ExprTools v0.1.3
  [0c68f7d7] GPUArrays v6.2.0
  [61eb1bfa] GPUCompiler v0.10.0
  [929cbde3] LLVM v3.6.0
  [1914dd2f] MacroTools v0.5.6
  [c03570c3] Memoize v0.4.4
  [872c559c] NNlib v0.7.14
  [bac558e1] OrderedCollections v1.3.3
  [189a3867] Reexport v1.0.0
  [ae029012] Requires v1.1.2
  [6c6a2e73] Scratch v1.0.3
  [a759f4b9] TimerOutputs v0.5.7
...

I got different error msg, after I reduced the thread number

begin
    using CUDA, Random
    for _ = 1:100
        @info "-"^30
        @sync Threads.@threads for i = 1:20
            x = CUDA.rand(100, 100, 100)
            x ./= sum(x; dims = 2)
        end
        CUDA.memory_status()
    end
end

log

[ Info: ------------------------------
Effective GPU memory usage: 98.92% (7.678 GiB/7.762 GiB)
CUDA allocator usage: 5.237 GiB
binned usage: 5.237 GiB (5.237 GiB allocated, 0 bytes cached)
[ Info: ------------------------------
ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:317 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:34
 [3] macro expansion
   @ ./threadingconstructs.jl:93 [inlined]
 [4] macro expansion
   @ ./task.jl:382 [inlined]
 [5] top-level scope
   @ ./REPL[2]:5

    nested task error: CURANDError: memory allocation failed (code 102, CURAND_STATUS_ALLOCATION_FAILED)
    Stacktrace:
      [1] throw_api_error(res::CUDA.CURAND.curandStatus)
        @ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/error.jl:53
      [2] seed!(rng::CUDA.CURAND.RNG, seed::UInt64, offset::Int64)
        @ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/random.jl:45
      [3] seed!(rng::CUDA.CURAND.RNG)
        @ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/random.jl:38
      [4] (::CUDA.CURAND.var"#46#50"{CuContext})()
        @ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/CURAND.jl:65
      [5] get!
        @ ./iddict.jl:163 [inlined]
      [6] default_rng()
        @ CUDA.CURAND /julia_depot/dev/CUDA/lib/curand/CURAND.jl:43
      [7] rand
        @ /julia_depot/dev/CUDA/src/random.jl:70 [inlined]

Could you try increasing the loop count, in order to make the GPU OOM?

So I think the main problem is memory leak. then why dead lock after OOM?

@maleadt
Copy link
Member

maleadt commented Feb 12, 2021

[61eb1bfa] GPUCompiler v0.10.0

That's not using the proposed fix?

@norci
Copy link
Contributor Author

norci commented Feb 12, 2021

sorry I reverted GPUCompiler before.
with GPUCompiler on tb/lock_runtime branch, the result is the same.

Can we add this code as a test case, and run on the CI system?

This was referenced Feb 23, 2021
@maleadt
Copy link
Member

maleadt commented Feb 23, 2021

I finally could reproduce this :-) A fix is up on #734, could you verify?

@maleadt maleadt changed the title Initial compilation appears to deadlock Deadlock during OOM Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants