Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching devices causes GC errors #731

Closed
marius311 opened this issue Feb 23, 2021 · 1 comment · Fixed by #732
Closed

Switching devices causes GC errors #731

marius311 opened this issue Feb 23, 2021 · 1 comment · Fixed by #732
Labels
bug Something isn't working cuda array Stuff about CuArray.

Comments

@marius311
Copy link
Contributor

Allocating an array, switching devices, then triggering GC, seems to cause errors. Its unclear to me to what extent this is supposed to work or whether this is too experimental, but it certainly hampers single-process multi-GPU work quite a bit (which otherwise seems very doable) so if there's an easy fix it'd be great to have one.

Here's a MWE (Julia 1.6, CUDA 2.6.1):

julia> using CUDA

julia> device!(0)

julia> x = CUDA.rand(2,2)
2×2 CuArray{Float32, 2}:
 0.386771  0.448549
 0.419093  0.383297

julia> device!(1)

julia> x = nothing

julia> GC.gc(true)
WARNING: Error while freeing CuPtr{Nothing}(0x00002aab9fe30000):
Base.KeyError(key=CUDA.CuPtr{Nothing}(0x00002aab9fe30000))

The bug is easy enough to understand, this line looks up the pointer in the pool for the current device, rather than the one in which it was allocated, so its not there.

Stacktrace:
  [1] getindex
    @ ./dict.jl:482 [inlined]
  [2] free
    @ ~/.julia/packages/CUDA/Zmd60/src/pool.jl:347 [inlined]
  [3] unsafe_free!(xs::CuArray{Float32, 2})
    @ CUDA ~/.julia/packages/CUDA/Zmd60/src/array.jl:42
  [4] gc(full::Bool)
    @ Base.GC ./gcutils.jl:94
  [5] top-level scope
    @ REPL[6]:1
  [6] eval(m::Module, e::Any)
    @ Core ./boot.jl:360
  [7] eval_user_input(ast::Any, backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
  [8] repl_backend_loop(backend::REPL.REPLBackend)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
  [9] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
 [10] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
 [11] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
 [12] (::Base.var"#875#877"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:387
 [13] #invokelatest#2
    @ ./essentials.jl:707 [inlined]
 [14] invokelatest
    @ ./essentials.jl:706 [inlined]
 [15] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:372
 [16] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:302
 [17] _start()
    @ Base ./client.jl:485
@maleadt
Copy link
Member

maleadt commented Feb 23, 2021

Thanks for the clear bug report and MWE!

@maleadt maleadt added the cuda array Stuff about CuArray. label Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda array Stuff about CuArray.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants