Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when following example on using multiple GPUs on multiple processes #468

Closed
theogf opened this issue Oct 4, 2020 · 1 comment · Fixed by #471
Closed

Error when following example on using multiple GPUs on multiple processes #468

theogf opened this issue Oct 4, 2020 · 1 comment · Fixed by #471
Assignees
Labels
bug Something isn't working

Comments

@theogf
Copy link

theogf commented Oct 4, 2020

As described in the discourse post : https://discourse.julialang.org/t/pmap-with-multiple-gpus/47698
When following the example described in https://juliagpu.gitlab.io/CUDA.jl/usage/multigpu/#Scenario-1:-One-GPU-per-process
this error occurs :

[ Info: Worker 6 uses CuDevice(4)                                                                                                                                                                    
[ Info: Worker 2 uses CuDevice(0)                                                                                                                                                                    
[ Info: Worker 3 uses CuDevice(1)                                                                                                                                                                    
[ Info: Worker 8 uses CuDevice(6)                                                                                                                                                                    
[ Info: Worker 7 uses CuDevice(5)            
ERROR: [ Info: Worker 9 uses CuDevice(7)                                                                                                                                                             
[ Info: Worker 5 uses CuDevice(3)                                                                                                                                                                    
[ Info: Worker 4 uses CuDevice(2)                                                                                                                                                                    
LoadError: On worker 2:                                                                                                                                                                              
UndefRefError: access to undefined reference                                                                                                                                                         
getindex at ./array.jl:809 [inlined]                                                                                                                                                                 
context at /home/ubuntu/.julia/packages/CUDA/dZvbp/src/state.jl:242 [inlined]                                                                                                                        
device! at /home/ubuntu/.julia/packages/CUDA/dZvbp/src/state.jl:286                                                                                                                                  
device! at /home/ubuntu/.julia/packages/CUDA/dZvbp/src/state.jl:265 [inlined]                                                                                                                        
#2 at /home/ubuntu/ParticleFlow_Exp/julia/scripts/run_swag.jl:15                                                                                                                                     
#110 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:309                                                                                
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79                                                                       
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:88                                                                       
#96 at ./task.jl:356                                                                                                                                                                                 
Stacktrace:                                                                                                                                                                                          
 [1] (::Base.var"#770#772")(::Task) at ./asyncmap.jl:178                                                                                                                                             
 [2] foreach(::Base.var"#770#772", ::Array{Any,1}) at ./abstractarray.jl:2009                                                                                                                        
 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::Base.Iterators.Zip{Tuple{Array{Int64,1},CUDA.DeviceSet}}) at ./asyncmap.jl:178                                                           
 [4] wrap_n_exec_twice at ./asyncmap.jl:154 [inlined]                                                                                                                                                
 [5] async_usemap(::var"#1#3", ::Base.Iterators.Zip{Tuple{Array{Int64,1},CUDA.DeviceSet}}; ntasks::Int64, batch_size::Nothing) at ./asyncmap.jl:103                                                  
 [6] #asyncmap#754 at ./asyncmap.jl:81 [inlined]                                                                                                                                                     
 [7] asyncmap(::Function, ::Base.Iterators.Zip{Tuple{Array{Int64,1},CUDA.DeviceSet}}) at ./asyncmap.jl:81       

This is happening with CUDA 1.3.3

@theogf theogf added the bug Something isn't working label Oct 4, 2020
@maleadt
Copy link
Member

maleadt commented Oct 5, 2020

I cannot reproduce. Does this happen consistently? Could you try with CUDA.jl 2.0?
EDIT: ah no I can reproduce on one specific system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants