Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU on cluster: conversion to pointer not defined for CuArray{Float32,2} #286

Closed
viaudg opened this issue Jun 7, 2018 · 2 comments

Comments

@viaudg
Copy link

commented Jun 7, 2018

I built julia-0.6.2 on a computing cluster with a GPU node from sources with cmake-3.11.2, git-2.17.0, gcc-8.1.0, I then installed Flux v"0.5.1", LLVM v"0.5.1" as specified by the note there and CuArrays v"0.6.0"; during the latter, I encoutered the following warning:

WARNING: could not find CUDNN, its functionality will be unavailable

But no error. When I then tried to launch the mnist/conv.jl example from model-zoo, I had the following error:

ERROR: LoadError: conversion to pointer not defined for CuArray{Float32,2}
Stacktrace:
 [1] #conv2d!#43(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Int64, ::Int64, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at /home/viaudg/.julia/v0.6/NNlib/src/impl/conv.jl:174
 [2] (::NNlib.#kw##conv2d!)(::Array{Any,1}, ::NNlib.#conv2d!, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at ./<missing>:0
 [3] (::NNlib.#kw##conv!)(::Array{Any,1}, ::NNlib.#conv!, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at ./<missing>:0
 [4] #conv#53(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at /home/viaudg/.julia/v0.6/NNlib/src/conv.jl:29
 [5] (::NNlib.#kw##conv)(::Array{Any,1}, ::NNlib.#conv, ::CuArray{Float32,4}, ::CuArray{Float32,4}) at ./<missing>:0
 [6] track(::Flux.Tracker.Call{Flux.Tracker.#_conv,Tuple{CuArray{Float32,4},TrackedArray{…,CuArray{Float32,4}},Tuple{Int64,Int64},Tuple{Int64,Int64}}}) at /home/viaudg/.julia/v0.6/Flux/src/tracker/Tracker.jl:41
 [7] #conv#21(::Tuple{Int64,Int64}, ::Tuple{Int64,Int64}, ::Function, ::CuArray{Float32,4}, ::TrackedArray{…,CuArray{Float32,4}}) at /home/viaudg/.julia/v0.6/Flux/src/tracker/array.jl:235
 [8] (::NNlib.#kw##conv)(::Array{Any,1}, ::NNlib.#conv, ::CuArray{Float32,4}, ::TrackedArray{…,CuArray{Float32,4}}) at ./<missing>:0
 [9] (::Flux.Conv{2,NNlib.#relu,TrackedArray{…,CuArray{Float32,4}},TrackedArray{…,CuArray{Float32,1}}})(::CuArray{Float32,4}) at /home/viaudg/.julia/v0.6/Flux/src/layers/conv.jl:39
 [10] mapfoldl_impl(::Base.#identity, ::Flux.##81#82, ::CuArray{Float32,4}, ::Array{Any,1}, ::Int64) at ./reduce.jl:43
 [11] (::Flux.Chain)(::CuArray{Float32,4}) at /home/viaudg/.julia/v0.6/Flux/src/layers/basic.jl:31
 [12] include_from_node1(::String) at ./loading.jl:576
 [13] include(::String) at ./sysimg.jl:14
 [14] process_options(::Base.JLOptions) at ./client.jl:305
 [15] _start() at ./client.jl:371
while loading /workdir/viaudg/model-zoo/mnist/conv.jl, in expression starting on line 30

Does anyone have any idea why? Is this related to the warning of CUDNN?
I apologize if this is not the best repository to file such an issue, I thought I'd give it a chance here before going to CuArrays.
Many thanks.

@MikeInnes

This comment has been minimized.

Copy link
Member

commented Jun 7, 2018

I think what's happening here is that because CUDNN isn't available, we're falling back to CPU BLAS and that's causing this issue. It'd be good to have a better error message.

Either way, you can fix this by just installing CUDNN.

@viaudg

This comment has been minimized.

Copy link
Author

commented Jun 7, 2018

Thank you, that's what I suspected as well. I've installed CuDNN. For those interested, I've done so by following this procedure, I also needed to add the path to libcudnn to LD_LIBRARY_PATH as I didn't have sudo rights on this cluster and could not copy the files to /usr/local/. I rebuilt the CuArrays package and now it seems to work fine, at least when I watch -n 1 nvidia-smi things are happening on the GPU.

I'm encoutering another error, it doesn't seem to be GPU-related but I surprisingly didn't get it when running on CPU:

ERROR: LoadError: Broadcast output type Any is not concrete
Stacktrace:
 [1] broadcast_t at /home/viaudg/.julia/v0.6/CuArrays/src/broadcast.jl:34 [inlined]
 [2] broadcast_c at /home/viaudg/.julia/v0.6/CuArrays/src/broadcast.jl:63 [inlined]
 [3] broadcast at ./broadcast.jl:455 [inlined]
 [4] tracked_broadcast(::Function, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}, ::TrackedArray{…,CuArray{Float32,2}}, ::Int64) at /home/viaudg/.julia/v0.6/Flux/src/tracker/array.jl:278
 [5] #crossentropy#71(::Int64, ::Function, ::TrackedArray{…,CuArray{Float32,2}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/viaudg/.julia/v0.6/Flux/src/layers/stateless.jl:8
 [6] crossentropy(::TrackedArray{…,CuArray{Float32,2}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/viaudg/.julia/v0.6/Flux/src/layers/stateless.jl:8
 [7] loss(::CuArray{Float32,4}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /gpfs/workdir/viaudg/model-zoo/mnist/conv.jl:32
 [8] macro expansion at /home/viaudg/.julia/v0.6/Flux/src/optimise/train.jl:39 [inlined]
 [9] macro expansion at /home/viaudg/.julia/v0.6/Juno/src/progress.jl:119 [inlined]
 [10] #train!#130(::Flux.#throttled#14, ::Function, ::Function, ::Array{Tuple{CuArray{Float32,4},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}},1}, ::Flux.Optimise.##71#75) at /home/viaudg/.julia/v0.6/Flux/src/optimise/train.jl:38
 [11] (::Flux.Optimise.#kw##train!)(::Array{Any,1}, ::Flux.Optimise.#train!, ::Function, ::Array{Tuple{CuArray{Float32,4},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}},1}, ::Function) at ./<missing>:0
 [12] include_from_node1(::String) at ./loading.jl:576
 [13] include(::String) at ./sysimg.jl:14
 [14] process_options(::Base.JLOptions) at ./client.jl:305
 [15] _start() at ./client.jl:371
while loading /gpfs/workdir/viaudg/model-zoo/mnist/conv.jl, in expression starting on line 39

I will try to find what happens but if in the meantime anyone knows from where this comes from, feel free to share :)

EDIT: this is #201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.