Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similar(x, (1,2,3)) and similar(x,1,2,3) differ for TrackedArray on GPU #734

Open
aterenin opened this Issue Apr 13, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@aterenin
Copy link

commented Apr 13, 2019

MWE below. This was suggested to me on Slack as a workaround for image upsampling due to CuArrays not supporting repeat.

randn(Float32, (4,4,64,1)) |> gpu |> param |> x -> begin
    r = reshape(x, (4,1,4,1,64,1))
    w = similar(x, (1,2,1,2,1,1))
    fill!(w, 1.0f0)
    r .* w
  end

Replacing with w = similar(x, 1,2,1,2,1,1) or with w = similar(x |> Flux.data, (1,2,1,2,1,1)) fixes the issue. Removing gpu or param also gets rid of the issue.

ERROR: GPU compilation of #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,6,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,NTuple{6,Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,6,CUDAnative.AS.Global},NTuple{6,Bool},NTuple{6,Int6
4}},Base.Broadcast.Extruded{Array{Float32,6},NTuple{6,Bool},NTuple{6,Int64}}}}) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,NTuple{6,Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,6,CUDAnative.AS.Global},NTuple{6,Bool},NTuple{6,Int64}},Base.Broadcast.Extruded{Array{Float32,6},NTuple{6,Bool},NTuple
{6,Int64}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.

Stacktrace:
 [1] check_invocation(::CUDAnative.CompilerContext, ::LLVM.Function) at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/compiler/validation.jl:35
 [2] compile(::CUDAnative.CompilerContext) at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:94
 [3] #compile#109(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::VersionNumber, ::Any, ::Any) at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:45
 [4] compile at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:43 [inlined]
 [5] #compile#108(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CUDAdrv.CuDevice, ::Function, ::Any) at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:18
 [6] compile at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/compiler/driver.jl:16 [inlined]
 [7] macro expansion at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/execution.jl:269 [inlined]
 [8] #cufunction#123(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::getfield(GPUArrays, Symbol("##23#24")), ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,6,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,NTup
le{6,Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,6,CUDAnative.AS.Global},NTuple{6,Bool},NTuple{6,Int64}},Base.Broadcast.Extruded{Array{Float32,6},NTuple{6,Bool},NTuple{6,Int64}}}}}}) at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/execution.jl:240
 [9] cufunction(::Function, ::Type) at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/execution.jl:240
 [10] macro expansion at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/execution.jl:208 [inlined]
 [11] macro expansion at ./gcutils.jl:87 [inlined]
 [12] macro expansion at /home/at6617/.julia/packages/CUDAnative/PFgO3/src/execution.jl:205 [inlined]
 [13] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,6}, ::Tuple{CuArray{Float32,6},Base.Broadcast.Broadcasted{Nothing,NTuple{6,Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{CuArray{Float32,6},NTuple{6,Bool},NTuple{6,Int64}},Base.Broadcast.Extruded{Array{Float32,6},N
Tuple{6,Bool},NTuple{6,Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/at6617/.julia/packages/CuArrays/qZCAt/src/gpuarray_interface.jl:59
 [14] gpu_call(::Function, ::CuArray{Float32,6}, ::Tuple{CuArray{Float32,6},Base.Broadcast.Broadcasted{Nothing,NTuple{6,Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{CuArray{Float32,6},NTuple{6,Bool},NTuple{6,Int64}},Base.Broadcast.Extruded{Array{Float32,6},NTuple{6,Bool},NTuple{6,Int64
}}}}}, ::Int64) at /home/at6617/.julia/packages/GPUArrays/t8tJB/src/abstract_gpu_interface.jl:151
 [15] gpu_call at /home/at6617/.julia/packages/GPUArrays/t8tJB/src/abstract_gpu_interface.jl:128 [inlined]
 [16] copyto! at /home/at6617/.julia/packages/GPUArrays/t8tJB/src/broadcast.jl:48 [inlined]
 [17] copyto! at ./broadcast.jl:797 [inlined]
 [18] copy at ./broadcast.jl:773 [inlined]
 [19] materialize at ./broadcast.jl:753 [inlined]
 [20] broadcast(::typeof(*), ::CuArray{Float32,6}, ::Array{Float32,6}) at ./broadcast.jl:707
 [21] ∇broadcast at /home/at6617/.julia/packages/Tracker/6wcYJ/src/lib/array.jl:458 [inlined]
 [22] materialize at /home/at6617/.julia/packages/Tracker/6wcYJ/src/lib/array.jl:489 [inlined]
 [23] #117 at ./none:5 [inlined]
 [24] |>(::TrackedArray{…,CuArray{Float32,4}}, ::getfield(Main, Symbol("##117#118"))) at ./operators.jl:813
 [25] top-level scope at none:0
@aterenin

This comment has been minimized.

Copy link
Author

commented Apr 14, 2019

Update: the culprit here is similar. MWE updated.

@aterenin aterenin changed the title Reshape with element-wise product fails with TrackedArray on GPU similar(x, (1,2,3)) and similar(x,1,2,3) differ for TrackedArray on GPU Apr 14, 2019

@dhairyagandhi96

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

I am able to run this just fine, I am on Flux#master and CuArrays#master.

By the by; repeat in this environment also works alright

julia> randn(Float32, (4,4,64,1)) |> gpu |> param |> x -> begin
           r = reshape(x, (4,1,4,1,64,1))
           w = similar(x, (1,2,1,2,1,1))
           fill!(w, 1.0f0)
           r .* w
         end |> typeof
TrackedArray{…,CuArray{Float32,6}}

julia> repeat(cu(rand(3,3)), inner = (4,5), outer = (1,3))
12×45 CuArray{Float32,2}:
 0.114308  0.114308  0.1143080.363503   0.363503
 ...

(temop) pkg> st
    Status `~/temp/temop/Project.toml`
  [3a865a2d] CuArrays v1.0.2 [`~/.julia/dev/CuArrays`]
  [587475ba] Flux v0.8.2+ [`~/.julia/dev/Flux`]
  [0c68f7d7] GPUArrays v0.7.0 [`~/.julia/dev/GPUArrays`]
  [872c559c] NNlib v0.6.0 [`~/.julia/dev/NNlib`]
  [9f7883ad] Tracker v0.1.0+ [`~/.julia/dev/Tracker`]
@aterenin

This comment has been minimized.

Copy link
Author

commented Apr 19, 2019

I'm unable to get CuArrays#master to precompile, it gives the error ERROR: LoadError: LoadError: LoadError: UndefVarError: DenseConvDims not defined which seems to come from NNlib. However, I can't update that to NNlib#master due to ERROR: Unsatisfiable requirements detected for package CUDAdrv.

I just tried the MWE again on release versions, and it does indeed not work on those. It's possible that it has been fixed since.

For repeat, be sure you run CuArrays.allowscalar(false) first, otherwise it will default to an abstract implementation which is slower than copying to CPU, running repeat there, and copying back.

@dhairyagandhi96

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

Ah, scalar indexing does in fact creep in in repeat.

I usually try to keep an environment that has CUDAdrv, CUDAnative, CuArrays and NNlib on their respective master branches, since these need to be synced, esp since they have been fairly fast moving targets of late

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.