Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asymmetric padding fails on gpu models #775

Closed
vandyt opened this issue May 13, 2019 · 3 comments
Closed

Asymmetric padding fails on gpu models #775

vandyt opened this issue May 13, 2019 · 3 comments

Comments

@vandyt
Copy link

vandyt commented May 13, 2019

Of course, cudnn does not support asymmetric padding and CuArrays tries to handle this by forcing the use of symmetric padding; however, this still fails. For example,

julia> using Flux, CuArrays

julia> gpu_m = gpu(Conv((3,3),1=>1; pad = (1,0,1,0)));

julia> gpu_data = gpu(rand(10,10,1,1));

julia> gpu_m(gpu_data)
┌ Warning: CuDNN does not support asymmetric padding; defaulting to symmetric choice
└ @ CuArrays.CUDNN ~/.julia/packages/CuArrays/PwSdF/src/dnn/helpers.jl:114
ERROR: CUDNNError(code 3, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] macro expansion at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/error.jl:19 [inlined]
 [2] cudnnGetConvolutionForwardWorkspaceSize(::CuArrays.CUDNN.TensorDesc, ::CuArrays.CUDNN.FilterDesc, ::CuArrays.CUDNN.ConvDesc, ::CuArrays.CUDNN.TensorDesc, ::Int64, ::Base.RefValue{Int32}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/libcudnn.jl:276
 [3] #cudnnGetConvolutionForwardWorkspaceSize#9(::Int64, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/libcudnn.jl:287
 [4] #cudnnGetConvolutionForwardWorkspaceSize at ./none:0 [inlined]
 [5] #conv!#22(::Int64, ::Int64, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/nnlib.jl:48
 [6] conv!(::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/nnlib.jl:44
 [7] macro expansion at /home/vt5/.julia/packages/NNlib/mxWRT/src/conv.jl:114 [inlined]
 [8] #conv#97(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:190
 [9] #_forward#515 at /home/vt5/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:198 [inlined]
 [10] _forward(::typeof(conv), ::CuArray{Float32,4}, ::TrackedArray{…,CuArray{Float32,4}}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at ./none:0
 [11] #track#1(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::typeof(conv), ::CuArray{Float32,4}, ::Vararg{Any,N} where N) at /home/vt5/.julia/packages/Tracker/rQ0eB/src/Tracker.jl:51
 [12] track at /home/vt5/.julia/packages/Tracker/rQ0eB/src/Tracker.jl:51 [inlined]
 [13] #conv#513 at /home/vt5/.julia/packages/Tracker/rQ0eB/src/lib/array.jl:410 [inlined]
 [14] conv at /home/vt5/.julia/packages/Tracker/rQ0eB/src/lib/array.jl:410 [inlined]
 [15] (::Conv{2,4,typeof(identity),TrackedArray{…,CuArray{Float32,4}},TrackedArray{…,CuArray{Float32,1}}})(::CuArray{Float32,4}) at /home/vt5/.julia/packages/Flux/z0cbO/src/layers/conv.jl:55
 [16] top-level scope at none:0

I am currently using the following versions:

  Julia v1.0.1
  CuArrays v1.0.2
  Flux v0.8.3 #master (https://github.com/FluxML/Flux.jl.git)
  NNlib v0.6.0

The problem occurs in the NNlib/src/conv.jl line 112 with the call of output_size(cdims) in the creation of y. The size of y is incorrect because output_size(cdims) assumes the use of the asymmetric padding but CuArrays will force a symmetric pad in later a call.

I am not sure who (Flux, NNlib, CuArrays) should be responsible for providing a fix.

@MikeInnes
Copy link
Member

CuArrays is responsible for the implementation here, so it'd be worth an issue there. I'm not sure what we can do though. Actually padding the array before executing the convolution might be one option.

@DrChainsaw
Copy link
Contributor

Hi,

I'm interested in adding this functionality (pad input manually) where it belongs (Flux, NNlib or CuArrays). NNLib seems to be the lowest level where it can fit as the API to CuArrays seems to require passing a reference to an already allocated output array (or?).

Adding it in Flux has the (maybe negligible) advantage that one can determine at layer creation time what approach to have and therefore avoid having checks for asymetric padding or not when evaluating (as done in the example in the reference below). Haven't thought so much about gpu <-> cpu, but I guess that if the logic is in the "primary" constructor it should be possible to work it out?

I made an example implementation here and it seems to have acceptable performance (if the benchmark is correct of course):
https://github.com/JuliaGPU/CuArrays.jl/issues/356

@ToucheSir
Copy link
Member

This should now work after FluxML/NNlibCUDA.jl#34.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants