Asymmetric padding fails on gpu models #775

vandyt · 2019-05-13T19:59:33Z

Of course, cudnn does not support asymmetric padding and CuArrays tries to handle this by forcing the use of symmetric padding; however, this still fails. For example,

julia> using Flux, CuArrays

julia> gpu_m = gpu(Conv((3,3),1=>1; pad = (1,0,1,0)));

julia> gpu_data = gpu(rand(10,10,1,1));

julia> gpu_m(gpu_data)
┌ Warning: CuDNN does not support asymmetric padding; defaulting to symmetric choice
└ @ CuArrays.CUDNN ~/.julia/packages/CuArrays/PwSdF/src/dnn/helpers.jl:114
ERROR: CUDNNError(code 3, CUDNN_STATUS_BAD_PARAM)
Stacktrace:
 [1] macro expansion at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/error.jl:19 [inlined]
 [2] cudnnGetConvolutionForwardWorkspaceSize(::CuArrays.CUDNN.TensorDesc, ::CuArrays.CUDNN.FilterDesc, ::CuArrays.CUDNN.ConvDesc, ::CuArrays.CUDNN.TensorDesc, ::Int64, ::Base.RefValue{Int32}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/libcudnn.jl:276
 [3] #cudnnGetConvolutionForwardWorkspaceSize#9(::Int64, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/libcudnn.jl:287
 [4] #cudnnGetConvolutionForwardWorkspaceSize at ./none:0 [inlined]
 [5] #conv!#22(::Int64, ::Int64, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/nnlib.jl:48
 [6] conv!(::CuArray{Float32,4}, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/CuArrays/PwSdF/src/dnn/nnlib.jl:44
 [7] macro expansion at /home/vt5/.julia/packages/NNlib/mxWRT/src/conv.jl:114 [inlined]
 [8] #conv#97(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::CuArray{Float32,4}, ::CuArray{Float32,4}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at /home/vt5/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:190
 [9] #_forward#515 at /home/vt5/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:198 [inlined]
 [10] _forward(::typeof(conv), ::CuArray{Float32,4}, ::TrackedArray{…,CuArray{Float32,4}}, ::DenseConvDims{2,(3, 3),1,1,(1, 1),(1, 0, 1, 0),(1, 1),false}) at ./none:0
 [11] #track#1(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::typeof(conv), ::CuArray{Float32,4}, ::Vararg{Any,N} where N) at /home/vt5/.julia/packages/Tracker/rQ0eB/src/Tracker.jl:51
 [12] track at /home/vt5/.julia/packages/Tracker/rQ0eB/src/Tracker.jl:51 [inlined]
 [13] #conv#513 at /home/vt5/.julia/packages/Tracker/rQ0eB/src/lib/array.jl:410 [inlined]
 [14] conv at /home/vt5/.julia/packages/Tracker/rQ0eB/src/lib/array.jl:410 [inlined]
 [15] (::Conv{2,4,typeof(identity),TrackedArray{…,CuArray{Float32,4}},TrackedArray{…,CuArray{Float32,1}}})(::CuArray{Float32,4}) at /home/vt5/.julia/packages/Flux/z0cbO/src/layers/conv.jl:55
 [16] top-level scope at none:0

I am currently using the following versions:

  Julia v1.0.1
  CuArrays v1.0.2
  Flux v0.8.3 #master (https://github.com/FluxML/Flux.jl.git)
  NNlib v0.6.0

The problem occurs in the NNlib/src/conv.jl line 112 with the call of output_size(cdims) in the creation of y. The size of y is incorrect because output_size(cdims) assumes the use of the asymmetric padding but CuArrays will force a symmetric pad in later a call.

I am not sure who (Flux, NNlib, CuArrays) should be responsible for providing a fix.

The text was updated successfully, but these errors were encountered:

MikeInnes · 2019-07-11T15:15:49Z

CuArrays is responsible for the implementation here, so it'd be worth an issue there. I'm not sure what we can do though. Actually padding the array before executing the convolution might be one option.

DrChainsaw · 2019-10-20T21:02:22Z

Hi,

I'm interested in adding this functionality (pad input manually) where it belongs (Flux, NNlib or CuArrays). NNLib seems to be the lowest level where it can fit as the API to CuArrays seems to require passing a reference to an already allocated output array (or?).

Adding it in Flux has the (maybe negligible) advantage that one can determine at layer creation time what approach to have and therefore avoid having checks for asymetric padding or not when evaluating (as done in the example in the reference below). Haven't thought so much about gpu <-> cpu, but I guess that if the logic is in the "primary" constructor it should be possible to work it out?

I made an example implementation here and it seems to have acceptable performance (if the benchmark is correct of course):
https://github.com/JuliaGPU/CuArrays.jl/issues/356

ToucheSir · 2022-01-16T01:42:54Z

This should now work after FluxML/NNlibCUDA.jl#34.

vandyt mentioned this issue May 13, 2019

Asymmetric padding fails on gpu models FluxML/NNlib.jl#117

Closed

DrChainsaw mentioned this issue May 29, 2021

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

Closed

ToucheSir closed this as completed Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asymmetric padding fails on gpu models #775

Asymmetric padding fails on gpu models #775

vandyt commented May 13, 2019

MikeInnes commented Jul 11, 2019

DrChainsaw commented Oct 20, 2019

ToucheSir commented Jan 16, 2022

Asymmetric padding fails on gpu models #775

Asymmetric padding fails on gpu models #775

Comments

vandyt commented May 13, 2019

MikeInnes commented Jul 11, 2019

DrChainsaw commented Oct 20, 2019

ToucheSir commented Jan 16, 2022