Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnnConvolutionForward fails memory checking #702

Closed
maleadt opened this issue Feb 10, 2021 · 0 comments · Fixed by #703
Closed

cudnnConvolutionForward fails memory checking #702

maleadt opened this issue Feb 10, 2021 · 0 comments · Fixed by #703
Assignees
Labels
bug Something isn't working

Comments

@maleadt
Copy link
Member

maleadt commented Feb 10, 2021

using CUDA

T = Float32
ax,aw,ab = randn(T,8,8,4,4), randn(T,3,3,4,4), randn(T,1,1,4,1)
cx,cw,cb = CuArray.((ax,aw,ab))
cx2,cw2,cb2 = (x->permutedims(x,(3,1,2,4))).((cx,cw,cb))
cwhn = CUDNN.cudnnConvolutionForward(cw2,cx2;bias=cb2,format=CUDNN.CUDNN_TENSOR_NHWC)

Running under cuda-memcheck on hydor (RTX2080):

I! CuDNN (v8004) function cudnnCreateConvolutionDescriptor() called:
i!     convDesc: location=host; addr=0x7f5d216796d0;
i! Time: 2021-02-10T09:22:25.659723 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetConvolutionNdDescriptor() called:
i!     convDesc: location=host; addr=0x533cb40;
i!     arrayLength: type=int; val=2;
i!     padA: type=int; val=[0,0];
i!     strideA: type=int; val=[1,1];
i!     dilationA: type=int; val=[1,1];
i!     mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i! Time: 2021-02-10T09:22:25.659883 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetConvolutionMathType() called:
i!     convDesc: location=host; addr=0x533cb40;
i!     mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i! Time: 2021-02-10T09:22:25.659913 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-02-10T09:22:25.668620 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetTensorNdDescriptorEx() called:
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[4,4,8,8];
i! Time: 2021-02-10T09:22:25.668714 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnCreateFilterDescriptor() called:
i! Time: 2021-02-10T09:22:25.668768 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetFilterNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     nbDims: type=int; val=4;
i!     filterDimA: type=int; val=[4,4,3,3];
i! Time: 2021-02-10T09:22:25.668798 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnGetConvolutionNdForwardOutputDim() called:
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i!         reorderType: type=int; val=0;
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[1,1];
i!         groupCount: type=int; val=1;
i!     inputTensorDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,8,8];
i!         strideA: type=int; val=[256,1,32,4];
i!     filterDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,3,3];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     nbDims: type=int; val=4;
i!     tensorOuputDimA: location=host; addr=0x7f5d216a4170;
i! Time: 2021-02-10T09:22:25.668847 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-02-10T09:22:25.677434 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetTensorNdDescriptorEx() called:
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[4,4,6,6];
i! Time: 2021-02-10T09:22:25.677520 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnCreateTensorDescriptor() called:
i! Time: 2021-02-10T09:22:25.677560 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetTensorNdDescriptorEx() called:
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[1,4,1,1];
i! Time: 2021-02-10T09:22:25.677578 (0d+0h+0m+22s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnCreate() called:
i!     handle: location=host; addr=0x7ffecf021bc0;
i! Time: 2021-02-10T09:22:27.021262 (0d+0h+0m+24s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetStream() called:
i!     handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i!     streamId: type=cudaStream_t; streamId=0x3684540;
i! Time: 2021-02-10T09:22:33.929024 (0d+0h+0m+30s since start)
i! Process=3146; Thread=3146; GPU=0; Handle=0x33b4820; StreamId=(nil) (defaultStream).


I! CuDNN (v8004) function cudnnGetConvolutionForwardWorkspaceSize() called:
i!     handle: type=cudnnHandle_t; streamId=0x3684540;
i!     xDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,8,8];
i!         strideA: type=int; val=[256,1,32,4];
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,3,3];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i!         reorderType: type=int; val=0;
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[1,1];
i!         groupCount: type=int; val=1;
i!     yDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,6,6];
i!         strideA: type=int; val=[144,1,24,4];
i!     algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i!     sizeInBytes: location=host; addr=0x7f5c626c01a0;
i! Time: 2021-02-10T09:22:33.937817 (0d+0h+0m+30s since start)
i! Process=3146; Thread=3146; GPU=0; Handle=0x33b4820; StreamId=0x3684540.


I! CuDNN (v8004) function cudnnCreateActivationDescriptor() called:
i! Time: 2021-02-10T09:22:33.963387 (0d+0h+0m+30s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnSetActivationDescriptor() called:
i!     mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_IDENTITY (5);
i!     reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i!     coef: type=double; val=1.000000;
i! Time: 2021-02-10T09:22:33.963429 (0d+0h+0m+30s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v8004) function cudnnConvolutionBiasActivationForward() called:
i!     handle: type=cudnnHandle_t; streamId=0x3684540;
i!     alpha1: type=CUDNN_DATA_FLOAT; val=1.000000;
i!     xDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,8,8];
i!         strideA: type=int; val=[256,1,32,4];
i!     xData: location=dev; addr=0x7f5b53001600;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,3,3];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     wData: location=dev; addr=0x7f5b53002600;
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
i!         reorderType: type=int; val=0;
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[1,1];
i!         groupCount: type=int; val=1;
i!     algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1);
i!     workSpace: location=dev; addr=0x7f5b53031200;
i!     workSpaceSizeInBytes: type=size_t; val=1920;
i!     alpha2: type=CUDNN_DATA_FLOAT; val=0.000000;
i!     zDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,6,6];
i!         strideA: type=int; val=[144,1,24,4];
i!     zData: location=dev; addr=0x7f5b53002c00;
i!     biasDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,4,1,1];
i!         strideA: type=int; val=[4,1,4,4];
i!     bias: location=dev; addr=0x7f5b53002a00;
i!     activationDesc: type=cudnnActivationDescriptor_t: 
i!         coef: type=double; val=1.000000;
i!         mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_IDENTITY (5);
i!         reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i!     yDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,4,6,6];
i!         strideA: type=int; val=[144,1,24,4];
i!     yData: location=dev; addr=0x7f5b53002c00;
i! Time: 2021-02-10T09:22:33.963540 (0d+0h+0m+30s since start)
i! Process=3146; Thread=3146; GPU=0; Handle=0x33b4820; StreamId=0x3684540.

I! CuDNN (v8004) function cudnnDestroyActivationDescriptor() called:
i! Time: 2021-02-10T09:22:48.938916 (0d+0h+0m+45s since start)
i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL.

========= CUDA-MEMCHECK
========= Invalid __global__ read of size 4
=========     at 0x00003c20 in volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1
=========     by thread (63,0,0) in block (1,0,0)
=========     Address 0x7f5b53002a7c is out of bounds
=========     Device Frame:volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1 (volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1 : 0x3c20)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dc8]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x2acfee1b]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x2ad44af5]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x2817abda]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN10cask_cudnn18ImplicitGemmShaderINS_18ImplicitGemmParamsILi8ELi128EEEE3runERNS_7RunInfoEPvPKvS8_S8_S8_S8_S8_P11CUstream_st + 0x2b4) [0x2818d814]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x27d240e8]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn5infer16InferNdSubEngineILb1EL19cudnnTensorFormat_t1ELS3_1ELS3_1EL15cudnnDataType_t0ELb0ELi70ELNS1_9subtree_tE0EE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st + 0x125) [0x27d30235]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st + 0x53) [0x27c06353]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineContainerIL24cudnnBackendEngineName_t34ELm113664EE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st + 0x10) [0x27c32740]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st + 0x53) [0x27c06353]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn6fusion32ConvBiasActPatternMatchingEngineINS_3cnn15EngineContainerIL24cudnnBackendEngineName_t34ELm113664EEELS4_4020ESt17integral_constantIiLin1EEE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st + 0x3a) [0x27c8ca9a]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st + 0x53) [0x27c06353]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn7backend7executeEP12cudnnContextRNS0_13ExecutionPlanERNS0_11VariantPackE + 0xe0) [0x27c0da50]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn7backend14EnginesAlgoMapI25cudnnConvolutionFwdAlgo_tLi8EE15execute_wrapperEP12cudnnContextS2_RNS0_13ExecutionPlanERNS0_11VariantPackE + 0x3c) [0x27d0e72c]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn7backend32convolutionBiasActivationForwardEP12cudnnContextPKvP17cudnnTensorStructS4_P17cudnnFilterStructS4_P22cudnnConvolutionStruct25cudnnConvolutionFwdAlgo_tPvmS4_S6_S4_S6_S4_P21cudnnActivationStructS6_SC_ + 0x9bd) [0x27d0b22d]
=========     Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (cudnnConvolutionBiasActivationForward + 0x221) [0x27e0b1d1]
=========     Host Frame:[0x7f5c723b881b]
=========     Host Frame:[0x7f5c723ba306]
=========     Host Frame:[0x7f5c723ba684]
=========     Host Frame:[0x7f5c723ba78c]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a]
=========     Host Frame:[0x7f5c723a71b4]
=========     Host Frame:[0x7f5c723a724d]
=========     Host Frame:[0x7f5c723a72ad]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd3f76]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd3bce]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd4872]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd52f8]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xf0c62]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xf0ead]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_toplevel_eval_in + 0xaa) [0xf2a7a]
=========     Host Frame:lib/julia/sys.so [0xbab258]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a]
=========     Host Frame:lib/julia/sys.so [0xc0a816]
=========     Host Frame:lib/julia/sys.so [0xc0a1f6]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a]
=========     Host Frame:lib/julia/sys.so [0x8b5913]
=========     Host Frame:lib/julia/sys.so [0x8b753b]
=========     Host Frame:lib/julia/sys.so [0x8b76a6]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0x1135d6]
=========     Host Frame:bin/../lib/julia/libjulia-internal.so.1 (repl_entrypoint + 0x8d) [0x113f7d]
=========     Host Frame:julia (main + 0x9) [0x7a9]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
=========     Host Frame:julia [0x7d9]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants