We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using CUDA T = Float32 ax,aw,ab = randn(T,8,8,4,4), randn(T,3,3,4,4), randn(T,1,1,4,1) cx,cw,cb = CuArray.((ax,aw,ab)) cx2,cw2,cb2 = (x->permutedims(x,(3,1,2,4))).((cx,cw,cb)) cwhn = CUDNN.cudnnConvolutionForward(cw2,cx2;bias=cb2,format=CUDNN.CUDNN_TENSOR_NHWC)
Running under cuda-memcheck on hydor (RTX2080):
cuda-memcheck
I! CuDNN (v8004) function cudnnCreateConvolutionDescriptor() called: i! convDesc: location=host; addr=0x7f5d216796d0; i! Time: 2021-02-10T09:22:25.659723 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetConvolutionNdDescriptor() called: i! convDesc: location=host; addr=0x533cb40; i! arrayLength: type=int; val=2; i! padA: type=int; val=[0,0]; i! strideA: type=int; val=[1,1]; i! dilationA: type=int; val=[1,1]; i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! Time: 2021-02-10T09:22:25.659883 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetConvolutionMathType() called: i! convDesc: location=host; addr=0x533cb40; i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1); i! Time: 2021-02-10T09:22:25.659913 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnCreateTensorDescriptor() called: i! Time: 2021-02-10T09:22:25.668620 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetTensorNdDescriptorEx() called: i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,8,8]; i! Time: 2021-02-10T09:22:25.668714 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnCreateFilterDescriptor() called: i! Time: 2021-02-10T09:22:25.668768 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetFilterNdDescriptor() called: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! nbDims: type=int; val=4; i! filterDimA: type=int; val=[4,4,3,3]; i! Time: 2021-02-10T09:22:25.668798 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnGetConvolutionNdForwardOutputDim() called: i! convDesc: type=cudnnConvolutionDescriptor_t: i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1); i! reorderType: type=int; val=0; i! arrayLength: type=int; val=2; i! padA: type=int; val=[0,0]; i! strideA: type=int; val=[1,1]; i! dilationA: type=int; val=[1,1]; i! groupCount: type=int; val=1; i! inputTensorDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,8,8]; i! strideA: type=int; val=[256,1,32,4]; i! filterDesc: type=cudnnFilterDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! vect: type=int; val=0; i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,3,3]; i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! nbDims: type=int; val=4; i! tensorOuputDimA: location=host; addr=0x7f5d216a4170; i! Time: 2021-02-10T09:22:25.668847 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnCreateTensorDescriptor() called: i! Time: 2021-02-10T09:22:25.677434 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetTensorNdDescriptorEx() called: i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,6,6]; i! Time: 2021-02-10T09:22:25.677520 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnCreateTensorDescriptor() called: i! Time: 2021-02-10T09:22:25.677560 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetTensorNdDescriptorEx() called: i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[1,4,1,1]; i! Time: 2021-02-10T09:22:25.677578 (0d+0h+0m+22s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnCreate() called: i! handle: location=host; addr=0x7ffecf021bc0; i! Time: 2021-02-10T09:22:27.021262 (0d+0h+0m+24s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetStream() called: i! handle: type=cudnnHandle_t; streamId=(nil) (defaultStream); i! streamId: type=cudaStream_t; streamId=0x3684540; i! Time: 2021-02-10T09:22:33.929024 (0d+0h+0m+30s since start) i! Process=3146; Thread=3146; GPU=0; Handle=0x33b4820; StreamId=(nil) (defaultStream). I! CuDNN (v8004) function cudnnGetConvolutionForwardWorkspaceSize() called: i! handle: type=cudnnHandle_t; streamId=0x3684540; i! xDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,8,8]; i! strideA: type=int; val=[256,1,32,4]; i! wDesc: type=cudnnFilterDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! vect: type=int; val=0; i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,3,3]; i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! convDesc: type=cudnnConvolutionDescriptor_t: i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1); i! reorderType: type=int; val=0; i! arrayLength: type=int; val=2; i! padA: type=int; val=[0,0]; i! strideA: type=int; val=[1,1]; i! dilationA: type=int; val=[1,1]; i! groupCount: type=int; val=1; i! yDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,6,6]; i! strideA: type=int; val=[144,1,24,4]; i! algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1); i! sizeInBytes: location=host; addr=0x7f5c626c01a0; i! Time: 2021-02-10T09:22:33.937817 (0d+0h+0m+30s since start) i! Process=3146; Thread=3146; GPU=0; Handle=0x33b4820; StreamId=0x3684540. I! CuDNN (v8004) function cudnnCreateActivationDescriptor() called: i! Time: 2021-02-10T09:22:33.963387 (0d+0h+0m+30s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnSetActivationDescriptor() called: i! mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_IDENTITY (5); i! reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0); i! coef: type=double; val=1.000000; i! Time: 2021-02-10T09:22:33.963429 (0d+0h+0m+30s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. I! CuDNN (v8004) function cudnnConvolutionBiasActivationForward() called: i! handle: type=cudnnHandle_t; streamId=0x3684540; i! alpha1: type=CUDNN_DATA_FLOAT; val=1.000000; i! xDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,8,8]; i! strideA: type=int; val=[256,1,32,4]; i! xData: location=dev; addr=0x7f5b53001600; i! wDesc: type=cudnnFilterDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! vect: type=int; val=0; i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,3,3]; i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! wData: location=dev; addr=0x7f5b53002600; i! convDesc: type=cudnnConvolutionDescriptor_t: i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1); i! reorderType: type=int; val=0; i! arrayLength: type=int; val=2; i! padA: type=int; val=[0,0]; i! strideA: type=int; val=[1,1]; i! dilationA: type=int; val=[1,1]; i! groupCount: type=int; val=1; i! algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1); i! workSpace: location=dev; addr=0x7f5b53031200; i! workSpaceSizeInBytes: type=size_t; val=1920; i! alpha2: type=CUDNN_DATA_FLOAT; val=0.000000; i! zDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,6,6]; i! strideA: type=int; val=[144,1,24,4]; i! zData: location=dev; addr=0x7f5b53002c00; i! biasDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[1,4,1,1]; i! strideA: type=int; val=[4,1,4,4]; i! bias: location=dev; addr=0x7f5b53002a00; i! activationDesc: type=cudnnActivationDescriptor_t: i! coef: type=double; val=1.000000; i! mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_IDENTITY (5); i! reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0); i! yDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[4,4,6,6]; i! strideA: type=int; val=[144,1,24,4]; i! yData: location=dev; addr=0x7f5b53002c00; i! Time: 2021-02-10T09:22:33.963540 (0d+0h+0m+30s since start) i! Process=3146; Thread=3146; GPU=0; Handle=0x33b4820; StreamId=0x3684540. I! CuDNN (v8004) function cudnnDestroyActivationDescriptor() called: i! Time: 2021-02-10T09:22:48.938916 (0d+0h+0m+45s since start) i! Process=3146; Thread=3146; GPU=NULL; Handle=NULL; StreamId=NULL. ========= CUDA-MEMCHECK ========= Invalid __global__ read of size 4 ========= at 0x00003c20 in volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1 ========= by thread (63,0,0) in block (1,0,0) ========= Address 0x7f5b53002a7c is out of bounds ========= Device Frame:volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1 (volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1 : 0x3c20) ========= Saved host backtrace up to driver entry point at kernel launch time ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dc8] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x2acfee1b] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x2ad44af5] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x2817abda] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN10cask_cudnn18ImplicitGemmShaderINS_18ImplicitGemmParamsILi8ELi128EEEE3runERNS_7RunInfoEPvPKvS8_S8_S8_S8_S8_P11CUstream_st + 0x2b4) [0x2818d814] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 [0x27d240e8] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn5infer16InferNdSubEngineILb1EL19cudnnTensorFormat_t1ELS3_1ELS3_1EL15cudnnDataType_t0ELb0ELi70ELNS1_9subtree_tE0EE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st + 0x125) [0x27d30235] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st + 0x53) [0x27c06353] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineContainerIL24cudnnBackendEngineName_t34ELm113664EE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st + 0x10) [0x27c32740] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st + 0x53) [0x27c06353] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn6fusion32ConvBiasActPatternMatchingEngineINS_3cnn15EngineContainerIL24cudnnBackendEngineName_t34ELm113664EEELS4_4020ESt17integral_constantIiLin1EEE21execute_internal_implERKNS_7backend11VariantPackEP11CUstream_st + 0x3a) [0x27c8ca9a] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn3cnn15EngineInterface7executeERKNS_7backend11VariantPackEP11CUstream_st + 0x53) [0x27c06353] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn7backend7executeEP12cudnnContextRNS0_13ExecutionPlanERNS0_11VariantPackE + 0xe0) [0x27c0da50] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn7backend14EnginesAlgoMapI25cudnnConvolutionFwdAlgo_tLi8EE15execute_wrapperEP12cudnnContextS2_RNS0_13ExecutionPlanERNS0_11VariantPackE + 0x3c) [0x27d0e72c] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (_ZN5cudnn7backend32convolutionBiasActivationForwardEP12cudnnContextPKvP17cudnnTensorStructS4_P17cudnnFilterStructS4_P22cudnnConvolutionStruct25cudnnConvolutionFwdAlgo_tPvmS4_S6_S4_S6_S4_P21cudnnActivationStructS6_SC_ + 0x9bd) [0x27d0b22d] ========= Host Frame:c1f953962cc13e4a55f7b2333fb212e7f5c08817/lib/libcudnn_cnn_infer.so.8 (cudnnConvolutionBiasActivationForward + 0x221) [0x27e0b1d1] ========= Host Frame:[0x7f5c723b881b] ========= Host Frame:[0x7f5c723ba306] ========= Host Frame:[0x7f5c723ba684] ========= Host Frame:[0x7f5c723ba78c] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a] ========= Host Frame:[0x7f5c723a71b4] ========= Host Frame:[0x7f5c723a724d] ========= Host Frame:[0x7f5c723a72ad] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd3f76] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd3bce] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd4872] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xd52f8] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xf0c62] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0xf0ead] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_toplevel_eval_in + 0xaa) [0xf2a7a] ========= Host Frame:lib/julia/sys.so [0xbab258] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a] ========= Host Frame:lib/julia/sys.so [0xc0a816] ========= Host Frame:lib/julia/sys.so [0xc0a1f6] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a] ========= Host Frame:lib/julia/sys.so [0x8b5913] ========= Host Frame:lib/julia/sys.so [0x8b753b] ========= Host Frame:lib/julia/sys.so [0x8b76a6] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (jl_apply_generic + 0x1fa) [0xb7e4a] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 [0x1135d6] ========= Host Frame:bin/../lib/julia/libjulia-internal.so.1 (repl_entrypoint + 0x8d) [0x113f7d] ========= Host Frame:julia (main + 0x9) [0x7a9] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b] ========= Host Frame:julia [0x7d9]
The text was updated successfully, but these errors were encountered:
fix JuliaGPU#702: cudnnConvolutionForward fails memory checking
15f4405
denizyuret
Successfully merging a pull request may close this issue.
Running under
cuda-memcheck
on hydor (RTX2080):The text was updated successfully, but these errors were encountered: