Change filter type back to NCHW, since it supports more algos. #53

galv · 2018-10-23T07:24:53Z

Convert assertion to warning, although I am not sure this works, since
if an algorithm fails to be found because of an out-of-memory error, it
is likely to fail at training time for the same reason.

This is the current error I am getting, after these changes, when I run make -j convolution-cudnn-test && CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stderr ./convolution-cudnn-test 2 >&1 | tee:

I! CuDNN (v7102) function cudnnGetConvolutionBackwardDataWorkspaceSize() called:
i!     handle: type=cudnnHandle_t; streamId=0x2;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[2,9,1,2];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
i!     dyDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,2,11,7];
i!         strideA: type=int; val=[154,1,14,2];
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[1,1];
i!         strideA: type=int; val=[2,2];
i!         dilationA: type=int; val=[1,2];
i!         groupCount: type=int; val=1;
i!     dxDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,9,19,13];
i!         strideA: type=int; val=[2223,1,117,9];
i!     algo: type=cudnnConvolutionBwdDataAlgo_t; val=CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 (0);
i! Time: 2018-10-23T03:18:18.896738 (0d+0h+0m+1s since start)
i! Process=6906; Thread=6906; Handle=0x557409815c20; StreamId=0x2.

ERROR ([5.5.97-b5d202]:ComputeTempSpaceSizes():convolution-cudnn.cc:339) cudnnStatus_t 9 : "CUDNN_STATUS_NOT_SUPPORTED" returned from 'cudnnGetConvolutionBackwardDataWorkspaceSize( CuDevice::Instantiate().GetCudnnHandle(), params_desc_, output_desc_, conv_desc_, input_desc_, bwd_data_algo_, &temp_space_required_backward_data_)'

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ComputeTempSpaceSizes()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::InitCudnn()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ConvolutionComputation(kaldi::nnet3::cudnn_convolution::ConvolutionComputationConfig const&)
kaldi::nnet3::cudnn_convolution::TestConvolutionComputation()
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
  what():

Convert assertion to warning, although I am not sure this works, since if an algorithm fails to be found because of an out-of-memory error, it is likely to fail at training time for the same reason.

galv · 2018-10-23T07:31:59Z

I did on a K10. A quick fix is probably to try a newer GPU with more memory like the K80.

galv · 2018-10-23T16:19:51Z

Same problem on a 1080ti, so it's not likely to be running out of memory:

I! CuDNN (v7102) function cudnnGetConvolutionBackwardDataWorkspaceSize() called:
i!     handle: type=cudnnHandle_t; streamId=0x2;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[2,5,1,3];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
i!     dyDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[5,2,19,8];
i!         strideA: type=int; val=[304,1,16,2];
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,2];
i!         dilationA: type=int; val=[1,2];
i!         groupCount: type=int; val=1;
i!     dxDesc: type=cudnnTensorDescriptor_t:   
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[5,5,19,20];
i!         strideA: type=int; val=[1900,1,100,5];
i!     algo: type=cudnnConvolutionBwdDataAlgo_t; val=CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 (0);
i! Time: 2018-10-23T12:17:15.081096 (0d+0h+0m+9s since start)
i! Process=44887; Thread=44887; Handle=0x55b36456e920; StreamId=0x2.

ERROR ([5.5.97-b5d202]:ComputeTempSpaceSizes():convolution-cudnn.cc:339) cudnnStatus_t 9 : "CUDNN_STATUS_NOT_SUPPORTED" returned from 'cudnnGetConvolutionBackwardDataWorkspaceSize( CuDevice::Instantiate().GetCudnnHandle(), params_desc_, output_desc_, conv_desc_, input_desc_, bwd_data_algo_, &temp_space_required_backward_data_)'

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ComputeTempSpaceSizes()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::InitCudnn()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ConvolutionComputation(kaldi::nnet3::cudnn_convolution::ConvolutionComputationConfig const&)
kaldi::nnet3::cudnn_convolution::TestConvolutionComputation()
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
  what():

galv · 2018-10-23T16:38:47Z

There's a bug in CUDNN.

Calling:

  CUDNN_SAFE_CALL(
      cudnnSetTensor4dDescriptor(input_desc_, CUDNN_TENSOR_NHWC,
                                 CUDNN_DATA_BASEFLOAT, c.num_images,
                                 c.num_channels_in, c.input_image_width,
                                 c.input_image_height));

Sets strides inappropriately as if the data format were NCHW, from what I can tell. We can work around this by using cudnnSetTensor4dDescriptorEx directly, so we can set the strides appropriately. Although the fact that this bug has been sitting around makes me a little bit concerned about the accuracy of the documentation saying which formats are compatible with which algos.

If I change the two occurrences of CUDNN_TENSOR_NHWC to CUDNN_TENSOR_NCHW in this PR, the test will pass.

cc @danpovey

danpovey · 2018-10-23T16:57:52Z

OK thanks. For nnet3 reasons, CUDNN_TENSOR_NCHW won't work.

Also, it seems to me there must be more than one bug in CUDNN, because apart from setting those strides inappropriately, it also segfaults later on, which surely it shouldn't do.

Regarding checking those things instead of crashing: since we are not specifying the memory limit (memoryLimitInBytes) to the cudnnGet*Algorithm() calls: I don't think it would fail for a memory reason, and I think it's valid to just crash. Initially it was failing to get any backward algorithm at all when I used the other layout for the filters. So that would be considered a code error, presumably.

danpovey · 2018-10-23T17:00:51Z

And let me know how you found that bug, i.e. from that debugging output, what specific line you knew that there was a bug from, and what it would have been if it were correct.

…

On Tue, Oct 23, 2018 at 12:44 PM Daniel Galvez ***@***.***> wrote: There's a bug in CUDNN. Calling: CUDNN_SAFE_CALL( cudnnSetTensor4dDescriptor(input_desc_, CUDNN_TENSOR_NHWC, CUDNN_DATA_BASEFLOAT, c.num_images, c.num_channels_in, c.input_image_width, c.input_image_height)); Sets strides inappropriately as if the data format were NCHW, from what I can tell. We can work around this by using cudnnSetTensor4dDescriptorEx directly, so we can set the strides appropriately. Although the fact that this bug has been sitting around makes me a little bit concerned about the accuracy of the documentation saying which formats are compatible with which algos. If I change the two occurrences of CUDNN_TENSOR_NHWC to CUDNN_TENSOR_NCHW in this PR, the test will pass. cc @danpovey <https://github.com/danpovey> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#53 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu27Uv4DnsUDLzsjoJQ5JiDyIaMMkks5un0dWgaJpZM4X034A> .

galv · 2018-10-23T17:14:20Z

Yes, my initial expectation about memory was wrong.

I know we can't use CUDNN_TENSOR_NCHW for nnet3 reasons for the input and output images. (but it's okay for filters, right? Please confirm.)

For debugging, my biggest recommendation is this:

CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stderr ./convolution-cudnn-test

It logs every CUDNN call made (even those made internally by the library itself), including the formal parameters to those calls. I then reasoned about what is calling each CUDNN function, to see where invalid inputs were coming from.

In this case, I saw this excerpt:

I! CuDNN (v7102) function cudnnSetTensor4dDescriptor() called:
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     n: type=int; val=8;
i!     c: type=int; val=20;
i!     h: type=int; val=13;
i!     w: type=int; val=12;
i! Time: 2018-10-23T12:31:54.659314 (0d+0h+0m+2s since start)
i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL.


I! CuDNN (v7102) function cudnnSetTensor4dDescriptorEx() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     n: type=int; val=8;
i!     c: type=int; val=20;
i!     h: type=int; val=13;
i!     w: type=int; val=12;
i!     nStride: type=int; val=3120;
i!     cStride: type=int; val=1;
i!     hStride: type=int; val=240;
i!     wStride: type=int; val=20;
i! Time: 2018-10-23T12:31:54.659350 (0d+0h+0m+2s since start)
i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL.


I! CuDNN (v7102) function cudnnSetTensorNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[8,20,13,12];
i!     strideA: type=int; val=[3120,1,240,20];
i! Time: 2018-10-23T12:31:54.659393 (0d+0h+0m+2s since start)
i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL.

where only the first call is a call that we make from Kaldi (everything else is just functions deferring to one another). The strides looked incorrect to me. The bug is that cudnnSetTensor4dDescriptor() seems to think that we specified CUDNN_TENSOR_NCHW, rather than CUDNN_TENSOR_NHWC. As I mentioned, we can work around this by calling cudnnSetTensor4dDescriptorEx() directly, with the appropriate strides for CUDNN_TENSOR_NHWC. I just haven't done it yet since I need to get to work.

danpovey · 2018-10-23T17:22:43Z

OK, thanks a lot for the info. For the filters, we can use any setup. I already had to change it to get around the previous problem where none of the backward algorithms were available.

…

On Tue, Oct 23, 2018 at 1:16 PM Daniel Galvez ***@***.***> wrote: Yes, my initial expectation about memory was wrong. I know we can't use CUDNN_TENSOR_NCHW for nnet3 reasons for the input and output images. (but it's okay for filters, right? Please confirm.) For debugging, my biggest recommendation is this: CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stderr ./convolution-cudnn-test It logs every CUDNN call made (even those made internally by the library itself), including the formal parameters to those calls. I then reasoned about what is calling each CUDNN function, to see where invalid inputs were coming from. In this case, I saw this excerpt: I! CuDNN (v7102) function cudnnSetTensor4dDescriptor() called: i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! n: type=int; val=8; i! c: type=int; val=20; i! h: type=int; val=13; i! w: type=int; val=12; i! Time: 2018-10-23T12:31:54.659314 (0d+0h+0m+2s since start) i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL. I! CuDNN (v7102) function cudnnSetTensor4dDescriptorEx() called: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! n: type=int; val=8; i! c: type=int; val=20; i! h: type=int; val=13; i! w: type=int; val=12; i! nStride: type=int; val=3120; i! cStride: type=int; val=1; i! hStride: type=int; val=240; i! wStride: type=int; val=20; i! Time: 2018-10-23T12:31:54.659350 (0d+0h+0m+2s since start) i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL. I! CuDNN (v7102) function cudnnSetTensorNdDescriptor() called: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[8,20,13,12]; i! strideA: type=int; val=[3120,1,240,20]; i! Time: 2018-10-23T12:31:54.659393 (0d+0h+0m+2s since start) i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL. where only the first call is a call that we make from Kaldi (everything else is just functions deferring to one another). The strides looked incorrect to me. The bug is that cudnnSetTensor4dDescriptor() seems to think that we specified CUDNN_TENSOR_NCHW, rather than CUDNN_TENSOR_NHWC. As I mentioned, we can work around this by calling cudnnSetTensor4dDescriptorEx() directly, with the appropriate strides for CUDNN_TENSOR_NHWC. I just haven't done it yet since I need to get to work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#53 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuywfW6mTqkyU0nWBDkvrTRwbIvB0ks5un08GgaJpZM4X034A> .

Use cudnnSetTensor4dDescriptor with strides we calculate ourselves instead.

galv · 2018-10-24T03:51:43Z

@danpovey Okay, the current code will run without any errors.

danpovey · 2018-10-24T04:22:27Z

Fantastic!!!!

…

On Tue, Oct 23, 2018 at 11:51 PM Daniel Galvez ***@***.***> wrote: @danpovey <https://github.com/danpovey> Okay, the current code will run without any errors. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#53 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu0cTU54EbD2UFR0fzg4w_1HxPuu5ks5un-PPgaJpZM4X034A> .

danpovey · 2018-10-24T04:23:18Z

I'll deal with merging changes to-morrow when I have time to look at this.

…

On Wed, Oct 24, 2018 at 12:22 AM Daniel Povey ***@***.***> wrote: Fantastic!!!! On Tue, Oct 23, 2018 at 11:51 PM Daniel Galvez ***@***.***> wrote: > @danpovey <https://github.com/danpovey> Okay, the current code will run > without any errors. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#53 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ADJVu0cTU54EbD2UFR0fzg4w_1HxPuu5ks5un-PPgaJpZM4X034A> > . >

Change filter type back to NCHW, since it supports more algos.

22669f6

Convert assertion to warning, although I am not sure this works, since if an algorithm fails to be found because of an out-of-memory error, it is likely to fail at training time for the same reason.

galv mentioned this pull request Oct 23, 2018

Some work with CUDNN kaldi-asr/kaldi#2797

Open

Workaround cudnnSetTensor4dDescriptor's striding bug.

c958143

Use cudnnSetTensor4dDescriptor with strides we calculate ourselves instead.

danpovey merged commit e4d3383 into danpovey:cudnn Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change filter type back to NCHW, since it supports more algos. #53

Change filter type back to NCHW, since it supports more algos. #53

galv commented Oct 23, 2018

galv commented Oct 23, 2018

galv commented Oct 23, 2018

galv commented Oct 23, 2018

danpovey commented Oct 23, 2018

danpovey commented Oct 23, 2018 via email

galv commented Oct 23, 2018

danpovey commented Oct 23, 2018 via email

galv commented Oct 24, 2018

danpovey commented Oct 24, 2018 via email

danpovey commented Oct 24, 2018 via email

Change filter type back to NCHW, since it supports more algos. #53

Change filter type back to NCHW, since it supports more algos. #53

Conversation

galv commented Oct 23, 2018

galv commented Oct 23, 2018

galv commented Oct 23, 2018

galv commented Oct 23, 2018

danpovey commented Oct 23, 2018

danpovey commented Oct 23, 2018 via email

galv commented Oct 23, 2018

danpovey commented Oct 23, 2018 via email

galv commented Oct 24, 2018

danpovey commented Oct 24, 2018 via email

danpovey commented Oct 24, 2018 via email