Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change filter type back to NCHW, since it supports more algos. #53

Merged
merged 2 commits into from
Oct 24, 2018

Conversation

galv
Copy link

@galv galv commented Oct 23, 2018

Convert assertion to warning, although I am not sure this works, since
if an algorithm fails to be found because of an out-of-memory error, it
is likely to fail at training time for the same reason.

This is the current error I am getting, after these changes, when I run make -j convolution-cudnn-test && CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stderr ./convolution-cudnn-test 2 >&1 | tee:

I! CuDNN (v7102) function cudnnGetConvolutionBackwardDataWorkspaceSize() called:
i!     handle: type=cudnnHandle_t; streamId=0x2;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[2,9,1,2];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
i!     dyDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,2,11,7];
i!         strideA: type=int; val=[154,1,14,2];
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[1,1];
i!         strideA: type=int; val=[2,2];
i!         dilationA: type=int; val=[1,2];
i!         groupCount: type=int; val=1;
i!     dxDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[4,9,19,13];
i!         strideA: type=int; val=[2223,1,117,9];
i!     algo: type=cudnnConvolutionBwdDataAlgo_t; val=CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 (0);
i! Time: 2018-10-23T03:18:18.896738 (0d+0h+0m+1s since start)
i! Process=6906; Thread=6906; Handle=0x557409815c20; StreamId=0x2.

ERROR ([5.5.97-b5d202]:ComputeTempSpaceSizes():convolution-cudnn.cc:339) cudnnStatus_t 9 : "CUDNN_STATUS_NOT_SUPPORTED" returned from 'cudnnGetConvolutionBackwardDataWorkspaceSize( CuDevice::Instantiate().GetCudnnHandle(), params_desc_, output_desc_, conv_desc_, input_desc_, bwd_data_algo_, &temp_space_required_backward_data_)'

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ComputeTempSpaceSizes()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::InitCudnn()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ConvolutionComputation(kaldi::nnet3::cudnn_convolution::ConvolutionComputationConfig const&)
kaldi::nnet3::cudnn_convolution::TestConvolutionComputation()
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
  what():

Convert assertion to warning, although I am not sure this works, since
if an algorithm fails to be found because of an out-of-memory error, it
is likely to fail at training time for the same reason.
@galv
Copy link
Author

galv commented Oct 23, 2018

I did on a K10. A quick fix is probably to try a newer GPU with more memory like the K80.

@galv
Copy link
Author

galv commented Oct 23, 2018

Same problem on a 1080ti, so it's not likely to be running out of memory:

I! CuDNN (v7102) function cudnnGetConvolutionBackwardDataWorkspaceSize() called:
i!     handle: type=cudnnHandle_t; streamId=0x2;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[2,5,1,3];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
i!     dyDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[5,2,19,8];
i!         strideA: type=int; val=[304,1,16,2];
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[0,0];
i!         strideA: type=int; val=[1,2];
i!         dilationA: type=int; val=[1,2];
i!         groupCount: type=int; val=1;
i!     dxDesc: type=cudnnTensorDescriptor_t:   
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[5,5,19,20];
i!         strideA: type=int; val=[1900,1,100,5];
i!     algo: type=cudnnConvolutionBwdDataAlgo_t; val=CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 (0);
i! Time: 2018-10-23T12:17:15.081096 (0d+0h+0m+9s since start)
i! Process=44887; Thread=44887; Handle=0x55b36456e920; StreamId=0x2.

ERROR ([5.5.97-b5d202]:ComputeTempSpaceSizes():convolution-cudnn.cc:339) cudnnStatus_t 9 : "CUDNN_STATUS_NOT_SUPPORTED" returned from 'cudnnGetConvolutionBackwardDataWorkspaceSize( CuDevice::Instantiate().GetCudnnHandle(), params_desc_, output_desc_, conv_desc_, input_desc_, bwd_data_algo_, &temp_space_required_backward_data_)'

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ComputeTempSpaceSizes()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::InitCudnn()
kaldi::nnet3::cudnn_convolution::ConvolutionComputation::ConvolutionComputation(kaldi::nnet3::cudnn_convolution::ConvolutionComputationConfig const&)
kaldi::nnet3::cudnn_convolution::TestConvolutionComputation()
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
  what():

@galv
Copy link
Author

galv commented Oct 23, 2018

There's a bug in CUDNN.

Calling:

  CUDNN_SAFE_CALL(
      cudnnSetTensor4dDescriptor(input_desc_, CUDNN_TENSOR_NHWC,
                                 CUDNN_DATA_BASEFLOAT, c.num_images,
                                 c.num_channels_in, c.input_image_width,
                                 c.input_image_height));

Sets strides inappropriately as if the data format were NCHW, from what I can tell. We can work around this by using cudnnSetTensor4dDescriptorEx directly, so we can set the strides appropriately. Although the fact that this bug has been sitting around makes me a little bit concerned about the accuracy of the documentation saying which formats are compatible with which algos.

If I change the two occurrences of CUDNN_TENSOR_NHWC to CUDNN_TENSOR_NCHW in this PR, the test will pass.

cc @danpovey

@danpovey
Copy link
Owner

OK thanks. For nnet3 reasons, CUDNN_TENSOR_NCHW won't work.

Also, it seems to me there must be more than one bug in CUDNN, because apart from setting those strides inappropriately, it also segfaults later on, which surely it shouldn't do.

Regarding checking those things instead of crashing: since we are not specifying the memory limit (memoryLimitInBytes) to the cudnnGet*Algorithm() calls: I don't think it would fail for a memory reason, and I think it's valid to just crash. Initially it was failing to get any backward algorithm at all when I used the other layout for the filters. So that would be considered a code error, presumably.

@danpovey
Copy link
Owner

danpovey commented Oct 23, 2018 via email

@galv
Copy link
Author

galv commented Oct 23, 2018

Yes, my initial expectation about memory was wrong.

I know we can't use CUDNN_TENSOR_NCHW for nnet3 reasons for the input and output images. (but it's okay for filters, right? Please confirm.)

For debugging, my biggest recommendation is this:

CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stderr ./convolution-cudnn-test

It logs every CUDNN call made (even those made internally by the library itself), including the formal parameters to those calls. I then reasoned about what is calling each CUDNN function, to see where invalid inputs were coming from.

In this case, I saw this excerpt:

I! CuDNN (v7102) function cudnnSetTensor4dDescriptor() called:
i!     format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NHWC (1);
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     n: type=int; val=8;
i!     c: type=int; val=20;
i!     h: type=int; val=13;
i!     w: type=int; val=12;
i! Time: 2018-10-23T12:31:54.659314 (0d+0h+0m+2s since start)
i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL.


I! CuDNN (v7102) function cudnnSetTensor4dDescriptorEx() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     n: type=int; val=8;
i!     c: type=int; val=20;
i!     h: type=int; val=13;
i!     w: type=int; val=12;
i!     nStride: type=int; val=3120;
i!     cStride: type=int; val=1;
i!     hStride: type=int; val=240;
i!     wStride: type=int; val=20;
i! Time: 2018-10-23T12:31:54.659350 (0d+0h+0m+2s since start)
i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL.


I! CuDNN (v7102) function cudnnSetTensorNdDescriptor() called:
i!     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!     nbDims: type=int; val=4;
i!     dimA: type=int; val=[8,20,13,12];
i!     strideA: type=int; val=[3120,1,240,20];
i! Time: 2018-10-23T12:31:54.659393 (0d+0h+0m+2s since start)
i! Process=48634; Thread=48634; Handle=NULL; StreamId=NULL.

where only the first call is a call that we make from Kaldi (everything else is just functions deferring to one another). The strides looked incorrect to me. The bug is that cudnnSetTensor4dDescriptor() seems to think that we specified CUDNN_TENSOR_NCHW, rather than CUDNN_TENSOR_NHWC. As I mentioned, we can work around this by calling cudnnSetTensor4dDescriptorEx() directly, with the appropriate strides for CUDNN_TENSOR_NHWC. I just haven't done it yet since I need to get to work.

@danpovey
Copy link
Owner

danpovey commented Oct 23, 2018 via email

Use cudnnSetTensor4dDescriptor with strides we calculate ourselves
instead.
@galv
Copy link
Author

galv commented Oct 24, 2018

@danpovey Okay, the current code will run without any errors.

@danpovey
Copy link
Owner

danpovey commented Oct 24, 2018 via email

@danpovey
Copy link
Owner

danpovey commented Oct 24, 2018 via email

@danpovey danpovey merged commit e4d3383 into danpovey:cudnn Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants