Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CUDNN tests eagerly invoke at-test for better error reporting. #710

Merged
merged 1 commit into from
Feb 12, 2021

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Feb 12, 2021

Test failures were pretty vague now, with the @test on the outside:

cudnn/convolution: Test Failed at /var/lib/buildkite-agent/builds/rtx2080-hydor-elis-ugent-be/julialang/cuda-dot-jl/test/cudnn/convolution.jl:167
  Expression: convtest(bias = cb, group = 2)

This should reveal which comparison exactly fails. Meanwhile, I've also seen:

Error in testset cudnn/convolution:
Error During Test at /home/tim/Julia/pkg/CUDA/test/cudnn/convolution.jl:142
  Test threw exception
  Expression: ay1 ≈ cudnnConvolutionForward(cw0, cx; bias, activation, mode, padding, stride, dilation, group, mathType, reorderType, alpha) |> Array
  CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
  Stacktrace:
    [3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CuArray{Float32, 4}, convDesc::cudnnConvolutionDescriptor, algo::cudnnConvolutionFwdAlgo_t, workSpace::CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CuArray{Float32, 4})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/utils/call.jl:26
    [4] macro expansion
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:105 [inlined]
    [5] macro expansion
      @ ~/Julia/pkg/CUDA/lib/utils/call.jl:144 [inlined]
    [6] cudnnConvolutionForwardAD(w::CuArray{Float32, 4}, x::CuArray{Float32, 4}, bias::Nothing, z::Nothing; y::CuArray{Float32, 4}, activation::cudnnActivationMode_t, convDesc::cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::Nothing, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:103
    [7] cudnnConvolutionForwardWithDefaults(w::CuArray{Float32, 4}, x::CuArray{Float32, 4}; padding::Int64, stride::Int64, dilation::Int64, mode::cudnnConvolutionMode_t, mathType::cudnnMathType_t, reorderType::cudnnReorderType_t, group::Int64, format::cudnnTensorFormat_t, convDesc::cudnnConvolutionDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, y::CuArray{Float32, 4}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, alpha::Int64, beta::Int64, bias::Nothing, z::Nothing, biasDesc::Nothing, zDesc::Nothing, activation::cudnnActivationMode_t, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:96
    [8] #cudnnConvolutionForward#104
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:50 [inlined]
    [9] (::var"#convtest#6"{var"#convtest#4#7"{CuArray{Float32, 4}, CuArray{Float32, 4}, Array{Float32, 4}, Array{Float32, 4}, DataType}, CuArray{Float32, 4}})(; blendz::Bool, bias::Nothing, activation::cudnnActivationMode_t, mode::cudnnConvolutionMode_t, padding::Int64, stride::Int64, dilation::Int64, group::Int64, dataType::Type, mathType::cudnnMathType_t, reorderType::cudnnReorderType_t, alpha::Int64, beta::Int64)
      @ Main ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:142
   [10] macro expansion
      @ ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:161 [inlined]

Error in testset cudnn/convolution:
Error During Test at /home/tim/Julia/pkg/CUDA/test/cudnn/convolution.jl:145
  Test threw exception
  Expression: ay1 ≈ cudnnConvolutionForward(cw0, cx, d; bias, activation, alpha) |> Array
  CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
  Stacktrace:
    [3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CuArray{Float32, 4}, convDesc::cudnnConvolutionDescriptor, algo::cudnnConvolutionFwdAlgo_t, workSpace::CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CuArray{Float32, 4})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/utils/call.jl:26
    [4] macro expansion
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:105 [inlined]
    [5] macro expansion
      @ ~/Julia/pkg/CUDA/lib/utils/call.jl:144 [inlined]
    [6] cudnnConvolutionForwardAD(w::CuArray{Float32, 4}, x::CuArray{Float32, 4}, bias::Nothing, z::Nothing; y::CuArray{Float32, 4}, activation::cudnnActivationMode_t, convDesc::cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::Nothing, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:103
    [7] cudnnConvolutionForwardWithDefaults(w::CuArray{Float32, 4}, x::CuArray{Float32, 4}; padding::Int64, stride::Int64, dilation::Int64, mode::cudnnConvolutionMode_t, mathType::cudnnMathType_t, reorderType::cudnnReorderType_t, group::Int64, format::cudnnTensorFormat_t, convDesc::cudnnConvolutionDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, y::CuArray{Float32, 4}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, alpha::Int64, beta::Int64, bias::Nothing, z::Nothing, biasDesc::Nothing, zDesc::Nothing, activation::cudnnActivationMode_t, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:96
    [8] #cudnnConvolutionForward#106
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:52 [inlined]
    [9] (::var"#convtest#6"{var"#convtest#4#7"{CuArray{Float32, 4}, CuArray{Float32, 4}, Array{Float32, 4}, Array{Float32, 4}, DataType}, CuArray{Float32, 4}})(; blendz::Bool, bias::Nothing, activation::cudnnActivationMode_t, mode::cudnnConvolutionMode_t, padding::Int64, stride::Int64, dilation::Int64, group::Int64, dataType::Type, mathType::cudnnMathType_t, reorderType::cudnnReorderType_t, alpha::Int64, beta::Int64)
      @ Main ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:145
   [10] macro expansion
      @ ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:161 [inlined]

Error in testset cudnn/convolution:
Error During Test at /home/tim/Julia/pkg/CUDA/test/cudnn/convolution.jl:146
  Test threw exception
  Expression: ay2 ≈ cudnnConvolutionForward!(cy0, cw0, cx; z = cz0, bias, activation, mode, padding, stride, dilation, group, mathType, reorderType, alpha, beta) |> Array
  CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
  Stacktrace:
    [3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CuArray{Float32, 4}, convDesc::cudnnConvolutionDescriptor, algo::cudnnConvolutionFwdAlgo_t, workSpace::CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CuArray{Float32, 4})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/utils/call.jl:26
    [4] macro expansion
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:105 [inlined]
    [5] macro expansion
      @ ~/Julia/pkg/CUDA/lib/utils/call.jl:144 [inlined]
    [6] cudnnConvolutionForwardAD(w::CuArray{Float32, 4}, x::CuArray{Float32, 4}, bias::Nothing, z::CuArray{Float32, 4}; y::CuArray{Float32, 4}, activation::cudnnActivationMode_t, convDesc::cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::CUDA.CUDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:103
    [7] #cudnnConvolutionForwardWithDefaults#108
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:96 [inlined]
    [8] #cudnnConvolutionForward!#105
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:51 [inlined]
    [9] (::var"#convtest#6"{var"#convtest#4#7"{CuArray{Float32, 4}, CuArray{Float32, 4}, Array{Float32, 4}, Array{Float32, 4}, DataType}, CuArray{Float32, 4}})(; blendz::Bool, bias::Nothing, activation::cudnnActivationMode_t, mode::cudnnConvolutionMode_t, padding::Int64, stride::Int64, dilation::Int64, group::Int64, dataType::Type, mathType::cudnnMathType_t, reorderType::cudnnReorderType_t, alpha::Int64, beta::Int64)
      @ Main ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:146
   [10] macro expansion
      @ ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:161 [inlined]

Error in testset cudnn/convolution:
Error During Test at /home/tim/Julia/pkg/CUDA/test/cudnn/convolution.jl:149
  Test threw exception
  Expression: ay2 ≈ cudnnConvolutionForward!(cy1, cw0, cx, d; z = cz1, bias, activation, alpha, beta) |> Array
  CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
  Stacktrace:
    [3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CuArray{Float32, 4}, convDesc::cudnnConvolutionDescriptor, algo::cudnnConvolutionFwdAlgo_t, workSpace::CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CuArray{Float32, 4})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/utils/call.jl:26
    [4] macro expansion
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:105 [inlined]
    [5] macro expansion
      @ ~/Julia/pkg/CUDA/lib/utils/call.jl:144 [inlined]
    [6] cudnnConvolutionForwardAD(w::CuArray{Float32, 4}, x::CuArray{Float32, 4}, bias::Nothing, z::CuArray{Float32, 4}; y::CuArray{Float32, 4}, activation::cudnnActivationMode_t, convDesc::cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::CUDA.CUDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
      @ CUDA.CUDNN ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:103
    [7] #cudnnConvolutionForwardWithDefaults#108
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:96 [inlined]
    [8] #cudnnConvolutionForward!#107
      @ ~/Julia/pkg/CUDA/lib/cudnn/convolution.jl:53 [inlined]
    [9] (::var"#convtest#6"{var"#convtest#4#7"{CuArray{Float32, 4}, CuArray{Float32, 4}, Array{Float32, 4}, Array{Float32, 4}, DataType}, CuArray{Float32, 4}})(; blendz::Bool, bias::Nothing, activation::cudnnActivationMode_t, mode::cudnnConvolutionMode_t, padding::Int64, stride::Int64, dilation::Int64, group::Int64, dataType::Type, mathType::cudnnMathType_t, reorderType::cudnnReorderType_t, alpha::Int64, beta::Int64)
      @ Main ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:149
   [10] macro expansion
      @ ~/Julia/pkg/CUDA/test/cudnn/convolution.jl:161 [inlined]

cc @denizyuret; looks like there's still something wrong with the CUDNN convolution wrappers.

@maleadt maleadt added the ci Everything related to continuous integration. label Feb 12, 2021
@codecov
Copy link

codecov bot commented Feb 12, 2021

Codecov Report

Merging #710 (01fd0f7) into master (4f5b790) will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   79.87%   79.85%   -0.03%     
==========================================
  Files         122      122              
  Lines        7380     7380              
==========================================
- Hits         5895     5893       -2     
- Misses       1485     1487       +2     
Impacted Files Coverage Δ
lib/cusparse/level3.jl 70.62% <0.00%> (-0.70%) ⬇️
lib/cudadrv/memory.jl 82.06% <0.00%> (-0.45%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f5b790...01fd0f7. Read the comment docs.

@maleadt maleadt merged commit c88bc3e into master Feb 12, 2021
@maleadt maleadt deleted the tb/cudnn_tests branch February 12, 2021 12:15
@denizyuret
Copy link
Contributor

The reason I had switched to the testing method before was to find out what parameter combination caused the error in the error message. In your new setup how do we get which keyword argument combination caused the error? Stack trace?

@denizyuret
Copy link
Contributor

Tests passed for me again :(
Should I try replicating on the CI machine?

@maleadt
Copy link
Member Author

maleadt commented Feb 13, 2021

Yeah the stack trace should make that clear. I haven't been able to reproduce consistently either, even on the CI machine, so this issue seems better hidden. But with additional logging I hope to get a clue from the failure logs, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Everything related to continuous integration.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants