Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the correct CUDNN scaling parameter type. #454

Merged
merged 1 commit into from
Oct 29, 2020
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Sep 28, 2020

Fixes #92

cc @ DrChainsaw @hgt312

@DhairyaLGandhi Could you test this with Flux? There's very few CUDNN tests here, and we're close to release. FWIW, Flux' tests still pass, or at least throw the same errors as they did before this PR (UndefVarError: ALL_LOSSES not defined).

@maleadt maleadt added bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. labels Sep 28, 2020
@DhairyaLGandhi
Copy link
Member

ALL_Losses was removed some time back and code moved to a sub-package. Is it called directly when testing CUDA?

@maleadt
Copy link
Member Author

maleadt commented Sep 28, 2020

Ah no I see what's up with that, I'm running the CUDA tests in isolation but it appears the Flux tests are stateful. Anyway, unrelated to this PR.

@DhairyaLGandhi
Copy link
Member

Started the tests now

@codecov
Copy link

codecov bot commented Sep 28, 2020

Codecov Report

Merging #454 into master will decrease coverage by 0.01%.
The diff coverage is 57.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #454      +/-   ##
==========================================
- Coverage   80.77%   80.75%   -0.02%     
==========================================
  Files         166      166              
  Lines        9086     9090       +4     
==========================================
+ Hits         7339     7341       +2     
- Misses       1747     1749       +2     
Impacted Files Coverage Δ
lib/cudnn/activation.jl 100.00% <ø> (ø)
lib/cudnn/conv.jl 50.00% <ø> (ø)
lib/cudnn/pooling.jl 94.73% <ø> (ø)
lib/cudnn/softmax.jl 100.00% <ø> (ø)
lib/cudnn/tensor.jl 59.52% <ø> (ø)
lib/cudnn/util.jl 50.00% <50.00%> (ø)
lib/cudnn/batchnorm.jl 36.95% <66.66%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b529985...055c680. Read the comment docs.

@DhairyaLGandhi
Copy link
Member

I see failures in curnn tests as well as some movement tests

https://gitlab.com/JuliaGPU/Flux.jl/-/pipelines/195453079

@maleadt
Copy link
Member Author

maleadt commented Sep 28, 2020

https://gitlab.com/JuliaGPU/Flux.jl/-/pipelines/195453079

That's not going to show much: 1.3 nor nightly are supported by CUDA.jl.

Strange that you see failures, everything passes locally. Could you post some details?

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Sep 28, 2020

Ah, my bad I forgot to push the Project toml and gitlab config.

It's on this branch https://github.com/FluxML/Flux.jl/tree/test_cudnn

https://gitlab.com/JuliaGPU/Flux.jl/-/pipelines/195461096

@maleadt
Copy link
Member Author

maleadt commented Sep 28, 2020

Using Docker executor with image nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 ...

You shouldn't be running with an explicit image tag. CUDNN 7 is unsupported now.

@maleadt
Copy link
Member Author

maleadt commented Sep 28, 2020

Also, lots of Flux failures on CUDA.jl#master too: #455 (comment). Didn't you recently validate the master branch with Flux, or did I misunderstand

@DrChainsaw
Copy link

Wow, I kinda gave up on this issue when I saw some other issue about float16 support which made it seem like there was some fundamental difference in how Julia did float16.

Sorry for stupid questin but would this be enough to get the equivalent of model.use_half_type() in other frameworks or is that something completely different?

Fwiw, I get the similar errors when I test Flux master with CUDA master and this branch. The tests labeled "Conv GPU grad tests" pass

"Flux master + CUDA master"
Test Summary:                    | Pass  Fail  Error  Broken  Total
CUDA                             |   77    21      1      34    133
  CUDA                           |    9                           9
  onecold gpu                    |    2                           2
  restructure gpu                |    1                           1
  GPU functors                   |    2                           2
  Losses                         |   29            1             30
    GPU grad tests               |   24            1             25
  Basic GPU Movement             |    2                           2
  Conv GPU grad tests            |    6                    1      7
  Pooling GPU grad tests         |    2                           2
  AdaptivePooling GPU grad tests |    2                           2
  Dropout GPU grad tests         |    1                    1      2
  Normalising GPU grad tests     |    3     1                     4
    LayerNorm GPU grad test      |    1     1                     2
    BatchNorm GPU grad test      |    2                           2
  InstanceNorm GPU grad tests    |                         1      1
  GroupNorm GPU grad tests       |                         1      1
  Stateless GPU grad tests       |    1                           1
  CUDNN BatchNorm                |    8                           8
  R = RNN                        |    1                    2      3
  R = GRU                        |    1                    2      3
  R = LSTM                       |    1                    2      3
  RNN                            |    6    20             24     50
    R = RNN, batch_size = 1      |    1     3              4      8
    R = RNN, batch_size = 5      |    1     3              4      8
    R = GRU, batch_size = 1      |    1     3              4      8
    R = GRU, batch_size = 5      |    1     3              4      8
    R = LSTM, batch_size = 1     |    1     4              4      9
    R = LSTM, batch_size = 5     |    1     4              4      9
"Flux master + CUDA master"
(cutest) pkg> add CUDA#tb/cudnn_scalar_type
   Updating git-repo `https://github.com/JuliaGPU/CUDA.jl.git`
  Resolving package versions...
Updating `E:\Programs\julia\.julia\dev\cutest\Project.toml`
  [052768ef] ~ CUDA v1.3.0 `https://github.com/JuliaGPU/CUDA.jl.git#master` ⇒ v1.3.0 `https://github.com/JuliaGPU/CUDA.jl.git#tb/cudnn_scalar_type`
Updating `E:\Programs\julia\.julia\dev\cutest\Manifest.toml`
  [052768ef] ~ CUDA v1.3.0 `https://github.com/JuliaGPU/CUDA.jl.git#master` ⇒ v1.3.0 `https://github.com/JuliaGPU/CUDA.jl.git#tb/cudnn_scalar_type`

Test Summary: | Pass Fail Error Broken Total
CUDA | 76 21 2 34 133
CUDA | 9 9
onecold gpu | 2 2
restructure gpu | 1 1
GPU functors | 2 2
Losses | 29 1 30
GPU grad tests | 24 1 25
Basic GPU Movement | 2 2
Conv GPU grad tests | 6 1 7
Pooling GPU grad tests | 2 2
AdaptivePooling GPU grad tests | 2 2
Dropout GPU grad tests | 1 1 2
Normalising GPU grad tests | 3 1 4
LayerNorm GPU grad test | 1 1 2
BatchNorm GPU grad test | 2 2
InstanceNorm GPU grad tests | 1 1
GroupNorm GPU grad tests | 1 1
Stateless GPU grad tests | 1 1
CUDNN BatchNorm | 8 8
R = RNN | 1 2 3
R = GRU | 1 2 3
R = LSTM | 1 2 3
RNN | 6 20 24 50
R = RNN, batch_size = 1 | 1 3 4 8
R = RNN, batch_size = 5 | 1 3 4 8
R = GRU, batch_size = 1 | 1 3 4 8
R = GRU, batch_size = 5 | 1 3 4 8
R = LSTM, batch_size = 1 | 1 4 4 9
R = LSTM, batch_size = 5 | 1 4 4 9
ERROR: LoadError: Some tests did not pass: 76 passed, 21 failed, 2 errored, 34 broken.

@maleadt
Copy link
Member Author

maleadt commented Sep 30, 2020

Wow, I kinda gave up on this issue when I saw some other issue about float16 support which made it seem like there was some fundamental difference in how Julia did float16.

There is, but we're working to fix that :-) And it's not related to the issue you were seeing here.

Sorry for stupid questin but would this be enough to get the equivalent of model.use_half_type() in other frameworks or is that something completely different?

That's up to Flux, but I think that would make sense. On the CUDA.jl side, we first need to expose the necessary functionality, and we're almost there.

@DhairyaLGandhi
Copy link
Member

In the fp16 PR on Flux, we are introducing the the f16 utility which would achieve that.

@DrChainsaw
Copy link

Great!

I already did a quick benchmark with Flux.paramtype(Float16, Conv(...)) |> gpu and saw a nice about 3 times speedup on the forward pass (with correct outputs this time :) ).

I'm thinking about things like what @maleadt mentioned in the linked issue where CUDA docs claim there is a soon to be removed "wronger" way of doing it and a "righter" way of doing it:

Note:CUDNN_DATA_HALF in cudnnSetConvolutionNdDescriptor() with HALF_CONVOLUTION_BWD_FILTER is not recommended as it is known to not be useful for any practical use case for training and will be considered to be blocked in a future cuDNN release. The use of CUDNN_DATA_HALF for input tensors in cudnnSetTensorNdDescriptor() and CUDNN_DATA_FLOAT in cudnnSetConvolutionNdDescriptor() with HALF_CONVOLUTION_BWD_FILTER is recommended and is used with the automatic mixed precision (AMP) training in many well known deep learning frameworks.

I guess this has to do with things like f16 mul with f32 accum (is this what automatic mixed precision refers to), or?

@DhairyaLGandhi
Copy link
Member

Interesting that removing the image from GitLab CI runs does not get CUDNN by default, can I manually set a flag to ensure that its downloaded?

@maleadt
Copy link
Member Author

maleadt commented Sep 30, 2020

Interesting that removing the image from GitLab CI runs does not get CUDNN by default, can I manually set a flag to ensure that its downloaded?

If you remove the image flag, it should the default image which is always one with CUDNN (as configured on the runners). But I see the issue: some of those still use CUDNN 7. I'll fix that.

@maleadt maleadt merged commit 2f4d71a into master Oct 29, 2020
@maleadt maleadt deleted the tb/cudnn_scalar_type branch October 29, 2020 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDNN convolution with Float16 always returns zeros
3 participants