Fix TuringMvNormal's rand on GPU #108

nmheim · 2020-08-13T11:55:02Z

My attempt to fix #98 for TuringMvNormal by @requireing CUDA. What do you think?

devmotion · 2020-08-13T17:44:48Z

It seems the name randnsimilar is slightly misleading? It doesn't ensure that the element types are preserved but only if an array is created on the GPU or not.

My suggestion would be to

make CUDA a proper dependency as recommended in its documentation:
"Because CUDA.jl always loads, even if the user doesn't have a GPU or CUDA, you should just depend on it like any other package (and not use, e.g., Requires.jl). This ensures that breaking changes to the GPU stack will be taken into account by the package resolver when installing your package."

do not define randnsimilar but just add dispatches such as

function Base.rand(
    rng::Random.AbstractRNG,
    d::TuringDenseMvNormal{<:CuArray},
    n::Int...,
)

nmheim · 2020-08-14T08:35:10Z

It seems the name randnsimilar is slightly misleading? It doesn't ensure that the element types are preserved but only if an array is created on the GPU or not.

good point, I'll fix the the element types

* make CUDA a proper dependency as recommended in [its documentation](https://juliagpu.gitlab.io/CUDA.jl/installation/conditional/):

As this was not an issue here before I assumed that not many people need GPU support? Adding CUDA increases loading times from about 5s to 8s on my machine. It would ofc be nicer to have it as a proper dependency, but I was not sure if that was desired.

devmotion · 2020-08-14T10:12:39Z

As this was not an issue here before I assumed that not many people need GPU support? Adding CUDA increases loading times from about 5s to 8s on my machine. It would ofc be nicer to have it as a proper dependency, but I was not sure if that was desired.

The increased loading times are unfortunate. In my experience using Requires tends to increase loading times as well though and, probably even worse, doesn't allow us to specify any compatibility bounds (which is annoying for the AD backends as well BTW). An alternative would be a separate package, maybe named DistributionsADCUDA or DistributionsADGPU (or the other way around), that just adds GPU support to DistributionsAD and depends on both DistributionsAD and CUDA.

nmheim · 2020-08-14T10:22:48Z

Ok, I think I have not understood entirely how the CPU/GPU rng is used (which was the reason for not using randn! and introducing CUDA as a dependency). If I run:

using DistributionsAD
using CUDA
using Random
using Zygote

Random.seed!(0)

μ = rand(Float32,2) |> cu
σ = rand(Float32,2) |> cu
d = DistributionsAD.TuringMvNormal(μ, σ)
display(rand(d))

f(μ,σ) = sum(rand(DistributionsAD.TuringMvNormal(μ,σ)))

for a in Zygote.gradient(f, μ, σ) display(a) end

with the code from the commit above, fixing only the cpu seed fixes the output of rand(d). what am I missing here? the way it is written now would get by without the CUDA dependency.

fixing only the cpu seed and using CUDA.randn would ofc not fix the output of rand(d), but is that a problem?

DistributionsAD.jl/src/multivariate.jl

Line 101 in 65c362d

    
           Distributions.rand(d::TuringDenseMvNormal, n::Int...) = rand(Random.GLOBAL_RNG, d, n...)

seems to use cpu rng...?

devmotion · 2020-08-14T11:02:14Z

I don't remember exactly the original issue - was it caused mainly by rand not preserving the element types (Float32 vs Float64)?

nmheim · 2020-08-14T11:08:20Z

the original issue (#98) was that randn inside rand(::MvNormal) returned Arrays instead of CuArrays when the distributions where on the GPU. Ah, and I think I see the problem. like this the randn! calls are forwarded to scalar operations. sorry, I forgot about that

nmheim · 2020-08-14T11:26:56Z

Ok, so like this randnsimilar should create arrays with preserved eltype and array type.
Introduces dependency on CUDA (and Adapt, which is already part of CUDA), which needs julia 1.4 though...

CUDA 1.3.3 is only compatible with julia 1.5

src/common.jl

devmotion · 2020-09-09T09:41:53Z

Introduces dependency on CUDA (and Adapt, which is already part of CUDA), which needs julia 1.4 though...

The latest release of CUDA even requires Julia 1.5. However, IMO we shouldn't drop support for Julia < 1.5 currently.

nmheim · 2020-09-09T09:48:32Z

The latest release of CUDA even requires Julia 1.5. However, IMO we shouldn't drop support for Julia < 1.5 currently.

Yes, agreed, the commit above restricts CUDA to <1.3.3 which is still compatible with julia 1.3

src/common.jl

nmheim · 2020-09-09T10:06:13Z

I think the tests are currently failing because of JuliaDiff/ChainRules.jl#262. So I think once JuliaDiff/ChainRules.jl#263 is merged we should be fine.

Edit: acutally, it doesn't seem like it. the gradient test tracebacks are a bit tricky to read ^^
The gradients through rand are not tested currently right?

devmotion · 2020-09-09T10:16:04Z

The gradients through rand are not tested currently right?

No, we only test gradients of logpdf and loglikelihood. rand calls are not differentiable in ChainRules anyways, see e.g. https://github.com/JuliaDiff/ChainRules.jl/blob/master/src/rulesets/Random/random.jl.

Edit: Just saw your PR, yes, I guess it would make sense to test gradients there as well if it is supported by some AD backend.

nmheim · 2020-09-09T10:22:01Z

I guess it would make sense to test gradients there as well if it is supported by some AD backend.

I'll open an issue for this

nmheim · 2020-09-09T11:13:34Z

I am still a bit puzzled by the failing TuringPoissonBinomial though. could that also be related to some new @non_differentiable rules?

devmotion · 2020-09-09T11:35:49Z

Possibly. Tests on master passed with ChainRules v0.7.14 and ChainRulesCore v0.9.6, this PR fails with ChainRules v0.7.17 and ChainRulesCore v0.9.7. The differences are JuliaDiff/ChainRulesCore.jl@v0.9.6...v0.9.7 and JuliaDiff/ChainRules.jl@v0.7.14...v0.7.17.

devmotion · 2020-09-09T11:38:30Z

I guess the problem might be the call of isapprox in

DistributionsAD.jl/src/univariate.jl

Line 43 in f754a19

check_args && @assert all(x -> x >= -ϵ, pb) && isapprox(sum(pb), 1, atol = ϵ)

and the definition in
https://github.com/JuliaDiff/ChainRules.jl/blob/0baf7bba2a9a2f235bf2a85edfdc6209f8cf137d/src/rulesets/Base/nondiff.jl#L123.

nmheim · 2020-09-09T15:37:30Z

I guess the problem might be the call of isapprox in

DistributionsAD.jl/src/univariate.jl

Line 43 in f754a19

check_args && @assert all(x -> x >= -ϵ, pb) && isapprox(sum(pb), 1, atol = ϵ)

and the definition in JuliaDiff/ChainRules.jl@v0.7.14...v0.7.17diff-608aeb35114f6c76d5282c97cba14d6aR123.

Yes, you are right, thats it. The latest commit contains a hacky fix, because @non_differentiable cannot deal with keyword args. What do you think?

devmotion

LGTM, I just have the following comments left:

Can you please bump the version number to 0.6.8 so we can make a new release with the bugfix and new feature? (Ideally, they should be separated but IMO it's OK to keep it in one PR now)
Can you open an issue over at ChainRulesCore to inform them about the breakage of isapprox and raise awareness for keyword argument support in @non_differentiable?
Can you open an issue in DistributionsAD that we should remove _isapprox as soon as the underlying problem is fixed upstream? Otherwise we might just not remember it.

devmotion

Great, LGTM! Thanks for the PR!

It seems to work but we should make sure to actually run some GPU tests in the future as well: https://github.com/JuliaGPU/gitlab-ci

fix TuringMvNormal rand on GPU

9321c6b

nmheim marked this pull request as draft August 13, 2020 11:57

remove CUDA from test targets for now

0875c40

fix randsimilar to peserve eltype; define rrule; remove CUDA dependency

6fe1d6c

DoesNotExist -> Zero

50cfde4

randnsimilar for CuArrays

18b7f44

nmheim added 3 commits September 4, 2020 14:17

Merge branch 'master' into nh/mvnormal-rand-gpu

56bac2d

fix to CUDA v1.3

a18c7eb

fix to CUDA < v1.3.3

dc9cd71

CUDA 1.3.3 is only compatible with julia 1.5

devmotion reviewed Sep 9, 2020

View reviewed changes

src/common.jl Outdated Show resolved Hide resolved

devmotion reviewed Sep 9, 2020

View reviewed changes

src/common.jl Outdated Show resolved Hide resolved

get rid of CUDA dependency by using adapt

10c2e7e

fix TuringPoissonBinomial

0b0090d

nmheim marked this pull request as ready for review September 9, 2020 18:28

devmotion requested changes Sep 10, 2020

View reviewed changes

bump patch version

4efa1c3

devmotion mentioned this pull request Sep 10, 2020

LeakyReLU TuringLang/Bijectors.jl#81

Merged

1 task

devmotion approved these changes Sep 10, 2020

View reviewed changes

devmotion merged commit bedf0a0 into master Sep 10, 2020

devmotion deleted the nh/mvnormal-rand-gpu branch September 10, 2020 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TuringMvNormal's rand on GPU #108

Fix TuringMvNormal's rand on GPU #108

nmheim commented Aug 13, 2020

devmotion commented Aug 13, 2020

nmheim commented Aug 14, 2020

devmotion commented Aug 14, 2020

nmheim commented Aug 14, 2020 •

edited

devmotion commented Aug 14, 2020

nmheim commented Aug 14, 2020 •

edited

nmheim commented Aug 14, 2020

devmotion commented Sep 9, 2020

nmheim commented Sep 9, 2020

nmheim commented Sep 9, 2020 •

edited

devmotion commented Sep 9, 2020 •

edited

nmheim commented Sep 9, 2020 •

edited

nmheim commented Sep 9, 2020

devmotion commented Sep 9, 2020

devmotion commented Sep 9, 2020 •

edited

nmheim commented Sep 9, 2020

devmotion left a comment

devmotion left a comment

Fix TuringMvNormal's rand on GPU #108

Fix TuringMvNormal's rand on GPU #108

Conversation

nmheim commented Aug 13, 2020

devmotion commented Aug 13, 2020

nmheim commented Aug 14, 2020

devmotion commented Aug 14, 2020

nmheim commented Aug 14, 2020 • edited

devmotion commented Aug 14, 2020

nmheim commented Aug 14, 2020 • edited

nmheim commented Aug 14, 2020

devmotion commented Sep 9, 2020

nmheim commented Sep 9, 2020

nmheim commented Sep 9, 2020 • edited

devmotion commented Sep 9, 2020 • edited

nmheim commented Sep 9, 2020 • edited

nmheim commented Sep 9, 2020

devmotion commented Sep 9, 2020

devmotion commented Sep 9, 2020 • edited

nmheim commented Sep 9, 2020

devmotion left a comment

Choose a reason for hiding this comment

devmotion left a comment

Choose a reason for hiding this comment

nmheim commented Aug 14, 2020 •

edited

nmheim commented Aug 14, 2020 •

edited

nmheim commented Sep 9, 2020 •

edited

devmotion commented Sep 9, 2020 •

edited

nmheim commented Sep 9, 2020 •

edited

devmotion commented Sep 9, 2020 •

edited