Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TuringMvNormal's rand on GPU #108

Merged
merged 11 commits into from
Sep 10, 2020
Merged

Fix TuringMvNormal's rand on GPU #108

merged 11 commits into from
Sep 10, 2020

Conversation

nmheim
Copy link
Collaborator

@nmheim nmheim commented Aug 13, 2020

My attempt to fix #98 for TuringMvNormal by @requireing CUDA. What do you think?

@nmheim nmheim marked this pull request as draft August 13, 2020 11:57
@devmotion
Copy link
Member

It seems the name randnsimilar is slightly misleading? It doesn't ensure that the element types are preserved but only if an array is created on the GPU or not.

My suggestion would be to

  • make CUDA a proper dependency as recommended in its documentation:
    "Because CUDA.jl always loads, even if the user doesn't have a GPU or CUDA, you should just depend on it like any other package (and not use, e.g., Requires.jl). This ensures that breaking changes to the GPU stack will be taken into account by the package resolver when installing your package."
  • do not define randnsimilar but just add dispatches such as
    function Base.rand(
        rng::Random.AbstractRNG,
        d::TuringDenseMvNormal{<:CuArray},
        n::Int...,
    )

@nmheim
Copy link
Collaborator Author

nmheim commented Aug 14, 2020

It seems the name randnsimilar is slightly misleading? It doesn't ensure that the element types are preserved but only if an array is created on the GPU or not.

good point, I'll fix the the element types

* make CUDA a proper dependency as recommended in [its documentation](https://juliagpu.gitlab.io/CUDA.jl/installation/conditional/):

As this was not an issue here before I assumed that not many people need GPU support? Adding CUDA increases loading times from about 5s to 8s on my machine. It would ofc be nicer to have it as a proper dependency, but I was not sure if that was desired.

@devmotion
Copy link
Member

As this was not an issue here before I assumed that not many people need GPU support? Adding CUDA increases loading times from about 5s to 8s on my machine. It would ofc be nicer to have it as a proper dependency, but I was not sure if that was desired.

The increased loading times are unfortunate. In my experience using Requires tends to increase loading times as well though and, probably even worse, doesn't allow us to specify any compatibility bounds (which is annoying for the AD backends as well BTW). An alternative would be a separate package, maybe named DistributionsADCUDA or DistributionsADGPU (or the other way around), that just adds GPU support to DistributionsAD and depends on both DistributionsAD and CUDA.

@nmheim
Copy link
Collaborator Author

nmheim commented Aug 14, 2020

Ok, I think I have not understood entirely how the CPU/GPU rng is used (which was the reason for not using randn! and introducing CUDA as a dependency). If I run:

using DistributionsAD
using CUDA
using Random
using Zygote

Random.seed!(0)

μ = rand(Float32,2) |> cu
σ = rand(Float32,2) |> cu
d = DistributionsAD.TuringMvNormal(μ, σ)
display(rand(d))

f(μ,σ) = sum(rand(DistributionsAD.TuringMvNormal(μ,σ)))

for a in Zygote.gradient(f, μ, σ) display(a) end

with the code from the commit above, fixing only the cpu seed fixes the output of rand(d). what am I missing here? the way it is written now would get by without the CUDA dependency.

fixing only the cpu seed and using CUDA.randn would ofc not fix the output of rand(d), but is that a problem?

Distributions.rand(d::TuringDenseMvNormal, n::Int...) = rand(Random.GLOBAL_RNG, d, n...)

seems to use cpu rng...?

@devmotion
Copy link
Member

I don't remember exactly the original issue - was it caused mainly by rand not preserving the element types (Float32 vs Float64)?

@nmheim
Copy link
Collaborator Author

nmheim commented Aug 14, 2020

the original issue (#98) was that randn inside rand(::MvNormal) returned Arrays instead of CuArrays when the distributions where on the GPU. Ah, and I think I see the problem. like this the randn! calls are forwarded to scalar operations. sorry, I forgot about that

@nmheim
Copy link
Collaborator Author

nmheim commented Aug 14, 2020

Ok, so like this randnsimilar should create arrays with preserved eltype and array type.
Introduces dependency on CUDA (and Adapt, which is already part of CUDA), which needs julia 1.4 though...

src/common.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

Introduces dependency on CUDA (and Adapt, which is already part of CUDA), which needs julia 1.4 though...

The latest release of CUDA even requires Julia 1.5. However, IMO we shouldn't drop support for Julia < 1.5 currently.

@nmheim
Copy link
Collaborator Author

nmheim commented Sep 9, 2020

The latest release of CUDA even requires Julia 1.5. However, IMO we shouldn't drop support for Julia < 1.5 currently.

Yes, agreed, the commit above restricts CUDA to <1.3.3 which is still compatible with julia 1.3

src/common.jl Outdated Show resolved Hide resolved
@nmheim
Copy link
Collaborator Author

nmheim commented Sep 9, 2020

I think the tests are currently failing because of JuliaDiff/ChainRules.jl#262. So I think once JuliaDiff/ChainRules.jl#263 is merged we should be fine.

Edit: acutally, it doesn't seem like it. the gradient test tracebacks are a bit tricky to read ^^
The gradients through rand are not tested currently right?

@devmotion
Copy link
Member

devmotion commented Sep 9, 2020

The gradients through rand are not tested currently right?

No, we only test gradients of logpdf and loglikelihood. rand calls are not differentiable in ChainRules anyways, see e.g. https://github.com/JuliaDiff/ChainRules.jl/blob/master/src/rulesets/Random/random.jl.

Edit: Just saw your PR, yes, I guess it would make sense to test gradients there as well if it is supported by some AD backend.

@nmheim
Copy link
Collaborator Author

nmheim commented Sep 9, 2020

I guess it would make sense to test gradients there as well if it is supported by some AD backend.

I'll open an issue for this

@nmheim
Copy link
Collaborator Author

nmheim commented Sep 9, 2020

I am still a bit puzzled by the failing TuringPoissonBinomial though. could that also be related to some new @non_differentiable rules?

@devmotion
Copy link
Member

Possibly. Tests on master passed with ChainRules v0.7.14 and ChainRulesCore v0.9.6, this PR fails with ChainRules v0.7.17 and ChainRulesCore v0.9.7. The differences are JuliaDiff/ChainRulesCore.jl@v0.9.6...v0.9.7 and JuliaDiff/ChainRules.jl@v0.7.14...v0.7.17.

@devmotion
Copy link
Member

devmotion commented Sep 9, 2020

I guess the problem might be the call of isapprox in

check_args && @assert all(x -> x >= -ϵ, pb) && isapprox(sum(pb), 1, atol = ϵ)
and the definition in
https://github.com/JuliaDiff/ChainRules.jl/blob/0baf7bba2a9a2f235bf2a85edfdc6209f8cf137d/src/rulesets/Base/nondiff.jl#L123.

@nmheim
Copy link
Collaborator Author

nmheim commented Sep 9, 2020

I guess the problem might be the call of isapprox in

check_args && @assert all(x -> x >= -ϵ, pb) && isapprox(sum(pb), 1, atol = ϵ)

and the definition in JuliaDiff/ChainRules.jl@v0.7.14...v0.7.17diff-608aeb35114f6c76d5282c97cba14d6aR123.

Yes, you are right, thats it. The latest commit contains a hacky fix, because @non_differentiable cannot deal with keyword args. What do you think?

@nmheim nmheim marked this pull request as ready for review September 9, 2020 18:28
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just have the following comments left:

  • Can you please bump the version number to 0.6.8 so we can make a new release with the bugfix and new feature? (Ideally, they should be separated but IMO it's OK to keep it in one PR now)
  • Can you open an issue over at ChainRulesCore to inform them about the breakage of isapprox and raise awareness for keyword argument support in @non_differentiable?
  • Can you open an issue in DistributionsAD that we should remove _isapprox as soon as the underlying problem is fixed upstream? Otherwise we might just not remember it.

@devmotion devmotion mentioned this pull request Sep 10, 2020
1 task
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM! Thanks for the PR!

It seems to work but we should make sure to actually run some GPU tests in the future as well: https://github.com/JuliaGPU/gitlab-ci

@devmotion devmotion merged commit bedf0a0 into master Sep 10, 2020
@devmotion devmotion deleted the nh/mvnormal-rand-gpu branch September 10, 2020 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU friendly distributions
2 participants