Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LKJ follow-up #134

Open
devmotion opened this issue Aug 24, 2020 · 5 comments
Open

LKJ follow-up #134

devmotion opened this issue Aug 24, 2020 · 5 comments

Comments

@devmotion
Copy link
Member

Some things that probably can be improved as a follow-up to the initial implementation (but might require changes in other packages such as AD backends):

  • Avoid dense matrices (currently triangular matrices are converted to dense matrices in a quite hacky way although it shouldn't be needed to work with dense matrices)
  • Simplify the custom adjoints and potentially improve numerical stability (while the implementation of the link and inverse link seems to be quite optimized, probably the custom adjoints can be improved)
@devmotion
Copy link
Member Author

Additionally we might want to:

  • Define the bijector on the Cholesky factor directly
  • Map it to a parameter vector instead of a matrix

@torfjelde
Copy link
Member

Additionally we might want to:

  • Define the bijector on the Cholesky factor directly
  • Map it to a parameter vector instead of a matrix

This has now been done in #246 👍

@torfjelde
Copy link
Member

Worth pointing out that the comments on numerical issues in TuringLang/DynamicPPL.jl#485 are talking about ForwardDiff, so would be curious to see whether usage of the rrule is helpful or not.

@harisorgn
Copy link
Member

harisorgn commented Jun 20, 2023

Worth pointing out that the comments on numerical issues in TuringLang/DynamicPPL.jl#485 are talking about ForwardDiff

The issue with ForwardDiff was more about correctness there, as I was not expecting a Dual with a single partial when differentiating wrt LKJ samples. We get numerical issues without ForwardDiff as well, that probably have to do with the numerical stability of the inverse link

function _inv_link_chol_lkj(y::AbstractVector)
K = _triu1_dim_from_length(length(y))
W = similar(y, K, K)
idx = 1
@inbounds for j in 1:K
W[1, j] = 1
for i in 2:j
z = tanh(y[idx])
idx += 1
tmp = W[i - 1, j]
W[i - 1, j] = z * tmp
W[i, j] = tmp * sqrt(1 - z^2)
end
for i in (j + 1):K
W[i, j] = 0
end
end
return W
end
as discussed in the same PR.

Numerical issues directly related with ForwardDiff were observed in #253 (comment) , which we treated by explicitly wrapping matrix with Hermitian to avoid the check.

@harisorgn
Copy link
Member

harisorgn commented Jun 20, 2023

The issue with ForwardDiff was more about correctness there

Just adding this here as well to move the discussion from the DynamicPPL PR. The numerical issues when sampling seem to be ForwardDiff-related indeed. See TuringLang/DynamicPPL.jl#485 (comment) for a simple test case, which gives loads of numerical issues when sampling with NUTS and Duals are passed through, but it seems to work fine with ReverseDiff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants