-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of VecCorrBijector
#246
Conversation
src/bijectors/corr.jl
Outdated
with_logabsdet_jacobian(b::VecCorrBijector, x) = transform(b, x), logabsdetjac(b, x) | ||
|
||
function transform(::VecCorrBijector, X::AbstractMatrix{<:Real}) | ||
w = upper_triangular(parent(cholesky(X).U)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate this:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not have w = cholesky(X).U
and work with a w::UpperTriangular
instead of a dense matrix? Tried it locally, no real gain in performance though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not have w = cholesky(X).U and work with a w::UpperTriangular instead of a dense matrix?
Yeah, but this won't work with some of the AD backends (I know, it's super-annoying...). If you have a look at our compat-code for ReverseDiff (I believe), I think you'll see that we have to do some custom stuff to compute the pullback.
Tried it locally, no real gain in performance though.
I don't think we'd expect it to because internally we're iterating over the relevant elements of the matrix anyways, i.e. we're not gaining anything by telling the rest of the computational path that we're actually working on a lower-triangular matrix because it already assumes the given matrix is lower-triangular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely unrelated, but if it is not 100%-guaranteed that you always end up with an upper triangular matrix when calling cholesky
(which I think you can't if AbstractMatrix
is supported), it would be better to work with .UL
instead of .U
(as we already do in other places of Bijectors and other libraries).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 100% guaranteed that it's available though, right? So it's a question of efficiency, not correctness.
The problem here is that
- We need to work with a lower-triangular matrix (unless we re-implement
_link_chol_lkj
to work with vector). - We drop the type-information in the construction of
w
, hence we don't actually know if it was a uppper- or lower-triangular (in the case where we docholesky(...).UL
).
All in all, it seems we need an additional diversion, e.g.
link_chol_lkj(x::LowerTriangular) = link_chol_lkj(lower_triangular(parent(x)))
link_chol_lkj(x::UpperTriangular) = link_chol_lkj(transpose(upper_triangular(parent(x))))
link_chol_lkj(x::AbstractMatrix) = _link_chol_lkj(x) # assume it's lower-triangular
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess it's mainly for efficiency reasons. But it's difficult to say if there are other implications as well, e.g., regarding AD. An alternative to .UL
would be something like what's used in PDMats: https://github.com/JuliaStats/PDMats.jl/blob/fff131e11e23403931a42f5bfb3384f0d2b114c9/src/chol.jl#L6-L11 That should also be quite efficient and you could continue working with upper-triangular matrices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess it's mainly for efficiency reasons.
But it's difficult to say if there are other implications as well, e.g., regarding AD.
Hmm fair, though IMO something like this seems like it would be a bug with the AD package, no?
An alternative to .UL would be something like what's used in PDMats: https://github.com/JuliaStats/PDMats.jl/blob/fff131e11e23403931a42f5bfb3384f0d2b114c9/src/chol.jl#L6-L11 That should also be quite efficient and you could continue working with upper-triangular matrices.
Me and @harisorgn were just having a chat and we're thinking of replacing upper_triangular(parent(cholesky(X).U))
with
cholesky_upper(x) = upper_triangular(parent(cholesky(X).U))
cholesky_lower(x) = lower_triangular(parent(cholesky(X).L))
to make it less likely that we forget or mess up somewhere.
But we can make it
cholesky_upper(x) = upper_triangular(parent(PDMats.chol_upper(cholesky(X))))
cholesky_lower(x) = lower_triangular(parent(PDMats.chol_lower(cholesky(X))))
But are you sure there's not a good reason for why the default is copy
? Of course it's more mem-intensive, but will stuff like LowerTriangular(U')
lead to slower computation paths (since you're now working with adjoint(U)
rather than something that is actually lower-triangular)? E.g. indexing adjoin(U)
surely involves more computations than indexing copy(adjoint(U))
.
for `LKJCholesky`
to `Matrix` row index
to `_logabsdetjac_inv_corr`
Thanks David! This seems to work for our issue. The interface tests for v1.6 now fail on the |
My initial guess would be that this is caused by some changes in LinearAlgebra rather than AD (ReverseDiff didn't change Julia compat and doesn't use |
I messed up and merged the formatting PR into master before this 🤦 So the result was quite a lot of conflicts. I've resolved them now though 👍 |
Regarding |
Ah yes, thanks once again! Sorry, misread the stack trace on my phone and thought the issue was the How should we handle this then? The |
One straight-forward (but hacky) solution is of course: cholesky_factor(X::ReverseDiff.TrackedArray) = cholesky_factor(cholesky(X)) Also, though I don't see another fix, I'm uncertain if |
Was just checking in case I wasn't seeing something less hacky, but let's go with this now 👍 . I've added this method in
Should we have something like an |
# NOTE: `CorrBijector` technically isn't bijective, and so the default `getjacobian` | ||
# used in the ChangesOfVariables.jl tests will fail as the jacobian will have determinant 0. | ||
# Hence, we disable those tests. | ||
test_bijector(b, x; test_not_identity=d != 1, changes_of_variables_test=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InverseFunctions.test_inverse
that is called in here fails on some calls, but passes on others, even with the same sample x
. CI on previous commits of this PR failed because of it. Seems weird, not sure why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harisorgn do we generate a random number somewhere inside the tests? If so, maybe fix the random seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no RNG between these lines, it was after sampling from LKJ
. I believe the issue was numerical as there was no explicit zero-filling in the old Matrix-version link function, which caused mismatches. I've added zero-filling and tested it a bunch locally, should be fine 🤞
Do we want to keep |
Good question. I'd say keep it for now + make an issue. |
Great work -- thanks @harisorgn, @torfjelde and @devmotion! |
This PR implements a
VecCorrBijector
, i.e.CorrBijector
but where the unconstrained space is treated as aVector
of the correct size rather than working with aMatrix
which is actually embedded in a subspace of matrices.This is particularly when doing stuff like MCMC sampling, where the full matrix representation might give the sampler the false indication that the dimension is
d × d
instead of(d choose 2)
(as is the case here).