Implementation of `VecCorrBijector` #246

torfjelde · 2023-02-06T00:23:36Z

This PR implements a VecCorrBijector, i.e. CorrBijector but where the unconstrained space is treated as a Vector of the correct size rather than working with a Matrix which is actually embedded in a subspace of matrices.

This is particularly when doing stuff like MCMC sampling, where the full matrix representation might give the sampler the false indication that the dimension is d × d instead of (d choose 2) (as is the case here).

…rrBijector

torfjelde · 2023-02-13T10:55:46Z

src/bijectors/corr.jl

+with_logabsdet_jacobian(b::VecCorrBijector, x) = transform(b, x), logabsdetjac(b, x)
+
+function transform(::VecCorrBijector, X::AbstractMatrix{<:Real})
+    w = upper_triangular(parent(cholesky(X).U))


I hate this:(

Could we not have w = cholesky(X).U and work with a w::UpperTriangular instead of a dense matrix? Tried it locally, no real gain in performance though.

Could we not have w = cholesky(X).U and work with a w::UpperTriangular instead of a dense matrix?

Yeah, but this won't work with some of the AD backends (I know, it's super-annoying...). If you have a look at our compat-code for ReverseDiff (I believe), I think you'll see that we have to do some custom stuff to compute the pullback.

Tried it locally, no real gain in performance though.

I don't think we'd expect it to because internally we're iterating over the relevant elements of the matrix anyways, i.e. we're not gaining anything by telling the rest of the computational path that we're actually working on a lower-triangular matrix because it already assumes the given matrix is lower-triangular.

Completely unrelated, but if it is not 100%-guaranteed that you always end up with an upper triangular matrix when calling cholesky (which I think you can't if AbstractMatrix is supported), it would be better to work with .UL instead of .U (as we already do in other places of Bijectors and other libraries).

It's 100% guaranteed that it's available though, right? So it's a question of efficiency, not correctness.

The problem here is that

We need to work with a lower-triangular matrix (unless we re-implement _link_chol_lkj to work with vector).

We drop the type-information in the construction of w, hence we don't actually know if it was a uppper- or lower-triangular (in the case where we do cholesky(...).UL).

All in all, it seems we need an additional diversion, e.g.

link_chol_lkj(x::LowerTriangular) = link_chol_lkj(lower_triangular(parent(x))) link_chol_lkj(x::UpperTriangular) = link_chol_lkj(transpose(upper_triangular(parent(x)))) link_chol_lkj(x::AbstractMatrix) = _link_chol_lkj(x) # assume it's lower-triangular

?

I would guess it's mainly for efficiency reasons. But it's difficult to say if there are other implications as well, e.g., regarding AD. An alternative to .UL would be something like what's used in PDMats: https://github.com/JuliaStats/PDMats.jl/blob/fff131e11e23403931a42f5bfb3384f0d2b114c9/src/chol.jl#L6-L11 That should also be quite efficient and you could continue working with upper-triangular matrices.

I would guess it's mainly for efficiency reasons.

https://github.com/JuliaLang/julia/blob/db7971f49912d1abba703345ca6eb43249607f32/stdlib/LinearAlgebra/src/cholesky.jl#L515-L527

But it's difficult to say if there are other implications as well, e.g., regarding AD.

Hmm fair, though IMO something like this seems like it would be a bug with the AD package, no?

An alternative to .UL would be something like what's used in PDMats: https://github.com/JuliaStats/PDMats.jl/blob/fff131e11e23403931a42f5bfb3384f0d2b114c9/src/chol.jl#L6-L11 That should also be quite efficient and you could continue working with upper-triangular matrices.

Me and @harisorgn were just having a chat and we're thinking of replacing upper_triangular(parent(cholesky(X).U)) with

cholesky_upper(x) = upper_triangular(parent(cholesky(X).U)) cholesky_lower(x) = lower_triangular(parent(cholesky(X).L))

to make it less likely that we forget or mess up somewhere.

But we can make it

cholesky_upper(x) = upper_triangular(parent(PDMats.chol_upper(cholesky(X)))) cholesky_lower(x) = lower_triangular(parent(PDMats.chol_lower(cholesky(X))))

But are you sure there's not a good reason for why the default is copy? Of course it's more mem-intensive, but will stuff like LowerTriangular(U') lead to slower computation paths (since you're now working with adjoint(U) rather than something that is actually lower-triangular)? E.g. indexing adjoin(U) surely involves more computations than indexing copy(adjoint(U)).

src/bijectors/corr.jl

for `LKJCholesky`

to `Matrix` row index

to `_logabsdetjac_inv_corr`

harisorgn · 2023-06-06T16:32:30Z

but always wrap the matrix in a Symmetric, Hermitian, etc. wrapper first. This will dispatch to optimized methods, avoid checks such as ishermitian, and also yield different paths and results in AD.

Thanks David! This seems to work for our issue. The interface tests for v1.6 now fail on the Hermitian constructor during ReverseDiff. Seems that an rrule for the constructor is accounted for in latest but not in v1.6 (though I couldn't find it explicitly)?

devmotion · 2023-06-06T16:36:34Z

My initial guess would be that this is caused by some changes in LinearAlgebra rather than AD (ReverseDiff didn't change Julia compat and doesn't use rrules in general).

torfjelde · 2023-06-06T18:23:34Z

I messed up and merged the formatting PR into master before this 🤦 So the result was quite a lot of conflicts. I've resolved them now though 👍

devmotion · 2023-06-06T19:11:24Z

Regarding cholesky/Hermitian: I assume it works on Julia >= 1.8 due to JuliaLang/julia#44076.

harisorgn · 2023-06-07T09:09:29Z

Regarding cholesky/Hermitian: I assume it works on Julia >= 1.8 due to JuliaLang/julia#44076.

Ah yes, thanks once again! Sorry, misread the stack trace on my phone and thought the issue was the Hermitian constructor 😅

How should we handle this then? The Hermitian wrapper before cholesky solves the ForwardDiff issue with the tiny numerical mismatch, but creates an instance of <: Hermitian{....TrackedArray....} that cholesky can't handle in v1.6 .

torfjelde · 2023-06-07T10:16:58Z

One straight-forward (but hacky) solution is of course:

cholesky_factor(X::ReverseDiff.TrackedArray) = cholesky_factor(cholesky(X))

Also, though I don't see another fix, I'm uncertain if Hermitian is the 100% correct solution since during sampling we can easily run into matrices which end up not being hermitian due to numerical issues. If this happens, it seems to me that we'd want to error rather than just continue, no?

…jectors.jl into torfjelde/vec-corr

harisorgn · 2023-06-08T06:58:30Z

One straight-forward (but hacky) solution is of course

Was just checking in case I wasn't seeing something less hacky, but let's go with this now 👍 . I've added this method in src/compat/reversediff.jl to avoid a ReverseDiff dependency outside that submodule, a bit annoying though as it is not grouped with the rest of the methods.

we can easily run into matrices which end up not being hermitian due to numerical issues. If this happens, it seems to me that we'd want to error rather than just continue, no?

Should we have something like an ishermitian check but with tolerances? The ForwardDiff issue was numerical but tiny as far as I tested it.

src/compat/reversediff.jl

harisorgn · 2023-06-08T15:09:04Z

test/bijectors/corr.jl

+        # NOTE: `CorrBijector` technically isn't bijective, and so the default `getjacobian`
+        # used in the ChangesOfVariables.jl tests will fail as the jacobian will have determinant 0.
+        # Hence, we disable those tests.
+        test_bijector(b, x; test_not_identity=d != 1, changes_of_variables_test=false)


The InverseFunctions.test_inverse that is called in here fails on some calls, but passes on others, even with the same sample x. CI on previous commits of this PR failed because of it. Seems weird, not sure why.

@harisorgn do we generate a random number somewhere inside the tests? If so, maybe fix the random seed?

There was no RNG between these lines, it was after sampling from LKJ. I believe the issue was numerical as there was no explicit zero-filling in the old Matrix-version link function, which caused mismatches. I've added zero-filling and tested it a bunch locally, should be fine 🤞

harisorgn · 2023-06-08T15:12:25Z

Do we want to keep CorrBijector in general? Will it have a purpose after this PR?

torfjelde · 2023-06-10T17:00:11Z

Do we want to keep CorrBijector in general? Will it have a purpose after this PR?

Good question. I'd say keep it for now + make an issue.

…jectors.jl into torfjelde/vec-corr

yebai · 2023-06-12T11:30:24Z

Great work -- thanks @harisorgn, @torfjelde and @devmotion!

torfjelde added 3 commits February 6, 2023 00:02

initial work on VecCorrBijector

79b92c9

added some tests for CorrBijector, and fixed implementation for VecCo…

aa2fe61

…rrBijector

improved tests and are now using integer sqrt and division

8d23094

torfjelde marked this pull request as draft February 6, 2023 01:09

torfjelde mentioned this pull request Feb 6, 2023

Support for linking distributions with embedded support TuringLang/DynamicPPL.jl#461

Closed

moved things around a bit

a35e36f

torfjelde mentioned this pull request Feb 13, 2023

CorrBijector makes posterior improper #228

Open

torfjelde added 3 commits February 13, 2023 09:49

added chainrule for ReverseDiff

8cadf69

some fixes for AD

eaf5324

added some TODOs

36ffbdb

torfjelde commented Feb 13, 2023

View reviewed changes

yebai mentioned this pull request Mar 15, 2023

Roadmap to LKJCholesky TuringLang/Turing.jl#1870

Closed

torfjelde commented Mar 24, 2023

View reviewed changes

src/bijectors/corr.jl Outdated Show resolved Hide resolved

torfjelde and others added 17 commits March 24, 2023 13:11

Update src/bijectors/corr.jl

62ae1ac

define bijectors for LKJ and LKJCholesky

3f25a8b

add TransformedDistribution constructor

e1567c3

for `LKJCholesky`

define logpdf for LKJ & LKJCholesky

8d07e34

define rand for LKJ & LKJCholesky

9a59a9f

add util to extract Cholesky factor

f15ad85

TYPO: capitalize matrix

53e78f3

add util to convert Vector index

ec7d20e

to `Matrix` row index

add VecTriBijectors for LKJCholesky

2ed00f4

TYPO: capitilize matrix

07555fc

add LKJCholesky link for UpperTriangular

a75cabc

add LKJCholesky link for LowerTriangular

844b07e

TYPO: capitalize matrix

792cfe9

add LKJCholesky inverse link to UpperTriangular

8f0886b

rename _logabsdetjac_chol_lkj

35f1c03

to `_logabsdetjac_inv_corr`

dispatch _logabsdetjac_inv_corr for ::Vector

9d55829

add logabsdetjac for inverse link of LKJCholesky

adf10ad

yebai mentioned this pull request Jun 6, 2023

Added formatting #260

Merged

Merge branch 'master' into torfjelde/vec-corr

1bfb2ee

harisorgn added 3 commits June 8, 2023 09:42

add hacky dispatch for cholesky_factor and ReverseDiff

9c3dec8

Merge branch 'torfjelde/vec-corr' of https://github.com/TuringLang/Bi…

980660a

…jectors.jl into torfjelde/vec-corr

import cholesky_factor in ReverseDiff module for hacky dispatch

87a6fac

devmotion reviewed Jun 8, 2023

View reviewed changes

src/compat/reversediff.jl Outdated Show resolved Hide resolved

harisorgn added 2 commits June 8, 2023 15:01

only use hacky cholesky_factor in versions before fix

1d8999f

change LKJCholesky shape to avoid stochastic test failures

424607d

harisorgn reviewed Jun 8, 2023

View reviewed changes

Merge branch 'master' into torfjelde/vec-corr

be5c1c5

yebai approved these changes Jun 10, 2023

View reviewed changes

harisorgn added 3 commits June 12, 2023 12:16

remove old TODOs

6aeebbf

add explicit zero-filling in link for CorrBijector

62ca234

Merge branch 'torfjelde/vec-corr' of https://github.com/TuringLang/Bi…

f439682

…jectors.jl into torfjelde/vec-corr

yebai merged commit 3a0b7e3 into master Jun 12, 2023

delete-merged-branch bot deleted the torfjelde/vec-corr branch June 12, 2023 11:30

harisorgn mentioned this pull request Jun 12, 2023

What to do with CorrBijector ? #268

Open

This was referenced Jun 18, 2023

Vector-version for PDBijector #271

Merged

LKJ follow-up #134

Open

torfjelde mentioned this pull request Jul 17, 2023

Update FillArrays compat to 1.4.1 TuringLang/Turing.jl#2035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of `VecCorrBijector` #246

Implementation of `VecCorrBijector` #246

torfjelde commented Feb 6, 2023

torfjelde Feb 13, 2023

harisorgn Mar 21, 2023

torfjelde Mar 22, 2023

devmotion Mar 22, 2023

torfjelde Mar 24, 2023

devmotion Mar 24, 2023

torfjelde Mar 24, 2023

harisorgn commented Jun 6, 2023 •

edited

Loading

devmotion commented Jun 6, 2023

torfjelde commented Jun 6, 2023

devmotion commented Jun 6, 2023

harisorgn commented Jun 7, 2023

torfjelde commented Jun 7, 2023

harisorgn commented Jun 8, 2023

harisorgn Jun 8, 2023

yebai Jun 10, 2023

harisorgn Jun 12, 2023

harisorgn commented Jun 8, 2023

torfjelde commented Jun 10, 2023

yebai commented Jun 12, 2023

Implementation of VecCorrBijector #246

Implementation of VecCorrBijector #246

Conversation

torfjelde commented Feb 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harisorgn commented Jun 6, 2023 • edited Loading

devmotion commented Jun 6, 2023

torfjelde commented Jun 6, 2023

devmotion commented Jun 6, 2023

harisorgn commented Jun 7, 2023

torfjelde commented Jun 7, 2023

harisorgn commented Jun 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harisorgn commented Jun 8, 2023

torfjelde commented Jun 10, 2023

yebai commented Jun 12, 2023

Implementation of `VecCorrBijector` #246

Implementation of `VecCorrBijector` #246

harisorgn commented Jun 6, 2023 •

edited

Loading