New Gibbs sampler using `condition` #2099

torfjelde · 2023-10-06T18:06:36Z

This is an attempt at a new Gibbs sampler which makes use of the condition functionality from DynamicPPL.jl

This should, for the models which are compatible with condition, make for a much more flexible approach to Gibbs sampling.

The idea is basically as follows: instead of having a single varinfo which is shared between all the samplers, we instead have a different varinfo for every sampler involved in the composition. We then subsequently condition the model on the other varinfo for each "inner" step.

The current implementation is somewhat rough, but it does seem to work!

Open issues

TODO Issue with MH

using Turing
@model demo() = x ~ Normal()
sampler = Turing.Experimental.Gibbs(
  Turing.OrderedDict(
    @varname(x) => MH(AdvancedMH.RandomWalkProposal(Normal(0,0.003)))
  )
)

breaks, but if we make it a filldist or something then it works

sampler = Turing.Experimental.Gibbs(
  Turing.OrderedDict(
    @varname(x) => MH(AdvancedMH.RandomWalkProposal(filldist(Normal(0,0.003),1)))
  )
)

TODO Linking isn't quite there (see last comment below)

TODO

torfjelde · 2023-10-06T18:06:50Z

@yebai

codecov · 2023-10-06T20:19:53Z

Codecov Report

Attention: Patch coverage is 0% with 145 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (c29d36e) to head (4bde75a).
Report is 1 commits behind head on master.

❗ Current head 4bde75a differs from pull request most recent head 0f30514. Consider uploading reports for the commit 0f30514 to get more accurate results

Files	Patch %	Lines
src/experimental/gibbs.jl	0.00%	123 Missing ⚠️
src/mcmc/particle_mcmc.jl	0.00%	22 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #2099    +/-   ##
=======================================
  Coverage    0.00%   0.00%            
=======================================
  Files          21      22     +1     
  Lines        1367    1502   +135     
=======================================
- Misses       1367    1502   +135

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yebai · 2023-10-08T17:03:28Z

One more step closer to Turing v1.0... we only need to fix docs and particle Gibbs after this PR.

torfjelde · 2023-10-08T17:11:37Z

This PR is not ready though (hence why I put it in draft-mode). Some cases are functional, but it needs more work (which I'm currently doing:))

github-actions · 2023-10-08T18:57:07Z

Pull Request Test Coverage Report for Build 6930099476

0 of 121 (0.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 0.0%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/mcmc/gibbs_new.jl	0	121	0.0%

Totals
Change from base Build 6890657606:	0.0%
Covered Lines:	0
Relevant Lines:	1542

💛 - Coveralls

…lde/new-gibbs

available in DynamicPPL

torfjelde · 2023-11-16T15:36:51Z

test/mcmc/gibbs_new.jl

+        # `GibbsV2` does not work with SMC samplers, e.g. `CSMC`.
+        # FIXME: Oooor it is (see tests below). Uncertain.
+        Random.seed!(100)
+        alg = GibbsV2(@varname(s) => CSMC(15), @varname(m) => ESS(:m))
+        chain = sample(gdemo(1.5, 2.0), alg, 10_000)
+        @test_broken mean(chain[:s]) ≈ 49 / 24
+        @test_broken mean(chain[:m]) ≈ 7 / 6


@yebai Any idea why this fails? It does work with CSMC in the example below..

Main difference I see here is that this example uses CSMC on continuous rather than discrete.

(I've also tried without the ESS(:m) locally, and it doesn't make a difference)

I don't see an obvious reason. CSMC / PG's accuracy can often be improved by increasing the number of particles in each SMC sweep. So, this might go away if you use CSMC(50) or larger values.

As me and @yebai discussed, this issues is likely caused by the fact that SMC samplers will now also treat the conditioned variables as observations, which is not quite what we want.

But atm I don't quite see how we can work around this with the existing functionality we have in DynamicPPL 😕

Basically, we have fix and condition:

condition makes the variables be treated as observations, which, as seen above, is bad for likelihood-tempered samplers.

fix avoids variables being treated as observations but it also doesn't include the log-prob of that variable. This then causes issues in models with hierarchical dependencies, e.g. gdemo where the prior on m depends on the value of s, and so changing s changes the joint also through the prior on m.

Effectively, we'd need something similar to fix which also computes the log-prob.

I haven't followed the discussion in this thread but regarding the last point (ESS): This difference (log-likelihood vs log-joint) was one (the?) main reason for the current design of the re-evaluation in Gibbs, i.e., for gibbs_rerun:

Turing.jl/src/mcmc/gibbs.jl

Lines 134 to 142 in 6649f10

"""

gibbs_rerun(prev_alg, alg)

Check if the model should be rerun to recompute the log density before sampling with the

Gibbs component `alg` and after sampling from Gibbs component `prev_alg`.

By default, the function returns `true`.

"""

gibbs_rerun(prev_alg, alg) = true

We try to avoid re-evaluating varinfo.logp when we know that it is not needed but to be safe (and e.g. for ESS) the default is to re-evaluate.

Aaalrighty! Adopting the rerun mechanisms from the current Gibbs seems to have done the trick!

I've also tried a acclogp!!(context, varinfo, logp) implementation, and I think this is something we should adopt, as it will immediately address the issue of SMC samplers not currently working with @acclogprob!! + there are other scenarios where we might want to hook into this, e.g. DebugContext where we can to "record" these hard-coded acclogprobs for debugging and model checking purposes. I'll make separate PRs for this though. As mentioned above, CSMC should still work even if we're performing resampling for what is technically not observations.

Btw, @devmotion didn't see your comment because was working on a train without internet 🙃

One thing that confuses me slightly is that, technically, even if the SMC sampler sees one of the parameters as an observation, AFAIK, it shouldn't break things, no? It might increase the variance, etc. but it should still converge.

To make this absolutely clear, let's look at PG from Andrieu (2010):

Looking at this, it might seem as if we shouldn't include the "prior" probability of $m$ in gdemo, since $p_{\theta}(x_{1:T} \mid y_{1:T})$ does not include $p(\theta)$.

But this is all done under the assumption that the joint nicely factorizes as
$$p(x_{1:T}, \theta \mid y_{1:T}) \propto p_{\theta}(x_{1:T}, y_{1:T}) p(\theta)$$
In this scenario, $p\big(\theta(i)\big)$ will be the same value for all particles in step (b), and so there's no need to include this in the weights.

But in our scenario, the joint does not nicely factorize as above. We have $\theta = m$ and $x_{1:T} = s$, so
$$p(s, m \mid y_{1:T}) = p(s) p(m \mid s) p(y_{1:T} \mid m, s)$$
To get a similar factorization as above, we can of course just define
$$p_m(s, y_{1:T}) = \frac{p(s, m, y_{1:T})}{p(m)} \propto p(s, m, y_{1:T})$$
where we can just drop the contribution of $p(m)$ since it's constant wrt. $s$ that we're sampling.

With this notation, mapping this to the algorithm above is immediate, in which case it's clear that we're targeting the full joint, and so, in the general scenario, we need to include the log-prob contribution of latent variables that are not being sampled by conditional SMC.

And more generally speaking, PG is just a Gibbs sampler with:

Using some method (in Andrieu (2010) they assume it's available in closed-form)
$$\theta(i) \sim p \big( \theta \mid y_{1:T}, X_{1:T} \propto p(\theta, x_{1:T}, y_{1:T})$$

Using CSMC
$$X_{1:T}(i) \sim p \big( X_{1:T} \mid y_{1:T}, \theta \big) \propto p(\theta, x_{1:T}, y_{1:T})$$

And the reason why we're not including the contribution from the variable we're targeting with CSMC is because we would then have to adjust the weights later by removing the prior contribution since this is our importance weight; that is, our proposal for $s$ is $p(s)$ and we're targeting $p(s, m, y_{1:T})$ so our IS weight is
$$w = \frac{p(s, m, y_{1:T})}{p(s)} = \frac{p(s) p(m \mid s) p(y_{1:T} \mid s, m)}{p(s)} = p(m \mid s) p(y_{1:T} \mid s, m)$$

Does that make sense @yebai ?

And finally, whether we're resampling when hitting $\theta$ or not, is not a correctness issue since we're still targeting the same distribution (though it probably does something to the variance of the estimator; not sure if it's good or bad 🤷 )

This also explains why we need to make acclogp!! accumulate log-probs in the task-local varinfo and not the "global" one.

test/mcmc/gibbs_new.jl

yebai · 2023-11-16T16:19:45Z

test/mcmc/gibbs_new.jl

+        # `GibbsV2` does not work with SMC samplers, e.g. `CSMC`.
+        # FIXME: Oooor it is (see tests below). Uncertain.
+        Random.seed!(100)
+        alg = GibbsV2(@varname(s) => CSMC(15), @varname(m) => ESS(:m))
+        chain = sample(gdemo(1.5, 2.0), alg, 10_000)
+        @test_broken mean(chain[:s]) ≈ 49 / 24
+        @test_broken mean(chain[:m]) ≈ 7 / 6


I don't see an obvious reason. CSMC / PG's accuracy can often be improved by increasing the number of particles in each SMC sweep. So, this might go away if you use CSMC(50) or larger values.

the log-prob of the fixed variables

issues with some of the tests for GibbsV2

ref: TuringLang/DynamicPPL.jl#587

torfjelde · 2024-04-17T18:57:13Z

Tests are at least passing now, but I have some local changes I'm currently working on, so no merge-y yet please.

src/Turing.jl

src/experimental/gibbs.jl

src/mcmc/Inference.jl

test/experimental/gibbs.jl

torfjelde · 2024-04-19T13:04:19Z

This is starting to take shape @devmotion @yebai @sunxd3

After TuringLang/DynamicPPL.jl#587 there's very little reason to use the "current" Gibbs sampler over this experimental one. It provides much more flexibility, in addition to being much easier to debug, etc. I've also encountered issues with linking, etc. in the current impl of Gibbs which then just become a pain to debug.

…lde/new-gibbs

torfjelde · 2024-04-20T13:39:09Z

Tests are passing here now 👍

Think it's worth merging and making a release with this before we make a breaking release after #2197

yebai

Thanks @torfjelde -- happy to merge this since it is fairly self-contained.

One thing to consider is to rename experimental to from_future but that is just a fun thing to do.

torfjelde added 6 commits October 4, 2023 17:11

initial work on the new gibbs sampler

b3db449

added tests for the new Gibbs sampler

efe15db

added tests for new Gibbs

35f9f15

new Gibbs is now sampling (correctly) sequentially

072cf6b

let's not overload merge just yet

2f25199

export GibbsV2 + added more samplers to the tests

7ca26bb

torfjelde marked this pull request as draft October 7, 2023 00:40

torfjelde mentioned this pull request Oct 7, 2023

link and invlink should correctly work with Selector and thus Gibbs TuringLang/DynamicPPL.jl#542

Merged

added TODO comment

8b0de21

Merge branch 'master' into torfjelde/new-gibbs

90fd854

torfjelde mentioned this pull request Oct 8, 2023

subset and merge for VarInfo TuringLang/DynamicPPL.jl#543

Closed

sunxd3 mentioned this pull request Oct 12, 2023

Support particle Gibbs for BUGS models via compiling to Turing/DynamicPPL TuringLang/JuliaBUGS.jl#40

Closed

torfjelde added 4 commits October 20, 2023 01:10

Merge remote-tracking branch 'origin/torfjelde/new-gibbs' into torfje…

b960418

…lde/new-gibbs

Merge branch 'master' into torfjelde/new-gibbs

d000382

Merge branch 'master' into torfjelde/new-gibbs

144b9c0

removed lots of varinfo related merging functionality that is now

014fbe2

available in DynamicPPL

torfjelde mentioned this pull request Nov 16, 2023

Allow merge to work on VarInfo with different distributions TuringLang/DynamicPPL.jl#562

Merged

torfjelde added 2 commits November 16, 2023 14:58

shifting some code around

63d64e6

removed redundant constructor for GibbsV2

0dcd5bf

torfjelde commented Nov 16, 2023

View reviewed changes

yebai reviewed Nov 16, 2023

View reviewed changes

torfjelde added 4 commits November 18, 2023 16:09

added GibbsContext which is similar to FixContext but also computes

c29efc1

the log-prob of the fixed variables

adopted the rerun mechanism in Gibbs for GibbsV2, thus fixing the

bff0786

issues with some of the tests for GibbsV2

broken tests are no longer broken

5955167

fix issues with dot_tilde_* impls for GibbsContext

614dc52

torfjelde added 12 commits April 15, 2024 16:53

aaaalways link the varinfo in the new Gibbs sampler, just to be sure

b54e6eb

add test to cover recent improvement to DynamicPPL.subset

e7ad682

ref: TuringLang/DynamicPPL.jl#587

bump compat entry for DynamicPPL

2713316

added some docstrings

fb29556

fixed test

d3a13ad

fixed import

81bf9c0

another attempt at fixing tests

d340e58

another attempt at fixing tests

6f6fe7a

attempt at fix tests

9c93162

forgot something in previos commit

cfa8fb3

cleaned up the experimental Gibbs sampler a bit

15e83b9

removed accidentaly psuedocode inclusion

aed307a