Improve documentation #77

BenjaminJCox · 2023-02-07T09:41:38Z

As requested on the Slack I am posting my perspective as to how the docs could be improved.

The below is not complete, but these are the issues that I have come across when trying to implement an adaptive multiple importance sampler along the lines of https://arxiv.org/abs/0907.1254

All of the below comes from the perspective of a guy that designs samplers and parameter inference methods, so some things are probably obvious but perhaps named or laid out differently than expected by me as I am educated in statistics only.

Overview of potential improvements to documentation for DynamicPPL (and associated):

Document expected return values of functions, e.g. what do DynamicPPL.assume and DynamicPPL.observe return, what do AbstractMCMC.step and DynamicPPL.initialstep return? How are these used in the sampling loop?
What does DynamicPPL.updategid! do? It is used in both the MH and HMC implementations yet it is completely unexplained what a gid is.
I believe that https://turinglang.org/v0.24/docs/for-developers/interface uses an outdated version of the AbstractMCMC api, as AbstractMCMC.step! seems to be deprecated in favour of AbstractMCMC.step, which has a different return structure.
I do not see any way by which multiple samples can be saved per iteration. Whilst this is never really done for MCMC, it is very common for importance samplers and related methods. Indication as to whether this is possible within the existing Turing framework would be useful.
https://turinglang.org/v0.24/docs/for-developers/interface seems to be entirely outdated, and is also the only resource for implementing a sampler within Turing.
A flowchart of what happens at each sampling step with required inputs and outputs would be invaluable.
It is unclear as to how to access conditional posterior likelihoods (e.g. sample the mean and variance separately, as these may require different sampler design). It seems to be done via DynamicPPL.condition and DynamicPPL.decondition, but the examples for these are bizarre and not representative of real word usage.
I think that having a standard example model would be extremely useful, and I believe that this model should take in a dataset as an argument as this is how it would be in usage.
It is not clear how to take gradients of the posterior likelihood within DynamicPPL. It may be as simple as calling gradient on DynamicPPL.logjoint or DynamicPPL.getlogp, but it is possible that there may be issues with this. A small tutorial could ease this.
There is no documentation (or at least I cannot find any) for DynamicPPL.assume or DynamicPPL.observe or derived functions, but these are extremely important to implementing a sampler.
I cannot find documentation for VarInfo or how it is used, although it seems to be used in all samplers.
I believe that the importance sampler is considered the example of implementing a sampler using the Turing api. I believe that it could do with extensive documentation as to what everything does, as the tutorial that references it is out of date. Also I think that the use of push!! is legacy and should be updated, but I am unsure.
https://turinglang.org/v0.24/docs/for-developers/how_turing_implements_abstractmcmc is no longer correct, as nearly all of the functions called therein have been updated within DynamicPPL. This tutorial represents a valuable resource for potential contributors, and I believe that it could be updated to 'tutorialise' the entire implemented importance sampler over a few hours by someone who knows what they are doing.

Suggestion:

Denote by θ the model parameters and by x the data
Most samplers can be implemented using (a subset of) the following:
• p(θ|x) the posterior likelihood
• p(θ) the parameter prior
• p(x|θ) the data likelihood
• A way to evaluate the above at arbitrary parameter values
• A way to evaluate conditionals of above for subsets of the sampled variables (e.g. let θ = [μ,σ], get p(μ|σ, x))
• first and second order derivatives of the posterior likelihood
• Transforming the parameter space to an unconstrained space
• A place to store samples from each iteration (potentially multiple samples per iteration)
• A place to store weights and other (meta)data associated with samples at each iteration
• A way to accumulate probabilities from each iteration

To this end I think that it would be invaluable to have a tutorial that implements a very basic HMC algorithm (literally just a textbook method) within Turing (i.e. using DynamicPPL and AbstractMCMC), as it will cover the majority of these in a way that allows extension.

Addendum:

I believe that with some tweaking the Turing ecosystem has potential to be an excellent tool for prototyping inference algorithms on complex models, however it is currently rather opaque as to how to implement samplers. Furthermore, it seems to be built around the implicit assumptions of MCMC, with two obvious ones being one sample per iteration and equal weighting of samples. Whilst these assumptions can be circumvented (e.g. by writing your method using only the DynamicPPL modelling interface), it would be useful to have native interop, although I appreciate this is a big ask.

I understand that improving this API is secondary to improving the user facing part of Turing, as many more people use Turing without implementing their own samplers. However, I believe that improving the documentation will attract more contributors, and allow the use of the Turing ecosystem in reseaching sampler design in addition to statistical studies.

I am of course happy to help to the best of my ability, but I do not understand the design of DynamicPPL or AbstractMCMC to an extent that I think I could.

(apologies for the formatting, I copied this over from notepad)

torfjelde · 2023-02-07T13:43:47Z

This is really useful stuff @BenjaminJCox ; thank you so much! This is partially because the documentation used to be separate from the actual code (this has improved now that we have https://turinglang.org/library) + there has been numerous improvements to a lot in this process of over in particular the past year, e.g. most samplers don't have to interact with Turing.jl at all to be compatible with turing; you can just work with LogDensityProblems.jl, and then, thanks to DynamicPPL.LogDensityFunction, you'll be compatible with Turing models too.

But all of this stuff is not at all obvious.

To this end I think that it would be invaluable to have a tutorial that implements a very basic HMC algorithm (literally just a textbook method) within Turing (i.e. using DynamicPPL and AbstractMCMC), as it will cover the majority of these in a way that allows extension.

I think this is the way to go. A lot of the points here could probably be improved significantly by just implementing the same sampler using the difference approaches, showing the user the different possible paths one can take.

I'll have a shot at this 👍

torfjelde · 2023-02-07T15:21:28Z

A flowchart of what happens at each sampling step with required inputs and outputs would be invaluable.

Something like this but more on the sampling?

graphviz

digraph {
    # Nodes.
    tilde_node [shape=box, label="x ~ Normal()", fontname="Courier"];
    base_node [shape=box, label=< left = <FONT COLOR="#3B6EA8">@varname</FONT>(x)<BR/>right = Normal()<BR/>x, vi = ... >, fontname="Courier"];
    ifobs [label=< <FONT COLOR="#3B6EA8">if</FONT> isobservation >, fontname="Courier"];
    tilde_assume_bangbang [shape=box, label="tilde_assume!!(context, left, right, vi)", fontname="Courier"];
    tilde_observe_bangbang [shape=box, label="tilde_observe!!(context, left, right, vi)", fontname="Courier"];

    tilde_assume_bangbang_inner [shape=box, label="x, lp, vi = tilde_assume(context, left, right, vi)\nreturn x, acclogp!!(vi, lp)", fontname="Courier"];
    tilde_observe_bangbang_inner [shape=box, label="lp, vi = tilde_observe(context, left, right, vi)\n return left, acclogp!!(vi, lp)", fontname="Courier"];

    tilde_assume [shape=box, label="tilde_assume(context, left, right, vi)", fontname="Courier"];
    tilde_assume_sampling [shape=box, label="tilde_assume(rng, context, sampler, left, right, vi)", fontname="Courier"];

    assume [shape=box, label="assume(left, right, vi)", style=dashed, fontname="Courier"];
    assume_sampling [shape=box, label="assume(rng, sampler, left, right, vi)", style=dashed, fontname="Courier"];

    tilde_observe [shape=box, label="tilde_observe(context, left, right, vi)", fontname="Courier"];
    observe [shape=box, label="observe(left, right, vi)", style=dashed, fontname="Courier"];

    ifsampling [label=< <FONT COLOR="#3B6EA8">if</FONT> sampling >, fontname="Courier"];
    ifleafcontext1, ifleafcontext2, ifleafcontext3 [label=<<FONT COLOR="#3B6EA8">if</FONT> <FONT COLOR="#9A7500">LeafContext</FONT>>, fontname="Courier"];

    childcontext1, childcontext2, childcontext3 [shape=box, label="context = childcontext(context)", fontname="Courier"];

    # Edges.
    tilde_node -> base_node [style=dashed, label=<  <FONT COLOR="#3B6EA8">@model</FONT>>, fontname="Courier"]

    base_node -> ifobs;
    ifobs -> tilde_assume_bangbang [label=<  <FONT COLOR="#97365B">false</FONT>>, fontname="Courier"];
    ifobs -> tilde_observe_bangbang [label=<  <FONT COLOR="#4F894C">true</FONT>>, fontname="Courier"];

    # Assume
    tilde_assume_bangbang -> tilde_assume_bangbang_inner;
    tilde_assume_bangbang_inner -> tilde_assume;
    tilde_assume -> ifsampling;
    ifsampling -> ifleafcontext1 [label=<  <FONT COLOR="#97365B">false</FONT>>, fontname="Courier"];
    ifsampling -> tilde_assume_sampling [label=<  <FONT COLOR="#4F894C">true</FONT>>, fontname="Courier"];
    ifleafcontext1 -> childcontext1 [label=<  <FONT COLOR="#97365B">false</FONT>>, fontname="Courier"]
    childcontext1 -> tilde_assume;
    ifleafcontext1 -> assume [label=<  <FONT COLOR="#4F894C">true</FONT>>, fontname="Courier"]
    tilde_assume_sampling -> ifleafcontext2;
    ifleafcontext2 -> assume_sampling [label=<  <FONT COLOR="#4F894C">true</FONT>>, fontname="Courier"];
    ifleafcontext2 -> childcontext2 [label=<  <FONT COLOR="#97365B">false</FONT>>, fontname="Courier"];
    childcontext2 -> tilde_assume_sampling;

    # Observe
    tilde_observe_bangbang -> tilde_observe_bangbang_inner;
    tilde_observe_bangbang_inner -> tilde_observe;
    tilde_observe -> ifleafcontext3;
    ifleafcontext3 -> childcontext3 [label=<  <FONT COLOR="#97365B">false</FONT>>, fontname="Courier"];
    ifleafcontext3 -> observe [label=<  <FONT COLOR="#4F894C">true</FONT>>, fontname="Courier"]
    childcontext3 -> tilde_observe;
}

yebai · 2023-02-07T16:49:23Z

@torfjelde excellent diagram!

BenjaminJCox · 2023-02-08T05:16:36Z

@torfjelde This is the type of diagram I had in mind, however I think that explaining what each node means would be very helpful.

Implementing the same sampler using the various different approaches seems like a very good idea. Having an implementation using LogDensityProblems.jl and the interface would be very useful as someone that is used to implementing samplers based on statistical methodology papers, as I think it would reduce the required knowledge of the internals of the PPL.

yebai mentioned this issue Oct 23, 2023

Add generated documentation #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation #77

Improve documentation #77

BenjaminJCox commented Feb 7, 2023

torfjelde commented Feb 7, 2023

torfjelde commented Feb 7, 2023

yebai commented Feb 7, 2023

BenjaminJCox commented Feb 8, 2023 •

edited

Improve documentation #77

Improve documentation #77

Comments

BenjaminJCox commented Feb 7, 2023

Overview of potential improvements to documentation for DynamicPPL (and associated):

Suggestion:

Addendum:

torfjelde commented Feb 7, 2023

torfjelde commented Feb 7, 2023

yebai commented Feb 7, 2023

BenjaminJCox commented Feb 8, 2023 • edited

BenjaminJCox commented Feb 8, 2023 •

edited