Standardising APIs between compiler and inference methods #16

xukai92 · 2018-12-20T15:23:58Z

See https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-471339521 for a summary.

yebai · 2018-12-22T16:32:06Z

@willtebbutt @xukai92
how about we always drop all observe statements in gdemo()(), ie, we always sample all variables from the prior, even for the following case raised by @willtebbutt (slightly modified)

@model demo(y) = begin
   5 ~ Normal(0,1)   # rv1
   y ~ Normal(0,1) . # rv2
end

gdemo()()  # sample both `rv1` and `rv2` from the prior, ingoring `rv1=5`
gdemo(2)() # sample from the posterior

This leads to consistent behaviour and should be intuitive too.

xukai92 · 2018-12-22T21:58:17Z

I see. But what I mean by this issue is to actually have some concrete functions generated by evaluating gdemo() (I think @willtebbutt also proposed this some time ago).

Currently behaviour:

@model demo(y) = begin
   5 ~ Normal(0,1)   # rv1
   y ~ Normal(0,1) . # rv2
end
mf = gdemo()
mf() # => sampling from prior

But instead we could have

@model demo(y) = begin
   5 ~ Normal(0,1)   # rv1
   y ~ Normal(0,1) . # rv2
end
mf = gdemo(???) # this returns mf::UniqueRuntimeModelType
                # and create some function definitions:
logpdf(::UniqueRuntimeModelType, vi) = ... # (1) return logpdf by evaluating using vi
rand(::UniqueRuntimeModelType) = ... # (2) gdemo()() in your example
sample(::UniqueRuntimeModelType) = ... # (3) gdemo(2)() in your example

I guess it could also potentially give more space for performance because during sample we are explicitly using the specific version of logpdf and rand.

willtebbutt · 2018-12-23T14:38:08Z

@yebai your proposal has the issue that the behaviour of the following programme becomes ambiguous:

@model demo() = begin
    5 ~ Normal(0, 1)
end
demo() # what does this do?

This makes me nervous. On the one hand, this programme is weird, on the other it's a valid Turing programme, so we should ensure that the behaviour is well-defined. My proposal below will mostly resolve this.

@xukai92 I completely agree regarding the format of logpdf. Regarding rand and sample: certainly we could define one to sample from the prior and the other the posterior, but this wouldn't be particularly intuitive (there's nothing in the names to suggest the proposed behaviour). Instead, I propose the following:

Don't define sample, just use rand and logpdf (or maybe logp, since logpdf isn't defined for all models supported by Turing?).
The current proposal regarding gdemo(missing) to indicate that the prior should be used for y seems sensible to me.
We implement gdemo(Prior()) to indicate that all random variables should be sampled from the prior. In this case, both rv1 and rv2 would be sample from the prior.
rand(mf) produces a VarInfo. rand(mf, N) produces a Vector{<:VarInfo} of length N.
logpdf(mf, vi) works as proposed by @xukai92.

mohamed82008 · 2018-12-23T18:23:09Z

One the one hand, this programme is weird, on the other it's a valid Turing programme, so we should ensure that the behaviour is well-defined.

We can give an error if the model has no variables.

UniqueRuntimeModelType

This can be just const PriorModel{pvars, F} = Model{pvars, Tuple{}, F, NamedTuple{(), Tuple{}}} which means no data, all parameters. For this case, no special sampling alg is required. More generally however, we can make use of the existing Sampler type, which we can add model (and vi) to. Then we can simply do: rand(Sampler(model, alg)) or for models without parameters, it can be rand(Sampler(model)).

Btw, after the new refactor, if you call gdemo(), it will give you a PriorModel{pvars, F}, with pvars as Tuple{:y}.

willtebbutt · 2018-12-23T18:35:54Z

We can give an error if the model has no variables.

Agreed, but this programme definitely has a random variable. It's anonymous, but it's definitely there, and if we define sample-from-prior functionality, then presumably something should happen.

Btw, after the new refactor, if you call gdemo(), it will give you a PriorModel{pvars, F}, with pvars as Tuple{:y}.

This is probably fine for now, but I'm not sure it's appropriate long-term for the same reasons as above.

This can be just const PriorModel{pvars, F} = Model{pvars, Tuple{}, F, NamedTuple{(), Tuple{}}} which means no data, all parameters. For this case, no special sampling alg is required. More generally however, we can make use of the existing Sampler type, which we can add model (and vi) to. Then we can simply do: rand(Sampler(model, alg)) or for models without parameters, it can be rand(Sampler(model)).

I like this. We would have to think carefully about how the rand(Sampler(model, alg)) would work for MCMC though. We should also probably implement a few convenience functions as well i.e. rand(model) = rand(Sampler(model)) and similar.

mohamed82008 · 2018-12-23T18:44:23Z

It's anonymous, but it's definitely there, and if we define sample-from-prior functionality, then presumably something should happen.

Is there a practical application of sampling this anonymous random variable, or is it just for theoretical appeal? Note that implementation-wise, there is nothing random about a model with no explicit random variables; every time you run the model, you will get exactly the same thing, hence my difficulty digesting this anonymous random variable concept. I can see how 5 plays the role of data. So perhaps more generally this falls under likelihood models, where the only sensible thing to do with it, that I can think of, is to calculate the likelihood.

We should also probably implement a few convenience functions as well i.e. rand(model) = rand(Sampler(model)) and similar.

Sounds good.

mohamed82008 · 2018-12-23T18:51:14Z

Note that currently, we just ignore any numbers or arrays on the LHS of a ~. If you want, we can assign an anonymous dvar to it instead and assign the value on the LHS of the ~ to this anonymous dvar. This would probably be more theoretically accurate. And it allows us to view the model with 5 ~ Normal(0, 1) only as a LikelihoodModel with 1 dvar.

Edit: by ignore I mean that we don't treat it as an explicit variable, but it still affects the logp calculation.

willtebbutt · 2018-12-23T18:56:14Z

Is there a practical application of sampling this anonymous random variable, or is it just for theoretical appeal?

I'm just looking for consistency. I don't have a particular application in mind :)

Note that implementation-wise, there is nothing random about a model with no explicit random variables; every time you run the model, you will get exactly the same thing.

Not if you run the model in sample-from-prior mode.

If you want, we can assign an anonymous dvar to it instead and assign the value on the LHS of the ~ to this anonymous dvar. This would probably be more theoretically accurate. And it allows us to view the model with 5 ~ Normal(0, 1) only as a LikelihoodModel with 1 dvar.

This general kind of thing would be a good change. If the model is constructed in the regular way, it should be a dvar as you suggest, but if the model is run in sample-from-prior mode, then it should presumably magically become a dvar.

mohamed82008 · 2018-12-23T19:03:40Z

Not if you run the model in sample-from-prior mode.

@model demo() = begin
   5 ~ Normal(0,1)
end
model = demo()
logp = model(Turing.VarInfo(), Turing.SampleFromPrior())
all(isequal(logp), (model(Turing.VarInfo(), Turing.SampleFromPrior()) for i in 1:100))

vi is empty in every case, with a logp only.

mohamed82008 · 2018-12-23T19:05:32Z

then it should presumably magically become a dvar.

This should be totally possible when constructing the Sampler.

willtebbutt · 2018-12-23T19:07:02Z

Sorry, I should have been clearer. I was referring to what I think should be going on rather than what actually happens at the minute. You're totally correct regarding the current behaviour. It's my feeling that if we implement a sampler-from-prior mode, then we should take these anonymous random variables more seriously and treat them more like named random-variables.

mohamed82008 · 2018-12-23T19:10:26Z

then we should take these anonymous random variables more seriously and treat them more like named random-variables.

I don't foresee a problem handling these. I can try a proof-of-concept after the VarInfo refactor. Perhaps @trappmartin can also help with this after he reads the new compiler code.

trappmartin · 2019-01-30T09:34:39Z

Thanks for cc‘ing me. I’m pretty much in favour of the vi = rand(Sampler(model)) and the logpdf(model, vi) concept. Though, I believe it would be good if we keep the sample function. It’s semantically more clear to me what this does and that I can expect it to generate samples using a MCMC sampler. I would not expect it to draw from the prior as rand.

However, I’m not sure how inference with VI would fit into those syntaxes to stay consistent without having too many entry points for the user.

willtebbutt · 2019-01-30T10:16:15Z

However, I’m not sure how inference with VI would fit into those syntaxes to stay consistent without having too many entry points for the user.

My thinking here is that VI works fine with the rand(sampler(model, alg), N) design, where the alg parameter would be a representation of the approximate posterior distribution. Semantically this reads as: draw N samples from the posterior distribution over the random variables in model (each of which should be a VarInfo?) using alg. This works for both VI and MCMC, so provides a single point of entry for both kinds of samplers.

Though, I believe it would be good if we keep the sample function. It’s semantically more clear to me what this does and that I can expect it to generate samples using a MCMC sampler.

Again, my feeling here is that we wouldn't need the sample function, as the creation of the sampler, e.g. rand(sampler(model, HMC(...)), N), makes it quite clear that we're drawing N samples from model using the specified HMC algorithm. Although I could live with the sample function being an alias for rand, I suspect that rand is more consistent with existing stuff.

I’m pretty much in favour of the vi = rand(Sampler(model)) and the logpdf(model, vi) concept.

As am I :)

A minor stylistic thing, @yebai what would your thoughts be on making the "correct" thing to do be sampler(model, alg) rather than Sampler(model, alg)? (At least to me) the latter says make me an object of type Sampler whereas the former says make me a thing that can generate samples from model using alg, which may or may not be an object of type Sampler. The distinction is important because the Sampler option commits us to always producing an object of type Sampler if we want to produce samples, whereas the former allows us to spit out whatever object we like provided that it can be used to generate samples via rand.

xukai92 · 2019-02-07T16:30:59Z

Reading through your explanation of sample v.s. Sampler, I feel sample is better, although in most of the cases (at this moment), sample will return a Sampler to us. Also I think using a function rather than a type conversion syntax is a more standard Julia style.

yebai · 2019-03-10T20:08:46Z

Here is a summary of some proposals in this issue (biased towards personal aesthetic)

logp(m::Model, v::VarInfo, s::Selector=undef) computes log density/mass for model m at value v::VarInfo
Sampler(m, alg) creates a sampler for model using alg as the sampling method
rand(s::Sampler(m, alg), N) produces a Vector{<:VarInfo} of length N.
- rand(s::Sampler(m, SampleFromPrior()), N) produces a Vector{<:VarInfo} of length N, by sampling from the prior distribution while ignoring all observations (including something like 5~Normal(0,1)).
- sample(m, alg; N=N, args...) provides an alias to rand(Sampler(model, alg), N, args...)
- v = rand(m::Model, N=1) is an alias to v = rand(Sampler(m, SampleFromPrior()), N)

This would provide a relatively well-defined interface between the compiler and inference methods (e.g. variational methods and sampling methods). For variational methods, the most important API is the logp function and its gradients. While for sampling methods, both the logp function and the rand function are necessary. This would allow us to clean up the code in many sampler's implementations. Such as examples below:

runmodel!(m::Model, v::VarInfo, spl::AbstractSampler) ==> logp(m::Model, v::VarInfo) https://github.com/TuringLang/Turing.jl/blob/6d48b96b03617f4410fd90becd7fb3a6c0d7e7e8/src/core/VarReplay.jl#L106
v = model(v::VarInfo, s::AbstractSampler) ==> v= rand(Sampler(m, alg), 1)
https://github.com/TuringLang/Turing.jl/blob/bedbb5a10d3da794957d64677f0fcfa9542ac830/src/inference/hmc.jl#L133
Introduce an intermediate algorithm type: AbstractRunner
- type SampleFromPrior <: AbstractRunner end,
- similiarly, ComputeLogDensity, SampleFromUniform, ParticleFiltering
- define assume, observe functions for Runner types
- helpful to enable sharing code among (high-level) inference algorithms, e.g. HMC, Gibbs, VI.
  https://github.com/TuringLang/Turing.jl/blob/6d48b96b03617f4410fd90becd7fb3a6c0d7e7e8/src/Turing.jl#L65

A long term goal is to build up a clear separation between the compiler and inference methods (see TuringLang/Turing.jl#456). Such that inference methods can be implemented in separate packages with a dependency on Turing.

Anything missing?

…l errors. (#634)

xukai92 · 2019-03-19T00:42:39Z

v = rand(m::Model, N=1) is an alias to v = rand(Sampler(m, SampleFromPrior()), N)

How about enabling this alias with support to sampling algorithm as well.

v = rand(m::Model, N=1; alg=SampleFromPrior()) = rand(Sampler(m, alg), N)

xukai92 · 2019-03-22T17:43:07Z

How about AbstractRunner -> AbstractTilde

willtebbutt · 2019-03-22T17:46:38Z

I don't know that we actually need a supertype here. I mean, is there going to be a fallback that can be applied to Abstract<Thing>? If not, we can just create concrete types as required and avoid the hassle of having to choose and name for an abstract type, and forcing things to inherit from it.

yebai · 2019-03-27T11:22:07Z

Let's stick with AbstractRunner type for now and change it to something else when we got better ideas.

mohamed82008 · 2019-04-01T16:00:48Z

I suggest during the model compilation we generate 2 functions inside Model:

One for the log likelihood which replaces all the a ~ dist lines with a ~ Fixed() where the assume of Fixed returns the variable value in vi and 0 for the logpdf. This is useful for maximum likelihood estimation.
The second is the logpdf of the joint probability distribution that we have now. This is to be used for MAP and inference.

trappmartin · 2019-04-03T16:58:13Z

@mohamed82008 I think this is a good idea. Thanks! I'll take a closer look into this asap.

trappmartin · 2019-04-04T10:03:47Z

@yebai I had some time to read through your proposal. It sound quite reasonable to me. However, I have a few questions / remarks.

What is the purpose of the Selector in the logp function? Do we need that?

In favour of what Mohamed suggested and the latest discussions on model criticism, it might be good to have:

logjoint(m::Model, v::VarInfo, s::Selector=undef) == logp
logpdf(m::Model, v::VarInfo, s::Selector=undef)
logppd(m::Model, v::Vector{VarInfo}, s::Selector=undef)

As proposed by Mohamed we could do this by generating respective functions in the model macro.

willtebbutt · 2019-04-04T10:31:49Z

Could someone please link the relevant discussions here?

@trappmartin could you please explain the above proposal in a bit more depth? What are the differences between each of the functions? What are logjoin and logppd short for? Thanks!

@mohamed82008 could you please expand on your proposal? It's not obvious to me why we need a separate model construction to perform maximum likelihood / when we would actually be interested in performing maximum likelihood in our context: if the user has gone to the trouble to specify priors over parameters, why would they wish to ignore them?

trappmartin · 2019-04-04T10:35:07Z

logjoint = log p(\theta, x), i.e. the log of the joint probability function of the model.
logppd= \sum_i^N log 1/T \sum_t^T p(x_i | \theta_t), i.e. the log pointwise predictive density function.

trappmartin · 2019-04-04T10:37:48Z

I don't think we want maximum likelihood. But I agree it would be good to provide an interface to compute the log likelihood.

willtebbutt · 2019-04-04T10:38:48Z

Sorry, I'm not familiar with the logppd, could you explain why it's useful?

logjoin = log p(\theta, x), i.e. the log of the joint probability function of the model.

If we're going to rename logp to something, could we please write logjoint? For the cost of an extra character it's quite a bit less ambiguous haha

trappmartin · 2019-04-04T10:40:28Z

Uh, sorry that was a typo. I meant logjoint. :D
The log pointwise predictive density is useful for model criticism and model selection.

willtebbutt · 2019-04-04T10:47:22Z

Uh, sorry that was a typo. I meant logjoint. :D

haha fair enough. Glad we're on the same page with this.

The log pointwise predictive density is useful for model criticism and model selection.

Is there a good reference for this, or is this something that should be obvious?

trappmartin · 2019-04-04T10:53:39Z

See the paper by A. Gelman, J. Hwang, and A. Vehtari on Understanding predictive information criteria for Bayesian models for example.

mohamed82008 · 2019-04-04T11:20:42Z

@mohamed82008 could you please expand on your proposal? It's not obvious to me why we need a separate model construction to perform maximum likelihood / when we would actually be interested in performing maximum likelihood in our context: if the user has gone to the trouble to specify priors over parameters, why would they wish to ignore them?

I just thought it would be cool to compare ML, MAP, VI and MCMC without changing the model definition! But @trappmartin has a lot more interesting use cases :)

yebai · 2019-04-04T13:48:31Z

What is the purpose of the Selector in the logp function? Do we need that?

@trappmartin

you're right - Selector here is not necessary. This might be taken from some interface discussed in https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-44965726, i.e.

# s::Sampler ==> s::Selector
logp = model(Turing.VarInfo(), s::Sampler)

Since logp will always compute the log joint probability in the new design, we don't really need s::Selector to select subsets of random variables.

I suggest during the model compilation we generate 2 functions inside Model:
One for the log likelihood which replaces all the a ~ dist lines with a ~ Fixed() where the assume of Fixed returns the variable value in vi and 0 for the logpdf. This is useful for maximum likelihood estimation.
The second is the logpdf of the joint probability distribution that we have now. This is to be used for MAP and inference.
I just thought it would be cool to compare ML, MAP, VI and MCMC without changing the model definition! But @trappmartin has a lot more interesting use cases :)

@mohamed82008 I'm not sure I understand the point here. For MAP and VI based learning, all we need is logp and its gradient. We don't need to modify the compiler.

logjoint = log p(\theta, x), i.e. the log of the joint probability function of the model.
logppd= \sum_i^N log 1/T \sum_t^T p(x_i | \theta_t), i.e. the log pointwise predictive density function.
See the paper by A. Gelman, J. Hwang, and A. Vehtari on Understanding predictive information criteria for Bayesian models for example.

@trappmartin

It seems logjoint is the same as logp?

I agree it's helpful to have some support for model evaluation and criticism, but I prefer to keep the standard API between compiler and inference minimal. These additional features can be supported in a separate module (e.g. MCMCChains). In fact, it appears that logppd can be implemented in the form of a new Runner or even InferenceAlgorithm.

trappmartin · 2019-04-04T14:26:50Z

It seems logjoint is the same as logo?

Yes, seems more telling to me than logp

I agree it's helpful to have some support for model evaluation and critism, but I prefer to keep the standard API between compiler and inference minimal. These additional features in a seprate module. In fact, it apears that logppd can be implemented in the form of a new Runner or even InferenceAlgorithm.

Good point. I'll try to do so as it seems relevant for several people.

… martint/#634

phipsgabler · 2021-02-01T14:58:15Z

Is there anything here that has not already implemented in the meantime, or couldn't be moved to the discussions at AbstractPPL?

yebai · 2021-02-03T17:12:26Z

Nothing that I know of. Please feel free to close, and/or move some useful bits to AbstractPPL.

yebai referenced this issue in TuringLang/Turing.jl Mar 11, 2019

Renamed runmodel! to logp!; initial support for catching numerica…

7dd253c

…l errors. (#634)

yebai changed the title ~~Forbid mf() and implement explicit rand and logpdf functions for mf~~ Standardising APIs between compiler and inference methods Mar 20, 2019

yebai pinned this issue Mar 27, 2019

yebai assigned trappmartin Mar 27, 2019

trappmartin referenced this issue in TuringLang/Turing.jl Apr 13, 2019

Merge branch 'master' of https://github.com/TuringLang/Turing.jl into…

fe63a3e

… martint/#634

trappmartin referenced this issue in TuringLang/Turing.jl Apr 23, 2019

Merge branch 'master' of https://github.com/TuringLang/Turing.jl into…

43a3e7d

… martint/#634

yebai unpinned this issue May 6, 2019

mohamed82008 transferred this issue from TuringLang/Turing.jl Dec 25, 2019

mohamed82008 unassigned trappmartin Dec 25, 2019

mohamed82008 mentioned this issue May 14, 2020

Gripes and anti-gripes about VarInfo #119

Closed

phipsgabler mentioned this issue Jan 31, 2021

AbstractVarInfo interface TuringLang/AbstractPPL.jl#1

Closed

phipsgabler closed this as completed Feb 9, 2021

Standardising APIs between compiler and inference methods #16

Standardising APIs between compiler and inference methods #16

Comments

xukai92 commented Dec 20, 2018

yebai commented Dec 22, 2018 • edited

xukai92 commented Dec 22, 2018 • edited

willtebbutt commented Dec 23, 2018 • edited

mohamed82008 commented Dec 23, 2018 • edited

willtebbutt commented Dec 23, 2018

mohamed82008 commented Dec 23, 2018 • edited

mohamed82008 commented Dec 23, 2018 • edited

willtebbutt commented Dec 23, 2018 • edited

mohamed82008 commented Dec 23, 2018

mohamed82008 commented Dec 23, 2018

willtebbutt commented Dec 23, 2018

mohamed82008 commented Dec 23, 2018

trappmartin commented Jan 30, 2019

willtebbutt commented Jan 30, 2019

xukai92 commented Feb 7, 2019

yebai commented Mar 10, 2019 • edited

xukai92 commented Mar 19, 2019 • edited by yebai

xukai92 commented Mar 22, 2019

willtebbutt commented Mar 22, 2019

yebai commented Mar 27, 2019 • edited

mohamed82008 commented Apr 1, 2019

trappmartin commented Apr 3, 2019

trappmartin commented Apr 4, 2019 • edited

willtebbutt commented Apr 4, 2019 • edited

trappmartin commented Apr 4, 2019 • edited

trappmartin commented Apr 4, 2019

willtebbutt commented Apr 4, 2019

trappmartin commented Apr 4, 2019 • edited

willtebbutt commented Apr 4, 2019

trappmartin commented Apr 4, 2019 • edited

mohamed82008 commented Apr 4, 2019

yebai commented Apr 4, 2019 • edited

trappmartin commented Apr 4, 2019

phipsgabler commented Feb 1, 2021

yebai commented Feb 3, 2021

yebai commented Dec 22, 2018 •

edited

xukai92 commented Dec 22, 2018 •

edited

willtebbutt commented Dec 23, 2018 •

edited

mohamed82008 commented Dec 23, 2018 •

edited

mohamed82008 commented Dec 23, 2018 •

edited

mohamed82008 commented Dec 23, 2018 •

edited

willtebbutt commented Dec 23, 2018 •

edited

yebai commented Mar 10, 2019 •

edited

xukai92 commented Mar 19, 2019 •

edited by yebai

yebai commented Mar 27, 2019 •

edited

trappmartin commented Apr 4, 2019 •

edited

willtebbutt commented Apr 4, 2019 •

edited

trappmartin commented Apr 4, 2019 •

edited

trappmartin commented Apr 4, 2019 •

edited

trappmartin commented Apr 4, 2019 •

edited

yebai commented Apr 4, 2019 •

edited