Refactor Approximate Inference #194

rossviljoen · 2021-08-04T16:26:07Z

Implements the idea in #193 (see also JuliaGaussianProcesses/ApproximateGPs.jl#13)

Essentially, this replaces the current method for constructing approximate posteriors:
approx_posterior(VFE(), fx, y, fz)
with
posterior(VFE(z), fx, y)

(n.b. dropping the approx_ prefix - should we keep it?)

The reasoning is that using this style of approximation generalises more easily to cases where you need more complex approximations such as in SparseGPs - e.g. using a variational distribution q: posterior(SVGP(z, q), fx, y).

Furthermore, it only requires passing the inducing inputs z instead of a full fz isa FiniteGP which enforces the fact that fz and fx are constructed from the same underlying process fx.f.

It also moves the elbo and dtc from finite_gp.jl to approx_posterior_gp.jl and redefines them from:

elbo(fx, y, fz)
to
elbo(VFE(z), fx, y)

As @willtebbutt mentioned in #193 - it would be useful to define other functions like logpdf and rand on the approximation:

posterior(approximation, fx, y)
logpdf(approximation, fx, y)
rand(rng, approximation, fx)

codecov-commenter · 2021-08-04T16:31:55Z

Codecov Report

Merging #194 (cf23c30) into master (b44dd01) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #194      +/-   ##
==========================================
+ Coverage   97.98%   98.05%   +0.06%     
==========================================
  Files          10       10              
  Lines         348      359      +11     
==========================================
+ Hits          341      352      +11     
  Misses          7        7

Impacted Files	Coverage Δ
src/finite_gp_projection.jl	`100.00% <ø> (ø)`
src/sparse_approximations.jl	`100.00% <100.00%> (ø)`
src/util/test_util.jl	`93.90% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b44dd01...cf23c30. Read the comment docs.

willtebbutt

Broadly looks good -- just a couple of comments.

Unfortunately I think this is going to have to be a breaking change because one can no longer construct VFE without any arguments. I guess it's time for 0.4.

src/posterior_gp/approx_posterior_gp.jl

willtebbutt · 2021-08-08T15:39:56Z

We can add logpdf(approximation, fx, y) and rand(rng, approximation, fx) at a later date, because they won't break the API.

(n.b. dropping the approx_ prefix - should we keep it?)

I'm happy with this. If we find that it's confusing / causes problems for some reasons, we can always revert at a later date.

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>

willtebbutt

I'm happy with this now.

Unless @st-- has any objections, I propose that we bump the version of 0.4.0-DEV, so that we can remove deprecations / make any other breaking changes we need to make over the next couple of days, and aim to release 0.4.0 on Thursday.

st-- · 2021-08-10T19:23:37Z

Ohhh I still want to rename a bunch of things... not sure we'll manage that by Thursday though!

docs/src/concrete_features.md

src/posterior_gp/approx_posterior_gp.jl

st--

I'm generally happy with this, I think it's a good step forward. I've left a few minor suggestions for clean-up.

One larger question I have (but I'd be happy to discuss this separately/change it in another PR if you think that makes more sense): why posterior(VFE(z), fx, y) and not, say, posterior(VFE(f(z)), x, y)? I'm mainly wondering if you had thought this through and if so what conclusions you came to, or if you hadn't thought this through what pros/cons either way you might see:)

Co-authored-by: st-- <st--@users.noreply.github.com>

rossviljoen · 2021-08-10T22:35:00Z

Cheers for the review :)

The reason I did it that way is because you need to somehow pass both the observation noise for x and the jitter for z, so it seemed a bit cleaner to me to do
posterior(VFE(z; jitter=1e-6), f(x, 0.1), y)
vs something like
posterior(VFE(f(z, 1e-6)), x, y, 0.1).

Also, it's more consistent with the exact posterior(fx, y) in that the meaning of fx is the same in both.

st-- · 2021-08-11T06:49:05Z

For non-conjugate inference, would we then have something like
posterior(VFE(z; jitter=1e-6), f(x, likelihood), y)? 🤔

rossviljoen · 2021-08-11T08:44:25Z

~~Yes, I suppose it would currently look like:~~

~~f = LatentGP(GP(kernel), likelihood, jitter)~~
~~posterior(VFE(z; jitter=1e-6), f(x), y)~~

Edit: Sorry, that's nonsense

willtebbutt · 2021-08-11T10:00:03Z

For non-conjugate inference, would we then have something like

Could do -- I think we need to have some more discussion about this. We've also got the LatentGP type as Ross points out. I'd imagined approximate inference under non-Gaussian likelihoods would look more like what he suggests, but I don't have especially strong opinions either way.

st-- · 2021-08-11T11:12:05Z

~~Yes, I suppose it would currently look like:~~

~~f = LatentGP(GP(kernel), likelihood, jitter)~~
~~posterior(VFE(z; jitter=1e-6), f(x), y)~~

Edit: Sorry, that's nonsense

Awww, that's a shame; I thought that looked really good actually 😄

st-- · 2021-08-11T13:35:21Z

As discussed in person, let's just release this as 0.4.0. Ideally, we would wait with that until #190 has been merged & released as 0.3.9. (This is blocked, in turn, by JuliaGaussianProcesses/KernelFunctions.jl#344 and JuliaDiff/ChainRules.jl#496)

rossviljoen · 2021-08-16T22:31:30Z

Looking at this again, there is a problem with using posterior(VFE(z), fx, y) instead of posterior(VFE(f(z)), x, y). For the SVGP approximation, this becomes: posterior(SVGP(z, q)) since the data are not actually needed to construct the posterior. Therefore, the process f would need to be somehow passed as well.

I'm actually somewhat leaning towards reverting to the original version of:
posterior(VFE(f(z, jitter)), f(x, σ²), y) and just asserting that fz.f == fx.f? The SVGP version then becomes posterior(SVGP(f(z, jitter), q)). I think this gives the most natural way of passing the observation/jitter noise - it's just a shame about the redundancy of passing f twice...

willtebbutt · 2021-08-17T12:29:52Z

Hmmm, yes, good point.

Thinking about this, I feel like we might be missing something from the API, but I'm not entirely sure what .

I think that, once an approximation has been built and an ApproxPosteriorGP produced, it's clear what the API should look like (just the usual AbstractGPs / FiniteGPs API).

The question, as I understand it, is

what the API should be to construct such an approximate posterior, and
what the API should be for approximate log marginal likelihood computaitons.

The only issue I have with your proposal is that you wind up with a posterior function that might take three arguments, or might take 1, depending upon the type of approximation being employed. I would prefer to have exactly the same API for all approximations if possible (so that one can just drop in one for the other anywhere), but maybe that's not possible in a way that seems reasonable?

For example, maybe we should be making the VFE / SVGP constructors something like

VFE(z, f(x), y; jitter)
SVGP(f(z, jitter), q)

? But then what do we replace elbowith for the VFE approximation? Or would the idea be that

logpdf(VFE(f(z), f(x), y), f(x), y)
logpdf(SVGP(f(z), q), f(x), y; kwargs...)

both yield the appropriate elbo?

Sorry, that's a bit of a brain dump. Do you have any more thoughts about the 1 vs 3 arg problem @rossviljoen ? Or perhaps I'm just being overly fussy, and we shouldn't worry about it for now?

rossviljoen · 2021-08-17T13:58:25Z

I think that summarises the problem perfectly.

I think that your proposed constructors make the most conceptual sense, since VFE is essentially a special case of SVGP where q is determined completely by fx, y, which leads naturally to:

VFE(fz, fx, y)
SVGP(fz, q)

but then the elbo/logpdf has a huge amount of redundancy as you point out.

I also agree that the entire point of this PR was to get a consistent API across all approximations. To that end, I'm in favour of the API looking like:

posterior(approx, fx, y)
elbo(approx, fx, y)

because that is also the most consistent with the non-approx API.
Then, we'd have

approx = VFE(f(z, jitter))
approx = SVGP(f(z, jitter), q)

and simply define

posterior(SVGP(fz, q), fx, y) = posterior(SVGP(fz, q))

?

You're right though, it still doesn't quite feel 100%

willtebbutt · 2021-08-17T14:26:32Z

At this point in time, I think it's more important to having something in, so that we can crack on with building the other functionality, even if we refactor later when we realise there's a better way to do stuff.

but then the elbo/logpdf has a huge amount of redundancy as you point out.

I suppse that one could just delay computing stuff with VFE until either posterior or logpdf / elbo is called?

and simply define

Would it not simply be posterior(approx) in both cases, since VFE would now have enough information to know how to compute the approximate posterior without any additional arguments?

edit: sorry, I misread your conclusion as to which thing to do. Lets go with the one you actually proposed, meaning that my second comment above is redundant.

rossviljoen · 2021-08-18T11:20:41Z

Ok, I've changed it again to use the style

posterior(VFE(f(z)), f(x), y)
elbo(VFE(f(z)), f(x), y)

as well as added assertions that fz.f == fx.f

src/sparse_approximations.jl

willtebbutt

I'm happy for this to go in, once a minor version bump has been added and my comment addressed.

rossviljoen added 6 commits August 4, 2021 01:39

First pass at new approx posterior API

35c9dbe

Merge branch 'master' of github.com:rossviljoen/AbstractGPs.jl

5dae447

Move elbo to approx_posterior_gp.jl

90899d4

Update ELBO tests

0b77d94

Fix elbo tests

ece42b1

Parameterise VFE type

fec0991

rossviljoen added 2 commits August 5, 2021 13:17

Only pass inducing inputs to VFE

cac1f57

Formatting

e448a03

willtebbutt reviewed Aug 8, 2021

View reviewed changes

src/posterior_gp/approx_posterior_gp.jl Outdated Show resolved Hide resolved

src/posterior_gp/approx_posterior_gp.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

9c074ef

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>

willtebbutt approved these changes Aug 10, 2021

View reviewed changes

rossviljoen added 4 commits August 10, 2021 11:53

Explain jitter in docstring

74f127a

Update regression example

f2c1592

Update concrete features docs

119eaa6

Replace comment

331a284