Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Approximate Inference #194

Merged
merged 27 commits into from
Aug 20, 2021
Merged

Refactor Approximate Inference #194

merged 27 commits into from
Aug 20, 2021

Conversation

rossviljoen
Copy link
Member

@rossviljoen rossviljoen commented Aug 4, 2021

Implements the idea in #193 (see also JuliaGaussianProcesses/ApproximateGPs.jl#13)

Essentially, this replaces the current method for constructing approximate posteriors:
approx_posterior(VFE(), fx, y, fz)
with
posterior(VFE(z), fx, y)

(n.b. dropping the approx_ prefix - should we keep it?)

The reasoning is that using this style of approximation generalises more easily to cases where you need more complex approximations such as in SparseGPs - e.g. using a variational distribution q: posterior(SVGP(z, q), fx, y).

Furthermore, it only requires passing the inducing inputs z instead of a full fz isa FiniteGP which enforces the fact that fz and fx are constructed from the same underlying process fx.f.

It also moves the elbo and dtc from finite_gp.jl to approx_posterior_gp.jl and redefines them from:

elbo(fx, y, fz)
to
elbo(VFE(z), fx, y)


As @willtebbutt mentioned in #193 - it would be useful to define other functions like logpdf and rand on the approximation:

posterior(approximation, fx, y)
logpdf(approximation, fx, y)
rand(rng, approximation, fx)

@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2021

Codecov Report

Merging #194 (cf23c30) into master (b44dd01) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #194      +/-   ##
==========================================
+ Coverage   97.98%   98.05%   +0.06%     
==========================================
  Files          10       10              
  Lines         348      359      +11     
==========================================
+ Hits          341      352      +11     
  Misses          7        7              
Impacted Files Coverage Δ
src/finite_gp_projection.jl 100.00% <ø> (ø)
src/sparse_approximations.jl 100.00% <100.00%> (ø)
src/util/test_util.jl 93.90% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b44dd01...cf23c30. Read the comment docs.

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good -- just a couple of comments.

Unfortunately I think this is going to have to be a breaking change because one can no longer construct VFE without any arguments. I guess it's time for 0.4.

src/posterior_gp/approx_posterior_gp.jl Outdated Show resolved Hide resolved
src/posterior_gp/approx_posterior_gp.jl Outdated Show resolved Hide resolved
@willtebbutt
Copy link
Member

We can add logpdf(approximation, fx, y) and rand(rng, approximation, fx) at a later date, because they won't break the API.

(n.b. dropping the approx_ prefix - should we keep it?)

I'm happy with this. If we find that it's confusing / causes problems for some reasons, we can always revert at a later date.

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>
Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now.

Unless @st-- has any objections, I propose that we bump the version of 0.4.0-DEV, so that we can remove deprecations / make any other breaking changes we need to make over the next couple of days, and aim to release 0.4.0 on Thursday.

@st--
Copy link
Member

st-- commented Aug 10, 2021

Ohhh I still want to rename a bunch of things... not sure we'll manage that by Thursday though!

Copy link
Member

@st-- st-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally happy with this, I think it's a good step forward. I've left a few minor suggestions for clean-up.

One larger question I have (but I'd be happy to discuss this separately/change it in another PR if you think that makes more sense): why posterior(VFE(z), fx, y) and not, say, posterior(VFE(f(z)), x, y)? I'm mainly wondering if you had thought this through and if so what conclusions you came to, or if you hadn't thought this through what pros/cons either way you might see:)

Co-authored-by: st-- <st--@users.noreply.github.com>
@rossviljoen
Copy link
Member Author

Cheers for the review :)

The reason I did it that way is because you need to somehow pass both the observation noise for x and the jitter for z, so it seemed a bit cleaner to me to do
posterior(VFE(z; jitter=1e-6), f(x, 0.1), y)
vs something like
posterior(VFE(f(z, 1e-6)), x, y, 0.1).

Also, it's more consistent with the exact posterior(fx, y) in that the meaning of fx is the same in both.

@st--
Copy link
Member

st-- commented Aug 11, 2021

For non-conjugate inference, would we then have something like
posterior(VFE(z; jitter=1e-6), f(x, likelihood), y)? 🤔

@rossviljoen
Copy link
Member Author

rossviljoen commented Aug 11, 2021

Yes, I suppose it would currently look like:

f = LatentGP(GP(kernel), likelihood, jitter)
posterior(VFE(z; jitter=1e-6), f(x), y)

Edit: Sorry, that's nonsense

@willtebbutt
Copy link
Member

For non-conjugate inference, would we then have something like

Could do -- I think we need to have some more discussion about this. We've also got the LatentGP type as Ross points out. I'd imagined approximate inference under non-Gaussian likelihoods would look more like what he suggests, but I don't have especially strong opinions either way.

@st--
Copy link
Member

st-- commented Aug 11, 2021

Yes, I suppose it would currently look like:

f = LatentGP(GP(kernel), likelihood, jitter)
posterior(VFE(z; jitter=1e-6), f(x), y)

Edit: Sorry, that's nonsense

Awww, that's a shame; I thought that looked really good actually 😄

@st--
Copy link
Member

st-- commented Aug 11, 2021

As discussed in person, let's just release this as 0.4.0. Ideally, we would wait with that until #190 has been merged & released as 0.3.9. (This is blocked, in turn, by JuliaGaussianProcesses/KernelFunctions.jl#344 and JuliaDiff/ChainRules.jl#496)

@rossviljoen
Copy link
Member Author

Looking at this again, there is a problem with using posterior(VFE(z), fx, y) instead of posterior(VFE(f(z)), x, y). For the SVGP approximation, this becomes: posterior(SVGP(z, q)) since the data are not actually needed to construct the posterior. Therefore, the process f would need to be somehow passed as well.

I'm actually somewhat leaning towards reverting to the original version of:
posterior(VFE(f(z, jitter)), f(x, σ²), y) and just asserting that fz.f == fx.f? The SVGP version then becomes posterior(SVGP(f(z, jitter), q)). I think this gives the most natural way of passing the observation/jitter noise - it's just a shame about the redundancy of passing f twice...

@willtebbutt
Copy link
Member

Hmmm, yes, good point.

Thinking about this, I feel like we might be missing something from the API, but I'm not entirely sure what .

I think that, once an approximation has been built and an ApproxPosteriorGP produced, it's clear what the API should look like (just the usual AbstractGPs / FiniteGPs API).

The question, as I understand it, is

  1. what the API should be to construct such an approximate posterior, and
  2. what the API should be for approximate log marginal likelihood computaitons.

The only issue I have with your proposal is that you wind up with a posterior function that might take three arguments, or might take 1, depending upon the type of approximation being employed. I would prefer to have exactly the same API for all approximations if possible (so that one can just drop in one for the other anywhere), but maybe that's not possible in a way that seems reasonable?

For example, maybe we should be making the VFE / SVGP constructors something like

VFE(z, f(x), y; jitter)
SVGP(f(z, jitter), q)

? But then what do we replace elbowith for the VFE approximation? Or would the idea be that

logpdf(VFE(f(z), f(x), y), f(x), y)
logpdf(SVGP(f(z), q), f(x), y; kwargs...)

both yield the appropriate elbo?

Sorry, that's a bit of a brain dump. Do you have any more thoughts about the 1 vs 3 arg problem @rossviljoen ? Or perhaps I'm just being overly fussy, and we shouldn't worry about it for now?

@rossviljoen
Copy link
Member Author

I think that summarises the problem perfectly.

I think that your proposed constructors make the most conceptual sense, since VFE is essentially a special case of SVGP where q is determined completely by fx, y, which leads naturally to:

VFE(fz, fx, y)
SVGP(fz, q)

but then the elbo/logpdf has a huge amount of redundancy as you point out.

I also agree that the entire point of this PR was to get a consistent API across all approximations. To that end, I'm in favour of the API looking like:

posterior(approx, fx, y)
elbo(approx, fx, y)

because that is also the most consistent with the non-approx API.
Then, we'd have

approx = VFE(f(z, jitter))
approx = SVGP(f(z, jitter), q)

and simply define

posterior(SVGP(fz, q), fx, y) = posterior(SVGP(fz, q))

?

You're right though, it still doesn't quite feel 100%

@willtebbutt
Copy link
Member

willtebbutt commented Aug 17, 2021

At this point in time, I think it's more important to having something in, so that we can crack on with building the other functionality, even if we refactor later when we realise there's a better way to do stuff.

but then the elbo/logpdf has a huge amount of redundancy as you point out.

I suppse that one could just delay computing stuff with VFE until either posterior or logpdf / elbo is called?

and simply define

Would it not simply be posterior(approx) in both cases, since VFE would now have enough information to know how to compute the approximate posterior without any additional arguments?

edit: sorry, I misread your conclusion as to which thing to do. Lets go with the one you actually proposed, meaning that my second comment above is redundant.

@rossviljoen
Copy link
Member Author

Ok, I've changed it again to use the style

posterior(VFE(f(z)), f(x), y)
elbo(VFE(f(z)), f(x), y)

as well as added assertions that fz.f == fx.f

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for this to go in, once a minor version bump has been added and my comment addressed.

@rossviljoen rossviljoen merged commit 5630f89 into JuliaGaussianProcesses:master Aug 20, 2021
@st-- st-- mentioned this pull request Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants