Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VI - Variational Inference support #498

Open
ahartikainen opened this issue Jan 5, 2019 · 9 comments
Open

VI - Variational Inference support #498

ahartikainen opened this issue Jan 5, 2019 · 9 comments

Comments

@ahartikainen
Copy link
Contributor

ahartikainen commented Jan 5, 2019

To support VI we need to decide what different aspect we should implement.

Data structure

We probably need a special data structure for VI results.

It could contain:

  • posterior mean
  • posterior std
  • pseudo-samples
  • VI diagnostics
  • log_likelihood (?)

Functions

Also, our current functions need to throw a warning/exception if the VI results try to use mcmc specific functions.

I think we don't need to support 100% of VI capabilities before starting.

Libraries

Libs we should support

  • PyMC3 VI
  • Stan VI
  • Tensorflow Probability
  • Pyro
  • ProbTorch

Output from libs:

Stan (ADVI)

  • posterior mean
  • posterior standard deviation
  • pseudo-samples from posterior
  • statistics
@ahartikainen ahartikainen changed the title VI datastructure VI - Variational Inference support Jan 13, 2019
@ahartikainen
Copy link
Contributor Author

For the first step (I know it is more complicated than this)

One idea is to use InferenceData as is, but for VI results don't have chain dimension. Also mean and other statistics would go under one group (e.g. approximate_posterior) and pseudo-samples could go other group (e.g. "pseudo_posterior"). ELBO and other diagnostics could probably go under sample_stats.

Then for VI specific plots, diagnostics and stats would be under arviz.vi.
Also MCMC specific stuff would only need to check that both "chain" and "draw" dimensions exist.

ps. better names than those, please

cc. @avehtari @dustinvtran @kyleabeauchamp @yaochitc @esennesh

@yaochitc
Copy link
Contributor

That's interesting. It is quite helpful to have vi support.

@avehtari
Copy link

  • For diagnostics and importance sampling adjustment it would be useful to have both the log density of true posterior and the log density of approximation.
  • In some cases it can be useful to run VI several times and do Rhat type of diagnostic that the different VI runs converge to the same solution or in multimodal case do stacking of different VI runs.

@djinnome
Copy link

What is the current status of ArviZ support for variational inference output?

@gibsramen
Copy link

Is anybody currently working on this? I have an immediate need for this functionality (specifically for Stan output) that I'm going to write some custom code for but I'm happy to take a stab at a PR for general use later.

cc @mortonjt

@ColCarroll
Copy link
Member

Maybe a weird suggestion here, but my current thinking is that a good "standard data structure" for VI would be an object that supports, say, .sample and .log_prob (and .log_prob_parts, perhaps?) This saves us from having to think about encoding different variational families, and punts on the idea of easy serialization to disk.

Given such an object and the target_log_prob (or target_log_prob_parts), it seems like it would be reasonable to produce an InferenceData object that is able to use arviz functionality, while maintaining a sort of semantic distance that makes it clear that a VI fit is different from an MCMC fit.

That seems like a not very helpful response, so more concretely:

  1. I would be interested if someone (@gibsramen?) shared an implementation of the above API
  2. I would also be interested if others had strong feelings about the design

@sethaxen
Copy link
Member

Is there a reason we need access to the sampling and log-density functionality of a VI model? e.g. when we need samples from a posterior represented by a PPL, we wrap an object from the PPL in a suitable SamplingWrapper to access the necessary functionality. But for statistics, visualization, and serialization, MC samples and sample statistics are a more useful and general representation of the object, and these could be stored in the same InferenceData without any problems.

We implicitly support such MC draws from posteriors right now, where the semantic distance between the methods used to obtain those draws is enforced by the function call. e.g. ess is an MCMC diagnostic. bfmi is specifically an HMC diagnostic. We also have the reff keyword to loo. But hdi or r2_score just assume MC draws and don't care how those are generated.

I guess what I'm asking is whether we will actually need the log-prob function or sample function of a VI method often enough to warrant creation of a new InferenceData object, or like with other posterior objects, is it enough to simply have something like a SamplingWrapper that is used for just a few features that need this.

@ColCarroll
Copy link
Member

Thanks, @sethaxen -- well put!

Maybe two issues I'm still stuck on are

  1. I guess it seems weird to me to throw away answers to some (often) analytic questions ("what is the mean and standard deviation of my approximate posterior") in favor of approximate ones, and
  2. it makes me uncomfortable to recommend some number of samples from an approximate posterior. In the MCMC regime, there's an implicit tradeoff of computation and accuracy, but when you're just drawing samples from gaussians, I really don't know!

Maybe more globally, I tend to think about using samples to approximate integrals as one approach, and am a little anxious to expand the scope of this project to (directly) include approximating families of distributions.

I think your answer is more pragmatic, though, and if users of VI libraries would use arviz, I'd be generally in favor of supporting that.

@mortonjt
Copy link
Contributor

Hi, I think @gibsramen and I were erring on the more practical approach. I agree @ColCarroll , the ideal approach is to have the actual function form serialized to disk (and keep track of all of the variational parameters). But from a short-term dev perspective, I'm not sure how practical this is, particularly for complex variational distributions.

In the short-term, I think treating arviz objects as wrappers to samples from a posterior distribution is already extremely useful, enabling evaluation of many of the metrics provided in arviz. I can't immediately comment on the semantic distinction. From a user perspective, does it matter what algorithm generated the posterior samples? If so, then perhaps it is worthwhile to consider abstractions to provide a taxonomy for these use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants