Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variational Inference for TuringLang #775

Closed
yebai opened this issue May 7, 2019 · 8 comments
Closed

Variational Inference for TuringLang #775

yebai opened this issue May 7, 2019 · 8 comments

Comments

@yebai
Copy link
Member

yebai commented May 7, 2019

We can consider adding support for the following variational inference methods to Turing

  1. Automatic derivatives variational inference [1,4]
  2. Variational inference with normalizing flows [2]
  3. Markov Chain Monte Carlo and variational inference: Bridging the gap [3]
  4. Black-box α-divergence minimization [5]

Comment: Project 3 should be interesting and also relatively easy since we have HMC support already.

Related projects:

[1]: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. The Journal of Machine Learning Research, 18(1), 430-474.
[2]: Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770.
[3]: Salimans, T., Kingma, D., & Welling, M. (2015, June). Markov chain Monte Carlo and variational inference: Bridging the gap. In International Conference on Machine Learning (pp. 1218-1226).
[4]: Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp. 568-576).
[5]: Hernández-Lobato, J. M., Li, Y., Rowland, M., Hernández-Lobato, D., Bui, T., & Turner, R. (2016). Black-box α-divergence minimization.

cc @cpfiffer @xukai92 @willtebbutt @mohamed82008

This was referenced May 7, 2019
@willtebbutt
Copy link
Member

Regarding point 2, @rpinsler gave a very interesting presentation on developments in this area recently, that we should maybe look at to decide what to target.

@cpfiffer
Copy link
Member

cpfiffer commented May 7, 2019

I suspect that mean-field VI is probably going to be the easiest to implement first, and it'll be easier to build up the infrastructure to support the more complex VI stuff like ADVI and stochastic VI. I think the infrastructure part will be very important to do in a reasonable way, because it seems like there's a fair amount of heterogeneity in VI methods.

We should highlight this to the JSoC people: @sharanry and @torfjelde. Welcome to both of you!

I really enjoyed reading Variational Inference: A Review for Statisticians for an overview of the VI-verse. Figured I should throw this paper in here while we're tossing knowledge around.

@torfjelde
Copy link
Member

torfjelde commented May 7, 2019

Thank you @cpfiffer !

I would also like to add Operator Variational Inference (OPVI) [1] to the bunch. It's basically a more general description of VI which a lot of standard VI methods are instantiations of, e.g. AVDI, Stein Variational Gradient Descent. This seems like a promising design, and is also the approach taken by pymc3.

Worth noting that ADVI and Normalizing flows will depend heavily on Bijectors.jl, therefore there might be some work to be done there too. I already have a working example of ADVI using Turing but I have to compute the jacobian of the inverse transform using ForwardDiff. Maybe this functionality should already be exposed in Bijectors.jl? For example I believe this is available in similar packages e.g. TransformVariables.jl (see TransformVariables.jl/src/generic.jl#L204).

I'll be done with exams on Friday, so then I'll start looking more into this.

  1. Ranganath, R., Altosaar, J., Tran, D., & Blei, D. M., Operator Variational Inference, CoRR, (2016), https://arxiv.org/abs/1610.09033.

@sharanry
Copy link
Collaborator

sharanry commented May 8, 2019

Thanks for the welcome @cpfiffer!

As discussed in last week's meeting, I believe that we need a lot of discussion on the design of this project. Especially in order to be able to have a common framework for these methods which will enable us to add new methods/techniques in the future with relative ease. @xukai92 suggested Mnih and Rezende (2016) as a good starting point to design abstraction of VI which would support the commonly used ELBO and IWAE.

Mean Field VI keeping the abstractions for other techniques in mind might be a good starting point?

I have exams till Tuesday, 14th. After which I will start with the suggested literature and hopefully contribute to this discussion.

@Red-Portal
Copy link
Member

Red-Portal commented Jul 31, 2019

Is there still demand for Stein Variational Gradient Descent (SVGD) variational inference?
I currently have a julia implementation that I used for my research,
I might be able to contribute it to Turing if desired.

Also, I think having Kernel Stein Discrepancy for MCMC diagnostic would be useful.

@torfjelde
Copy link
Member

Is there still demand for Stein Variational Gradient Descent (SVGD) variational inference?

That's great, and there is indeed! Feel free to give it a go and just ask me if there's anything (you can find me on the Julia slack as torfjelde). As VI is still heavily under development, the interface is yet not documented. If you want, I can point you to a blog post of mine where I go through a rather contrived example of implementing CAVI in this interface. Or you can just checkout the implementation of ADVI in src/variational/advi.jl. A immediate difference is of course that you have to also update the parameters of the objective in your implementation of optimize!.

Also, I think having Kernel Stein Discrepancy for MCMC diagnostic would be useful.

Funny you mention that; I'm fairly familiar with the KSD and implemented some stuff over at https://github.com/torfjelde/KernelGoodnessOfFit.jl. For example I have an implementation of the KSD as a goodness-of-fit test. Seems like what you're looking for? I wasn't sure if people in the Bayesian community were actually using the KSD as a diagnostic, though I recall it being mentioned as a use-case. Development of that package halted for a bit as I've been occupied with Turing for the moment, but my intention is to develop it further at some later point. And if there is an interest and it intersects with Turing.jl's interests as a MCMC diagnostic, the "later" quickly becomes "soon":)

@Red-Portal
Copy link
Member

Red-Portal commented Jul 31, 2019

I wasn't sure if people in the Bayesian community were actually using the KSD as a diagnostic, though I recall it being mentioned as a use-case.

And if there is an interest and it intersects with Turing.jl's interests as a MCMC diagnostic, the "later" quickly becomes "soon":)

I'm personally not a statistician, and indeed KSD methods don't seem to be a popular tool at the moment. However, I think if KSD is right at the fingertip of the users of Turing.jl we might see a gain in popularity? Especially since there seem to be a distrust against Effective Sample Size (ESS) as a metric for goodness-of-fit. It could also be used to quantify goodness-of-fit of various variational inference models apart from the good old KL divergence.

That's great, and there is indeed! Feel free to give it a go and just ask me if there's anything (you can find me on the Julia slack as torfjelde).

Great! I'll get in touch once my current work is done. (I think it would take a few months though)

@yebai
Copy link
Member Author

yebai commented Dec 16, 2021

@yebai yebai closed this as completed Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants