-
Notifications
You must be signed in to change notification settings - Fork 19
Proximal operator for the entropy of location-scale families #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ng/AdvancedVI.jl into mixed_ad_advi
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Results
| Benchmark suite | Current: d70a720 | Previous: 780b850 | Ratio |
|---|---|---|---|
normal/RepGradELBO + STL/meanfield/Zygote |
15110617102 ns |
15940384416 ns |
0.95 |
normal/RepGradELBO + STL/meanfield/ForwardDiff |
4646957534.5 ns |
4640506885.5 ns |
1.00 |
normal/RepGradELBO + STL/meanfield/ReverseDiff |
3523940450 ns |
3656579468 ns |
0.96 |
normal/RepGradELBO + STL/meanfield/Mooncake |
2272870111 ns |
2408256110 ns |
0.94 |
normal/RepGradELBO + STL/fullrank/Zygote |
14905216684 ns |
15910940620 ns |
0.94 |
normal/RepGradELBO + STL/fullrank/ForwardDiff |
5014498207 ns |
4715181430.5 ns |
1.06 |
normal/RepGradELBO + STL/fullrank/ReverseDiff |
6735928269 ns |
6895395404 ns |
0.98 |
normal/RepGradELBO + STL/fullrank/Mooncake |
2483745385 ns |
2577665723 ns |
0.96 |
normal/RepGradELBO/meanfield/Zygote |
6648797440 ns |
7257838530 ns |
0.92 |
normal/RepGradELBO/meanfield/ForwardDiff |
2647811104 ns |
2569067456.5 ns |
1.03 |
normal/RepGradELBO/meanfield/ReverseDiff |
1512544613 ns |
1579425564 ns |
0.96 |
normal/RepGradELBO/meanfield/Mooncake |
1948442109 ns |
2061556711 ns |
0.95 |
normal/RepGradELBO/fullrank/Zygote |
6640526936 ns |
7307338895 ns |
0.91 |
normal/RepGradELBO/fullrank/ForwardDiff |
2902638831.5 ns |
2729437724.5 ns |
1.06 |
normal/RepGradELBO/fullrank/ReverseDiff |
2973548534 ns |
3059733066.5 ns |
0.97 |
normal/RepGradELBO/fullrank/Mooncake |
2012759524 ns |
2151115288 ns |
0.94 |
normal + bijector/RepGradELBO + STL/meanfield/Zygote |
22701617844 ns |
25051818691 ns |
0.91 |
normal + bijector/RepGradELBO + STL/meanfield/ForwardDiff |
14022151391 ns |
14011956162 ns |
1.00 |
normal + bijector/RepGradELBO + STL/meanfield/ReverseDiff |
4873726247 ns |
4959853898.5 ns |
0.98 |
normal + bijector/RepGradELBO + STL/meanfield/Mooncake |
6950828279 ns |
7122368520 ns |
0.98 |
normal + bijector/RepGradELBO + STL/fullrank/Zygote |
22738792674 ns |
24090366516 ns |
0.94 |
normal + bijector/RepGradELBO + STL/fullrank/ForwardDiff |
13995483682 ns |
14120227537 ns |
0.99 |
normal + bijector/RepGradELBO + STL/fullrank/ReverseDiff |
8654869292 ns |
8798444809 ns |
0.98 |
normal + bijector/RepGradELBO + STL/fullrank/Mooncake |
7315743178 ns |
7408969022 ns |
0.99 |
normal + bijector/RepGradELBO/meanfield/Zygote |
14635213555 ns |
14802935407 ns |
0.99 |
normal + bijector/RepGradELBO/meanfield/ForwardDiff |
11738807732 ns |
11654832656 ns |
1.01 |
normal + bijector/RepGradELBO/meanfield/ReverseDiff |
2694582297.5 ns |
2656304668 ns |
1.01 |
normal + bijector/RepGradELBO/meanfield/Mooncake |
6589720309 ns |
6499853350 ns |
1.01 |
normal + bijector/RepGradELBO/fullrank/Zygote |
14280821092 ns |
14809696663 ns |
0.96 |
normal + bijector/RepGradELBO/fullrank/ForwardDiff |
11653369767 ns |
11421759227 ns |
1.02 |
normal + bijector/RepGradELBO/fullrank/ReverseDiff |
4490560555.5 ns |
4414384964 ns |
1.02 |
normal + bijector/RepGradELBO/fullrank/Mooncake |
6831675873 ns |
6972571483 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
…ng/AdvancedVI.jl into mixed_ad_advi
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
ah, sorry for slow response on this! I'll take a look as soon as I got some free time (probably Wednesday). |
|
Oops, sorry for forgetting about this. I'll take a look tomorrow morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, good to merge from my end.
One minor code issue
| # begin |
|
Small technical question: am I reading it correctly that AdvancedVI right now uses the linear parametrization? |
…mal_entropy_location_scale
…ng/AdvancedVI.jl into proximal_entropy_location_scale
Co-authored-by: Xianda Sun <5433119+sunxd3@users.noreply.github.com>
Co-authored-by: Xianda Sun <5433119+sunxd3@users.noreply.github.com>
Yes, the default settings do, hence the involvement of |
|
Hmmm... seems like |
This adds the proximal operator for the entropy of location-scale families,
ProximalLocationScaleEntropy, which was proposed by J. Domke1 and later theoretically and empirically analyzed by J. Domke and myself 23.The use of proximal operators is to guarantee that the scale matrix is never singular, and for this it fixes the limitations of projection operators (
ClipScale). Mainly,ClipScalerequires an explicit lower bound on the posterior variance, which is arbitrary. Even then, if the lower bound is too loose, the algorithm may be unstable depending on the initialization and the stepsize. In fact, when I experimented with the parameter-free optimization algorithms currently provided byAdvancedVI,DoGandDoWGtend to be very aggressive in terms of stepsize, andClipScaleshowed instabilities.In the context of Turing, the combination of
ProximalLocationScaleEntropyandDoWGorDoGshould provide a robust tuning-free default setting for variational inference. (This is why I am working this before Turing integration.)Proximal operators depend on the internal of the optimization algorithm in use. This is fairly straightforward for algorithms that reduce everything into a scalar stepsize like
DoGandDoWG. For those who operate a vector-valued stepsize, things are less straightforward.Footnotes
Domke, Justin. "Provable smoothness guarantees for black-box variational inference." International Conference on Machine Learning. PMLR, 2020. ↩
Domke, Justin, Robert Gower, and Guillaume Garrigos. "Provable convergence guarantees for black-box variational inference." Advances in neural information processing systems 36 (2023): 66289-66327. ↩
Kim, Kyurae, et al. "On the convergence of black-box variational inference." Advances in Neural Information Processing Systems 36 (2023): 44615-44657. ↩