Skip to content

Conversation

@Red-Portal
Copy link
Member

@Red-Portal Red-Portal commented Mar 14, 2025

This adds the proximal operator for the entropy of location-scale families, ProximalLocationScaleEntropy, which was proposed by J. Domke1 and later theoretically and empirically analyzed by J. Domke and myself 23.

The use of proximal operators is to guarantee that the scale matrix is never singular, and for this it fixes the limitations of projection operators (ClipScale). Mainly, ClipScale requires an explicit lower bound on the posterior variance, which is arbitrary. Even then, if the lower bound is too loose, the algorithm may be unstable depending on the initialization and the stepsize. In fact, when I experimented with the parameter-free optimization algorithms currently provided by AdvancedVI, DoG and DoWG tend to be very aggressive in terms of stepsize, and ClipScale showed instabilities.

In the context of Turing, the combination of ProximalLocationScaleEntropy and DoWG or DoG should provide a robust tuning-free default setting for variational inference. (This is why I am working this before Turing integration.)

Proximal operators depend on the internal of the optimization algorithm in use. This is fairly straightforward for algorithms that reduce everything into a scalar stepsize like DoG and DoWG. For those who operate a vector-valued stepsize, things are less straightforward.

Footnotes

  1. Domke, Justin. "Provable smoothness guarantees for black-box variational inference." International Conference on Machine Learning. PMLR, 2020.

  2. Domke, Justin, Robert Gower, and Guillaume Garrigos. "Provable convergence guarantees for black-box variational inference." Advances in neural information processing systems 36 (2023): 66289-66327.

  3. Kim, Kyurae, et al. "On the convergence of black-box variational inference." Advances in Neural Information Processing Systems 36 (2023): 44615-44657.

Red-Portal and others added 15 commits March 14, 2025 16:54
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark suite Current: d70a720 Previous: 780b850 Ratio
normal/RepGradELBO + STL/meanfield/Zygote 15110617102 ns 15940384416 ns 0.95
normal/RepGradELBO + STL/meanfield/ForwardDiff 4646957534.5 ns 4640506885.5 ns 1.00
normal/RepGradELBO + STL/meanfield/ReverseDiff 3523940450 ns 3656579468 ns 0.96
normal/RepGradELBO + STL/meanfield/Mooncake 2272870111 ns 2408256110 ns 0.94
normal/RepGradELBO + STL/fullrank/Zygote 14905216684 ns 15910940620 ns 0.94
normal/RepGradELBO + STL/fullrank/ForwardDiff 5014498207 ns 4715181430.5 ns 1.06
normal/RepGradELBO + STL/fullrank/ReverseDiff 6735928269 ns 6895395404 ns 0.98
normal/RepGradELBO + STL/fullrank/Mooncake 2483745385 ns 2577665723 ns 0.96
normal/RepGradELBO/meanfield/Zygote 6648797440 ns 7257838530 ns 0.92
normal/RepGradELBO/meanfield/ForwardDiff 2647811104 ns 2569067456.5 ns 1.03
normal/RepGradELBO/meanfield/ReverseDiff 1512544613 ns 1579425564 ns 0.96
normal/RepGradELBO/meanfield/Mooncake 1948442109 ns 2061556711 ns 0.95
normal/RepGradELBO/fullrank/Zygote 6640526936 ns 7307338895 ns 0.91
normal/RepGradELBO/fullrank/ForwardDiff 2902638831.5 ns 2729437724.5 ns 1.06
normal/RepGradELBO/fullrank/ReverseDiff 2973548534 ns 3059733066.5 ns 0.97
normal/RepGradELBO/fullrank/Mooncake 2012759524 ns 2151115288 ns 0.94
normal + bijector/RepGradELBO + STL/meanfield/Zygote 22701617844 ns 25051818691 ns 0.91
normal + bijector/RepGradELBO + STL/meanfield/ForwardDiff 14022151391 ns 14011956162 ns 1.00
normal + bijector/RepGradELBO + STL/meanfield/ReverseDiff 4873726247 ns 4959853898.5 ns 0.98
normal + bijector/RepGradELBO + STL/meanfield/Mooncake 6950828279 ns 7122368520 ns 0.98
normal + bijector/RepGradELBO + STL/fullrank/Zygote 22738792674 ns 24090366516 ns 0.94
normal + bijector/RepGradELBO + STL/fullrank/ForwardDiff 13995483682 ns 14120227537 ns 0.99
normal + bijector/RepGradELBO + STL/fullrank/ReverseDiff 8654869292 ns 8798444809 ns 0.98
normal + bijector/RepGradELBO + STL/fullrank/Mooncake 7315743178 ns 7408969022 ns 0.99
normal + bijector/RepGradELBO/meanfield/Zygote 14635213555 ns 14802935407 ns 0.99
normal + bijector/RepGradELBO/meanfield/ForwardDiff 11738807732 ns 11654832656 ns 1.01
normal + bijector/RepGradELBO/meanfield/ReverseDiff 2694582297.5 ns 2656304668 ns 1.01
normal + bijector/RepGradELBO/meanfield/Mooncake 6589720309 ns 6499853350 ns 1.01
normal + bijector/RepGradELBO/fullrank/Zygote 14280821092 ns 14809696663 ns 0.96
normal + bijector/RepGradELBO/fullrank/ForwardDiff 11653369767 ns 11421759227 ns 1.02
normal + bijector/RepGradELBO/fullrank/ReverseDiff 4490560555.5 ns 4414384964 ns 1.02
normal + bijector/RepGradELBO/fullrank/Mooncake 6831675873 ns 6972571483 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Red-Portal and others added 3 commits March 14, 2025 19:07
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@sunxd3
Copy link
Member

sunxd3 commented Mar 17, 2025

ah, sorry for slow response on this! I'll take a look as soon as I got some free time (probably Wednesday).

@Red-Portal
Copy link
Member Author

@sunxd3 @mhauru @yebai Could we move this forward?

@sunxd3
Copy link
Member

sunxd3 commented Mar 25, 2025

Oops, sorry for forgetting about this. I'll take a look tomorrow morning.

Copy link
Member

@sunxd3 sunxd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, good to merge from my end.

One minor code issue

@sunxd3
Copy link
Member

sunxd3 commented Mar 26, 2025

Small technical question: am I reading it correctly that AdvancedVI right now uses the linear parametrization?

Red-Portal and others added 4 commits March 28, 2025 14:38
…ng/AdvancedVI.jl into proximal_entropy_location_scale
Co-authored-by: Xianda Sun <5433119+sunxd3@users.noreply.github.com>
Co-authored-by: Xianda Sun <5433119+sunxd3@users.noreply.github.com>
@Red-Portal
Copy link
Member Author

Red-Portal commented Mar 28, 2025

Small technical question: am I reading it correctly that AdvancedVI right now uses the linear parametrization?

Yes, the default settings do, hence the involvement of ClipScale or ProximalLocationScaleEntropy, but users could implement their nonlinear parameterized location-scales if they wish to.

@Red-Portal Red-Portal requested a review from sunxd3 March 28, 2025 18:53
@Red-Portal
Copy link
Member Author

Hmmm... seems like mapreduce with Zygote is broken again.

@Red-Portal Red-Portal merged commit 8fdff72 into main Apr 29, 2025
12 of 16 checks passed
@Red-Portal Red-Portal deleted the proximal_entropy_location_scale branch April 29, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants