-
Notifications
You must be signed in to change notification settings - Fork 19
Remove the type ParamSpaceSGD
#205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @yebai , could you check to make sure that this is what you asked for? Personally, I feel the |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Red-Portal -- I left some comments above. In addition, let's simplify the folder structure a bit for clarity:
- move all files in
paramsspacesgdtoalgorithms, eg, "algorithms/paramspacesgd/constructors.jl" to "algorithms/constructors.jl" - keep each algorithm in its own file
Also, I'd suggest we consider renaming paramspacesgd.jl to interface.jl or something along the lines:
- "algorithms/paramspacesgd/paramspacesgd.jl" to "algorithms/interface.jl"
|
Hi Hong, I planned to do the restructuring in a separate PR to keep things simple in this one. Though:
After the release of v0.5, we'll be adding algorithms that don't conform to the original |
|
It is okay to keep all algorithms under
You can add more algorithms to |
I am saying that these new algorithms can't be grouped in |
|
"grouping" refers to grouping interface code together for similar VI algorithms in the proposed |
|
If I understand right, this PR flattens the existing Before we drop it, may I understand what concrete benefits the flattening delivers? In particular, are we planning to add other algorithm families alongside the current At the moment, I haven't quite convince myself that the flattening of the type hierarchy is necessary. |
|
I suggested the removal of the |
|
@yebai @sunxd3 Thanks for chiming in. Actually, I have a new idea. So I believe the main complaint at the moment is that the term In a nutshell, the nice thing about the current
or something along these lines? Would that resolve your concern? |
|
I think I see your point. But, I am not sure that helps.
That probably includes every learning algorithm in ML. |
Yes, that is indeed almost true! But the point is that there are a couple of important algorithms that don't quite conform to this formalism, as they result in a custom update rule. They don't fall out of a gradient estimator, but modify the parameter update step too. So this is the reason I wish to allow for two different abstraction levels. But as you said, most algorithms only require defining a gradient estimator. So the lower-level interface helps unify the code for all those algorithms. |
We are at risk of premature abstraction and introducing heuristic terminology here. It is better to work with concrete algorithms, and define a union type if sharing code is needed (eg, There might be some insights we can learn by taking a unifying view of parameter space gradient descent VI, but that is a discussion we should have offline for a review paper. |
My main beef with using Unions here is the following:
With that said, do you find the solution below still unsatisfying? At least I hope that this resolves your concern that the terminology is non-standard.
If you think we should still go with an implicit interface, then I'll follow for the sake of moving forward. |
|
Hi @yebai , what is your final verdict on what we should do here given my last comment above? |
|
Hi @ Red-Portal, I don't think the proposed interfaces are better. For clarity, I'd suggest we remove Happy to continue the discussion offline and explore whether parameter space can be useful for novel algorithmic insights. |
…VI.jl into remove_paramspacesgd
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
Hi @yebai , I applied all the changes you requested. Let me know if you're satisfied with the code now. Updating the docs will be separate since it will be a lot. (And painful on my end 😅 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks, @Red-Portal, for the contributions. I'm happy with the changes; @penelopeysm / @mhauru might want to review this too.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Results
| Benchmark suite | Current: 7d3ed86 | Previous: 9b2eabb | Ratio |
|---|---|---|---|
normal/RepGradELBO + STL/meanfield/Zygote |
3689036472.5 ns |
3990679915 ns |
0.92 |
normal/RepGradELBO + STL/meanfield/ReverseDiff |
970375750 ns |
1124955886 ns |
0.86 |
normal/RepGradELBO + STL/meanfield/Mooncake |
913048608 ns |
1261092892.5 ns |
0.72 |
normal/RepGradELBO + STL/fullrank/Zygote |
3670088355.5 ns |
3971814683.5 ns |
0.92 |
normal/RepGradELBO + STL/fullrank/ReverseDiff |
1489955135.5 ns |
1621498433.5 ns |
0.92 |
normal/RepGradELBO + STL/fullrank/Mooncake |
972304504 ns |
1285149814.5 ns |
0.76 |
normal/RepGradELBO/meanfield/Zygote |
2365990011 ns |
2859724373.5 ns |
0.83 |
normal/RepGradELBO/meanfield/ReverseDiff |
660842017.5 ns |
777886417 ns |
0.85 |
normal/RepGradELBO/meanfield/Mooncake |
799140056 ns |
1137178738 ns |
0.70 |
normal/RepGradELBO/fullrank/Zygote |
2401497236 ns |
2864869952 ns |
0.84 |
normal/RepGradELBO/fullrank/ReverseDiff |
831429492.5 ns |
954006564 ns |
0.87 |
normal/RepGradELBO/fullrank/Mooncake |
842196960.5 ns |
1144634418 ns |
0.74 |
normal + bijector/RepGradELBO + STL/meanfield/Zygote |
5641882447 ns |
5743469386 ns |
0.98 |
normal + bijector/RepGradELBO + STL/meanfield/ReverseDiff |
1944678817 ns |
2419710523 ns |
0.80 |
normal + bijector/RepGradELBO + STL/meanfield/Mooncake |
3867940233 ns |
4227822021.5 ns |
0.91 |
normal + bijector/RepGradELBO + STL/fullrank/Zygote |
5657272286 ns |
5787110379 ns |
0.98 |
normal + bijector/RepGradELBO + STL/fullrank/ReverseDiff |
2646661599.5 ns |
3116964174 ns |
0.85 |
normal + bijector/RepGradELBO + STL/fullrank/Mooncake |
3997669962.5 ns |
4366946807.5 ns |
0.92 |
normal + bijector/RepGradELBO/meanfield/Zygote |
4058043520 ns |
4464663738.5 ns |
0.91 |
normal + bijector/RepGradELBO/meanfield/ReverseDiff |
1570570758 ns |
2102610294 ns |
0.75 |
normal + bijector/RepGradELBO/meanfield/Mooncake |
3674711961 ns |
4074397467 ns |
0.90 |
normal + bijector/RepGradELBO/fullrank/Zygote |
4150263319.5 ns |
4560198939 ns |
0.91 |
normal + bijector/RepGradELBO/fullrank/ReverseDiff |
1831914975 ns |
2331323017 ns |
0.79 |
normal + bijector/RepGradELBO/fullrank/Mooncake |
3821471618.5 ns |
4175468965.5 ns |
0.92 |
This comment was automatically generated by workflow using github-action-benchmark.
|
hi @yebai @Red-Portal I'm a bit ill now but if you give me a few days I'll be happy to take a look, if it's not a rush |
|
@penelopeysm Sure thing, please take care! |
| return ( | ||
| prob=prob, | ||
| q=q_init, | ||
| iteration=0, | ||
| grad_buf=grad_buf, | ||
| opt_st=opt_st, | ||
| obj_st=obj_st, | ||
| avg_st=avg_st, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have one comment, which is that it's probably better to keep this as a struct (although I recognise you may have some difficulty in choosing a name for it). NamedTuples are too flexible, if you return one you don't know what fields are in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we stick with this for now and improve this later once we can make things more organized. (The new structure in the PR is making a whole lot of things implicit anyway, so I think the fields being implicit doesn't make things much worse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm not that fussed.
|
@yebai @penelopeysm Are we good to go? edit: Penny told me that this is a yes. |
…VI.jl into remove_paramspacesgd
|
AdvancedVI.jl documentation for PR #205 is available at: |
This PR removes the use of the type
ParamSpaceSGD, which provides a unifying implementation of VI algorithms that run SGD in parameter space. Instead, each parameter space SGD-based VI algorithm becomes its ownAbstractVariationalAlgorithm, where the shared code implementingstepis shared by dispatching over theirUnion.This addresses #204