New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unifying trajectories #214
Conversation
As mentioned in #213 (comment):
This ensures that this PR doesn't change the functionality at all. The regression test in this PR is added in 31c6517. |
Codecov Report
@@ Coverage Diff @@
## master #214 +/- ##
==========================================
- Coverage 89.13% 87.86% -1.27%
==========================================
Files 16 16
Lines 672 676 +4
==========================================
- Hits 599 594 -5
- Misses 73 82 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed roughly half of this PR tonight. I'll review the rest tomorrow.
@@ -0,0 +1,63 @@ | |||
using Test, Random, BSON, AdvancedHMC, Distributions, ForwardDiff | |||
|
|||
is_ef6de39 = isdefined(AdvancedHMC, :EndPointTS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way of performing this regression test is to check out a pinned version of AHMC, run the tests on CI machines with the pinned version and the current version. Then we can compare results without worrying machine-specific details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree. I don't know how to do that though. Can you point me some example CI codes on doing so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BenchmarkTools
allows users to write tests, and run regression tests automatically. Our use here is a bit special, but the principles are the same I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me an example? I only found example where they are looking at regression in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, no single version of the code should be considered the ground truth. Rather, we can only be confident that features covered by the test suite work correctly. Hence instead of checking out a previous version to test against, it's better to add any missing tests, ensuring they pass on both the old version and the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance regressions, you can use PkgBenchmark (see e.g. https://github.com/JuliaFolds/FLoops.jl/blob/master/.github/workflows/benchmark.yml), but I don't think that checks the outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we can keep PRs small, i.e. one PR for a change/feature. For big changes like this PR, we should try to do it in multiple steps. It would make reviewing code much easier, which leads to much quicker merging too.
Thanks, @xukai92. I've done one full pass of this PR; it looks overall good to me. I left many comments above. |
Co-authored-by: Hong Ge <hg344@cam.ac.uk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xukai92 Pls ping me when this is ready for another review.
Co-authored-by: Hong Ge <hg344@cam.ac.uk>
This reverts commit 7c5d344.
@yebai This is ready for another look. Two things I didn't address are:
Also, do you think we should introduce a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent refactor! I had a few minor questions/suggestions. Some of them are not necessarily on changes in this PR and don't need to be handled here, but they came up while I was reading.
end | ||
|
||
const NoUTurn = GeneralisedNoUTurn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make the transition of calling the old ones ClassificNoUTurn
and new ones just NoUTurn
and StrictNoUTurn
, i.e. by default we refer to the generalised ones if not specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would make sense to go the other way around, then? Define NoUTurn
and then make GeneralisedNoUTurn
the alias so that users' legacy code works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some debates between me and Hong on if we want to make this transition, and we ended up with this compromise. So with the PR, the recommended way is still the previous version. And we want to use the shorter version privately for a while before making the transition.
|
||
Detect U turn for two phase points (`zleft` and `zright`) under given Hamiltonian `h` | ||
using the (original) no-U-turn cirterion. | ||
|
||
Ref: https://arxiv.org/abs/1111.4246, https://arxiv.org/abs/1701.02434 | ||
""" | ||
function isterminated(h::Hamiltonian, t::BinaryTree{<:ClassicNoUTurn}) | ||
function isterminated(::ClassicNoUTurn, h::Hamiltonian, t::BinaryTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this should probably be in another PR.
@@ -587,79 +531,67 @@ function isterminated(h::Hamiltonian, t::BinaryTree{<:ClassicNoUTurn}) | |||
end | |||
|
|||
""" | |||
isterminated(h::Hamiltonian, t::BinaryTree{<:GeneralisedNoUTurn}) | |||
$(SIGNATURES) | |||
|
|||
Detect U turn for two phase points (`zleft` and `zright`) under given Hamiltonian `h` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this docstring is outdated, since zleft
and zright
appear nowhere in the signature or body of either method.
""" | ||
function isterminated(tc::StrictGeneralisedNoUTurn, h, t, tleft, tright) | ||
# Step 0: original generalised U-turn check | ||
s1 = isterminated(tc, h, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to return s1 if terminated, else s2 if terminated, else s3? Seems if s1 indicates is terminated, then we potentially do 3 times the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea! Can you do that in your NUTS PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
) where {T<:BinaryTree{<:StrictGeneralisedNoUTurn}} | ||
rho = tleft.c.rho + tright.zleft.r | ||
function check_left_subtree(h::Hamiltonian, t::T, tleft::T, tright::T) where {T<:BinaryTree} | ||
rho = tleft.rho + tright.zleft.r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we could avoid this allocation. Performing two extra dot
s in generalised_uturn_criterion
should be cheaper than performing this allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to make the proposed change if some benchmark shows it improves the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a benchmark of one with the same time complexity:
julia> using LinearAlgebra, BenchmarkTools, Plots
julia> function foo(a, b, c, d)
e = a + b
f = c + d
dot(e, f)
end;
julia> function foo2(a, b, c, d)
dot(a, c) + dot(a, d) + dot(b, c) + dot(b, d)
end;
julia> ns = 10 .^ (0:6);
julia> times = map(ns) do n
a, b, c, d = ntuple(_ -> randn(n), 4)
(@belapsed($foo($a,$b,$c,$d)), @belapsed($foo2($a,$b,$c,$d)))
end;
julia> plot(ns, [first.(times) last.(times)]; xscale=:log10, yscale=:log10, labels=["allocating" "non-allocating"])
I can include it in a more general allocation-reducing PR though.
) where {T<:BinaryTree{<:StrictGeneralisedNoUTurn}} | ||
rho = tleft.zright.r + tright.c.rho | ||
function check_right_subtree(h::Hamiltonian, t::T, tleft::T, tright::T) where {T<:BinaryTree} | ||
rho = tleft.zright.r + tright.rho |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same point here as above.
Co-authored-by: Seth Axen <seth.axen@gmail.com>
Co-authored-by: Seth Axen <seth.axen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @xukai92, I've done another careful pass of this PR and left some comments. There is now some merge conflicts against the master
branch. Maybe consider fixing that too.
Overall, I find reviewing this PR a bit painful ; ) I hope we can keep PRs small and incremental in the future given that the stake of introducing bugs is high. For a concrete example, the refactoring of Trajectories
, TerminationCriteria
, and the introduction of HMCKernel
each would deserve its own PR...
|
||
struct HMC{TS} end | ||
HMC{TS}(int::AbstractIntegrator, L) where {TS} = HMCKernel(Trajectory(int, FixedNSteps(L)), TS) | ||
HMC(int::AbstractIntegrator, L) = HMC{MetropolisTS}(int, L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider the following for clarity and performance?
HMC(int::AbstractIntegrator, L) = HMC{MetropolisTS}(int, L) | |
HMC(int::AbstractIntegrator, L) = HMCKernel(Trajectory(int, FixedNSteps(L)), MetropolisTS) |
struct HMC{TS} end | ||
HMC{TS}(int::AbstractIntegrator, L) where {TS} = HMCKernel(Trajectory(int, FixedNSteps(L)), TS) | ||
HMC(int::AbstractIntegrator, L) = HMC{MetropolisTS}(int, L) | ||
HMC(ϵ::AbstractScalarOrVec{<:Real}, L) = HMC{MetropolisTS}(Leapfrog(ϵ), L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HMC(ϵ::AbstractScalarOrVec{<:Real}, L) = HMC{MetropolisTS}(Leapfrog(ϵ), L) | |
HMC(ϵ::AbstractScalarOrVec{<:Real}, L) = HMCKernel(Trajectory(Leapfrog(ϵ), FixedNSteps(L)), MetropolisTS) |
|
||
struct StaticTrajectory{TS} end | ||
@deprecate StaticTrajectory{TS}(args...) where {TS} HMC{TS}(args...) | ||
@deprecate StaticTrajectory(args...) HMC(args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here, consider calling HMCKernel
for clarity.
@@ -30,12 +30,48 @@ stat(t::Transition) = t.stat | |||
""" | |||
Abstract Markov chain Monte Carlo proposal. | |||
""" | |||
abstract type AbstractProposal end | |||
abstract type AbstractKernel end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider the following for clarity?
abstract type AbstractKernel end | |
abstract type AbstractMCMCKernel end |
H′ = energy(z′) | ||
ΔH = H′ - H0 | ||
α′ = exp(min(0, -ΔH)) | ||
sampler′ = S(sampler, H0, z′) | ||
return BinaryTree(z′, z′, C(z′), α′, 1, ΔH), sampler′, Termination(sampler′, nt, H0, H′) | ||
sampler′ = TS(sampler, H0, z′) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using reconstruct
here for consistency and clarity.
h::Hamiltonian, | ||
τ::Trajectory{I, C}, | ||
::Type{TS}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should avoid passing this ::Type{TS}
around.
@@ -0,0 +1,63 @@ | |||
using Test, Random, BSON, AdvancedHMC, Distributions, ForwardDiff | |||
|
|||
is_ef6de39 = isdefined(AdvancedHMC, :EndPointTS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we can keep PRs small, i.e. one PR for a change/feature. For big changes like this PR, we should try to do it in multiple steps. It would make reviewing code much easier, which leads to much quicker merging too.
Ps. I now feel the introduction of |
Thanks for taking another pass. I will address the comments soon.
Sorry about this. I'm actually learning towards separating it. I will work on extracting out the first PR after cooperating your suggestions.
I completely agree that if we remove
But maybe I could just introduce it later when introducing the modularity of momentum, as we are agreeing on making separate PRs now.
Yes I see that these two steps are coupled for dynamics trajectories; it is exactly the reason we are dispatching on termination and trajectory sampler together, right? Not sure what do you mean by "inaccurate" here. I still think they (trajectory and its sampler) are conceptually two concepts; it's just
|
Co-authored-by: Hong Ge <hg344@cam.ac.uk>
This PR replaces #213.
This PR solves #103 by explicitly separating
Trajectory
fromHMCKernel
,without introducing any functionality change.
I left some comments on how to unify the notation as discussed in #48,
in which I use
τ
to refer a (numerical) Hamiltonian trajectory andκ
to refer a MCMC kernel.The remaining would be done in a separate PR to make the review easier.
As a showcase, below is how to construct NUTS now:
Old syntax is still supported e.g. as below
As suggested by #103 (comment),
fixed simulation steps or length are unified as termination criteria.
A nice benefit of this is that the parameters regarding trajectory simulation are clear:
e.g.
FixedNSteps(10)
,NoUTurn(5, 1_000)
orNoUTurn(max_depth=5)
.As a result, the main
transition
function looks like this.As a side product of this new interface,
it also unifies how integrators are jittered - all before simulating the trajectory in the same place where momentum variables are refreshed.
With this design, this is the only place where
jitter
is called.A list of changes is
GeneralisedNoUTurn
->NoUTurn
andStrictGeneralisedNoUTurn
->StrictNoUTurn
NoUTurn
are in the same role asFixedNSteps
now. To unify the design, they are only intended to store the algorithm parameters, e.g. number of steps, maximum depth, etc.