New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamicHMC is (~5 times) faster than Turing #559

yebai opened this Issue Sep 22, 2018 · 2 comments


None yet
3 participants
Copy link

yebai commented Sep 22, 2018

Time used for collecting 2 million samples

  • Turing.NUTS: 3318.293540 seconds (33.35 G allocations: 1.299 TiB, 29.35% gc time)
  • Turing.DynamicNUTS: 848.741643 seconds (7.86 G allocations: 251.076 GiB, 32.13% gc time)

using the following

using DynamicHMC, Turing, Test

@model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
  return s, m

mf = gdemo(1.5, 2.0)

@time chn1 = sample(mf, DynamicNUTS(2000000));

@time chn2 = sample(mf, Turing.NUTS(2000000, 0.65));

This comment has been minimized.

Copy link

skanskan commented Oct 6, 2018

What about Stan and Klara?


This comment has been minimized.

Copy link

tpapp commented Oct 7, 2018

This is how I would code the above in DynamicHMC 1.0:

using DynamicHMC, TransformVariables, LogDensityProblems, Distributions

struct GaussianDemo{T,P}

function (p::GaussianDemo)((m, s))
    d = Normal(m, √s)
    logpdf(d, p.x) + logpdf(d, p.y) + logpdf(p.s_prior, s) + logpdf(Normal(0, √s), m)

p = GaussianDemo(1.5, 2.0, InverseGamma(2, 3))
P = TransformedLogDensity(as((m = asℝ, s = asℝ₊)), p)
∇P = ForwardDiffLogDensity(P)
@time chain, tuned = NUTS_init_tune_mcmc(∇P, 10_000)

which on my laptop yields

  0.430758 seconds (3.74 M allocations: 211.077 MiB, 18.41% gc time)

Note that I changed the sample size since

julia> using MCMCDiagnostics
julia> mapslices(effective_sample_size, get_position_matrix(chain); dims = 1)
1×2 Array{Float64,2}:
 3841.64  4080.44

so I don't think you need 2 million samples. Finally, for real-world problems with higher dimensions, I would have used

∇P = FluxGradientLogDensity(P)

since reverse-mode AD is faster as the dimension increases (YMMV).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment