-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@benchmark
on ReverseDiff.gradient!
gives a shorter time than the original function
#102
Comments
Are the output results correct, or is ReverseDiff giving the wrong answer in your case? |
Yes I just checked - the answers are the same. I also improve the type stability suggested by Chris, and the first one runs faster than before, though still slower than ReverseDiff.jl one. See below type LP2{T<:Union{Float64,ReverseDiff.TrackedReal}}
lp::T
end
x = rand(1000)
lp = LP2(0.0)
f2(x) = begin
lp.lp = logpdf(Normal(0, 1), x[1])
for i = 2:length(x)
lp.lp += logpdf(Normal(x[i-1], 1), x[i])
end
lp.lp
end
f_tape2 = GradientTape(f2, (x))
compiled_f_tape2 = compile(f_tape2)
inputs = (x)
results = (similar(x))
all_results = DiffResults.GradientResult(results)
cfg = GradientConfig(inputs)
f2(x) # => -1004.707726395706
@benchmark f2(x)
# BenchmarkTools.Trial:
# memory estimate: 62.48 KiB
# allocs estimate: 3999
# --------------
# minimum time: 287.937 μs (0.00% GC)
# median time: 295.351 μs (0.00% GC)
# mean time: 367.426 μs (1.16% GC)
# maximum time: 2.283 ms (76.92% GC)
# --------------
# samples: 10000
# evals/sample: 1
@benchmark ReverseDiff.gradient!(all_results, compiled_f_tape2, x)
# BenchmarkTools.Trial:
# memory estimate: 0 bytes
# allocs estimate: 0
# --------------
# minimum time: 184.500 μs (0.00% GC)
# median time: 187.636 μs (0.00% GC)
# mean time: 188.634 μs (0.00% GC)
# maximum time: 545.807 μs (0.00% GC)
# --------------
# samples: 10000
# evals/sample: 1
all_results.value # => -1004.707726395706 Do you think it might be the case when passing variables as ReverseDiff.TrackedReal it might somehow resolve the type-instability thing in the original program? |
I've only skimmed your results rather than trying it out myself, but compiling a ReverseDiff tape can a) precompute dispatch and b) preallocate/reuse buffers for the execution trace, such that executing the tape (which calculates both the original value and gradients) can be faster than just executing the target code on it's own. I've thought in the past about just making a generic "accelerator" package using this taping mechanism that doesn't do AD, but never got around to it. Closing this since there doesn't seem to be a problem here, but thanks for sharing! |
The text was updated successfully, but these errors were encountered: