Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@benchmark on ReverseDiff.gradient! gives a shorter time than the original function #102

Closed
xukai92 opened this issue Feb 22, 2018 · 3 comments

Comments

@xukai92
Copy link

xukai92 commented Feb 22, 2018

type LP
    lp::Union{Float64,ReverseDiff.TrackedReal}
end

lp = LP(0.0)

f2(x) = begin
    lp.lp = 0.0
    lp.lp = logpdf(Normal(0, 1), x[1])
    for i = 2:length(x)
        lp.lp += logpdf(Normal(x[i-1], 1), x[i])
    end
    lp.lp
end

f_tape2 = GradientTape(f2, (x))
compiled_f_tape2 = compile(f_tape2)
inputs = (x)
results = (similar(x))
;

@benchmark f2(x) # => mean=1.336ms
@benchmark ReverseDiff.gradient!(results, compiled_f_tape2, x) # => mean=208.294 μs
@jrevels
Copy link
Member

jrevels commented Feb 22, 2018

Are the output results correct, or is ReverseDiff giving the wrong answer in your case?

@xukai92
Copy link
Author

xukai92 commented Feb 24, 2018

Yes I just checked - the answers are the same. I also improve the type stability suggested by Chris, and the first one runs faster than before, though still slower than ReverseDiff.jl one. See below

type LP2{T<:Union{Float64,ReverseDiff.TrackedReal}}
    lp::T
end

x = rand(1000)
lp = LP2(0.0)

f2(x) = begin
    lp.lp = logpdf(Normal(0, 1), x[1])
    for i = 2:length(x)
        lp.lp += logpdf(Normal(x[i-1], 1), x[i])
    end
    lp.lp
end

f_tape2 = GradientTape(f2, (x))
compiled_f_tape2 = compile(f_tape2)
inputs = (x)
results = (similar(x))
all_results = DiffResults.GradientResult(results)
cfg = GradientConfig(inputs)

f2(x) # => -1004.707726395706

@benchmark f2(x)

# BenchmarkTools.Trial: 
#   memory estimate:  62.48 KiB
#   allocs estimate:  3999
#   --------------
#   minimum time:     287.937 μs (0.00% GC)
#   median time:      295.351 μs (0.00% GC)
#   mean time:        367.426 μs (1.16% GC)
#   maximum time:     2.283 ms (76.92% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1

@benchmark ReverseDiff.gradient!(all_results, compiled_f_tape2, x)

# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     184.500 μs (0.00% GC)
#   median time:      187.636 μs (0.00% GC)
#   mean time:        188.634 μs (0.00% GC)
#   maximum time:     545.807 μs (0.00% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1

all_results.value # => -1004.707726395706

Do you think it might be the case when passing variables as ReverseDiff.TrackedReal it might somehow resolve the type-instability thing in the original program?

@jrevels
Copy link
Member

jrevels commented Feb 24, 2018

I've only skimmed your results rather than trying it out myself, but compiling a ReverseDiff tape can a) precompute dispatch and b) preallocate/reuse buffers for the execution trace, such that executing the tape (which calculates both the original value and gradients) can be faster than just executing the target code on it's own.

I've thought in the past about just making a generic "accelerator" package using this taping mechanism that doesn't do AD, but never got around to it.

Closing this since there doesn't seem to be a problem here, but thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants