-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace eval
by generated function
#119
Conversation
Benchmarked on Turing.jl/test/stdlib/RandomMeasures.jl ``` julia> @benchmark run_chain() # eval BenchmarkTools.Trial: 1 sample with 1 evaluation. Single result which took 15.231 s (1.52% GC) to evaluate, with a memory estimate of 679.26 MiB, over 9452913 allocations. julia> @benchmark run_chain() # splatnew BenchmarkTools.Trial: 1 sample with 1 evaluation. Single result which took 14.665 s (1.55% GC) to evaluate, with a memory estimate of 655.23 MiB, over 8972863 allocations. julia> @benchmark run_chain() # eval BenchmarkTools.Trial: 1 sample with 1 evaluation. Single result which took 16.465 s (1.41% GC) to evaluate, with a memory estimate of 679.21 MiB, over 9452903 allocations. julia> @benchmark run_chain() # splatnew BenchmarkTools.Trial: 1 sample with 1 evaluation. Single result which took 15.409 s (1.50% GC) to evaluate, with a memory estimate of 655.34 MiB, over 8972879 allocations. ``` So about 3.5% runtime reduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there are two orthogonal changes in this PR:
- Using
splatnew
instead ofnew
- Using
@generated
instead ofeval
Regarding the first one, a potential issue is that splatnew
is very low-level and hence its behaviour and implementation might change at any time.
Regarding the second one, I wonder how this change affects (pre-)compilation times. The @generated
function has to be compiled for every new types of T
and args
, and generally there are more restrictions on @generated
functions (and workaround/extensions such as https://github.com/JuliaStaging/GeneralizedGenerated.jl have some major problems).
I also wonder if it is a good idea to add a type annotation to the generated function. If the constructor is type stable it should not be needed, and if it is not, it would cause errors in some cases.
That critique holds for the original
Good point. For the test from RandomMeasures.jl: julia> run_chain();
julia> methodinstances(Libtask.__new__)
MethodInstance for Libtask.__new__(::Type{var"#1#2"}, ::Tuple{Box})
MethodInstance for Libtask.__new__(::Any, ::Tuple{Any}) And the number of calls to the
Okay. Deleted it. |
Based on the Action run times, I need to investigate further to double check this |
It turned out that the slow GitHub Action runs were fake news. This PR is a strict improvement on the running time in all non-trivial cases. Only when the number of samples is really low, the compile time costs doesn't outweigh the running time benefits. For the Turing tests, this PR is a strict improvement as is shown below. To narrow down what is exactly happening, I've compared the information produced by @time output = let
expr = Expr(:new, map(val, instr.input)...)
output = eval(expr)
end to @time output = let
input = map(val, instr.input)
T = input[1]
args = input[2:end]
output = __new__(T, args)
end For the RandomMeasures.jl test, this showed that the
where I only show the first and the last information printed to standard out to show the difference between compilation time and running time. The
This is an improvement because this part of the code is hit 15 000 times for 500 samples in the RandomMeasures.jl test. The 100 times quicker running time makes sense because a generated function will compile the expression. As a sanity check, this can be used to verify the result from the first post in this PR. The goes through these steps 15 000 times when iterations is set to 500. So, the total time spent in this part of the code is about 0.000103 * 15 000 = 1.545 seconds for the So, for this test, the difference is about 1.5 seconds which roughly matches the number I reported in the first post of this PR. To verify even more that this PR is an improvement, I've annotated a bunch of Turing testsets which contained sequential sampling as follows: print("inference/AdvancedSMC.jl ")
@time include("inference/AdvancedSMC.jl") and commented out some non-related test. The output is as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice improvement, thank @rikhuijzer!
Benchmarked on Turing.jl/test/stdlib/RandomMeasures.jl
So about 3.5% runtime reduction and 5% allocations reduction on that test.
Would be good to run some more tests, but so far it looks promising.
ProfileSVG outputs look as follows.
before PR
After PR