Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace eval by generated function #119

Merged
merged 2 commits into from
Feb 18, 2022
Merged

Replace eval by generated function #119

merged 2 commits into from
Feb 18, 2022

Conversation

rikhuijzer
Copy link
Contributor

@rikhuijzer rikhuijzer commented Feb 16, 2022

Benchmarked on Turing.jl/test/stdlib/RandomMeasures.jl

julia> @benchmark run_chain() # eval (before PR)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 15.231 s (1.52% GC) to evaluate,
 with a memory estimate of 679.26 MiB, over 9452913 allocations.

julia> @benchmark run_chain() # splatnew (after PR)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 14.665 s (1.55% GC) to evaluate,
 with a memory estimate of 655.23 MiB, over 8972863 allocations.

julia> @benchmark run_chain() # eval (before PR)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 16.465 s (1.41% GC) to evaluate,
 with a memory estimate of 679.21 MiB, over 9452903 allocations.

julia> @benchmark run_chain() # splatnew (after PR)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 15.409 s (1.50% GC) to evaluate,
 with a memory estimate of 655.34 MiB, over 8972879 allocations.

So about 3.5% runtime reduction and 5% allocations reduction on that test.

Would be good to run some more tests, but so far it looks promising.

ProfileSVG outputs look as follows.

before PR

image

After PR

image

Benchmarked on Turing.jl/test/stdlib/RandomMeasures.jl

```
julia> @benchmark run_chain() # eval
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 15.231 s (1.52% GC) to evaluate,
 with a memory estimate of 679.26 MiB, over 9452913 allocations.

julia> @benchmark run_chain() # splatnew
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 14.665 s (1.55% GC) to evaluate,
 with a memory estimate of 655.23 MiB, over 8972863 allocations.

julia> @benchmark run_chain() # eval
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 16.465 s (1.41% GC) to evaluate,
 with a memory estimate of 679.21 MiB, over 9452903 allocations.

julia> @benchmark run_chain() # splatnew
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 15.409 s (1.50% GC) to evaluate,
 with a memory estimate of 655.34 MiB, over 8972879 allocations.
```

So about 3.5% runtime reduction.
Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are two orthogonal changes in this PR:

  • Using splatnew instead of new
  • Using @generated instead of eval

Regarding the first one, a potential issue is that splatnew is very low-level and hence its behaviour and implementation might change at any time.

Regarding the second one, I wonder how this change affects (pre-)compilation times. The @generated function has to be compiled for every new types of T and args, and generally there are more restrictions on @generated functions (and workaround/extensions such as https://github.com/JuliaStaging/GeneralizedGenerated.jl have some major problems).

I also wonder if it is a good idea to add a type annotation to the generated function. If the constructor is type stable it should not be needed, and if it is not, it would cause errors in some cases.

@rikhuijzer
Copy link
Contributor Author

Regarding the first one, a potential issue is that splatnew is very low-level and hence its behaviour and implementation might change at any time.

That critique holds for the original :new too, right? The problem here is that we need to be able to create objects even when there is no suitable inner constructor available, so that's why the code resorts to eval

Regarding the second one, I wonder how this change affects (pre-)compilation times. The @generated function has to be compiled for every new types of T and args, and generally there are more restrictions on @generated functions (and workaround/extensions such as https://github.com/JuliaStaging/GeneralizedGenerated.jl have some major problems).

Good point. For the test from RandomMeasures.jl:

julia> run_chain();

julia> methodinstances(Libtask.__new__)
MethodInstance for Libtask.__new__(::Type{var"#1#2"}, ::Tuple{Box})
MethodInstance for Libtask.__new__(::Any, ::Tuple{Any})

And the number of calls to the __new__ function is 15k for those 500 iterations. Very reasonable trade off, right?

I also wonder if it is a good idea to add a type annotation to the generated function. If the constructor is type stable it should not be needed, and if it is not, it would cause errors in some cases.

Okay. Deleted it.

@rikhuijzer
Copy link
Contributor Author

And the number of calls to the new function is 15k for those 500 iterations. Very reasonable trade off, right?

Based on the Action run times, I need to investigate further to double check this

@rikhuijzer
Copy link
Contributor Author

rikhuijzer commented Feb 17, 2022

It turned out that the slow GitHub Action runs were fake news. This PR is a strict improvement on the running time in all non-trivial cases. Only when the number of samples is really low, the compile time costs doesn't outweigh the running time benefits. For the Turing tests, this PR is a strict improvement as is shown below.

To narrow down what is exactly happening, I've compared the information produced by

@time output = let
    expr = Expr(:new, map(val, instr.input)...)
    output = eval(expr)
end

to

@time output = let
    input = map(val, instr.input)
    T = input[1]
    args = input[2:end]
    output = __new__(T, args)
end

For the RandomMeasures.jl test, this showed that the eval based implementation has running times of

0.000139 seconds (36 allocations: 1.734 KiB)
[...]
0.000103 seconds (36 allocations: 1.734 KiB)

where I only show the first and the last information printed to standard out to show the difference between compilation time and running time.

The @generated based implementation has a running time of

0.135699 seconds (297.76 k allocations: 15.477 MiB, 99.95% compilation time)
[...]
0.000001 seconds (4 allocations: 112 bytes)

This is an improvement because this part of the code is hit 15 000 times for 500 samples in the RandomMeasures.jl test. The 100 times quicker running time makes sense because a generated function will compile the expression.

As a sanity check, this can be used to verify the result from the first post in this PR. The goes through these steps 15 000 times when iterations is set to 500. So, the total time spent in this part of the code is about 0.000103 * 15 000 = 1.545 seconds for the eval-based implementation. The total time spent is about 0.135699 + (0.000001 * 14_999) = 0.15 seconds for the @generated-based implementation.

So, for this test, the difference is about 1.5 seconds which roughly matches the number I reported in the first post of this PR.

To verify even more that this PR is an improvement, I've annotated a bunch of Turing testsets which contained sequential sampling as follows:

print("inference/AdvancedSMC.jl ")
@time include("inference/AdvancedSMC.jl")

and commented out some non-related test. The output is as follows:

Libtask#master:

inference/AdvancedSMC.jl  63.203509 seconds (124.00 M allocations: 7.722 GiB, 3.30% gc time, 96.98% compilation time)
inference/gibbs.jl 470.506836 seconds (1.43 G allocations: 87.395 GiB, 3.59% gc time, 17.77% compilation time)
contrib/inference/sghmc.jl 14.013553 seconds (39.25 M allocations: 2.370 GiB, 4.81% gc time, 98.35% compilation time)
inference/gibbs.jl 464.889812 seconds (1.37 G allocations: 81.550 GiB, 3.65% gc time, 7.38% compilation time)
contrib/inference/sghmc.jl 6.578214 seconds (36.67 M allocations: 1.669 GiB, 7.47% gc time, 74.07% compilation time)
inference/gibbs.jl 482.300446 seconds (1.32 G allocations: 79.904 GiB, 3.32% gc time, 7.16% compilation time)
contrib/inference/sghmc.jl 6.576698 seconds (21.35 M allocations: 1.156 GiB, 5.21% gc time, 79.33% compilation time)
stdlib/RandomMeasures.jl 13.991359 seconds (24.55 M allocations: 1.475 GiB, 3.78% gc time, 63.37% compilation time)

Libtask.jl#this PR:

inference/AdvancedSMC.jl  61.216612 seconds (123.97 M allocations: 7.719 GiB, 3.37% gc time, 96.79% compilation time)
inference/gibbs.jl 450.156926 seconds (1.44 G allocations: 87.710 GiB, 3.63% gc time, 18.94% compilation time)
contrib/inference/sghmc.jl 13.233059 seconds (39.25 M allocations: 2.370 GiB, 4.88% gc time, 98.14% compilation time)
inference/gibbs.jl 448.414789 seconds (1.37 G allocations: 81.548 GiB, 3.64% gc time, 7.42% compilation time)
contrib/inference/sghmc.jl 6.587861 seconds (36.67 M allocations: 1.669 GiB, 7.35% gc time, 73.75% compilation time)
inference/gibbs.jl 470.303001 seconds (1.32 G allocations: 79.921 GiB, 3.33% gc time, 7.07% compilation time)
contrib/inference/sghmc.jl 6.462751 seconds (21.34 M allocations: 1.156 GiB, 4.66% gc time, 77.33% compilation time)
stdlib/RandomMeasures.jl  11.844663 seconds (23.81 M allocations: 1.434 GiB, 4.63% gc time, 69.74% compilation time)

@rikhuijzer rikhuijzer requested a review from devmotion February 18, 2022 20:04
Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement, thank @rikhuijzer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants