Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no-op context overdubbing overhead #91

Open
jrevels opened this issue Dec 8, 2018 · 7 comments
Open

no-op context overdubbing overhead #91

jrevels opened this issue Dec 8, 2018 · 7 comments

Comments

@jrevels
Copy link
Collaborator

jrevels commented Dec 8, 2018

From @oxinabox on Slack:

using Cassette, BenchmarkTools

Cassette.@context NoOpCtx

demo(n) = sum(rand(100) for _ in 1:n)

f(n) = Cassette.overdub(NoOpCtx(), demo, n)
julia> @btime demo($10);
  2.262 μs (20 allocations: 16.64 KiB)

julia> @btime f($10);
  6.599 μs (95 allocations: 18.63 KiB)

julia> versioninfo()
Julia Version 1.1.0-DEV.772
Commit 3dd678f646 (2018-11-30 22:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.5.0)
  CPU: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
@oxinabox
Copy link
Contributor

oxinabox commented Dec 8, 2018

using:

using Cassette
using Cassette: @context, @overdub

function demo(n)
    sum((rand(1000) for _ in 1:n))
end

@context NilCtx

function timing_data()
    rounds = 1.5 .^ (0:25)
    normal_time = Float64[]
    overdubbed_time = Float64[]
    
    for  n in rounds
        push!(normal_time, @elapsed(demo(n)))
        push!(overdubbed_time, @elapsed(@overdub NilCtx() demo(n)))
    end
    
    (rounds, normal_time, overdubbed_time)
end
        
td = timing_data()

on:

Julia Version 1.0.2
Commit d789231e99 (2018-11-08 20:11 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Here is the timing info:

n normal time(seconds) overdubbed time (seconds)
1 0.0223821 9.3072E-05
1.5 1.3269E-05 2.375E-05
2.25 0.000710964 0.00196382
3.375 0.000487698 0.00279839
5.0625 1.169E-05 0.00551266
7.59375 1.6974E-05 0.0087366
11.3906 2.6293E-05 0.0216547
17.0859 0.000106537 0.0226156
25.6289 7.8579E-05 0.0349099
38.4434 8.4388E-05 0.0588086
57.665 0.00010424 0.0851472
86.4976 0.000143804 0.116825
129.746 0.000203693 0.172862
194.62 0.000395338 0.260294
291.929 0.00043216 0.41117
437.894 0.0042815 0.627703
656.841 0.00506973 0.9334
985.261 0.00551512 1.33161
1477.89 0.0105832 2.08623
2216.84 0.0141181 3.15861
3325.26 0.0102354 4.83013
4987.89 0.0176491 6.90362
7481.83 0.0252264 10.5003
11222.7 0.0373232 16.2054
16834.1 0.0549923 23.9963
25251.2 0.0843035 35.5056

Here is a plot of the output

  • Y is in seconds
  • X is the value of n
  • overdubbed is blue
  • normal is green

image

Here is the same, byt with Log scale on Y.

image

I really should have put a GC.gc() in there, which is oprobably what the bumpyness on the normal one is.
Or useds Benchmark Tools, but I find that too slow when. you want to time a lot of things and don't care too much about preciseness, like now where the trend is what is interesting.

@oxinabox
Copy link
Contributor

To we have any traction on this?
Possibly a pointer to an upstream issue with julia itself?

I fixing/improving would be a big help to oxinabox/MagneticReadHead.jl#27

@vchuravy
Copy link
Member

I don't think anyone has worked on this. Would be good to know where the overhead is coming from, e.g. what is the difference between the executed code is it an inlining failure?

@oxinabox
Copy link
Contributor

A strange thing I have noticed, the number of allocations is not constant

using Cassette
using Cassette: @context, @overdub

function inner()
    sum((rand(1000) for _ in 1:100))
end

@context NilCtx

const ctx = NilCtx()
function outer()
    Cassette.recurse(ctx, inner)
end

Timing (after I have ran it a few times getting the compilation done)

julia> GC.enable(false)
true

julia> @time outer();
  0.000892 seconds (890 allocations: 1.562 MiB)

julia> @time outer();
  0.000826 seconds (804 allocations: 1.561 MiB)

julia> @time outer();
  0.000818 seconds (1.25 k allocations: 1.568 MiB)

@vchuravy
Copy link
Member

I can't reproduce the original behaviour on Julia 1.1, (which landed many important performance enhancements for Cassette). I am still seeing the weird oscillation in allocation counts.

@oxinabox
Copy link
Contributor

in Julia 1.1 it is a lot better,

Normal scale
image

Log scale
image

@oxinabox
Copy link
Contributor

Ok, this is weird, my 1.0.2 results are also a ton better.
Basically same as 1,1 results above.

With Cassette 0.2.2.

Original tests were on Cassette master, which would have been roughly 0.1.4

so it could be that.

I guess until have alernate evidence we must assume it is that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants