Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make quadgk type stable #18928

Closed
wants to merge 4 commits into from
Closed

Conversation

ti-s
Copy link

@ti-s ti-s commented Oct 14, 2016

The version in master always returns Tuple{Any, Any} which might effect the calling code. This pull request changes this behavior if all limits are of the same subtype of AbstractFloat. Otherwise the return type can still not be inferred.

@ti-s
Copy link
Author

ti-s commented Oct 14, 2016

It would be nice, if one could find a way to ensure type stability in all cases. But I don't know how to achieve this.

@ViralBShah
Copy link
Member

Cc @stevengj

b::Number
I
E::Real
immutable Segment{T1, T2, T3}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T1<:Number and T3<:Real?

@stevengj
Copy link
Member

Looking back at this code, it seems like quadgk(f, a, b, c...; kws...) could instead be simplified to

quadgk(f, a, b, c...; kws...) = quadgk(f, promote(float(a),float(b),c...)...; kws...)

In addition to shortening the code, wouldn't this help type-stability in the case where the limits are different types, since promote(args...) should be resolved at compile-time?

@stevengj stevengj added the maths Mathematical functions label Oct 14, 2016
@ti-s
Copy link
Author

ti-s commented Oct 14, 2016

Sorry, it seems, that I was misled by @code_warntype. The version in master can actually be faster than my version:

julia> function test()
       @time a = [Base.QuadGK.quadgk(x->2x, 0., i)[1] for i in 1.:1e6]
       @time b = [QuadGK.quadgk(x->2x, 0., i)[1] for i in 1.:1e6]
       @time sum(a)
       @time sum(b)
       @time sum(a)
       c = Any[a...]
       @time sum(c)
       end

julia> test()
  0.719052 seconds (21.76 M allocations: 615.058 MB, 9.73% gc time)
  1.622262 seconds (28.95 M allocations: 882.229 MB, 8.99% gc time)
  0.000641 seconds (1 allocation: 16 bytes)
  0.000550 seconds
  0.000485 seconds (1 allocation: 16 bytes)
  0.029850 seconds (1.00 M allocations: 15.290 MB, 39.25% gc time)

On the other hand:

julia> function test2()
       @time a = [Base.QuadGK.quadgk(x->exp(-x^2), -Inf, i)[1] for i in 1.:1e6]
       @time b = [QuadGK.quadgk(x->exp(-x^2), -Inf, i)[1] for i in 1.:1e6]
       @time sum(a)
       @time sum(b)
       @time sum(a)
       c = Any[a...]
       @time sum(c)
    end

julia> test2()
  4.675536 seconds (229.34 M allocations: 3.783 GB, 4.56% gc time)
  1.794645 seconds (32.03 M allocations: 1.021 GB, 10.37% gc time)

Is there any real benefit of a correctly inferred return type by @code_warntype? The reason for changing quadgk was, that I didn't know what to do with all the Anys in my functions that directly or indirectly called quadgk.

@ti-s
Copy link
Author

ti-s commented Oct 14, 2016

The simplified code for the conversion seems to work, but it does not improve the output of @code_warntype.

After further investigation, it seems that the parameters of Segment and the additional function call are the reasons for the slowdown if none of the limits are Inf. Is it worth to slow down the non-Inf case by about 50% for a 3x speedup of the Inf case (both timings are with the parameters of Segment removed and with f(x) = exp(-x^2))?

Unfortunately, without parameters, I can't avoid the Anys in @code_warntype. So one could also just leave everything as is in master and handle Inf at the call site if one needs the speed.

@kshyatt
Copy link
Contributor

kshyatt commented Oct 14, 2016

Could we perhaps get some tests for the type stability?

@stevengj
Copy link
Member

stevengj commented Oct 14, 2016

I don't see how the additional function call could slow things down that much, and using parameterized types normally makes things faster, not slower. Is the segs array concretely typed?

heappush!(segs, evalrule(f, s[i],s[i+1], x,w,gw, nrm), Reverse)
segs = [evalrule(f, s[1],s[2], x,w,gw, nrm)]
for i in 2:length(s) - 1
heappush!(segs, evalrule(f, s[i], s[i+1], x,w,gw, nrm), Reverse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this will fail if f is not type-stable. e.g. if it returns an Int sometimes and a Float64 other times. We want that case to work (even if it is slower).

Copy link
Author

@ti-s ti-s Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not necessary any more, anyway, as the parameter for Segment slowed everything down significantly if the evaluation of f is fast. The only useful change would be to introduce the do_quadgk_no_inf function to avoid the recursion if one wants a fast version for Inf at the cost of the non-Inf case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I take that back. The real reason for the slowdown in the Inf case is the reuse of s, s1, and s2. Just changing these names and annotating s0::eltype(s) gives a 3x speedup and no slowdown for the finite case. I think this is what should be done.

@ti-s
Copy link
Author

ti-s commented Oct 14, 2016

As I said above, the only change that introduced a speedup was the rename of the segment variables. There seems to be no way to get fast code and an inferred type in @code_warntype. Should I delete my two commits, do only the rename and force push the new commit?

@stevengj
Copy link
Member

@ti-s, a PR with only the rename (and maybe my suggestion for the shortened promotion code) seems like a good idea. The ::Vector{Tw} declarations for the local variables seem like a good idea as well, assuming they don't hurt.

Due to the reuse of some variables, there was a slowdown in the case of infinite limits.
@ti-s
Copy link
Author

ti-s commented Oct 14, 2016

Sorry, there is still something going on. This looks nice:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-pc-linux-gnu

julia> include("quadgk.jl")
QuadGK

julia> function test2()
           f(x) = exp(-x^2)
           @time [Base.quadgk(f, 0, 1) for i in 1:1e5]
           @time [QuadGK.quadgk(f, 0, 1) for i in 1:1e5]
           @time [QuadGK.quadgk(f, -Inf, 0) for i in 1:1e5]
           @time [Base.QuadGK.quadgk(f, -Inf, 0) for i in 1:1e5]
       end
test2 (generic function with 1 method)

julia> test2()
  1.637960 seconds (8.05 M allocations: 273.430 MB, 4.21% gc time)
  0.774465 seconds (3.98 M allocations: 132.059 MB, 4.47% gc time)
  1.444773 seconds (12.32 M allocations: 274.707 MB, 3.93% gc time) # Edit: this is slower, too
  3.201985 seconds (179.78 M allocations: 2.762 GB, 10.07% gc time)
100000-element Array{Tuple{Float64,Float64},1}:
...

julia> function test()
           f(x) = exp(-x^2)
           @time [Base.quadgk(f, 0, i) for i in 1:1e6]
           @time [QuadGK.quadgk(f, 0, i) for i in 1:1e6]
           @time [QuadGK.quadgk(f, -Inf, i) for i in 1:1e6]
           @time [Base.QuadGK.quadgk(f, -Inf, i) for i in 1:1e6]
       end
test (generic function with 1 method)

julia> test()
  1.241135 seconds (26.56 M allocations: 823.276 MB, 13.32% gc time)
  1.272577 seconds (22.54 M allocations: 761.351 MB, 13.10% gc time)
  1.284076 seconds (26.13 M allocations: 813.265 MB, 8.02% gc time)
  4.751337 seconds (228.41 M allocations: 3.778 GB, 7.18% gc time)
1000000-element Array{Tuple{Float64,Float64},1}:
...

But this not:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-pc-linux-gnu

julia> include("quadgk.jl")
QuadGK

julia> function test()
           f(x) = exp(-x^2)
           @time [Base.quadgk(f, 0, i) for i in 1:1e6]
           @time [QuadGK.quadgk(f, 0, i) for i in 1:1e6]
           @time [QuadGK.quadgk(f, -Inf, i) for i in 1:1e6]
           @time [Base.QuadGK.quadgk(f, -Inf, i) for i in 1:1e6]
       end
test (generic function with 1 method)

julia> test()
  2.485640 seconds (31.77 M allocations: 1011.251 MB, 8.55% gc time)
  1.745853 seconds (24.10 M allocations: 814.830 MB, 12.69% gc time)
  6.727599 seconds (26.14 M allocations: 813.545 MB, 1.48% gc time)    # <-- this is bad
  4.767841 seconds (228.51 M allocations: 3.782 GB, 6.95% gc time)
1000000-element Array{Tuple{Float64,Float64},1}:
...           

julia> function test2()
           f(x) = exp(-x^2)
           @time [Base.quadgk(f, 0, 1) for i in 1:1e5]
           @time [QuadGK.quadgk(f, 0, 1) for i in 1:1e5]
           @time [QuadGK.quadgk(f, -Inf, 0) for i in 1:1e5]
           @time [Base.QuadGK.quadgk(f, -Inf, 0) for i in 1:1e5]
       end
test2 (generic function with 1 method)

julia> test2()
  0.268109 seconds (2.83 M allocations: 85.180 MB, 4.55% gc time)
  0.284496 seconds (2.41 M allocations: 78.406 MB, 3.87% gc time)
  0.765523 seconds (12.32 M allocations: 274.398 MB, 3.37% gc time)
  3.370256 seconds (179.69 M allocations: 2.758 GB, 6.73% gc time)
100000-element Array{Tuple{Float64,Float64},1}:
...

The runtime does not go below 6s even after several calls to test. How can the order of the function calls make such a large difference?

Edit: Likewise, in the first case the time of the third comprehension of test2 is slower than the same comprehension in the second case. The difference of 1.4s vs. 0.7s does not go away if I run test2 several times.

@ti-s
Copy link
Author

ti-s commented Oct 17, 2016

Introducing a new type parameter instead of the call to eltype(s), fixed the severe runtime regression that happened for certain compilation orders.

Moving the second part of the body of do_quadgk into a separate function, significantly reduced the compilation time.

Now the timings look like this:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-pc-linux-gnu

julia> include("quadgk.jl")
QuadGK

julia> f(x) = exp(-x^2)
f (generic function with 1 method)

julia> @time QuadGK.quadgk(f, 0, 1)
  0.714522 seconds (1.21 M allocations: 52.767 MB, 13.25% gc time)
(0.746824132812427,7.887024366937112e-13)

julia> @time Base.QuadGK.quadgk(f, 0, 1)
  2.939289 seconds (11.27 M allocations: 404.753 MB, 4.98% gc time)
(0.746824132812427,7.887024366937112e-13)

julia> function test()
              f(x) = exp(-x^2)
              @time [Base.quadgk(f, 0, i) for i in 1:1e6]
              @time [QuadGK.quadgk(f, 0, i) for i in 1:1e6]
              @time [QuadGK.quadgk(f, -Inf, i) for i in 1:1e6]
              @time [Base.QuadGK.quadgk(f, -Inf, i) for i in 1:1e6]
        end
test (generic function with 1 method)

julia> test()
  1.211794 seconds (26.66 M allocations: 827.582 MB, 11.70% gc time)
  1.412750 seconds (27.75 M allocations: 846.891 MB, 5.71% gc time)
  1.203066 seconds (27.05 M allocations: 795.028 MB, 7.51% gc time)
  4.264940 seconds (228.51 M allocations: 3.782 GB, 6.15% gc time)

There is a 20% runtime regression for the case of finite limits, but I think it is worth the more than 3x performance increase for Inf limits and the compilation time improvement.

It might be still interesting to find the reason for the strange behaviour that the runtime depends on the order of compilation, but this is outside the scope of this PR.

Edit: I also removed the type annotation of x, w, gw, because they caused another 10% performance regression.

This version can throw an error for type unstable functions.
@ti-s
Copy link
Author

ti-s commented Oct 31, 2016

I was finally able to get a speedup in all cases after reading #19137.

julia> test()
  1.107995 seconds (26.53 M allocations: 822.068 MB, 9.06% gc time)
  0.989808 seconds (18.03 M allocations: 693.213 MB, 9.29% gc time)
  1.054888 seconds (21.00 M allocations: 701.982 MB, 16.51% gc time)
  4.220679 seconds (228.33 M allocations: 3.775 GB, 6.50% gc time)

Additionally @code_warntype can now infer the result type for limits of floats:

julia> @code_warntype QuadGK.quadgk(x->x^2, 0., 1.)
Variables:
  #self#::QuadGK.#quadgk
  f::##10#11
  a::Float64
  b::Float64
  c::Tuple{}

Body:
  begin 
      SSAValue(0) = $(Expr(:invoke, LambdaInfo for power_by_squaring(::Int64, ::Int64), :(Base.power_by_squaring), 10, 7))
      return $(Expr(:invoke, LambdaInfo for do_quadgk(::##10#11, ::Array{Float64,1}, ::Int64, ::Type{Float64}, ::Float64, ::Float64, ::Int64, ::Function), :(QuadGK.do_quadgk), :(f), :($(Expr(:invoke, LambdaInfo for vect(::Float64, ::Vararg{Float64,N}), :(Base.vect), :(a), :(b)))), 7, Float64, :((Base.box)(Float64,(Base.sitofp)(Float64,0))), :((Base.Math.box)(Base.Math.Float64,(Base.Math.sqrt_llvm)($(QuoteNode(2.220446049250313e-16))))::Float64), SSAValue(0), :(QuadGK.vecnorm)))
  end::Tuple{Float64,Float64}

The compilation now takes a bit longer, but is still about 3x faster than for the version in master.

The downside is, that I don't know how to make this line work with type unstable functions f:

segs = [Segment(evalrule(f, s[1],s[2], x,w,gw, nrm))]

But without that, I can't improve on the timings of the previous version. Any idea?

@stevengj
Copy link
Member

stevengj commented Oct 31, 2016

Would segs = Segment{...something with promote_op(f, eltype(s))...}[Segment(evalrule(f, s[1],s[2], x,w,gw, nrm))] work?

@ti-s
Copy link
Author

ti-s commented Oct 31, 2016

Thanks! Didn't know about promote_op. Did you mean something like

segs = Segment{Tw, Base.promote_op(f, Tw), Base.promote_op(nrm, Base.promote_op(f, Tw))}[]

In that case the first problem is

julia> QuadGK.quadgk(g, 0, 1)
ERROR: TypeError: Segment: in T3, expected T3<:Real, got Type{Any}

And if I remove the Real I get

julia> QuadGK.quadgk(g, 0, 1)
ERROR: MethodError: Cannot `convert` an object of type QuadGK.Segment{Float64,BigFloat,BigFloat} to an object of type QuadGK.Segment{Float64,Any,Any}

This seems to work, but the timings are suboptimal again:

segs = [Segment(evalrule(f, s[1],s[2], x,w,gw, nrm))]
if !isleaftype(Base.promote_op(f, Tw))
    segs = Segment[segs...]
end

At least with the simple example g(x) = x < 0 ? 0 : BigFloat(x) the result is correct. Specialising on Base.promote_op(f, Tw) seems to help with the runtime, but maybe you see a simpler solution?

@stevengj
Copy link
Member

The problem with your last solution is that segs is not type-stable.

Maybe define:

segment_type{Tx,Tf<:Number}(::Type{Tx}, ::Type{Tf}) = Segment{Tw, Tf, real(Tf)}
segment_type{Tx,Tf<:Number}(::Type{Tx}, ::Type{Array{Tf}}) = Segment{Tw, Tf, real(Tf)}
segment_type(::Type, ::Type) = Segment

and then do seg = segment_type(Tw, promote_op(f, Tw))[ Segment(...) ]?

@stevengj
Copy link
Member

Or maybe have do_quadgk_no_inf call a second function _do_quadgk_no_inf, passing the initial segs as an argument. That way, you can do the isleaftype check outside of _do_quadgk_no_inf.

@ti-s
Copy link
Author

ti-s commented Oct 31, 2016

The solution with segment_type works really well, thank you! Additionally specialising on the result of segment_type seems to reduce the runtime even further, but only if I do not introduce another function call. I will investigate the remaining possibilities and add a new commit tomorrow.

@ti-s
Copy link
Author

ti-s commented Nov 1, 2016

As I mentioned above, defining

segment_type{Ts, Tf<:Number}(::Type{Ts}, ::Type{Tf}) = Segment{Ts, Tf, real(Tf)}
segment_type{Ts, Tx<:Number, N}(::Type{Ts}, ::Type{Array{Tx, N}}) = Segment{Ts, Array{Tx, N}, real(Tx)}
segment_type(::Type, ::Type) = Segment

works well for my test cases. It is fast and also gives (note the mixed types of the limits)

julia> @code_warntype QuadGK.quadgk(x->x^2, 0, 1.)
Variables:
  #self#::QuadGK.#quadgk
  f::##1#2
  a::Int64
  b::Float64
  c::Tuple{}

Body:
  begin 
      return $(Expr(:invoke, LambdaInfo for #quadgk#15(::Array{Any,1}, ::Function, ::##1#2, ::Int64, ::Float64), :(QuadGK.#quadgk#15), :((Core.ccall)(:jl_alloc_array_1d,(Core.apply_type)(Core.Array,Any,1)::Type{Array{Any,1}},(Core.svec)(Core.Any,Core.Int)::SimpleVector,Array{Any,1},0,0,0)::Array{Any,1}), :(#self#), :(f), :(a), :(b)))
  end::Tuple{Float64,Float64}

But unfortunately that fails, if a user supplied norm is not type stable or does not return real(Tf). Therefore I pushed a slightly different version, which is fast, too, but does not work as well for @code_warntype.

Additionally, this runtime regression could be a result of the specialization on f:

julia> g(f) = QuadGK.quadgk(f, 1, 2)[1]
g (generic function with 1 method)

julia> function test()
           f(x, i) = i*x
           s = 0.0
           for i = 1:1000
              s+= g(x->f(x, i))
           end
       end

 julia> @benchmark test()
 BenchmarkTools.Trial: 
   samples:          922
   evals/sample:     1
   time tolerance:   5.00%
   memory tolerance: 1.00%
   memory estimate:  2.01 mb
   allocs estimate:  106966
   minimum time:     4.43 ms (0.00% GC)
   median time:      4.96 ms (0.00% GC)
   mean time:        5.42 ms (3.18% GC)
   maximum time:     11.44 ms (50.05% GC)

With the version on master:

  julia> @benchmark test2()
  BenchmarkTools.Trial: 
    samples:          2083
    evals/sample:     1
    time tolerance:   5.00%
    memory tolerance: 1.00%
    memory estimate:  2.06 mb
    allocs estimate:  108966
    minimum time:     1.98 ms (0.00% GC)
    median time:      2.12 ms (0.00% GC)
    mean time:        2.40 ms (8.69% GC)
    maximum time:     8.28 ms (72.88% GC)

With commit 108c8fd

   julia> @benchmark test2()
   BenchmarkTools.Trial: 
     samples:          1947
     evals/sample:     1
     time tolerance:   5.00%
     memory tolerance: 1.00%
     memory estimate:  2.07 mb
     allocs estimate:  109966
     minimum time:     2.08 ms (0.00% GC)
     median time:      2.28 ms (0.00% GC)
     mean time:        2.57 ms (7.08% GC)
     maximum time:     8.97 ms (48.96% GC)

Also, on 0.5, the runtime of this PR depends again on the compilation order, but that seems to be fixed on master.

To summarize:

  • The version on master is fast for all cases except infinite limits, where it suffers from a 4x slowdown
  • Commit 108c8fd removes the slowdown for infinite limits, but comes with a 20% runtime cost in general. It also reduces the compilation time by a factor of 3
  • With commit 76d11d6, quadgk is 10% faster for known function types, but is 2x slower when called with many different functions types

Finally, I should probably write some tests if you decide to merge this PR.

@ti-s
Copy link
Author

ti-s commented Nov 2, 2016

The reason for the slowdown in my previous comment is that promote_op cannot infer the return type:

g(f) = @show Base.promote_op(f, Float64)
function test()
    for i = 1:1000
        g(x->x*i); break
    end
 end
test() # Base.promote_op(f,Float64) = Any

If I fix that, the new version is not slower than the version on master:

julia> function test()
           f(x, i) = i*x
           s = 0.0
           for i = 1:1000
              s+= g(x->f(x, i)::Float64)
           end
       end

New version:

julia> @benchmark test()
BenchmarkTools.Trial: 
  samples:          3473
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  1.15 mb
  allocs estimate:  51489
  minimum time:     903.75 μs (0.00% GC)
  median time:      1.10 ms (0.00% GC)
  mean time:        1.44 ms (14.48% GC)
  maximum time:     13.13 ms (80.00% GC)

Version on master:

julia> @benchmark test2()
BenchmarkTools.Trial: 
  samples:          3390
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  1.31 mb
  allocs estimate:  59966
  minimum time:     1.03 ms (0.00% GC)
  median time:      1.21 ms (0.00% GC)
  mean time:        1.47 ms (14.85% GC)
  maximum time:     80.99 ms (97.97% GC)

@tkelman
Copy link
Contributor

tkelman commented Jan 16, 2017

Could you reopen this at https://github.com/JuliaMath/QuadGK.jl ?

@tkelman tkelman closed this Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants