New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anonymous function calls have a huge overhead #1864

Closed
stevengj opened this Issue Jan 2, 2013 · 45 comments

Comments

@stevengj
Member

stevengj commented Jan 2, 2013

A surprising factor of 100 slowdown in this simple example:

julia> A = rand(100000);

julia> foo(X) = for i = 1:numel(X); X[i] *= 2; end

julia> bar(s) = X -> for i = 1:numel(X); X[i] *= s; end

julia> @time foo(A)
elapsed time: 0.008594989776611328 seconds

julia> @time foo(A)
elapsed time: 0.0002551078796386719 seconds

julia> baz = bar(2)
#<function>

julia> @time baz(A)
elapsed time: 0.048172950744628906 seconds

julia> @time baz(A)
elapsed time: 0.0285341739654541 seconds

I'm guessing there is some lack of JIT-ing going on? This affects #1856.

@johnmyleswhite

This comment has been minimized.

Member

johnmyleswhite commented Jan 2, 2013

I believe this was also one of the major sources of slowdowns in a previous discussion about the use of closures for MCMC. Very worth getting right.

@ghost ghost assigned JeffBezanson Jan 2, 2013

@stevengj

This comment has been minimized.

Member

stevengj commented Feb 28, 2013

See also the discussion on how this penalizes FFT performance.

@milktrader

This comment has been minimized.

Member

milktrader commented Mar 1, 2013

Ouch. I really love anonymous function syntax. +1 for getting it right.

@crisluengo

This comment has been minimized.

crisluengo commented May 19, 2014

Is this the same issue I'm seeing? Loops written directly at the REPL are much slower than inside a function:

julia> tic(); s=0.; for ii=0.:1e8; s+=ii*ii; end; s; toc() 
elapsed time: 9.243699199 seconds
9.243699199

julia> sumsquare(n)=(s=0.; for ii=0.:n; s+=ii*ii; end; s)
sumsquare (generic function with 1 method)

julia> tic(); sumsquare(1e8); toc()
elapsed time: 0.357053878 seconds
0.357053878

It takes no time at all to compile the function, this behaviour is surprising!

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented May 19, 2014

@StefanKarpinski

This comment has been minimized.

Member

StefanKarpinski commented May 19, 2014

No, that's a different issue due to the difference between global and local variables:

http://docs.julialang.org/en/latest/manual/performance-tips/#avoid-global-variables.

@quinnj

This comment has been minimized.

Member

quinnj commented Jun 4, 2014

Is the priority here really 1.0 milestone? Just bumping to check as this came up in a user thread again: https://groups.google.com/forum/#!topic/julia-users/BhQjHGUx2lA

@shashi

This comment has been minimized.

Contributor

shashi commented Aug 30, 2014

Bump! This and #4428 have been favorites. I'd love very much to see this fixed! Would be a boon for packages like Lazy and Reactive which use a functional style.

Second @quinnj about not having to wait till 1.0

@ViralBShah

This comment has been minimized.

Member

ViralBShah commented Aug 30, 2014

Does this one deserve a 0.4-projects milestone?

@shashi

This comment has been minimized.

Contributor

shashi commented Aug 30, 2014

I'd think yes! This is worth fixing sooner than later.

@tknopp

This comment has been minimized.

Contributor

tknopp commented Aug 30, 2014

It only deserves a 0.4-projects milestone if one of the people who actually are able to resolve this plans to solve it in the 0.4-dev period.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

I almost have this fixed over here. There's still one bit of sorcery left (needed to make them work properly when created inside a function), and unfortunately the proper metaprogramming incantations are proving to be surprisingly tricky. It will probably boil down to about 5 lines of code, but I've spent about a day on those 5 lines so far to no avail 😦. I'm annoyed at myself because I thought my metaprogramming-fu was well beyond having trouble with this kind of stuff.

@shashi

This comment has been minimized.

Contributor

shashi commented Aug 30, 2014

@tknopp of course. I'm just creating some noise and hoping it will happen. ;) This is beyond me to take on.

@tknopp

This comment has been minimized.

Contributor

tknopp commented Aug 30, 2014

@shashi: Yes bumping issues is absolutely ok. But the 0.4 tag should IMHO only be applied if someone plans to work on it.

@shashi

This comment has been minimized.

Contributor

shashi commented Aug 30, 2014

@timholy very cool. Patched to allow for functions with no name ;) timholy/FastAnonymous.jl#1

@jakebolewski

This comment has been minimized.

Member

jakebolewski commented Aug 30, 2014

@timholy your implementation generates the "anon" function at toplevel scope so the resulting anon function won't have the correct scoping semantics when generated inside a function. I don't think any amount of macro foo can get around that limitation.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

No, it can. Think I almost have it now. Crossing fingers.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

Ta-dah! Done.

It's amazing what a catalytic effect complaining has 😄.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

We still probably want to leave this issue open, however, because aside from needing the @anon macro, you also need to write functions like this:

function myfunction{f}(::Type{f}, args...)

rather than

function myfunction(f::Function, args...)

Still, it seems to finally be a workable, general solution to the problem.

@jakebolewski

This comment has been minimized.

Member

jakebolewski commented Aug 30, 2014

I don't want to be a downer but:

julia> function test()
       y = 0
       fn = @anon x -> x * y
       while y < 50
       y += 10; @show fn(10)
       sleep(1)
       end
       end
test (generic function with 1 method)

julia> test()
fn(10) => 0
fn(10) => 0
fn(10) => 0
fn(10) => 0
fn(10) => 0

julia> function test()
       y = 0
       fn = (x) -> x * y
       while y < 50
       y += 10; @show fn(10)
       sleep(1)
       end
       end
test (generic function with 1 method)

julia> test()
fn(10) => 100
fn(10) => 200
fn(10) => 300
fn(10) => 400
fn(10) => 500
@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

Fair enough. But that's what I'd call a serious corner case. 99% of the time (for me at least), once I define an anonymous function I don't want it interacting further with local variables.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

In other words, @anon makes it work more like I personally would expect it to work than do real anonymous functions.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

Though that could be my matlab upbringing (for which anonymous functions work like @anon).

@jakebolewski

This comment has been minimized.

Member

jakebolewski commented Aug 30, 2014

Your last few statements made me cry a bit inside as a lisper :-)

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

Understood. But if that's what you want to happen, then I agree that metaprogramming fu can't solve that particular problem. Sorry if I misunderstood your original meaning.

@timholy

This comment has been minimized.

Member

timholy commented Aug 30, 2014

However, if you want the parameters to update, it's not obvious to me that full inlining is even possible (or at least, const-folding could not be as complete). So there are some choices that have to made.

@shashi

This comment has been minimized.

Contributor

shashi commented Aug 30, 2014

huh for the same test @timholy is using I get different relative run times in this script. https://gist.github.com/shashi/b7a147f630d744c2b04f Anonymous function is the fastest?! :o I defined an @anon macro which transforms x->y+x into <rand symbol>(x) = y+x because I've been thinking the issue is with "anonymous" function, but thhat doesn't make any difference.

@timholy

This comment has been minimized.

Member

timholy commented Aug 31, 2014

Yes, the "anonymous" through @anon will be considerably faster than passing a generic function as an argument. The reason is that, with the proper syntax, the @anonized version gets inlined, but it doesn't for generic functions.

@rfourquet

This comment has been minimized.

Contributor

rfourquet commented Aug 31, 2014

@timholy that's great! It covers a lot of use cases already. I guess real closures could be implemented once objects become callable (this is how lambdas are implemented in C++, as stateful funtors). But I wondered, wouldn't it be possible to simulate closures via @anon by manually maintaining a hidden dict holding referenced variables?

@stevengj

This comment has been minimized.

Member

stevengj commented Aug 31, 2014

For me, almost the whole point of an anonymous (or inner) function is to have a closure that encapsulates local variables. And it would be insanely ugly and confusing to have two kinds of anonymous functions in Julia: slow real closures and fast fake closures. So I don't see this as something that can be merged into Base.

@stevengj

This comment has been minimized.

Member

stevengj commented Aug 31, 2014

Note also that this implementation probably defeats garbage collection: the local values that get spliced into your top-level type definition by eval will hang around forever.

@timholy

This comment has been minimized.

Member

timholy commented Aug 31, 2014

@stevengj, if I were proposing to merge this into base, it would have been a PR, not a package. I'm not personally worried about the GC issue, because my anonymous functions use small parameters. May not serve your needs, but it will serve mine.

@rfourquet, yes, that would work, although the dict lookup would be slow. You're right that for a fast implementation, basically what seems to be missing is the ability to define obj(x) for an instance obj.

Once we also get stagedfunctions (hint, hint), I think you can do the rest of it via metaprogramming, although of course it might be best just to build it all into the compile chain. You'd want

@makeclosures function mc(arg1, arg2)
    ...
    for offset = 1:20
        ...
        f = x -> (x+offset)^2
        somefunction(f, ...)
        ...
    end
end

to get translated into something like this:

stagedfunction register_localvar_##56789(arg1, arg2, offset)
    eval(quote
        type ##12345
            offset::$offset
        end
        call(obj::##12345, x) = (x+obj.offset)^2
        function  localvar_constructor_##56789(::Type{$arg1}, ::Type{$arg2})
            ccall(:jl_new_struct_uninit, Any, (Any,), ##12345)
        end
    end)
    :(nothing)
end

function mc(arg1, arg2)
    localvars = localvar_constructor_##56789(typeof(arg1), typeof(arg2))
    ...
    for offset = 1:20
        ...
        register_localvar_##56789(arg1, arg2, offset)
        localvars.offset = offset
        somefunction(localvars, ...)
        ...
    end
end

That would also solve @stevengj's concern about GC.

@jakebolewski

This comment has been minimized.

Member

jakebolewski commented Sep 1, 2014

@timholy that would not work either as the @makeclosure macro could be called in some outer scoping context it does not have access to, such as a function or let binding. This needs to be handled by the compiler and for many reasons, some of which you hinted at, it is a hard problem.

@timholy

This comment has been minimized.

Member

timholy commented Sep 1, 2014

If the user puts it in the wrong place, that's his/her problem. It was only intended to work when it contains the entire scope (that's why I put it outside the function statement). Of course, implementing it for a let block would make it more complicated.

But one likely problem with the scheme outlined above is if the compiler insists on compiling localvar_constructor_##56789 before register_localvar_##56789. The former won't even exist until the latter gets compiled. I don't know if this is already handled in the compiler, but one could have a two-pass system; compile what you can first, and defer any errors until it fails the second time. But indeed such things make it feel more like "clever ways to hack around the compiler," and it would be better to just do all this in the compiler directly.

@shashi

This comment has been minimized.

Contributor

shashi commented Sep 2, 2014

@timholy I meant that anonymous functions created with -> syntax actually appear to be faster in that gist.

This holds for simpler forms of @stevengj's original cases as well.

julia> a(x) = (x, 2)
a (generic function with 1 method)

julia> b(x) = y -> (x, y)
b (generic function with 1 method)

julia> c = b(2)

julia> @time c(1)
elapsed time: 0.001894234 seconds (15064 bytes allocated)
(2,1)

julia> @time c(1)
elapsed time: 3.484e-6 seconds (112 bytes allocated)
(2,1)

julia> @time a(1)
elapsed time: 0.002761751 seconds (7144 bytes allocated)
(1,2)

julia> @time a(1)
elapsed time: 4.714e-6 seconds (112 bytes allocated)

running @code_llvm on foo and baz from @Stevegj's example gives an error saying there are no methods for the given argument type, which is weird.
my bad, the code is available neither for call to c nor for baz.

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Sep 2, 2014

Obviously normal anonymous functions should be fast. @rfourquet is right that callable objects enable some new more efficient approaches.

@carnaval

This comment has been minimized.

Contributor

carnaval commented Sep 2, 2014

On a related note, I tried a bit to implement function types for anonymous functions a while ago. I got some julia time right now that I'd prefer spend on the gc stuff, but I pushed the branch to my clone if anybody wants to look at/work on top of it : https://github.com/carnaval/julia/tree/fty. (Test suite doesn't pass with > 1 worker, I suspect due to some lambda serialization issue).
Some spoilers :

julia> typeof((x::Int,y::Int)->(x+y)::Int)
Function{(Int64,Int64),Int64}

but perhaps more importantly :

julia> map2{A,R}(f :: Function{(A,), R}, a :: Vector{A}) = R[f(x) for x in a];
julia> map2(x::Int->x::Number,Int[1])
1-element Array{Number,1}:
 1
julia> @code_typed map2(x::Int->x::Number,Int[1])
[...] :: Array{Number, 1}

so if I'm not mistaken this is could be a step forward for the infamous "first element type of map" issue.

The function type is determined by declaration (it only looks at the last statement for the return signature for now, so it can be fooled) and not by inference. However inference understands and (should?) correctly apply function types.
This is admitedly the easy part since we should also be able to generate a C ABI function ptr for the lambda. This probably implies codegen at declaration time and bloating each Function object with a second optional C-callable fptr.
Of course this does not touch at all at the hairy question of typing generic functions.
I'm not saying that any final implementation should look like that but maybe it can get some discussion started :-)

@timholy

This comment has been minimized.

Member

timholy commented Sep 8, 2014

Since no one responded here to @carnaval, I'll just say: excited to hear you have some Julia time coming! As always you are willing & able to tackle hard problems. We need more of that, and I'll look forward to whatever comes from your efforts.

@shashi

This comment has been minimized.

Contributor

shashi commented Oct 24, 2014

@carnaval This is very cool! I wonder if efficient closures can be implemented in a simpler way after #8712

Could someone explain why the following is the case?

julia> x = rand(int(1e8));

julia> @time map(y->y, x);
elapsed time: 2.734845271 seconds (2400001232 bytes allocated, 24.31% gc time)

julia> @time broadcast(y->y, x);
elapsed time: 5.599463778 seconds (4000311128 bytes allocated, 25.05% gc time)

julia> id(a) = a

julia> @time map(id, x);
elapsed time: 5.109159967 seconds (4000002344 bytes allocated, 26.13% gc time)

julia> @time broadcast(id, x);
elapsed time: 0.545501696 seconds (800329264 bytes allocated, 19.08% gc time)
@timholy

This comment has been minimized.

Member

timholy commented Oct 26, 2014

@shashi, it's because broadcast generates an inlined version for each function you pass. map doesn't. See

julia/base/broadcast.jl

Lines 69 to 81 in ffebd6a

function gen_broadcast_body_cartesian(nd::Int, narrays::Int, f::Function)
F = Expr(:quote, f)
quote
@assert ndims(B) == $nd
@ncall $narrays check_broadcast_shape size(B) k->A_k
@nloops($nd, i, B,
d->(@nexprs $narrays k->(j_d_k = size(A_k, d) == 1 ? 1 : i_d)), # pre
begin # body
@nexprs $narrays k->(@inbounds v_k = @nref $nd A_k d->j_d_k)
@inbounds (@nref $nd B i) = (@ncall $narrays $F v)
end)
end
end
,

julia/base/broadcast.jl

Lines 183 to 193 in ffebd6a

function gen_broadcast_function(genbody::Function, nd::Int, narrays::Int, f::Function)
As = [symbol("A_"*string(i)) for i = 1:narrays]
body = genbody(nd, narrays, f)
@eval begin
local _F_
function _F_(B, $(As...))
$body
end
_F_
end
end
, and

julia/base/broadcast.jl

Lines 217 to 229 in ffebd6a

@eval let cache = Dict{Function,Dict{Int,Dict{Int,Function}}}()
global broadcast!
function broadcast!(f::Function, B::$Bsig, As::$Asig...)
nd = ndims(B)
narrays = length(As)
cache_f = @get! cache f Dict{Int,Dict{Int,Function}}()
cache_f_na = @get! cache_f narrays Dict{Int,Function}()
func = @get! cache_f_na nd $gbf($gbb, nd, narrays, f)
func(B, As...)
B
end
.

@mauro3

This comment has been minimized.

Contributor

mauro3 commented Feb 19, 2015

@carnaval any progress on the typed anonymous functions? I would appreciate having those a lot! Of course, having typed generic functions will be nice too. However, typed anonymous functions would cover 90% of my use-cases already and would be nice to have in itself.

@carnaval

This comment has been minimized.

Contributor

carnaval commented Feb 20, 2015

Hum. That branch must have bitrotten since then. I remember there was (unsurprisingly) problems rebasing on top of the call overload thing. I'll try to get it back in proof-of-concept state but I won't have time to actively develop it for the coming month. Maybe opening a PR will bait someone in taking up the work :)

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Jan 29, 2016

fixed by #13412.

@shashi

This comment has been minimized.

Contributor

shashi commented Jan 29, 2016

🍹

@rfourquet

This comment has been minimized.

Contributor

rfourquet commented Jan 29, 2016

:D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment