Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fusion of nested f.(args) calls into a single broadcast call #17300

Merged
merged 12 commits into from
Jul 12, 2016

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jul 6, 2016

As discussed in #16285 and suggested by @yuyichao, this PR implements fusion of nested f.(args) calls into a single broadcast loop. For example, sin.(cos.(x)) is transformed by the parser into broadcast(x -> sin(cos(x)), x).

Because this is a syntactic guarantee, there is no need to worry about side effects or function purity, so it is very different from loop fusion viewed as a compiler optimization.

To do:

Note that operators like .+ are still handled separately, but the long-term plan in #16285 is to treat dot operators as .( calls.

@stevengj
Copy link
Member Author

stevengj commented Jul 6, 2016

Note that this is still an early preview. I literally committed the instant sin.(cos.(x)) seemed to work. All of my other testing so far has been at the Scheme level (calling individual sub-functions of expand-fuse-broadcast on various symbolic expressions to make sure they are sensible). That being said, I think it is feature-complete, modulo bugs.

@stevengj stevengj mentioned this pull request Jul 6, 2016
5 tasks
@stevengj
Copy link
Member Author

stevengj commented Jul 6, 2016

There are also some potential optimizations that I haven't implemented. For example, if you do sin.(atan.(x,1)), that is currently transformed into the equivalent of broadcast((x,y) -> sin(atan(x,y)), x, 1). In principle, the parser can detect the literal constant 1 and omit the second broadcast argument, giving broadcast(x -> sin(atan(x,1)), x) instead.

Are there any other arguments besides numeric literals than can be "inlined" in this way?

@stevengj
Copy link
Member Author

stevengj commented Jul 6, 2016

Performance is looking good:

f(x) = x + 1
f1(x) = broadcast(f, broadcast(f, broadcast(f, x)))
f2(x) = [f(f(f(x))) for x in x]
f3(x) = f.(f.(f.(x)))
@time f1(x)
@time f2(x)
@time f3(x)

gives (after the usual repetitions):

  0.145307 seconds (66 allocations: 228.884 MB, 55.97% gc time)
  0.031480 seconds (2 allocations: 76.294 MB, 21.96% gc time)
  0.028810 seconds (22 allocations: 76.295 MB, 21.56% gc time)

i.e. it is doing about as well as manual loop fusion.

@nalimilan
Copy link
Member

Wait... If you add that now, what feature are we going to introduce next to justify another release after 0.5? ;-)

@Keno
Copy link
Member

Keno commented Jul 6, 2016

It's not clear that we should do this now. 0.5 is \epsilon close to release, this is untested and mostly a performance question (people can write this syntax, it'll get faster in the next release).

@stevengj
Copy link
Member Author

stevengj commented Jul 6, 2016

@Keno, that's true. On the other hand, this feature is relatively non-disruptive because the f.(args...) syntax itself is new in 0.5, and there is some argument in favor of settling on the semantics now.

@stevengj
Copy link
Member Author

stevengj commented Jul 7, 2016

Will need a rebase and some fixes after #17307.

@martinholters
Copy link
Member

people can write this syntax, it'll get faster in the next release

...and it will change the result when applied to non-pure functions. So either merge this PR for 0.5, or put a big warning in the docs.

@ivarne
Copy link
Member

ivarne commented Jul 7, 2016

Are there any other arguments besides numeric literals than can be "inlined" in this way?

Calls to @pure functions? non-numeric immutable literals (eg. Strings)

@martinholters
Copy link
Member

Not meaning to ruin anyones day, but with the promote_op fallback, there can be nasty surprises:

julia> log.(log.(log.([1000]))) # if this
1-element Array{Float64,1}:
 0.658889

julia> broadcast(x -> log(log(log(x))), [1000]) # is rewritten to this
ERROR: DomainError:
 in nan_dom_err at ./math.jl:0 [inlined]
 in log(::Float64) at ./math.jl:140
 in (::##1#2)(::Int64) at ./REPL[15]:1
 in promote_op at ./number.jl:0 [inlined]
 in promote_eltype_op at ./abstractarray.jl:0 [inlined]
 in broadcast(::Function, ::Array{Int64,1}) at ./broadcast.jl:197
 in eval(::Module, ::Any) at ./boot.jl:234
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

This is not meant as an argument against this PR though, but rather as motivation to rethink that fallback...

@nalimilan
Copy link
Member

This is not meant as an argument against this PR though, but rather as motivation to rethink that fallback...

Damn fallback. We really need to use type inference instead of actually calling the function on one(T)...

@andyferris
Copy link
Member

+100 for this PR

Note that operators like .+ are still handled separately, but the long-term plan in #16285 is to treat dot operators as .( calls.

@stevengj I'm not sure of their current status (I only checked a couple weeks ago), but can we at minimum have a default (fallback) definition of .+ etc in terms of broadcast in v0.5? This would significantly reduce the number of functions needed for new container types.

It's not clear that we should do this now. 0.5 is \epsilon close to release, this is untested and mostly a performance question (people can write this syntax, it'll get faster in the next release).

@Keno - it may be a performance question, but this change is totally kick-ass. I was disappointed when I first switched to Julia reading the improvements over MATLAB in speed to find out that the same memory mistakes MATLAB made for the kind of work I was doing at the time (large, memory-limited linear algebra problems) also existed in Julia. Of course, there was Devectorize.jl and other tricks, but for a newbie it was difficult to know what to do.

My personal opinion was always that reducing the memory overhead for Julia's core, ex-MATLAB audience should be one of the priorities. On the other hand, you definitely make valid points about rushed features! (In fact, I think changing the parsing of .+ etc for 0.5 is more important in terms of presenting a uniform interface with the performance improvements here coming later).

@andreasnoack
Copy link
Member

@martinholters Maybe a reason not to throw so much

julia> import NaNMath

julia> broadcast(x -> NaNMath.log(NaNMath.log(NaNMath.log(x))), [1000])
1-element Array{Float64,1}:
 0.658889

@stevengj
Copy link
Member Author

stevengj commented Jul 7, 2016

@andreasnoack, please continue that discussion at #17314, since it is orthogonal to this PR.

Moreover, *nested* ``f.(args...)`` calls are *fused* into a single ``broadcast``
loop. For example, ``sin.(cos.(X))`` is equivalent to ``broadcast(x -> sin(cos(x)), X)``,
similar to ``[sin(cos(x)) for x in X]``: there is only a single loop over ``X``,
and a single array is allocated for the result. [In contrast, ``sin(cos(X))``
Copy link
Contributor

@tkelman tkelman Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope brackets don't have special meaning in rst

I don't think we use square brackets for parenthetical comments elsewhere in the docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual typographical convention is to use square brackets for parenthetical comments that have nested parentheses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I haven't come across that convention. Do you have a citation for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In math, the usual style is to nest parens as {[(...)]}. In text, it is common to recommend square inside round parens if you must nest, but that isn't possible here because the nested parens are code and hence are constrained by Julia syntax.

http://blog.apastyle.org/apastyle/2013/05/punctuation-junction-parentheses-and-brackets.html
http://www.chicagomanualofstyle.org/16/ch12/ch12_sec026.html
http://www.chicagomanualofstyle.org/16/ch06/ch06_sec099.html

@martinholters
Copy link
Member

Another subtlety: If f and g define their own promote_ops, they won't be used in f.(g.(x)), potentially leading to a surprising return type. Silly example:

julia> Number.(sin.([0 pi/2]))
1×2 Array{Number,2}:
 0.0  1.0

julia> broadcast(x -> Number(sin(x)), [0.0 pi/2])
1×2 Array{Float64,2}:
 0.0  1.0

One may argue whether the promote_op for Number is flawed, but the point here is that fusing the operation gives a different return type. The solution here probably would be to implement a promote_op for the composed function invoking promote_op for the fused functions.

@stevengj
Copy link
Member Author

stevengj commented Jul 8, 2016

As @JeffBezanson remarked, promote_op inherently doesn't scale. @martinholters, what you're describing seems essentially equivalent to type inference, possibly combined with return-type declarations for a few functions.

@stevengj
Copy link
Member Author

stevengj commented Jul 10, 2016

(Test failure should be fixed by #17318.)

@martinholters
Copy link
Member

what you're describing seems essentially equivalent to type inference

Yes, well, but without any dependencies on the compiler's mood, so more predictable across Julia versions. But if we could get rid of promote_op completely, that would be all the better...

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 11, 2016

I would mention that this is not just a performance issue: if we release f.(x...) syntax in 0.5 and dot fusion in 0.6, we will potentially break people's code precisely because of all of the above gotchas. If we do this now, the fused dot behavior and the unfused non-dot behavior may be surprisingly different, but code won't break.

@StefanKarpinski
Copy link
Member

So if this can be made ready this week, I think we should consider doing it now.

@stevengj
Copy link
Member Author

stevengj commented Jul 12, 2016

Okay, dot calls with keyword arguments are now handled. (They are not handled correctly on master even for non-fused dot calls.)

If the tests pass, this should be ready to merge as far as I can tell right now.

@JeffBezanson
Copy link
Member

Some impressive sexpr-slinging in here. :)

@stevengj
Copy link
Member Author

It's always a questionable sign if you find yourself wishing that cadaddr and cddaddr were defined. 😉

@JeffBezanson JeffBezanson changed the title WIP: fusion of nested f.(args) calls into a single broadcast call fusion of nested f.(args) calls into a single broadcast call Jul 12, 2016
@JeffBezanson JeffBezanson merged commit 8fdaf91 into master Jul 12, 2016
@stevengj stevengj deleted the dot-fusion branch July 12, 2016 19:32
@davidanthoff
Copy link
Contributor

Thanks for finishing this on time, very, very cool that this made it for 0.5!!!

@gabrielgellner
Copy link
Contributor

Does it makes sense that:

x = rand(10000)
@time sin(cos(x)); #  0.000243 seconds (12 allocations: 156.625 KB)
@time sin.(cos.(x)); # 0.012385 seconds (6.85 k allocations: 386.261 KB)
@time broadcast(xval->sin(cos(xval)), x); #  0.012997 seconds (6.77 k allocations: 382.771 KB)

@time sin.(cos.(x) .+ 0) #  0.000325 seconds (50 allocations: 235.953 KB)

seems to be a strange speed regression on my machine using the dotted syntax if the fusion kicks in (what I imagine is caused by this, hence the last example trying to use the .+ to get different code...). But it is consistent with the broadcast call, so maybe this is how the code is meant to work? Just feels strange that adding the .+ gives such a huge speedup!

Is there a way to see what the lowered code looks like from the REPL?

@stevengj
Copy link
Member Author

The .+ prevents fusion, because dot operators are not handled as f.(args...) calls yet. Maybe this is a global-scope thing? Try f(x) = sin.(cos.(x)).

@yuyichao
Copy link
Contributor

Note that each line that uses the dot function call syntax generates a new anonymous function. You need to wrap it in a function to avoid benchmarking compiler

@stevengj
Copy link
Member Author

stevengj commented Jul 12, 2016

Timings look good when wrapped in a function:

x = rand(10^4)
f1(x) = sin(cos(x))
f2(x) = sin.(cos.(x))
@time f1(x); @time f2(x);
...repeat a few times...
@time f1(x); @time f2(x);

  0.000407 seconds (12 allocations: 156.625 KB)
  0.000370 seconds (15 allocations: 78.500 KB)

In this case, the sin and cos functions are expensive enough that there is not a huge performance advantage to fusing the two loops, but the memory advantage is clear.

@andyferris
Copy link
Member

Nice! I have to say I'm impressed by all of this work 👍

Out of curiosity, will .+ etc. be dealt with for 0.5?

@stevengj
Copy link
Member Author

No, I think dot operators will be left as-is for 0.5.

(In consequence, this loop fusion probably won't have a big practical impact until 0.6. But it is good to get at least a preview in 0.5.)

@gabrielgellner
Copy link
Contributor

Ohhhh I see. It is the anonymous function call in the global scope, versus having broadcast(sin, broadcast(cos, x)) not creating the anonymous function. Kind of funny that the more elegant way of doing this will make using it in the natural way in the REPL slower ;)

Thanks so much for the clarification. I am so excited for this feature (the .( syntax, not the fusion specifically).

@stevengj
Copy link
Member Author

Yeah, it would be nice if the compiler were smart enough to re-use identical anonymous functions.

@andyferris
Copy link
Member

andyferris commented Jul 12, 2016

No, I think dot operators will be left as-is for 0.5.

That leads me to predict you'll see some packages using code like +.(A, *.(B, C)), instead of A .+ B .* C, and broadcast!((a,b,c) -> a + b*c, A, A, B, C) instead of A .+= B.*C.

It will be a super-nice syntax change, whenever it lands. :)

@gabrielgellner
Copy link
Contributor

@andyferris as @StefanKarpinski has mentioned, in another issue, it would be really great if we could find a way for Julia to slap you if you have a tendency to write too much code that way... I just hope this is all that happens and they don't also use 0-based arrays on top of such abominations.

@andyferris
Copy link
Member

as @StefanKarpinski has mentioned, in another issue, it would be really great if we could find a way for Julia to slap you if you have a tendency to write too much code that way

lol :)

@stevengj stevengj added the broadcast Applying a function over a collection label Aug 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.