Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative syntax for `map(func, x)` #8450

Closed
kmsquire opened this issue Sep 23, 2014 · 283 comments
Closed

Alternative syntax for `map(func, x)` #8450

kmsquire opened this issue Sep 23, 2014 · 283 comments
Milestone

Comments

@kmsquire
Copy link
Member

@kmsquire kmsquire commented Sep 23, 2014

This was discussed in some detail here. I was having trouble finding it, and thought it deserved its own issue.

@quinnj
Copy link
Member

@quinnj quinnj commented Sep 23, 2014

+1

@toivoh
Copy link
Contributor

@toivoh toivoh commented Sep 23, 2014

Or func.(args...) as syntactic sugar for

broadcast(func, args...)

But maybe I'm the only one who would prefer that?
Either way, +1.

@ihnorton
Copy link
Member

@ihnorton ihnorton commented Sep 23, 2014

👎 If anything, I think Stefan's other suggestion of f[...] has a nice similarity to comprehensions.

@johnmyleswhite
Copy link
Member

@johnmyleswhite johnmyleswhite commented Sep 23, 2014

Like @ihnorton, I'm also not super fond of this idea. In particular, I dislike the asymmetry of having both a .+ b and sin.(a).

@quinnj
Copy link
Member

@quinnj quinnj commented Sep 23, 2014

Maybe we don't need special syntax. With #1470, we could do something like

call(f::Callable,x::AbstractArray) = applicable(f,x) ? apply(f,x) : map(f,x)

right? Perhaps this would be too magical though, to get auto-map on any function.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

@quinnj That one line summarizes my greatest fears about allowing call overloading. I won't be able to sleep for days.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

Not yet sure it's syntactically possible, but what about .sin(x)? Is that more similar to a .+ b?

I think [] is getting way too overloaded and will not work for this purpose. For example, we'll probably be able to write Int(x), but Int[x] constructs an array and so cannot mean map.

@johnmyleswhite
Copy link
Member

@johnmyleswhite johnmyleswhite commented Sep 23, 2014

I would be onboard with .sin(x).

@StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented Sep 23, 2014

We'd have to claw back some syntax for that, but if Int(x) is the scalar version then Int[x] is reasonable by analogy to construct a vector of element type Int. In my mind the opportunity to make that syntax more coherent is actually one of the most appealing aspects of the f[v] proposal.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

How does making f[v] syntax for map make the syntax more coherent? I don't understand. map has a different "shape" than the current T[...] array constructor syntax. What about Vector{Int}[...]? Wouldn't that not work?

@quinnj
Copy link
Member

@quinnj quinnj commented Sep 23, 2014

lol, sorry for the scare @JeffBezanson! Haha, the call overloading is definitely a little scary, every once in a while, I think about the kinds of code obfuscation you can do in julia and with call, you could do some gnarly stuff.

I think .sin(x) sounds like a good idea too. Was there consensus on what to do with multi-args?

@jakebolewski
Copy link
Member

@jakebolewski jakebolewski commented Sep 23, 2014

👎. Saving a couple of characters compared to using higher order functions I don't think is worth the cost in readability. Can you imagine a file with .func() / func.() and func() interspersed everywhere?

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

It seems likely we'll remove the a.(b) syntax anyway, at least.

@kmsquire kmsquire changed the title Use `func.(x)` as syntactic sugar for `map(func, x)` Alternative syntax for `map(func, x)` Sep 23, 2014
@kmsquire
Copy link
Member Author

@kmsquire kmsquire commented Sep 23, 2014

Wow, talk about stirring up a bees nest! I changed the name to better reflect the discussion.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

We could also rename 2-argument map to zipWith :)

@ihnorton
Copy link
Member

@ihnorton ihnorton commented Sep 23, 2014

If some syntax is really necessary, how about [f <- b] or another pun on comprehensions inside the brackets?

(@JeffBezanson you're just afraid that someone is going to write CJOS or Moose.jl :) ... if we get that feature, just put it in the Don't do stupid stuff: I won't optimize that section of the manual)

@StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented Sep 23, 2014

Currently writing Int[...] indicates that you are constructing an array of element type Int. But if Int(x) means converting x to Int by applying Int as a function then you could also consider Int[...] to mean "apply Int to each thing in ...", oh which by the way happens to produce values of type Int. So writing Int[v] would be equivalent to [ Int(x) for x in v ] and Int[ f(x) for x in v ] would be equivalent to [ Int(f(x)) for x in v ]. Of course, then you've lost some of the utility of writing Int[ f(x) for x in v ] in the first place – i.e. that we can statically know that the element type is Int – but if enforce that Int(x) must produce a value of type Int (not an unreasonable constraint), then we could recover that property.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

Strikes me as more vectorization/implicit-cat hell. What would Int[x, y] do? Or worse, Vector{Int}[x]?

@StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented Sep 23, 2014

I'm not saying it's the best idea ever or even advocating for it – I'm just pointing out that it doesn't completely clash with the existing usage, which is itself a bit of a hack. If we could make the existing usage part of a more coherent pattern, that would be a win. I'm not sure what f[v,w] would mean – the obvious choices are [ f(x,y) for x in v, y in w ] or map(f,v,w) but there are still more choices.

@jakebolewski
Copy link
Member

@jakebolewski jakebolewski commented Sep 23, 2014

I feel that a.(b) is hardly used. Ran a quick test and it is used in only in 54 of the ~4,000 julia source files in the wild: https://gist.github.com/jakebolewski/104458397f2e97a3d57d.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

I think it does completely clash. T[x] has the "shape" T --> Array{T}, while map has the shape Array{T} --> Array{S}. Those are pretty much incompatible.

To do this I think we'd have to give up T[x,y,z] as a constructor for Vector{T}. Plain old array indexing, A[I] where I is a vector, can be seen as map(i->A[i], I). "Applying" an array is like applying a function (of course matlab even uses the same syntax for them). In that sense the syntax really works, but we would lose typed-vector syntax in the process.

@johnmyleswhite
Copy link
Member

@johnmyleswhite johnmyleswhite commented Sep 23, 2014

I kind of feel like debating syntax here distracts from the more important change: making map fast.

@JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Sep 23, 2014

Obviously making map fast (which, by the way, needs to be part of a fairly thorough redesign of the notion of functions in julia) is more important. However going from sin(x) to map(sin, x) is very significant from a usability perspective, so to really kill vectorization the syntax is quite important.

@nalimilan
Copy link
Contributor

@nalimilan nalimilan commented Sep 23, 2014

However going from sin(x) to map(sin, x) is very significant from a usability perspective, so to really kill vectorization the syntax is quite important.

Fully agreed.

@toivoh
Copy link
Contributor

@toivoh toivoh commented Sep 23, 2014

I agree with @JeffBezanson that f[x] is pretty much irreconcilable with the current typed array constructions Int[x, y] etc.

@toivoh
Copy link
Contributor

@toivoh toivoh commented Sep 23, 2014

Another reason to prefer .sin over sin. is to finally allow to use e.g. Base.(+) to access the + function in Base (once a.(b) is removed).

@yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 8, 2016

It also needs to be easily composable for expressions like sin(A .+ cos(B[:,1])).

Is it possible to parse f1.(x, f2.(y .+ z)) as broadcast((a, b, c)->(f1(a, f2(b + c))), x, y, z)?

Edit: I see it is already mentioned above... in the comment hidden by default by @github..

@stevengj
Copy link
Member

@stevengj stevengj commented May 8, 2016

@yuyichao, loop fusion seems like it should be possible if the functions are marked as @pure (at least if the eltypes are immutable), as I commented in #15032, but this is a task for the compiler, not the parser. (But vectorized syntax like this is more for convenience than for squeezing the last cycle out of critical inner loops.)

@stevengj
Copy link
Member

@stevengj stevengj commented May 8, 2016

Remember that the key goal here is to eliminate the need for @vectorized functions; this requires a syntax at least as general, nearly as convenient, and at least as fast. It doesn't require automated loop fusion, though it is nice to expose the user's broadcast intention to the compiler to open the possibility of loop fusion at some future date.

@yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 8, 2016

Is there any drawback if it does loop fusion too?

@stevengj
Copy link
Member

@stevengj stevengj commented May 8, 2016

@yuyichao, loop fusion is a much harder problem, and it's not always possible even putting aside non-pure functions (e.g. see @StefanKarpinski's exp(log(A) .- sum(A,1)) example above). Holding out for that to be implemented will probably result in it never being implemented, in my opinion — we have to do this incrementally. Start by exposing the user's intention. If we can further optimize in the future, great. If not, we still have a generalized replacement for the handful of "vectorized" functions available now.

@stevengj
Copy link
Member

@stevengj stevengj commented May 8, 2016

Another obstacle is that .+ etc. is not currently exposed to the parser as a broadcast operation; .+ is just another function. My plan is to change that (make .+ sugar for broadcast(+, ...)), as noted above. But again, it is much easier to make progress if the changes are incremental.

@yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 8, 2016

What I mean is that doing loop fusion by proving doing so is valid is hard, so we can let the parser do the transformation as part of the schematics. In the example above, it can be written as. exp.(log.(A) .- sum(A,1)) and be parsed as broadcast((x, y)->exp(log(x) - y), A, sum(A, 1)).

It's also fine if .+ doesn't belong to the same category yet (just like any non boardcasted function call will be put into the argument) and it's even fine if we will only do this (loop fusion) in a later version. I'm mainly asking if having such a schematics is possible (i.e. non-ambiguous) in the parser and if there's any draw back by allowing written vectorized and fuzed loop this way..

@yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 8, 2016

doing loop fusion by proving doing so is valid is hard

I mean doing that in the compiler is hard (maybe not impossible), especially since the compiler needs to look into the complicated implementation of broadcast, unless we special case broadcast in the compiler, which is likely a bad idea and we should avoid it if possible...

@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

Maybe? It's an interesting idea, and doesn't seem impossible to define the .( syntax as "fusing" in this way, and leave it up to the caller to not use it for impure functions. The best thing would be to try it and see if there are any hard cases (I'm not seeing any obvious problems right now), but I'm inclined to do this after the "non-fusing" PR.

@yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 9, 2016

I'm inclined to do this after the "non-fusing" PR.

Totally agree, especially since .+ isn't handled anyway.

@StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented May 9, 2016

I don't want to derail this, but @yuyichao's suggestion gave me some ideas. The emphasis here is on which functions are vectorized, but that's always seems a bit misplaced to me – the real question is which variables to vectorize over, which completely determines the shape of the result. This is why I've been inclined to mark arguments for vectorization, rather than marking functions for vectorization. Marking arguments also allows for functions which vectorize over one argument but not another. That said, we can have both and this PR serves the immediate purpose of replacing the built-in vectorized functions.

@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

@StefanKarpinski, when you call f.(args...) or broadcast(f, args...), it vectorizes over all the arguments. (For this purpose, recall that scalars are treated as 0-dimensional arrays.) In @yuyichao's suggestion of f.(args...) = fused broadcast syntax (which I am liking more and more), I think the fusion would "stop" at any expression that is not func.(args...) (to include .+ etc. in the future).

So, for example, sin.(x .+ cos.(x .^ sum(x.^2))) would turn (in julia-syntax.scm) into broadcast((x, _s_) -> sin(x + cos(x^_s_)), x, sum(broacast(^, x, 2))). Notice that the sum function would be a "fusion boundary." The caller would be responsible for not using f.(args...) in cases where fusion would screw up side effects.

Do you have an example in mind where this would not be enough?

@yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 9, 2016

which I am liking more and more

I'm glad you like it. =)

Just yet another extension that probably doesn't belong to the same round, it might be possible to use .=, .*= or similar to solve the in place assignment issue (by making it distinct from the normal assignment)

@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

Yes, the lack of fusion for other operations was my main objection to .+= etcetera in #7052, but I think that would be resolved by having .= fuse with other func.(args...) calls. Or just fuse x[:] = ....

@mschauer
Copy link
Contributor

@mschauer mschauer commented May 9, 2016

👍 There are two concept huddled together in this discussion which are in fact quite orthogonal:
the matlab'y "fused broadcasting operations" or x .* y .+ z and apl'y "maps on products and zips" likef[product(I,J)...]and f[zip(I,J)...]. The talking past each other might have to do with that as well.

@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

@mschauer, f.(I, J) is already (in #15032) equivalent to map(x -> f(x...), zip(I, J) if I and J have the same shape. And if I is a row vector and J is a column vector or vice versa, then broadcast indeed maps over the product set (or you can do f.(I, J') if they are both 1d arrays). So I don't understand why you think the concepts are "quite orthogonal."

@mschauer
Copy link
Contributor

@mschauer mschauer commented May 9, 2016

Orthogonal was not the right word, they are just different enough to coexist.

@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

The point is, though, that we don't need separate syntaxes for the two cases. func.(args...) can support both.

@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

Once a member of the triumvirate (Stefan, Jeff, Viral) merges #15032 (which I think is merge-ready), I'll close this and file a roadmap issue to outline the remaining proposed changes: fix broadcast type-computation, deprecate @vectorize, turn .op into broadcast sugar, add syntax-level "broadcast-fusion", and finally fuse with in-place assignment. The last two probably won't make it into 0.5.

@mschauer
Copy link
Contributor

@mschauer mschauer commented May 9, 2016

Hey, I am very happy and grateful about 15032. I would not be dismissive of the discussion though. For example vectors of vectors and similar objects are still very awkward to use in julia but can sprout like weed as results of comprehensions. Good implicit notation not based on encoding iteration into singleton dimensions has the potential to ease this a lot, for example with the flexed out iterators and new generator expressions.

@stevengj stevengj mentioned this issue May 9, 2016
5 of 5 tasks complete
@stevengj
Copy link
Member

@stevengj stevengj commented May 9, 2016

I think this can be closed now in favor of #16285.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.