Towards typed lambdas #10269

carnaval · 2015-02-21T17:19:00Z

As promised in #1864, here is something somewhat working on top of current master.
There are ~5 parts to this :

Make the Function type parametric (and fatter by 1 pointer, to hold the specialized fptr)
Was easier than expected. I did not take care of the serializer so any A->B going through the system image should get out as (Any...,)->Any for now.
Also the function types are standard parametric DataTypes so there is no co/contra variance on ret/arg types.
Annotate function AST with return/arg types
Currently this is done in a dirty way. For the return type I'm just looking at the last instruction of the function to see if it is of the form return blah::X. This is not even correct.
To integrate this properly syntax should probably be discussed and the frontend modified to insert the necessary information in a proper field of the lambda ast node. (see jl_lam_(arg|ret)type in ast.c). We should also decide whether to let the user access inferred types here, or force explicit annotations.
Teach inference the reduction rules
Straightforward, see changes in inference.jl. I'm not doing anything when the Function type is not leaf so it could be smarter and look at TypeVar upper/lower bounds.
Codegen early (on definition) for lambdas annotated with leaf return & arg types.
Generate specialized calls through Function objects with tightly inferred type.
This part I'm not very confident I didn't break anything. In particular, are there cases where a specialized function requires boxing even if we can't tell looking at the julia signature ? Here we need to generate a C specialized call with this as only information whereas the current code relies on the known llvm signature. (Not sure if I'm clear but looking at the changes in codegen.cpp should explain my point better...).

Examples :

julia> typeof(x::Int -> (x+1)::Int) # Typed lambda
Function{(Int64,),Int64}

julia> function mymap{A,R}(f::Function{(A,), R}, v::Vector{A})
         n = length(v)
         r = Array(R, n)
         for i=1:n
           r[i] = f(v[i])
         end
         r
       end
mymap (generic function with 1 method)
julia> mymap(x::Int->(x+1)::Int, [1,2,3]) # Correct return type
3-element Array{Int64,1}:
 2
 3
 4

julia> mycall{A,R}(f::Function{(A,),R}, x::A) = f(x)
mycall (generic function with 1 method)
julia> @code_llvm(mycall(x::Int->(x+1)::Int, 0)) # Fast "C" call
define i64 @julia_mycall_42900(%jl_value_t*, i64) {
top:
  call void @llvm.dbg.value(metadata i64 %1, i64 0, metadata !14, metadata !16)
  %2 = getelementptr inbounds %jl_value_t* %0, i64 4, i32 0, !dbg !17
  %3 = bitcast %jl_value_t** %2 to i64 (i64)**, !dbg !17
  %4 = load i64 (i64)** %3, align 8, !dbg !17, !tbaa !19
  %5 = call i64 %4(i64 %1), !dbg !17
  ret i64 %5, !dbg !17
}

About generic functions, there are some big issues I can see, namely : mutability of the "generic signature", hard to avoid introducing a separate type for each function, hard to dispatch on.

As I said before, this is very early POC, probably broken in several ways, and I won't be able to spend much time on it right now. Might be a good basis for discussion however.

carnaval · 2015-02-21T17:43:26Z

Oh by the way it is highly buggy now since we incorrectly generate a C call when there are captured things. The fast way to do this is probably using some kind of trampoline (I believe llvm has support for this) so that we can ignore the difference between a closure and a simple function pointer at callsites.

Add Arg and Ret type parameters to the Function type. Those are determined by annotation at the lambda site, e.g. (a::Arg -> (a+1)::Ret) :: Function{(Arg,),Ret}. Untyped function are ::Function{Tuple,Any}. Inference should type the declarations and call-sites correctly.

JeffBezanson · 2015-02-21T18:58:48Z

Fortunately I've already written down most of my thoughts on this topic here:

https://github.com/JeffBezanson/phdthesis/blob/master/chap4.tex#L604

JeffBezanson · 2015-02-21T19:05:28Z

On inferred return types: the body of a function is an Expr with head :body, and the typ field of that expression has the overall inferred return type of the function. Not that we should necessarily use it for this purpose.

JeffBezanson · 2015-02-21T19:52:43Z

Given the existence of things like AddFun, it's likely we should give all generic functions their own type, so that + and AddFun() are just the same thing.

From there, we will probably have a hierarchy of function types, using the call generic function. One reason call makes sense is that, at a very low level, function calling actually is an overloaded operation: C function pointers with different types need different code generated in order to call them.

I'll sketch 3 nominal function types that we probably want.

(1) C functions

immutable CFunction{R,A} <: Function
    p::Ptr{Void}
end

call{R,A}(f::CFunction{R,A}, x) = ccall(f.p, R, (A,), x)

(2) Nominal arrows (similar to this PR):

immutable Arrow{A,R} <: Function
    f
end

call{A,R}(a::Arrow{A,R}, x::A) = a.f(x)::R

(3) Closures

immutable Closure{F<:Function,E<:Tuple} <: Function
    f::F
    e::E
end

call(c::Closure, args...) = c.f(c, args...)

Then we transform this:

function f(x)
    g(a::Int) = a+x
    g(a::Any) = a*x
    g
end

into this:

_g(env::Closure, a::Int) = a+env.e[1]
_g(env::Closure, a::Any) = a*env.e[1]

function f(x)
    g = Closure(_g, (x,))
    g
end

Of course, that is just one possible formulation. Here generic functions are considered inherently top-level, as closures are implemented on top of them not the other way around. I'm starting to feel this is the right approach, as we have already been doing quite well with mostly top-level generic functions. The problem is that inner functions are slow, and this formulation would solve that.

The tradeoff is that this design sacrifices generic functions with methods from multiple scopes. For example the rather useful idiom of wrapping a method in a let. There's also this hardly-ever-used pattern:

function addmethod(f, t)
    f(x::t) = ...
end

which I would be just as happy to disallow. We probably want generic functions to be mutated only at well-defined points, for example for #265.

In any case I think the crucial design decision is, what are the things inside generic functions that get dispatched to? Typed lambdas as in this PR are one candidate. Currently we use the same nearly-useless Function type that exists outside generic functions, which is probably not the right thing. One question is whether you can pull out a particular method and call it independently, and if so how that will work. This could be used for example to hoist method lookups.

I'd like to get a simpler internal representation of functions out of this. Currently we have this overly elaborate chain Function->MethodTable->Method->Function->LambdaStaticData. Surely that can be collapsed a bit. For example GenericFunction->Method would be nice.

carnaval · 2015-02-21T20:31:18Z

Interesting. Let me see if I understand this correctly :
The only builtin notion of function in the compiler would be "untyped" toplevel generic functions with a blob of (untyped) methods. Their signature never leaks into the type system and the inference process continues (as it does today) to guess types by looking only at constant function invocations.
On top of that we add a hierarchy looking like you described but defined on the julia side. We can then wrap generic functions in, e.g. Arrow types to provide typed guarantees about it.
Don't we still need a bit of magic to avoid checking at every call that Arrow(gf, A, R) held up it's contract ? By which I mean, if I create an Arrow wrapping a specific, compile-time-known, generic function and I known by looking at the signatures that the check are not needed.
Or is there something I'm missing ?

JeffBezanson · 2015-02-21T20:43:32Z

Yes, you have that right.

Since we already generate specialized signatures for functions internally, it's possible we could expose those as CFunction objects. Then when constructing an Arrow{Int,Int}(gf), we could do the method lookup in advance and hence have an Arrow that would be very efficient to call. The f inside the Arrow could get a type parameter.

It would be ideal to be able to decide whether to inline a function argument. For example map(sin,x) should be fully inlined, but something like integrate(x->exp(-x), ...) we might want to just specialize on Float64->Float64 functions. I'm not yet sure how to do that.

carnaval · 2015-02-21T21:38:42Z

Maybe there is value in keeping several function types as the same julia type however. Having multiple types will lead to dynamic dispatch on "call" calls when we are not sure whether we have, e.g., a closure or not.
For example, merging CFunc and "CClosure" using llvm trampolines simplifies code generation at callsites (but requires codegen cooperation, I don't think we could implement that only in julia).
In the same vein, having both function pointers, cstyle & jlcall, in every Function object may be a good thing.
It would be weird if, given two methods of a gf F, one returning an Arrow and the other one a Closure, there was a performance advantage in introducing a useless captured variable to trick the compiler and get a concretely typed Closure output to F(::Any).

JeffBezanson · 2015-02-21T22:06:14Z

I'm not sure. It might not make sense to try to optimize the case where we don't know what kind of function is being called. There will be user-defined types that define call anyway.

timholy · 2015-02-23T14:32:33Z

<usual_conversation_about_mutability_vs_immutability>If closure construction is not fast, there might be some advantages in making them mutable. See examples like JuliaNLSolvers/Optim.jl#102 (comment). This is why in my recent FastAnonymous experiments I gave each environment variable its own field, using the same name. Of course, if there is no overhead to speak of, then immutable seems better.</usual_conversation_about_mutability_vs_immutability>

johnmyleswhite · 2015-02-23T16:04:31Z

After reading the content in your thesis, @JeffBezanson, I'm a little confused about your plan for a higher-order function like map. In your thesis you seem to advocate for writing a specialized implementation of map that does speculative typing, rather than trying to rely on arrow types + the types of all other args to map. Would we end up doing the same sort of thing for every higher-order function as well?

StefanKarpinski · 2015-02-23T16:07:44Z

I think it would be totally ok to have some explicit way to ask for specialization on function arguments – you generally know when you need it and when you don't. Of course, doing the specialization completely automatically would be much slicker, but being able to make map, sort (with custom comparison), filter, etc. fast without too much fuss is the important part.

johnmyleswhite · 2015-02-23T16:12:31Z

I'm confused, @StefanKarpinski: don't you basically always need specialization if we're going to use map to replace vectorized functions?

StefanKarpinski · 2015-02-23T16:20:36Z

Yes, but the implementation of map can request specialization explicitly, instead of Julia figuring out automatically that map needs to be specialized on its function argument. That's what I meant.

JeffBezanson · 2015-02-23T17:45:32Z

@johnmyleswhite higher-order functions vary. map is almost the worst case, since it needs to come up with a single type for the many values it produces. filter is much easier and doesn't have these problems. reduce is also easier since the types naturally get "reduced" along with the values.

You can also push the complexity into the data structure (the "storage strategies" approach), and have an array that changes its representation as values are stored to it. However I doubt this can be made as fast as the hand-crafted map.

Of course one has the option of writing higher-order functions that only accept Arrow types, and callers might have to wrap their function arguments. To be really fancy, I could imagine the compiler inserting Arrow wrappers for functions in some cases, if the function's return type can be inferred exactly.

mdcfrancis · 2015-02-23T18:36:10Z

@JeffBezanson
I was naively assuming that the relationships would be inverted ?

For example -

abstract Arrow{A,R} 
immutable Function <: Arrow{Any,Any}
    expr::Expr
end
immutable Closure{A,R> <: Arrow{A,R}
   expr::Expr
   env::Env
end
Base.env{A,R}( op::Arrow{A,R} ) = Env()
Base.env{A,R}( op::Closure{A,R} ) = op.env
Base.call{A,R}(op::Arrow{A,R}, arg::A) = eval(op.code, arg,env(op))::R

So in user code we would have

f = ( x::String ) ::String-> x + " world"
typeof( f ) == "Arrow{String,String}"

leading to

 f = Function( x ); end

is identical to

f = Arrow{Any,Any}; end

higher order functions such as map would fall out as something like the following

map = Arrow{(Arrow{A,R}, Array{A}...),Array{R}}
      ret = R[]
      ....
end

where in-lining should be possible for all but 'unconstrained closures' - see below.

General question, for closures (since they follow the mutable Scheme model ) how does the compiler evaluate the type of the return type if one can modify the type of the closed variable from other scopes? I assume there would have to be some book keeping around references to the closed variable, is this done today?

JeffBezanson · 2015-02-23T18:51:46Z

Putting Arrow at the top of the function hierarchy forces everything to be classified by argument and return type, but that isn't useful for all kinds of functions. Of course there should be an AbstractArrow that you can subtype if you want. As you can see, if every function must be an Arrow, you are often forced to give up and pick Any as the return type, leading `map` to return an Any array.

mdcfrancis · 2015-02-23T19:22:19Z

To check, the argument you are making is that since you can (in many cases) compute the return type of the function being called it is better to leaving typing to the last instant? I'm not sure that I get the 'all kinds of functions' reference.

Perhaps it is the parametric types which lead me to think this way? Once I have parametric types available I dislike having to throw away the type information for the return type of the function. It feels more natural to carry through the typing to the very end. Perhaps I am misguided in this? For a higher order function (e.g. map) having the types nailed would appear to allow me to avoid a largish class of performance issues?

My mental model has every function classified by argument and return type, where in most cases the compiler is filling in the blanks. If I specify a function from Float64->Float64 to map I'd assume that the map expansion would be as I indicated above with no further type inference required. If I don't pass an array of Float64 to map I would expect it to fail. If I don't specify specific types for my function then there would be two outcomes, I get an array of Any returned (not ideal but not unreasonable) or the compiler is in a position to figure out the types of A and R and returns an array of type R. Where R is a function of supplied array and the function supplied.

JeffBezanson · 2015-02-23T20:38:42Z

You're pretty much right, but you're only paying attention to the easiest case. With my implementation of map (which is already in Base), you can certainly make an Arrow{Float64,Float64} and pass it to map, and it will have all the properties you describe.

The problem is that in general, (1) we care about generic functions most of all, and (2) you currently never need to specify the return type of a function manually. It actually is unreasonable for us to require a type declaration for map to work. It is also unacceptable for the type of array returned by map to depend on types inferred by the compiler, since that means we can't improve the compiler without changing the behavior of people's programs.

For example, sin has 11 methods (if you want a bigger challenge, + has 139). Its type is currently just Function. Despite this, map(sin, x) returns an Array{Float64} given an Array{Float64} argument, and is only 3-4x slower than a custom manually-inlined version:

julia> x=rand(1000,1000);

julia> @time sin(x);   # special-case code
elapsed time: 0.01372651 seconds (7 MB allocated)

julia> @time map(sin,x);
elapsed time: 0.042547229 seconds (38 MB allocated, 2.64% gc time in 2 pauses with 0 full sweep)

julia> typeof(map(sin,x))
Array{Float64,2}

If the compiler gets better, everything will be the same except this will get faster. That's what we want.

JeffBezanson · 2015-02-23T21:20:24Z

how does the compiler evaluate the type of the return type if one can modify the type of the closed variable from other scopes?

Indeed, it's difficult and we often can't get sharp type information in those cases. However with lexical scope you can see all assignments to each variable, so if all those assignments assign the same type things should be ok.

JeffBezanson · 2015-02-23T21:26:09Z

A couple other points:

As a matter of syntax, we could decide to make (x::T)->( ... )::S give an Arrow{T,S}. I think that's up in the air at this point.

You can also choose to write a method map{T,S}(f::Arrow{T,S}, a::Array{T})::Array{S} and it will work just fine. The only downside is you have to manually construct Arrows to pass to it. Some code bases might want to adopt this style though.

vtjnash · 2016-02-09T02:04:46Z

obsoleted by #13412 (and the disappearance of anonymous lambdas from the system)

timholy mentioned this pull request Feb 22, 2015

Another approach for fast closures #10288

Closed

JeffBezanson mentioned this pull request Feb 23, 2015

function types #210

Closed

timholy mentioned this pull request Feb 23, 2015

Inference problem across module boundaries? #10287

Closed

simonster mentioned this pull request Feb 24, 2015

Mapping empty Arrays #10307

Closed

JeffBezanson mentioned this pull request May 25, 2015

Generate c signature when possible #11306

Closed

yuyichao mentioned this pull request May 26, 2015

RFC: Try to generate reusable jlcall wrapper #11439

Open

yuyichao mentioned this pull request Jul 29, 2015

RFC: New functor system using symbol-based specialization #12357

Closed

lindahua mentioned this pull request Jul 29, 2015

add functors SubFun, DivFun, and PowFun #12322

Merged

vtjnash closed this Feb 9, 2016

stevengj mentioned this pull request Nov 3, 2016

crazy idea: dispatch on return type #19206

Closed

KristofferC deleted the ob/fty branch June 4, 2018 08:32

KristofferC restored the ob/fty branch June 4, 2018 08:32

DilumAluthge deleted the ob/fty branch March 25, 2021 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards typed lambdas #10269

Towards typed lambdas #10269

carnaval commented Feb 21, 2015

carnaval commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

carnaval commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

carnaval commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

timholy commented Feb 23, 2015

johnmyleswhite commented Feb 23, 2015

StefanKarpinski commented Feb 23, 2015

johnmyleswhite commented Feb 23, 2015

StefanKarpinski commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

mdcfrancis commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015 via email

mdcfrancis commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

vtjnash commented Feb 9, 2016

Towards typed lambdas #10269

Towards typed lambdas #10269

Conversation

carnaval commented Feb 21, 2015

carnaval commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

carnaval commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

carnaval commented Feb 21, 2015

JeffBezanson commented Feb 21, 2015

timholy commented Feb 23, 2015

johnmyleswhite commented Feb 23, 2015

StefanKarpinski commented Feb 23, 2015

johnmyleswhite commented Feb 23, 2015

StefanKarpinski commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

mdcfrancis commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015 via email

mdcfrancis commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

JeffBezanson commented Feb 23, 2015

vtjnash commented Feb 9, 2016