New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Nullable lifting infrastructure #18758

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
@davidagold
Contributor

davidagold commented Oct 1, 2016

This PR introduces a generic lifting framework based on the "higher-order lifting" approach. Once this is rebased on top of #18484 it should allow for the following:

julia> g(x::Int) = x
g (generic function with 1 method)

julia> _g = lift(g)
Lifted{#g}(g,Dict{Tuple{Vararg{DataType,N}},DataType}())

julia> _g(Nullable(1))
Nullable{Int64}(1)

julia> _g(Nullable{Int}())
Nullable{Int64}()

julia> lift(+, Int, Nullable(1), 2)
Nullable{Int64}(3)

julia> lift(+, Int, Nullable(), 2)
Nullable{Int64}()

That is, lift(f::F) returns a Lifted{F}, which, when called on arguments xs..., lowers to lift(f, U, xs...), where U is chosen by type inference. We include the return type parameter U as an argument to lift for cases when U is invariant over many applications of lift, e.g. when mapping some f over a tightly typed NullableArray. Having a Lifted type will allow us to dispatch such higher-order functions on whether or not an f is lifted. This in turn will allow us to define, say,

map{F}(f::Lifted{F}, X::NullableArray)

in a way that takes advantage of the aforementioned invariance.

This PR also implements three-valued logic semantics for lifted & and |:

julia> _and = lift(&)
Lifted{Base.#&}(&,Dict{Tuple{Vararg{DataType,N}},DataType}())

julia> _and(Nullable{Bool}(), Nullable(false))
Nullable{Bool}(false)

This PR could perhaps use some fine-tuning with respect to the use of splatting and whether or not to @inline the lift(f, U, xs...) definitions.

cc: @johnmyleswhite @nalimilan @quinnj @davidanthoff @JeffBezanson @TotalVerb @vchuravy

EDIT: Tests should pass once this is rebased.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Oct 1, 2016

I would honestly rather this be a method of broadcast. Conceptually, it makes a lot of sense.

@davidagold

This comment has been minimized.

Contributor

davidagold commented Oct 1, 2016

I would honestly rather this be a method of broadcast.

@TotalVerb I'm not opposed to this, as long as there is then a method broadcast(f, U::DataType, x), since having the return type parameter argument U in lift(f, U, x) is very handy when mapping over a NullableArray (really, any array with eltype Nullable{T}). I don't think we want to end up iterating over some such container and calculating Core.Inference.return_type(f, Tuple{T}) at each iteration. Having a Lifted wrapper type would still allow for handy dispatch.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Oct 1, 2016

I suppose the issue is that there are two senses of broadcasting here: broadcasting over the nullables themselves (the lifting) and broadcasting the resulting operation over the array. It may be difficult to combine those into one operation, especially one as semantically dense as broadcast (which is quite difficult to understand already).

I really don't like the type "annotations" here. Couldn't inference be applied once, with Core.Inference.return_type(f, Tuple{eltype(xs).parameters[1]})? And even if applied to each element of the array, that should be a compile-time cost, and thus in reality only applied once for the program's duration.

@davidagold

This comment has been minimized.

Contributor

davidagold commented Oct 1, 2016

That's where the Lifted wrapper may come in handy:

function broadcast{F}(_f::Lifted{F}, X, Y)
    ...
    for (x, y) in broadcasted_indices
        res[i] = broadcast(_f.f, U, x, y)
    end
    res
end
@davidagold

This comment has been minimized.

Contributor

davidagold commented Oct 1, 2016

I really don't like the type "annotations" here. Couldn't inference be applied once, with Core.Inference.return_type(f, Tuple{eltype(xs).parameters[1]})?

Would you please elaborate? I'm not sure I understand.

And even if applied to each element of the array, that should be a compile-time cost, and thus in reality only applied once for the program's duration.

We'll need to get somebody who's familiar with these things on the line. My naive thinking was that calling Core.Inference.return_type would run type inference each time it's called. I suppose this all depends on whether or not the results of type inference are cached for each argument signature along with the compiled method.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Oct 1, 2016

For lifted(foo), I dislike the implementation using a cache. Consider this:


A naive implementation of lifted(foo) could be like this:

lifted(foo) = (x, y) -> isnull(x) || isnull(y) ? NULL : unsafe_lift(foo, x, y)

where NULL is Nullable{Union{}}() and unsafe_lift is a function that gets unsafely the values of x and y and foos them.

The issue with this function is that it is type-unstable, as it could either return Nullable{Union{}} or Nullable{T}, where T is the actual type of foo(x, y). Let's assume for a minute that it's safe to compute unsafe_lift(foo, x, y) always, again for simplicity. This is obviously not true, as many useful Nullable types have values that are not safe to simply get. But for sake of argument, I would argue the following is a simple and good implementation:

stable_ifelse{T,U}(b::Bool, x::Nullable{T}, y::Nullable{U}) =
    ifelse(b, Nullable{Union{T,U}}(x), Nullable{Union{T,U}}(y))
lifted(foo) = (x, y) -> stable_ifelse(isnull(x) || isnull(y), NULL, unsafe_lift(foo, x, y))

This works for certain values, and has good generated code as far as I can tell. It's based on a combination of promoting distinct nullable types to a stable one. Even better, there is no inference-dependent behaviour at all here.


To me, it's clear that for operations and types that will support it (null_safe_op as it's called), the above code is superior. Now to deal with the tricky case.

Now, for safety, we need to turn this select into a real branch. But we can't make a stable_if like we could with stable_ifelse, without using Core.Inference.return_type. Let's consider two distinct options:

  • Give up type stability of the function in this case. We must try to minimize inference-dependent behaviour, and Nullable{Union{}} really isn't so bad.
  • Use Core.Inference.return_type, but in a sensible way.

For the first option, the implementation is obvious.

For the second, I prefer a solution similar to what map and comprehensions do:

  • if there is a value, use the actual type, and do not try to infer anything. The standard example:
julia> f(x) = rand(Bool) ? 1 : 1.0
f (generic function with 1 method)

julia> map(f, [1])
1-element Array{Int64,1}:
 1

julia> map(f, [1])
1-element Array{Float64,1}:
 1.0
  • if there is no value, try to infer the type and use that.
julia> map(x -> 1, Int[])
0-element Array{Int64,1}

julia> map(f, Int[])
0-element Array{Union{Float64,Int64},1}

So to be consistent, we could do:

lifted2(foo) = (x, y) -> isnull(x) || isnull(y) ?
    Nullable{Core.Inference.return_type(foo, Tuple{typeof(x).parameters[1], typeof(y).parameters[1]})}() :
    unsafe_lift(foo, x, y)

But I would actually argue against consistency here, because if we are going to use the actual type in the real case, then we should avoid non-concrete parameters in Nullable. By that I mean that Nullable{Union{Float64,Int64}}() is of little use, since the inferred type cannot be stable anyway. Instead, I think it would be best to use the following rule:

  • if there is no value, try to infer the type, and use that if and only if it's concrete; otherwise use Union{}

Which we can implement as:

Base.@pure inferred_concrete_return_type{T,U}(f,::Type{T},::Type{U}) =
    let X = Core.Inference.return_type(f,Tuple{T,U})
        isleaftype(X) ? X : Union{}
    end

lifted2(foo) = function{T,U}(x::Nullable{T}, y::Nullable{U}) -> isnull(x) || isnull(y) ?
    Nullable{inferred_concrete_return_type(foo, Tuple{T,U})}() :
    unsafe_lift(foo, x, y)

As-is, this will not be type stable. However, I think it could be manually made type stable through inference special-casing. This function has some degree of utility outside of nullables, so a case could be made for such a special case.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Oct 1, 2016

@davidagold Inference being run twice is not a concern. However, the result of inference not actually making the function type stable is indeed a concern. If we can't make the function type stable, it's obviously better just to return NULL. I think that can be made faster with some optimizations in the compiler.

@davidanthoff

This comment has been minimized.

Contributor

davidanthoff commented Oct 2, 2016

Why does this have to be in base, couldn't this stay in a package?

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Oct 2, 2016

You mean the entire nullable infrastructure, or just the lifting part?

@davidanthoff

This comment has been minimized.

Contributor

davidanthoff commented Oct 2, 2016

Just the stuff in this PR. It is not really clear to me where this would be used. I guess in NullableArrays, but couldn't it be in that package in that case?

isnull(x),
ifelse(
isnull(y),
Nullable{Bool}(),

This comment has been minimized.

@tkelman

tkelman Oct 2, 2016

Contributor

this uses a lot of vertical space, would be better with more than a single token per line

This comment has been minimized.

@nalimilan

nalimilan Oct 3, 2016

Contributor

Since Bool is safe to evaluate even when missing, you could also use the two-argument form of Nullable to make this more compact.

@andyferris

This comment has been minimized.

Contributor

andyferris commented Oct 3, 2016

Is lift too generic of a name to put into base like this? I see the meaning is clear in the context of nullables, but why not use a name which is clear without that context? (e.g. liftnull, and LiftNulls, which I admit is slightly longer to write...)

@kshyatt kshyatt added the missing data label Oct 3, 2016

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Oct 3, 2016

Unless we have another potential use for the term lift, I would keep it that way. Though we could probably start in a package, together with lifted operators. Anyway we'll need that for 0.5 support.

if isnull(x)
return Nullable{U}()
else
return Nullable{U}(f(unsafe_get(x)))

This comment has been minimized.

@nalimilan

nalimilan Oct 3, 2016

Contributor

Not really related, but I wonder whether the compiler isn't actually smart enough to generate the same code when using get. Is unsafe_get really needed except in very specific cases?

This comment has been minimized.

@davidagold

davidagold Oct 4, 2016

Contributor

I use unsafe_get here because it is generic over Nullable and non-Nullable types, whereas get(x) for unqualified x is not defined.

This comment has been minimized.

@nalimilan

nalimilan Oct 4, 2016

Contributor

Got it. That difference is a bit weird though. Maybe we should move to unwrap and have it work for any scalar.

@ViralBShah

This comment has been minimized.

Member

ViralBShah commented Oct 4, 2016

lift is used in Reactive.jl too. Cc @shashi

@ViralBShah

This comment has been minimized.

Member

ViralBShah commented Oct 4, 2016

I still think this is a good use of the word lift.

@andyferris

This comment has been minimized.

Contributor

andyferris commented Oct 4, 2016

Some time ago I heard of talk of having "automatic" lifting for elementary operations like + and so on such that most generic functions would work correctly with Nullables - is this also still planned?

If so, is this PR is to allow us to address dispatch issues mostly? Or is it instead of automatic lifting entirely?

@shashi

This comment has been minimized.

Contributor

shashi commented Oct 4, 2016

@ViralBShah in Reactive lift got renamed to map a while ago.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Oct 4, 2016

Some time ago I heard of talk of having "automatic" lifting for elementary operations like + and so on such that most generic functions would work correctly with Nullables - is this also still planned?

If so, is this PR is to allow us to address dispatch issues mostly? Or is it instead of automatic lifting entirely?

@andyferris AFAIK there are no plans for automatic lifting of all functions. But NullableArrays.jl currently includes lifted operators, and I plan to open a PR again to move them into Base (following up #16988).

@martinholters

This comment has been minimized.

Member

martinholters commented Oct 4, 2016

I know close to nothing of category theory, but it's still enough that I would expect lift(f) to yield a function that can be applied to whatever container-like thing, be it a Nullable or an Array. (And lift(lift(f)) to an Array of Nullables.) From that it would follow that Lifted{f}(xs...) should forward to broadcast(f, xs...) and broadcast should take care of unwrapping Nullables as suitable. (And then one can ask oneself whether lift is needed at all.)

BUT this gets extremely fishy if different container-likes get merged, like broadcast(+, Nullable(3), [1, 2]). Should this be Nullable([4, 5]) or [Nullable(4), Nullable(5)]? Or forbidden? The pragmatic way would probably be to have the infrastructure for Nullables stay separate and to reflect it in the name - e.g. liftnull. More theoretically pleasing, but not necessarily more useful, one could require the above to be written as broadcast(lift(+), Nullable(3), Nullable([1, 2])) or broadcast(lift(+), [Nullable(3)], [1, 2]), depending on the desired result.

@ViralBShah

This comment has been minimized.

Member

ViralBShah commented Oct 4, 2016

I would think that for the case of arrays, what I would want is always Nullable([4, 5]).

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Oct 4, 2016

Should this be Nullable([4, 5]) or [Nullable(4), Nullable(5)]? Or forbidden? The pragmatic way would probably be to have the infrastructure for Nullables stay separate and to reflect it in the name - e.g. liftnull.

I think that's why @davidagold proposes lift here instead of doing lifting using broadcast as in #16961. The bikeshedding on the name really is a detail.

@davidagold

This comment has been minimized.

Contributor

davidagold commented Oct 4, 2016

@TotalVerb Thank you for your thorough explanation. I really appreciate your taking the time to go through those thoughts, and I'd be interested to hear more about the compiler/inference work that would complement the present proposal.

@andyferris I'm not exactly sure anymore what precisely "automatic lifting" means. In this case it sounds like it means that f(x::Nullable{T}) "just works" given a method f(x::T). This PR won't provide for that behavior. Short of work in the compiler, I think the way to provide such behavior is actually to implement f(x::Nullable{T}).

But we could also consider "automatic lifting" restricted to a particular context, such as a macro invocation: @blah f(x) for x::Nullable{T}. Using such a macro it is possible to replace f(x) with lift(f, x). Now, if the entire point of the macro is to replace f(x) with lift(f, x), then one would hardly call this "automatic lifting". But if the macro has another purpose -- say, it is a querying macro that does other syntactic transformations, such as interpreting attributes (e.g. sepal_length) as column references (e.g. df[:sepal_length]) -- then one can throw in the replacement of f(x) with lift(f, x) for free. In that sense, we could think about "automatic lifting" as being a feature of querying macros, but not applicable to unadorned expressions f(x).

Given the ambiguity of the phrase "automatic lifting", I've found it more helpful to think in terms of "lifting by some method or other". So, the first approach, in which we define the method f(x::Nullable{T}), might be called something like "method extension lifting". The second approach -- the one implemented in this PR -- might be called something like "higher-order lifting". Then we could note that method extension lifting gives "proper" behavior to f(x::Nullable{T}) in all contexts, whereas higher-order lifting gives "proper" behavior to f(x::Nullable{T}) only in the context of a supporting macro.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Oct 4, 2016

@TotalVerb's solution sounds good. Though, if returning Union{} in case of type instability requires some work in inference code, I'd rather start with the easier solution of returning the actual inferred union, and improve things later.

@davidanthoff

This comment has been minimized.

Contributor

davidanthoff commented Oct 5, 2016

I'd like to raise the question again whether this has to be in base at this point, or whether this could live in a package. Given that this a) doesn't add additional methods to base functions for base types and b) is not used by anything else in base, it seems to me that this could well live in a package, mature there and once we have a better sense whether this is a good strategy or not it could still be put into base at a later point.

@davidagold

This comment has been minimized.

Contributor

davidagold commented Oct 5, 2016

@davidanthoff I think you're probably right that this can mature in a package for now. But having this PR against Base has induced some very helpful discussion, so maybe it's worth keeping open and updating periodically.

@TotalVerb I'm sorry I didn't believe you re: inference being run twice:

julia> function lift(f, x)
           U = Core.Inference.return_type(f, Tuple{eltype(typeof(x))})
           if isnull(x)
               return Nullable{U}()
           else
               return Nullable(f(unsafe_get(x)))
           end
       end
lift (generic function with 1 method)

julia> f(x) = x
f (generic function with 1 method)

julia> @code_warntype lift(f, 1)
Variables:
  #self#::#lift
  f::#f
  x::Int64
  U::Type{Int64}

Body:
  begin 
      U::Type{Int64} = $(QuoteNode(Int64)) # line 3:
      unless $(QuoteNode(false)) goto 6 # line 4:
      return $(Expr(:new, Nullable{Int64}, false))
      6:  # line 6:
      return $(Expr(:new, Nullable{Int64}, true, :(x)))
  end::Nullable{Int64}

That's really neat. Does this mean that Core.Inference.return_type is lowered differently than a standard function call?

"""
immutable Lifted{F}
f::F
cache::Dict{Tuple{Vararg{DataType}}, DataType}

This comment has been minimized.

@JeffBezanson

JeffBezanson Oct 20, 2016

Member

This cache should be removed. The compiler already implements it.

NOTE: There are two exceptions to the above: `lift(|, Bool, x, y)` and
`lift(&, Bool, x, y)`. These methods both follow three-valued logic semantics.
"""
function lift(f, U::DataType, x)

This comment has been minimized.

@JeffBezanson

JeffBezanson Oct 20, 2016

Member

Should probably be Type instead of DataType.

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Oct 20, 2016

Does this mean that Core.Inference.return_type is lowered differently than a standard function call?

No, it just means that inference can infer its return value.

@davidagold

This comment has been minimized.

Contributor

davidagold commented Oct 20, 2016

Given the comments above, it seemed appropriate to do return typing inside lift. So, the method signature is now just, e.g. lift(f, x).

Should lift(f) return a lambda or a Lifted wrapper?

@KristofferC

This comment has been minimized.

Contributor

KristofferC commented Jan 7, 2018

Nullable is no longer in Base

@KristofferC KristofferC closed this Jan 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment