WIP: Nullable lifting infrastructure #18758

davidagold · 2016-10-01T22:12:34Z

This PR introduces a generic lifting framework based on the "higher-order lifting" approach. Once this is rebased on top of #18484 it should allow for the following:

julia> g(x::Int) = x
g (generic function with 1 method)

julia> _g = lift(g)
Lifted{#g}(g,Dict{Tuple{Vararg{DataType,N}},DataType}())

julia> _g(Nullable(1))
Nullable{Int64}(1)

julia> _g(Nullable{Int}())
Nullable{Int64}()

julia> lift(+, Int, Nullable(1), 2)
Nullable{Int64}(3)

julia> lift(+, Int, Nullable(), 2)
Nullable{Int64}()

That is, lift(f::F) returns a Lifted{F}, which, when called on arguments xs..., lowers to lift(f, U, xs...), where U is chosen by type inference. We include the return type parameter U as an argument to lift for cases when U is invariant over many applications of lift, e.g. when mapping some f over a tightly typed NullableArray. Having a Lifted type will allow us to dispatch such higher-order functions on whether or not an f is lifted. This in turn will allow us to define, say,

map{F}(f::Lifted{F}, X::NullableArray)

in a way that takes advantage of the aforementioned invariance.

This PR also implements three-valued logic semantics for lifted & and |:

julia> _and = lift(&)
Lifted{Base.#&}(&,Dict{Tuple{Vararg{DataType,N}},DataType}())

julia> _and(Nullable{Bool}(), Nullable(false))
Nullable{Bool}(false)

This PR could perhaps use some fine-tuning with respect to the use of splatting and whether or not to @inline the lift(f, U, xs...) definitions.

cc: @johnmyleswhite @nalimilan @quinnj @davidanthoff @JeffBezanson @TotalVerb @vchuravy

EDIT: Tests should pass once this is rebased.

TotalVerb · 2016-10-01T22:13:45Z

I would honestly rather this be a method of broadcast. Conceptually, it makes a lot of sense.

davidagold · 2016-10-01T22:21:01Z

I would honestly rather this be a method of broadcast.

@TotalVerb I'm not opposed to this, as long as there is then a method broadcast(f, U::DataType, x), since having the return type parameter argument U in lift(f, U, x) is very handy when mapping over a NullableArray (really, any array with eltype Nullable{T}). I don't think we want to end up iterating over some such container and calculating Core.Inference.return_type(f, Tuple{T}) at each iteration. Having a Lifted wrapper type would still allow for handy dispatch.

TotalVerb · 2016-10-01T22:27:31Z

I suppose the issue is that there are two senses of broadcasting here: broadcasting over the nullables themselves (the lifting) and broadcasting the resulting operation over the array. It may be difficult to combine those into one operation, especially one as semantically dense as broadcast (which is quite difficult to understand already).

I really don't like the type "annotations" here. Couldn't inference be applied once, with Core.Inference.return_type(f, Tuple{eltype(xs).parameters[1]})? And even if applied to each element of the array, that should be a compile-time cost, and thus in reality only applied once for the program's duration.

davidagold · 2016-10-01T22:31:31Z

That's where the Lifted wrapper may come in handy:

function broadcast{F}(_f::Lifted{F}, X, Y)
    ...
    for (x, y) in broadcasted_indices
        res[i] = broadcast(_f.f, U, x, y)
    end
    res
end

davidagold · 2016-10-01T22:59:54Z

I really don't like the type "annotations" here. Couldn't inference be applied once, with Core.Inference.return_type(f, Tuple{eltype(xs).parameters[1]})?

Would you please elaborate? I'm not sure I understand.

And even if applied to each element of the array, that should be a compile-time cost, and thus in reality only applied once for the program's duration.

We'll need to get somebody who's familiar with these things on the line. My naive thinking was that calling Core.Inference.return_type would run type inference each time it's called. I suppose this all depends on whether or not the results of type inference are cached for each argument signature along with the compiled method.

TotalVerb · 2016-10-01T23:40:36Z

For lifted(foo), I dislike the implementation using a cache. Consider this:

A naive implementation of lifted(foo) could be like this:

lifted(foo) = (x, y) -> isnull(x) || isnull(y) ? NULL : unsafe_lift(foo, x, y)

where NULL is Nullable{Union{}}() and unsafe_lift is a function that gets unsafely the values of x and y and foos them.

The issue with this function is that it is type-unstable, as it could either return Nullable{Union{}} or Nullable{T}, where T is the actual type of foo(x, y). Let's assume for a minute that it's safe to compute unsafe_lift(foo, x, y) always, again for simplicity. This is obviously not true, as many useful Nullable types have values that are not safe to simply get. But for sake of argument, I would argue the following is a simple and good implementation:

stable_ifelse{T,U}(b::Bool, x::Nullable{T}, y::Nullable{U}) =
    ifelse(b, Nullable{Union{T,U}}(x), Nullable{Union{T,U}}(y))
lifted(foo) = (x, y) -> stable_ifelse(isnull(x) || isnull(y), NULL, unsafe_lift(foo, x, y))

This works for certain values, and has good generated code as far as I can tell. It's based on a combination of promoting distinct nullable types to a stable one. Even better, there is no inference-dependent behaviour at all here.

To me, it's clear that for operations and types that will support it (null_safe_op as it's called), the above code is superior. Now to deal with the tricky case.

Now, for safety, we need to turn this select into a real branch. But we can't make a stable_if like we could with stable_ifelse, without using Core.Inference.return_type. Let's consider two distinct options:

Give up type stability of the function in this case. We must try to minimize inference-dependent behaviour, and Nullable{Union{}} really isn't so bad.
Use Core.Inference.return_type, but in a sensible way.

For the first option, the implementation is obvious.

For the second, I prefer a solution similar to what map and comprehensions do:

if there is a value, use the actual type, and do not try to infer anything. The standard example:

julia> f(x) = rand(Bool) ? 1 : 1.0
f (generic function with 1 method)

julia> map(f, [1])
1-element Array{Int64,1}:
 1

julia> map(f, [1])
1-element Array{Float64,1}:
 1.0

if there is no value, try to infer the type and use that.

julia> map(x -> 1, Int[])
0-element Array{Int64,1}

julia> map(f, Int[])
0-element Array{Union{Float64,Int64},1}

So to be consistent, we could do:

lifted2(foo) = (x, y) -> isnull(x) || isnull(y) ?
    Nullable{Core.Inference.return_type(foo, Tuple{typeof(x).parameters[1], typeof(y).parameters[1]})}() :
    unsafe_lift(foo, x, y)

But I would actually argue against consistency here, because if we are going to use the actual type in the real case, then we should avoid non-concrete parameters in Nullable. By that I mean that Nullable{Union{Float64,Int64}}() is of little use, since the inferred type cannot be stable anyway. Instead, I think it would be best to use the following rule:

if there is no value, try to infer the type, and use that if and only if it's concrete; otherwise use Union{}

Which we can implement as:

Base.@pure inferred_concrete_return_type{T,U}(f,::Type{T},::Type{U}) =
    let X = Core.Inference.return_type(f,Tuple{T,U})
        isleaftype(X) ? X : Union{}
    end

lifted2(foo) = function{T,U}(x::Nullable{T}, y::Nullable{U}) -> isnull(x) || isnull(y) ?
    Nullable{inferred_concrete_return_type(foo, Tuple{T,U})}() :
    unsafe_lift(foo, x, y)

As-is, this will not be type stable. However, I think it could be manually made type stable through inference special-casing. This function has some degree of utility outside of nullables, so a case could be made for such a special case.

TotalVerb · 2016-10-01T23:50:10Z

@davidagold Inference being run twice is not a concern. However, the result of inference not actually making the function type stable is indeed a concern. If we can't make the function type stable, it's obviously better just to return NULL. I think that can be made faster with some optimizations in the compiler.

davidanthoff · 2016-10-02T05:04:01Z

Why does this have to be in base, couldn't this stay in a package?

TotalVerb · 2016-10-02T05:05:22Z

You mean the entire nullable infrastructure, or just the lifting part?

davidanthoff · 2016-10-02T05:15:43Z

Just the stuff in this PR. It is not really clear to me where this would be used. I guess in NullableArrays, but couldn't it be in that package in that case?

tkelman · 2016-10-02T06:31:28Z

base/nullable.jl

+        isnull(x),
+        ifelse(
+            isnull(y),
+            Nullable{Bool}(),


this uses a lot of vertical space, would be better with more than a single token per line

Since Bool is safe to evaluate even when missing, you could also use the two-argument form of Nullable to make this more compact.

andyferris · 2016-10-03T04:07:16Z

Is lift too generic of a name to put into base like this? I see the meaning is clear in the context of nullables, but why not use a name which is clear without that context? (e.g. liftnull, and LiftNulls, which I admit is slightly longer to write...)

nalimilan · 2016-10-03T20:32:15Z

Unless we have another potential use for the term lift, I would keep it that way. Though we could probably start in a package, together with lifted operators. Anyway we'll need that for 0.5 support.

nalimilan · 2016-10-03T20:36:26Z

base/nullable.jl

+    if isnull(x)
+        return Nullable{U}()
+    else
+        return Nullable{U}(f(unsafe_get(x)))


Not really related, but I wonder whether the compiler isn't actually smart enough to generate the same code when using get. Is unsafe_get really needed except in very specific cases?

I use unsafe_get here because it is generic over Nullable and non-Nullable types, whereas get(x) for unqualified x is not defined.

Got it. That difference is a bit weird though. Maybe we should move to unwrap and have it work for any scalar.

ViralBShah · 2016-10-04T03:31:13Z

lift is used in Reactive.jl too. Cc @shashi

ViralBShah · 2016-10-04T03:31:47Z

I still think this is a good use of the word lift.

andyferris · 2016-10-04T03:54:38Z

Some time ago I heard of talk of having "automatic" lifting for elementary operations like + and so on such that most generic functions would work correctly with Nullables - is this also still planned?

If so, is this PR is to allow us to address dispatch issues mostly? Or is it instead of automatic lifting entirely?

shashi · 2016-10-04T05:17:26Z

@ViralBShah in Reactive lift got renamed to map a while ago.

nalimilan · 2016-10-04T09:43:25Z

Some time ago I heard of talk of having "automatic" lifting for elementary operations like + and so on such that most generic functions would work correctly with Nullables - is this also still planned?

If so, is this PR is to allow us to address dispatch issues mostly? Or is it instead of automatic lifting entirely?

@andyferris AFAIK there are no plans for automatic lifting of all functions. But NullableArrays.jl currently includes lifted operators, and I plan to open a PR again to move them into Base (following up #16988).

martinholters · 2016-10-04T09:44:05Z

I know close to nothing of category theory, but it's still enough that I would expect lift(f) to yield a function that can be applied to whatever container-like thing, be it a Nullable or an Array. (And lift(lift(f)) to an Array of Nullables.) From that it would follow that Lifted{f}(xs...) should forward to broadcast(f, xs...) and broadcast should take care of unwrapping Nullables as suitable. (And then one can ask oneself whether lift is needed at all.)

BUT this gets extremely fishy if different container-likes get merged, like broadcast(+, Nullable(3), [1, 2]). Should this be Nullable([4, 5]) or [Nullable(4), Nullable(5)]? Or forbidden? The pragmatic way would probably be to have the infrastructure for Nullables stay separate and to reflect it in the name - e.g. liftnull. More theoretically pleasing, but not necessarily more useful, one could require the above to be written as broadcast(lift(+), Nullable(3), Nullable([1, 2])) or broadcast(lift(+), [Nullable(3)], [1, 2]), depending on the desired result.

ViralBShah · 2016-10-04T10:28:44Z

I would think that for the case of arrays, what I would want is always Nullable([4, 5]).

nalimilan · 2016-10-04T11:49:21Z

Should this be Nullable([4, 5]) or [Nullable(4), Nullable(5)]? Or forbidden? The pragmatic way would probably be to have the infrastructure for Nullables stay separate and to reflect it in the name - e.g. liftnull.

I think that's why @davidagold proposes lift here instead of doing lifting using broadcast as in #16961. The bikeshedding on the name really is a detail.

davidagold · 2016-10-04T17:14:31Z

@TotalVerb Thank you for your thorough explanation. I really appreciate your taking the time to go through those thoughts, and I'd be interested to hear more about the compiler/inference work that would complement the present proposal.

@andyferris I'm not exactly sure anymore what precisely "automatic lifting" means. In this case it sounds like it means that f(x::Nullable{T}) "just works" given a method f(x::T). This PR won't provide for that behavior. Short of work in the compiler, I think the way to provide such behavior is actually to implement f(x::Nullable{T}).

But we could also consider "automatic lifting" restricted to a particular context, such as a macro invocation: @blah f(x) for x::Nullable{T}. Using such a macro it is possible to replace f(x) with lift(f, x). Now, if the entire point of the macro is to replace f(x) with lift(f, x), then one would hardly call this "automatic lifting". But if the macro has another purpose -- say, it is a querying macro that does other syntactic transformations, such as interpreting attributes (e.g. sepal_length) as column references (e.g. df[:sepal_length]) -- then one can throw in the replacement of f(x) with lift(f, x) for free. In that sense, we could think about "automatic lifting" as being a feature of querying macros, but not applicable to unadorned expressions f(x).

Given the ambiguity of the phrase "automatic lifting", I've found it more helpful to think in terms of "lifting by some method or other". So, the first approach, in which we define the method f(x::Nullable{T}), might be called something like "method extension lifting". The second approach -- the one implemented in this PR -- might be called something like "higher-order lifting". Then we could note that method extension lifting gives "proper" behavior to f(x::Nullable{T}) in all contexts, whereas higher-order lifting gives "proper" behavior to f(x::Nullable{T}) only in the context of a supporting macro.

nalimilan · 2016-10-04T18:02:38Z

@TotalVerb's solution sounds good. Though, if returning Union{} in case of type instability requires some work in inference code, I'd rather start with the easier solution of returning the actual inferred union, and improve things later.

davidanthoff · 2016-10-05T05:01:03Z

I'd like to raise the question again whether this has to be in base at this point, or whether this could live in a package. Given that this a) doesn't add additional methods to base functions for base types and b) is not used by anything else in base, it seems to me that this could well live in a package, mature there and once we have a better sense whether this is a good strategy or not it could still be put into base at a later point.

davidagold · 2016-10-05T15:15:14Z

@davidanthoff I think you're probably right that this can mature in a package for now. But having this PR against Base has induced some very helpful discussion, so maybe it's worth keeping open and updating periodically.

@TotalVerb I'm sorry I didn't believe you re: inference being run twice:

julia> function lift(f, x)
           U = Core.Inference.return_type(f, Tuple{eltype(typeof(x))})
           if isnull(x)
               return Nullable{U}()
           else
               return Nullable(f(unsafe_get(x)))
           end
       end
lift (generic function with 1 method)

julia> f(x) = x
f (generic function with 1 method)

julia> @code_warntype lift(f, 1)
Variables:
  #self#::#lift
  f::#f
  x::Int64
  U::Type{Int64}

Body:
  begin 
      U::Type{Int64} = $(QuoteNode(Int64)) # line 3:
      unless $(QuoteNode(false)) goto 6 # line 4:
      return $(Expr(:new, Nullable{Int64}, false))
      6:  # line 6:
      return $(Expr(:new, Nullable{Int64}, true, :(x)))
  end::Nullable{Int64}

That's really neat. Does this mean that Core.Inference.return_type is lowered differently than a standard function call?

JeffBezanson · 2016-10-20T15:01:02Z

base/nullable.jl

+"""
+immutable Lifted{F}
+    f::F
+    cache::Dict{Tuple{Vararg{DataType}}, DataType}


This cache should be removed. The compiler already implements it.

JeffBezanson · 2016-10-20T15:01:21Z

base/nullable.jl

+NOTE: There are two exceptions to the above: `lift(|, Bool, x, y)` and
+`lift(&, Bool, x, y)`. These methods both follow three-valued logic semantics.
+"""
+function lift(f, U::DataType, x)


Should probably be Type instead of DataType.

JeffBezanson · 2016-10-20T15:03:59Z

Does this mean that Core.Inference.return_type is lowered differently than a standard function call?

No, it just means that inference can infer its return value.

davidagold · 2016-10-20T16:56:34Z

Given the comments above, it seemed appropriate to do return typing inside lift. So, the method signature is now just, e.g. lift(f, x).

Should lift(f) return a lambda or a Lifted wrapper?

KristofferC · 2018-01-07T20:04:59Z

Nullable is no longer in Base

davidagold force-pushed the dg/lift branch from afc63cd to 469fa74 Compare October 1, 2016 22:14

davidagold force-pushed the dg/lift branch from 469fa74 to 47b0a97 Compare October 1, 2016 22:24

tkelman reviewed Oct 2, 2016

View reviewed changes

kshyatt added the domain:missing data Base.missing and related functionality label Oct 3, 2016

nalimilan reviewed Oct 3, 2016

View reviewed changes

nalimilan mentioned this pull request Oct 7, 2016

Port to NullableArrays and CategoricalArrays JuliaData/DataFrames.jl#1008

Merged

nalimilan mentioned this pull request Oct 19, 2016

Implement more operators on Nullable with lifting semantics #19034

Closed

JeffBezanson requested changes Oct 20, 2016

View reviewed changes

davidagold force-pushed the dg/lift branch from 47b0a97 to beaf29c Compare October 20, 2016 16:50

Implement higher-order lifting

beaf29c

nalimilan mentioned this pull request Nov 11, 2016

Rewrite broadcast() and map() based on lift() JuliaStats/NullableArrays.jl#166

Merged

felipenoris mentioned this pull request Feb 9, 2017

isnull.(x) JuliaStats/NullableArrays.jl#180

Open

KristofferC closed this Jan 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Nullable lifting infrastructure #18758

WIP: Nullable lifting infrastructure #18758

davidagold commented Oct 1, 2016 •

edited

Loading

TotalVerb commented Oct 1, 2016

davidagold commented Oct 1, 2016

TotalVerb commented Oct 1, 2016 •

edited

Loading

davidagold commented Oct 1, 2016

davidagold commented Oct 1, 2016

TotalVerb commented Oct 1, 2016

TotalVerb commented Oct 1, 2016

davidanthoff commented Oct 2, 2016

TotalVerb commented Oct 2, 2016

davidanthoff commented Oct 2, 2016

tkelman Oct 2, 2016

nalimilan Oct 3, 2016

andyferris commented Oct 3, 2016

nalimilan commented Oct 3, 2016

nalimilan Oct 3, 2016

davidagold Oct 4, 2016

nalimilan Oct 4, 2016

ViralBShah commented Oct 4, 2016

ViralBShah commented Oct 4, 2016

andyferris commented Oct 4, 2016

shashi commented Oct 4, 2016

nalimilan commented Oct 4, 2016

martinholters commented Oct 4, 2016

ViralBShah commented Oct 4, 2016

nalimilan commented Oct 4, 2016

davidagold commented Oct 4, 2016

nalimilan commented Oct 4, 2016

davidanthoff commented Oct 5, 2016

davidagold commented Oct 5, 2016

JeffBezanson Oct 20, 2016

JeffBezanson Oct 20, 2016

JeffBezanson commented Oct 20, 2016

davidagold commented Oct 20, 2016

KristofferC commented Jan 7, 2018

WIP: Nullable lifting infrastructure #18758

WIP: Nullable lifting infrastructure #18758

Conversation

davidagold commented Oct 1, 2016 • edited Loading

TotalVerb commented Oct 1, 2016

davidagold commented Oct 1, 2016

TotalVerb commented Oct 1, 2016 • edited Loading

davidagold commented Oct 1, 2016

davidagold commented Oct 1, 2016

TotalVerb commented Oct 1, 2016

TotalVerb commented Oct 1, 2016

davidanthoff commented Oct 2, 2016

TotalVerb commented Oct 2, 2016

davidanthoff commented Oct 2, 2016

tkelman Oct 2, 2016

Choose a reason for hiding this comment

nalimilan Oct 3, 2016

Choose a reason for hiding this comment

andyferris commented Oct 3, 2016

nalimilan commented Oct 3, 2016

nalimilan Oct 3, 2016

Choose a reason for hiding this comment

davidagold Oct 4, 2016

Choose a reason for hiding this comment

nalimilan Oct 4, 2016

Choose a reason for hiding this comment

ViralBShah commented Oct 4, 2016

ViralBShah commented Oct 4, 2016

andyferris commented Oct 4, 2016

shashi commented Oct 4, 2016

nalimilan commented Oct 4, 2016

martinholters commented Oct 4, 2016

ViralBShah commented Oct 4, 2016

nalimilan commented Oct 4, 2016

davidagold commented Oct 4, 2016

nalimilan commented Oct 4, 2016

davidanthoff commented Oct 5, 2016

davidagold commented Oct 5, 2016

JeffBezanson Oct 20, 2016

Choose a reason for hiding this comment

JeffBezanson Oct 20, 2016

Choose a reason for hiding this comment

JeffBezanson commented Oct 20, 2016

davidagold commented Oct 20, 2016

KristofferC commented Jan 7, 2018

davidagold commented Oct 1, 2016 •

edited

Loading

TotalVerb commented Oct 1, 2016 •

edited

Loading