New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Nullables as collections #16961

Merged
merged 10 commits into from Dec 29, 2016

Conversation

@TotalVerb
Contributor

TotalVerb commented Jun 16, 2016

This implements map, filter, and broadcast for Nullable arguments.

cc: @nalimilan @johnmyleswhite @hayd @vchuravy

elseif !any(isnull, xs)
Nullable(map(f, map(unsafe_getindex, xs)...))
else
throw(DimensionMismatch("expected all null or all nonnull"))

This comment has been minimized.

@hayd

hayd Jun 16, 2016

Member

Is this a standard definition of map on options? I expected any(isnull, xs) && return Nullable().

This comment has been minimized.

@TotalVerb

TotalVerb Jun 16, 2016

Contributor

Among languages I know of (or could find on Google) with multi-argument map...

OCaml: option type does not support multi-argument map
(most) lisps: no real option type, but cons used in its place often. behaves like current implementation
Python: no option type
Haskell: zipWith does not apply to options

The intersection of multi-argument map and option types is quite small. This makes some degree of sense, as a lot of languages with option types are either OO (where multi-argument map often doesn't make sense) or curried.

In absence of a real precedent, I think it's a better idea to copy the behaviour of actual collections, rather than introduce special cases:

julia> map(+, Int[], [1])
ERROR: DimensionMismatch("dimensions must match")

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

Your argument about consistency convinced me, but I think the error message should also say something like "use broadcast instead". Basically, map shouldn't be used for lifting Nullable, it's mostly implemented for completeness (and maybe as a way to ensure fail-fast when you don't want a null to propagate).

This comment has been minimized.

@rfourquet

rfourquet Jun 16, 2016

Contributor

I would tend to think like @hayd on this one. In Haskell, zipWith on lists has no problem with one list shorter than the other, it makes a lot of sense and I don't see why our map don't act similarly, both for arrays and nullables. Also, when lifting via map an operation on two or more nullables, I would naturally interpret getting a isnull back as an "error": the computation couldn't be done because of missing values. Having an error thrown when the inputs are not all isnull is redundant, as you have now to different ways to be notified of a failed computation, and you have to check both for an isnull result and for a possible thrown exception. Maybe in Haskell can be used to encode an error at the type level, and lifting f on Maybe would be naturally implemented as

map2 f mx my = do
    x <- mx
    y <- my
    return $ f x y

which returns Nothing if one or more inputs are Nothing (that said, applying map2 on lists doesn't correspond to our multiple-list-arguments map).

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

We have broadcast for what you describe. I think there's been discussion somewhere about replacing map with it. But let's not make this PR even more complex by opening this debate: the goal here is to be consistent with arrays, whether that behavior is good or not.

This comment has been minimized.

@rfourquet

rfourquet Jun 17, 2016

Contributor

Ah indeed I hadn't notice the broadcast behavior. But still, I don't see why map should be defined for Nullable with the view that they are containers. For example NaN+1 doesn't throw an error. I'd then rather not define map until real cases has shown what should be the semantics (or has it happened already?) Anyway, I'm not competent so won't argue more. I would just love one day some docs on map vs broadcast (the concept of broadcast is still relatively alien to me).

This comment has been minimized.

@TotalVerb

TotalVerb Jun 17, 2016

Contributor

There is really no harm in making mixed map an error for now. If needed, it can always be relaxed... though I don't like broadcasting in map.

# indexing is either without index, or with 1 as index
# generalized linear indexing is not supported
linearindexing{T}(::Nullable{T}) = LinearFast()
function getindex(x::Nullable)

This comment has been minimized.

@hayd

hayd Jun 16, 2016

Member

I wonder if get (without default) could be deprecated in favor of this.

This comment has been minimized.

@TotalVerb

TotalVerb Jun 16, 2016

Contributor

I would like that too.

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

As noted elsewhere, we need the two-argument version anyway, so why not have the one-argument one too? It's probably easier to discover and to look for in help, while series of [] in the code might be a bit cryptic for newcomers.

This comment has been minimized.

@hayd

hayd Jun 16, 2016

Member

get(x::Nullable) is the only get method without a default.

This comment has been minimized.

@nalimilan

nalimilan Jun 17, 2016

Contributor

Let's discuss this elsewhere if we really want to change it. This PR has enough controversies in it.

# additional definitions for Nullable
# fast one & two-argument implementations
function broadcast{T}(f, x::Nullable{T})
restype = promote_op(f, T)

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

Types are usually written using one-letter capitals, for example S.

function broadcast(f, x::Union{Nullable, Number}, y::Union{Nullable, Number})
if isa(x, Number) && isa(y, Number)
throw(ArgumentError("broadcast is not supported on numbers"))

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

I'd rather define a specialized method for two numbers, which could (and should) work. Currently it doesn't, so that would fix a bug with e.g. broadcast(+, 1, 1) (please add a test for that too).

This comment has been minimized.

@TotalVerb

TotalVerb Jun 17, 2016

Contributor

What should that return? A number or an array?

This comment has been minimized.

@nalimilan

nalimilan Jun 17, 2016

Contributor

I guess a number? That's what 1 .+ 1 does.

This comment has been minimized.

@TotalVerb

TotalVerb Jun 18, 2016

Contributor
julia> broadcast(+, 1)
0-dimensional Array{Int64,0}:
1

What about this? I'm leaning towards 0-dimensional array, as in the current behaviour. When would it be useful to have a number as the result instead?

This comment has been minimized.

@nalimilan

nalimilan Nov 18, 2016

Contributor

I have no idea, but let's preserve the current behavior where broadcast(+, 1) === 1, and change this in another PR if needed. Mixing multiple issues is the recipe for never merging a PR.

if isnullvalue(x) | isnullvalue(y)
Nullable{restype}()
else
Nullable{restype}(f(unsafe_getindex(x), unsafe_getindex(y)))

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

Maybe better use @inbounds instead of the unsafe_ function.

(:.//, ://), (:.==, :(==)), (:.<, :<), (:.!=, :!=),
(:.<=, :<=), (:.÷, :÷), (:.%, :%), (:.<<, :<<), (:.>>, :>>),
(:.^, :^))
@eval $eop(x::Nullable, y::Union{Nullable, Number}) = broadcast($op, x, y)

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

I guess x::Union{Nullable, Number}, y::Union{Nullable, Number} doesn't work?

This comment has been minimized.

@TotalVerb

TotalVerb Jun 16, 2016

Contributor

It might. I was just paranoid about ambiguity.

end
# to maintain broadcast(fn, Number) behaviour
broadcast(f, x::Number) = broadcast(f, collect(x))

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

How is it affected by your changes?

x.value
end
# convenience method for getindex without bounds checking

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

You don't need these anymore with the @boundscheck declaration: use @inbounds as I noted above.

unsafe_getindex(x::Nullable) = x.value
unsafe_getindex(x::Number) = x
# convenience method for detecting null value

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

We should either add this definition to isnull or not add it at all. Cf. JuliaData/DataFrames.jl#994 (comment) and following comments.

This comment has been minimized.

@TotalVerb

TotalVerb Jun 17, 2016

Contributor

I'll inline isa(x, Nullable) && isnull(x) for now. Not sure if compiler is smart enough to optimize this out; I hope it is.

This comment has been minimized.

@nalimilan

nalimilan Jun 17, 2016

Contributor

Yes, I think it is.

# iteration protocol
start(x::Nullable) = 1
next(x::Nullable, i::Integer) = x.value, 0

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

While it works, it would be more logical to return 2 as a state, which is more clearly out of bounds. Then below you can check x.isnull | i != 1.

This comment has been minimized.

@eschnett

eschnett Jun 16, 2016

Contributor

I'd use a Bool for the state: either we are done or we are not done, there's no other case for Nullable. (Technically, this might even lead to better code if the hardware can handle Bool more efficiently than Int.)

This comment has been minimized.

@tkelman

tkelman Jun 17, 2016

Contributor

this line of code might go away for now, but x-ref #16753 and #16687

function filter{T}(p, x::Nullable{T})
if x.isnull
x
elseif p(x.value)

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

Write this as a single condition using the short-circuit operators: if isnull(x) || p(x.value).

end
function map(f, xs::Nullable...)
if all(isnull, xs)

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

I guess it would be faster to to nulls = sum(isnull, xs), and then if nulls == length(xs) and elseif nulls > 0 (length(xs) is a compile-time constant).

This comment has been minimized.

@TotalVerb

TotalVerb Jun 17, 2016

Contributor

This is more consistent in speed, but seems to be always slower from my tests. I think it's because normally one of all and any will short-circuit quickly.

This comment has been minimized.

@nalimilan

nalimilan Jun 17, 2016

Contributor

OK, maybe for very few elements like here this isn't worth it.

This comment has been minimized.

@TotalVerb

TotalVerb Jun 18, 2016

Contributor

I think the idea is that all short-circuits on false and any on true, so at least one of the branches is basically a no-op. And I suppose the sum is a bit more expensive than a series of branches.

@@ -1,5 +1,13 @@
# This file is a part of Julia. License is MIT: http://julialang.org/license
# "is a null with type T", curried on 2nd argument
isnull_typed(x::Nullable, T::Type) = typeof(x).parameters[1] == T && isnull(x)

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

Use eltype. Also, I'd call that function isnull_oftype.

isnull_typed(t::Type) = x -> isnull_typed(x, t)
# "is a nullable with value egal to x", curried on 2nd argument
isnullableof(y::Nullable, x) = !isnull(y) && y.value === x

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

AFAIK this is covered by y === Nullable(x), so no need for a function.

@test Nullable(0)[1] === 0
# collect
@test isempty(collect(Nullable())) && eltype(collect(Nullable())) == Union{}

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

Split these && into two lines, this is more readable and more precise about what fails.

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

BTW, it's also stricter do write isa(collect(Nullable()), Vector{Union{}}.

base/int.jl Outdated
@@ -305,6 +305,9 @@ promote_rule{T<:BitSigned64}(::Type{UInt64}, ::Type{T}) = UInt64
promote_rule{T<:Union{UInt32, UInt64}}(::Type{T}, ::Type{Int128}) = Int128
promote_rule{T<:BitSigned}(::Type{UInt128}, ::Type{T}) = UInt128
# the result on one is a good heuristic for promotion type
promote_op{T,R<:Integer}(::Type{T}, ::Type{R}) = T # to avoid ambiguity

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

If it's really needed to silence warnings, make it error() since it's not supposed to be used.

EDIT: but I thought in 0.5 the warning wouldn't be printed, and you would only get an error when trying to use that function method, which is OK.

This comment has been minimized.

@TotalVerb

TotalVerb Jun 16, 2016

Contributor

Why isn't it supposed to be used? broadcast(Float64, [1, 2, 3]) works today.

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

OK, I didn't consider types as callables, but indeed they are. Use (@_pure_meta; T) as for the current definition, though.

base/int.jl Outdated
@@ -305,6 +305,9 @@ promote_rule{T<:BitSigned64}(::Type{UInt64}, ::Type{T}) = UInt64
promote_rule{T<:Union{UInt32, UInt64}}(::Type{T}, ::Type{Int128}) = Int128
promote_rule{T<:BitSigned}(::Type{UInt128}, ::Type{T}) = UInt128
# the result on one is a good heuristic for promotion type
promote_op{T,R<:Integer}(::Type{T}, ::Type{R}) = T # to avoid ambiguity
promote_op{R<:Integer}(op, ::Type{R}) = typeof(op(one(R)))

This comment has been minimized.

@nalimilan

nalimilan Jun 16, 2016

Contributor

That sounds like a possible partial workaround to #16164 until we get a more general solution, but I think we should discuss this choice on that issue, and keep this PR focused on nullables. Please remove this commit for now to increase the changes of merging the PR soon.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Jun 16, 2016

Thanks. I agree it's OK to start with an unoptimized version, and add a fast path for safe isbits types like Base's number types ` (for which we can avoid a branch) later. Cf. JuliaStats/NullableArrays.jl#116 (comment).

I'll leave it to people more familiar with the broadcast machinery to say whether the general approach is good. One thing I wonder is how broadcast works on arguments mixing nullables, numbers and arrays. Could you add tests for that too?

s = 0
for x in Nullable{Int}()
s += x

This comment has been minimized.

@tkelman

tkelman Jun 16, 2016

Contributor

this does not seem desirable to me at all

@tkelman

This comment has been minimized.

Contributor

tkelman commented Jun 16, 2016

Why should nullables be iterable? This seems like going further down making scalars iterable, but with more conflating of nullability with container emptiness. I think it's clearer if the concepts are kept separate.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Jun 16, 2016

The main feature is to give f.(x) lifting semantics via broadcast. Other methods seem less useful to me, but can be interesting to have for consistency. This also seems consistent to how other languages treat nullable/option types.

@tkelman

This comment has been minimized.

Contributor

tkelman commented Jun 16, 2016

Should be able to opt in to the syntactic sugar without making them iterable. This would lead to far more methods which aren't expecting to deal with nullables trying to operate on them, which I imagine is the intent here but I see that going wrong often and would rather prefer being explicit. Iteration would suddenly discard the possibility of null values, or silently do nothing instead of visibly flagging that a null was present. Doing nothing on a null value isn't always appropriate, encoding it as part of the behavior strikes me as wrong.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Jun 16, 2016

f.(x) needs to always call broadcast since that's just syntactic sugar. So we need at least to implement broadcast on nullables.

I have no strong opinion on implementing the more general interface, but I figure e.g. Scala and Rust had good reasons to do that. @johnmyleswhite?

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Jun 16, 2016

I intentionally left broadcast between arrays and nullables. This operation is ambiguous because the return type could be Nullable{Array} or Array{Nullable} (although I admit the latter is almost always more useful). The original reason for leaving out this operation was a consequence of thinking of nullables as infinite-dimensional arrays in the original PR: arrays must be promoted in dimension to broadcast to nullable, and nullable must be promoted in size to broadcast to array.

Now that that interpretation has been dispensed with, it could be worth reconsidering whether Array{Nullable} is an acceptable return type. Though, it is a bit unfortunate that the return type would not be a NullableArray.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Jun 16, 2016

@tkelman The Julia documentation is pretty explicit that Nullable types are container-like:

In many settings, you need to interact with a value of type T that may or may not exist. To handle these settings, Julia provides a parametric type called Nullable{T}, which can be thought of as a specialized container type that can contain either zero or one values.

This was why I filed #16889—being thought of as a container type is useless if they don't behave as containers. The container property is important, and is what (functionally) distinguishes Nullable{T} from Union{T, Void} (though the second one is much slower with the current compiler).

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Jun 16, 2016

There are effectively four interpretations of a Nullable{T} out there:

  • Interpretation 1: Nullable{T} is Union{T, Void}. This is how many other dynamic languages do it: Python, JavaScript, etc.
  • Interpretation 2: Nullable{T} is a "dumb option type" that isn't much more powerful than Union{T, Void}. This is really only used in static languages, where the compiler provides type checking.
  • Interpretation 3: Nullable{T} is an option type that behaves as a collection. This is how some functional languages and most OO languages do it.
  • Interpretation 4: Nullable{T} is a monad. This is only done in functional languages, and only by few of them.

The current behaviour is interpretation 2. I would argue that interpretation 3 makes more sense for Julia.

@tkelman

This comment has been minimized.

Contributor

tkelman commented Jun 16, 2016

The container property is important

I'm not convinced. Why? We have plenty of "container" wrapper types that don't act like iterable collections.

@mschauer

This comment has been minimized.

Contributor

mschauer commented Jun 16, 2016

Nullable started as a very minimal type "with a very minimal interface with the hope that this will encourage you to resolve the uncertainty of whether a value is missing as soon as possible" to cite @johnmyleswhite . As some point it was even discussed that Nullable{Nullable{T}} could be collapsed to Nullable{T} (compare Union{Union{T,Void},Void} == Union{T,Void}). That indicates to me that there is reason to be cautious with a commitment to the "container with one or zero elements" interpretation.

@eschnett

This comment has been minimized.

Contributor

eschnett commented Jun 16, 2016

There are a few more interpretation of Nullable{T} that are interesting in certain cases:

  • encode missing values (see NA)
  • encode defaults (adding a null value corresponds to adding zero)
  • encode errors; adding a null value leads to an error (i.e. a null value) [that's the monadic interpretation]

At one point, Julia could begin to offer different types with different behaviour. There could be an Optional type with defaulting behaviour, a Maybe type that is monadic and container-like, etc. They would all be built onto the same low-level (and bare-bones) implementation.

Here, instead of adding more container-ness to Nullable (which seems to be counter-productive for some uses), we could instead add a Maybe type that is container-like, and which supports arithmetic operations, treating null values as errors, as is suggested here.

An Optional type (all names to be bikeshedded, of course) could also implement arithmetic, but would treat missing values as neutral, not as error.

What hasn't happened so far is to make a strong case for these types, and to prototype them in a package. I'd expect some of these to become widely used, others to remain obscure.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Jun 16, 2016

Here, instead of adding more container-ness to Nullable (which seems to be counter-productive for some uses),

"Counter-productive" sounds quite strong. Do you have any evidence of a case where different interpretations really conflict?

What hasn't happened so far is to make a strong case for these types, and to prototype them in a package. I'd expect some of these to become widely used, others to remain obscure.

Precisely, a lot of work has gone into NullableArrays, and we're seeing the current limitations of Nullable support in Base while porting DataFrames to it (i.e. with the NA interpretation): JuliaData/DataFrames.jl#994. The use case is well defined now, and we know we need at least a good support for lifting semantics. So far the best solution I can see is to use f.(x) (hence broadcast) for that. Whether we need to treat Nullable as a container in all cases remains to be seen, but that could be a logical extension.

@tkelman

This comment has been minimized.

Contributor

tkelman commented Dec 29, 2016

(pending a comment from Sacha on future cleanups he has in mind)

@johnmyleswhite

This comment has been minimized.

Member

johnmyleswhite commented Dec 29, 2016

Thanks again for making this happen, @TotalVerb!

@tkelman

This comment has been minimized.

Contributor

tkelman commented Dec 30, 2016

We should have checked the appveyor (edit: travis too!) log more carefully. This somehow didn't cause the tests to register as failing, but something strange is up here:

From worker 1:		From worker 1:	Worker 1 failed running test broadcast:
UndefVarError: TestMain_broadcast not defined
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::Type{Module}) at .\serialize.jl:601
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:580
 in deserialize(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:540
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:835
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:570
 in deserialize(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:540
 in ntuple(::Base.Serializer.##1#2{Base.ClusterSerializer{TCPSocket}}, ::Int64) at .\tuple.jl:79
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:561
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at .\serialize.jl:894
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:848
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:570
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at .\serialize.jl:894
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:848
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:570
 in deserialize_array(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:718
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:568
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at .\serialize.jl:894
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:848
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:570
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at .\serialize.jl:894
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:848
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:570
 in deserialize(::Base.ClusterSerializer{TCPSocket}, ::DataType) at .\serialize.jl:894
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at .\serialize.jl:848
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at .\serialize.jl:570
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}, ::Type{Base.ResultMsg}) at .\multi.jl:120
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}) at .\multi.jl:130
 in message_handler_loop(::TCPSocket, ::TCPSocket, ::Bool) at .\multi.jl:1371
 in process_tcp_streams(::TCPSocket, ::TCPSocket, ::Bool) at .\multi.jl:1328
 in (::Base.##551#552{TCPSocket,TCPSocket,Bool})() at .\event.jl:66
@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Dec 30, 2016

Oh dear. Let me see if I can reproduce this locally.

@tkelman

This comment has been minimized.

Contributor

tkelman commented Dec 30, 2016

I get a failure that bisects to here both on Windows and Linux from just running make test-broadcast - the issue above might be something wrong with the test system's handling of that failure?

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Dec 30, 2016

The failure isn't in the travis logs for #19745, so it could be the known typo in this PR substituting S for f, which was corrected in that one.

@tkelman

This comment has been minimized.

Contributor

tkelman commented Dec 30, 2016

That seems like a good explanation to me. There's a test system bug that it was able to continue and be marked as a success here then. Manifests only when tests are run in parallel. @yuyichao any idea?

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Dec 30, 2016

@johnmyleswhite I recall a discussion where we agreed that nullables are not containers. However, that aspect of this --- e.g. defining map on Nullable --- is probably ok.

But I'm very much against the broadcast behavior. Making broadcast also implicitly lift the function to nullables has nothing to do with treating nullables as containers. For example

julia> println.([[1],[2]])
[1]
[2]

julia> println.([Nullable(1), Nullable(2)])
1
2

So, at the very least I find the title of this PR misleading.

I think it's unacceptable for f.(xs) to "take apart" more than one layer of container (differing from map). Making 1 .+ Nullable(2) work might be fine; that is what I expected "Nullables as collections" to mean. Can we please take out the extra, horribly confusing behavior?

@ararslan

This comment has been minimized.

Member

ararslan commented Dec 30, 2016

we agreed that nullables are not containers

If I understand the current thinking correctly, I believe treating Nullables as containers is more of a stopgap until we can get better Unionss so that nullables can be a simple union rather than a container.

cf. JuliaLang/Juleps#21

_broadcast_type(f, T::Type, As...) = Base._return_type(f, typestuple(T, As...))
_broadcast_type(f, A, Bs...) = Base._default_eltype(Base.Generator{ziptype(A, Bs...), ftype(f, A, Bs...)})
# nullables need to be treated like scalars sometimes and like containers

This comment has been minimized.

@JeffBezanson

JeffBezanson Dec 30, 2016

Member

The occurrence of "sometimes" here definitely raises a red flag.

@pabloferz

This comment has been minimized.

Contributor

pabloferz commented Dec 30, 2016

I have a PR to address @JeffBezanson's concerns. I can submit it within the next hour.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Dec 30, 2016

Thanks @JeffBezanson for the comments. Indeed your concerns were noted and discussed previously around #16961 (comment) and the (weak) consensus seemed to be that "nullables go in arrays, and not the other way around" as a justification for this special case, but I admit that it is not the prettiest solution. Let's see if @pabloferz has a better one.

@johnmyleswhite

This comment has been minimized.

Member

johnmyleswhite commented Dec 31, 2016

I think it's unacceptable for f.(xs) to "take apart" more than one layer of container (differing from map).

FWIW, I'm ok with either approach. This "taking apart" is the behavior that Erik Meijer and Eric Lippert were encouraging us to adopt, but that's because their vision of a dot operator for C# always involved flattening for all containers-of-containers and not just containers-of-nullables.

For now I think we can remove this behavior since it's not very important for most use cases.

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Dec 31, 2016

We can get rid of this behavior for now, but that seems to imply we'll need to keep a special array type like NullableArray even after moving to a Union type to represent nullables. Else we would lose the ability to use element-wise operators for lifting over arrays.

@TotalVerb

This comment has been minimized.

Contributor

TotalVerb commented Dec 31, 2016

@nalimilan I agree that we will need to keep NullableArray.

Related to this point, @ScottPJones had the following comment to make;

I just wanted to point out that having some sort of NullableArray type will still be important for performance reasons, because (IIUC) for arrays of immutable types and bitstypes, the Union type would need at least to take a pointer (or more, to preserve alignment) size of extra memory per element, while a NullableArray only takes 1 byte extra per element.

Note: I think that it would be good to actually have a parameterized NullableArray type, where anything that is an abstract bit array could be used (because in many cases, a BitVector or BitArray would be more performant (both space and timewise) than using an Array{Bool}, and would allow for using a sparse bit array for even further space savings, if the number of Null elements is small compared to the size of the array.

I think future progress on this will be on the package side.

@Sacha0

This comment has been minimized.

Member

Sacha0 commented Dec 31, 2016

First off, much thanks for your great work in / persistence with this pull request @TotalVerb! :)

(pending a comment from Sacha on future cleanups he has in mind)

In brief: For better extensibility and maintainability of broadcast, we should consider a different approach to evolving the Base.Broadcast module than that taken in this pull request.

Specifically, this pull request introduced additional type-specific functionality into Base.Broadcast, and tightly coupled that functionality with existing generic infrastructure, making broadcast more difficult to extend and maintain. Heading the opposite direction seems advantageous in the long run: Focusing Base.Broadcast on generic, readily extensible infrastructure, allowing (and encouraging) type-specific functionality to live outside of Base.Broadcast (and outside of Base altogether as appropriate).

Not certain whether I will have time to comment in detail prior to travel tomorrow morning. Most specific concerns I have are happily addressed by #19745 and #19787.

Cross refs: @andreasnoack expresses similar sentiments in #19787 (comment), @TotalVerb in #19745 (review), and I in #19745 (review) and #19723 (comment).

Best!

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Jan 1, 2017

I just wanted to point out that having some sort of NullableArray type will still be important for performance reasons, because (IIUC) for arrays of immutable types and bitstypes, the Union type would need at least to take a pointer (or more, to preserve alignment) size of extra memory per element, while a NullableArray only takes 1 byte extra per element.

No, AFAIK @vtjnash's plan is to optimize Union{T, Void} so that the memory representation is as efficient as a NullableArray. See this paragraph in @johnmyleswhite's Julep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment