RFC: Nullables as collections #16961
Conversation
elseif !any(isnull, xs) | ||
Nullable(map(f, map(unsafe_getindex, xs)...)) | ||
else | ||
throw(DimensionMismatch("expected all null or all nonnull")) |
hayd
Jun 16, 2016
Member
Is this a standard definition of map on options? I expected any(isnull, xs) && return Nullable()
.
Is this a standard definition of map on options? I expected any(isnull, xs) && return Nullable()
.
TotalVerb
Jun 16, 2016
Author
Contributor
Among languages I know of (or could find on Google) with multi-argument map...
OCaml: option type does not support multi-argument map
(most) lisps: no real option type, but cons
used in its place often. behaves like current implementation
Python: no option type
Haskell: zipWith does not apply to options
The intersection of multi-argument map and option types is quite small. This makes some degree of sense, as a lot of languages with option types are either OO (where multi-argument map often doesn't make sense) or curried.
In absence of a real precedent, I think it's a better idea to copy the behaviour of actual collections, rather than introduce special cases:
julia> map(+, Int[], [1])
ERROR: DimensionMismatch("dimensions must match")
Among languages I know of (or could find on Google) with multi-argument map...
OCaml: option type does not support multi-argument map
(most) lisps: no real option type, but cons
used in its place often. behaves like current implementation
Python: no option type
Haskell: zipWith does not apply to options
The intersection of multi-argument map and option types is quite small. This makes some degree of sense, as a lot of languages with option types are either OO (where multi-argument map often doesn't make sense) or curried.
In absence of a real precedent, I think it's a better idea to copy the behaviour of actual collections, rather than introduce special cases:
julia> map(+, Int[], [1])
ERROR: DimensionMismatch("dimensions must match")
nalimilan
Jun 16, 2016
Contributor
Your argument about consistency convinced me, but I think the error message should also say something like "use broadcast instead". Basically, map
shouldn't be used for lifting Nullable
, it's mostly implemented for completeness (and maybe as a way to ensure fail-fast when you don't want a null to propagate).
Your argument about consistency convinced me, but I think the error message should also say something like "use broadcast instead". Basically, map
shouldn't be used for lifting Nullable
, it's mostly implemented for completeness (and maybe as a way to ensure fail-fast when you don't want a null to propagate).
rfourquet
Jun 16, 2016
Contributor
I would tend to think like @hayd on this one. In Haskell, zipWith
on lists has no problem with one list shorter than the other, it makes a lot of sense and I don't see why our map don't act similarly, both for arrays and nullables. Also, when lifting via map an operation on two or more nullables, I would naturally interpret getting a isnull back as an "error": the computation couldn't be done because of missing values. Having an error thrown when the inputs are not all isnull is redundant, as you have now to different ways to be notified of a failed computation, and you have to check both for an isnull result and for a possible thrown exception. Maybe
in Haskell can be used to encode an error at the type level, and lifting f
on Maybe
would be naturally implemented as
map2 f mx my = do
x <- mx
y <- my
return $ f x y
which returns Nothing
if one or more inputs are Nothing (that said, applying map2
on lists doesn't correspond to our multiple-list-arguments map).
I would tend to think like @hayd on this one. In Haskell, zipWith
on lists has no problem with one list shorter than the other, it makes a lot of sense and I don't see why our map don't act similarly, both for arrays and nullables. Also, when lifting via map an operation on two or more nullables, I would naturally interpret getting a isnull back as an "error": the computation couldn't be done because of missing values. Having an error thrown when the inputs are not all isnull is redundant, as you have now to different ways to be notified of a failed computation, and you have to check both for an isnull result and for a possible thrown exception. Maybe
in Haskell can be used to encode an error at the type level, and lifting f
on Maybe
would be naturally implemented as
map2 f mx my = do
x <- mx
y <- my
return $ f x y
which returns Nothing
if one or more inputs are Nothing (that said, applying map2
on lists doesn't correspond to our multiple-list-arguments map).
nalimilan
Jun 16, 2016
Contributor
We have broadcast
for what you describe. I think there's been discussion somewhere about replacing map
with it. But let's not make this PR even more complex by opening this debate: the goal here is to be consistent with arrays, whether that behavior is good or not.
We have broadcast
for what you describe. I think there's been discussion somewhere about replacing map
with it. But let's not make this PR even more complex by opening this debate: the goal here is to be consistent with arrays, whether that behavior is good or not.
rfourquet
Jun 17, 2016
Contributor
Ah indeed I hadn't notice the broadcast behavior. But still, I don't see why map
should be defined for Nullable
with the view that they are containers. For example NaN+1
doesn't throw an error. I'd then rather not define map
until real cases has shown what should be the semantics (or has it happened already?) Anyway, I'm not competent so won't argue more. I would just love one day some docs on map
vs broadcast
(the concept of broadcast is still relatively alien to me).
Ah indeed I hadn't notice the broadcast behavior. But still, I don't see why map
should be defined for Nullable
with the view that they are containers. For example NaN+1
doesn't throw an error. I'd then rather not define map
until real cases has shown what should be the semantics (or has it happened already?) Anyway, I'm not competent so won't argue more. I would just love one day some docs on map
vs broadcast
(the concept of broadcast is still relatively alien to me).
TotalVerb
Jun 17, 2016
•
Author
Contributor
There is really no harm in making mixed map an error for now. If needed, it can always be relaxed... though I don't like broadcasting in map.
There is really no harm in making mixed map an error for now. If needed, it can always be relaxed... though I don't like broadcasting in map.
# indexing is either without index, or with 1 as index | ||
# generalized linear indexing is not supported | ||
linearindexing{T}(::Nullable{T}) = LinearFast() | ||
function getindex(x::Nullable) |
hayd
Jun 16, 2016
Member
I wonder if get
(without default) could be deprecated in favor of this.
I wonder if get
(without default) could be deprecated in favor of this.
TotalVerb
Jun 16, 2016
Author
Contributor
I would like that too.
I would like that too.
nalimilan
Jun 16, 2016
Contributor
As noted elsewhere, we need the two-argument version anyway, so why not have the one-argument one too? It's probably easier to discover and to look for in help, while series of []
in the code might be a bit cryptic for newcomers.
As noted elsewhere, we need the two-argument version anyway, so why not have the one-argument one too? It's probably easier to discover and to look for in help, while series of []
in the code might be a bit cryptic for newcomers.
hayd
Jun 16, 2016
Member
get(x::Nullable)
is the only get method without a default.
get(x::Nullable)
is the only get method without a default.
nalimilan
Jun 17, 2016
Contributor
Let's discuss this elsewhere if we really want to change it. This PR has enough controversies in it.
Let's discuss this elsewhere if we really want to change it. This PR has enough controversies in it.
# additional definitions for Nullable | ||
# fast one & two-argument implementations | ||
function broadcast{T}(f, x::Nullable{T}) | ||
restype = promote_op(f, T) |
nalimilan
Jun 16, 2016
Contributor
Types are usually written using one-letter capitals, for example S
.
Types are usually written using one-letter capitals, for example S
.
|
||
function broadcast(f, x::Union{Nullable, Number}, y::Union{Nullable, Number}) | ||
if isa(x, Number) && isa(y, Number) | ||
throw(ArgumentError("broadcast is not supported on numbers")) |
nalimilan
Jun 16, 2016
Contributor
I'd rather define a specialized method for two numbers, which could (and should) work. Currently it doesn't, so that would fix a bug with e.g. broadcast(+, 1, 1)
(please add a test for that too).
I'd rather define a specialized method for two numbers, which could (and should) work. Currently it doesn't, so that would fix a bug with e.g. broadcast(+, 1, 1)
(please add a test for that too).
TotalVerb
Jun 17, 2016
Author
Contributor
What should that return? A number or an array?
What should that return? A number or an array?
nalimilan
Jun 17, 2016
Contributor
I guess a number? That's what 1 .+ 1
does.
I guess a number? That's what 1 .+ 1
does.
TotalVerb
Jun 18, 2016
•
Author
Contributor
julia> broadcast(+, 1)
0-dimensional Array{Int64,0}:
1
What about this? I'm leaning towards 0-dimensional array, as in the current behaviour. When would it be useful to have a number as the result instead?
julia> broadcast(+, 1)
0-dimensional Array{Int64,0}:
1
What about this? I'm leaning towards 0-dimensional array, as in the current behaviour. When would it be useful to have a number as the result instead?
nalimilan
Nov 18, 2016
Contributor
I have no idea, but let's preserve the current behavior where broadcast(+, 1) === 1
, and change this in another PR if needed. Mixing multiple issues is the recipe for never merging a PR.
I have no idea, but let's preserve the current behavior where broadcast(+, 1) === 1
, and change this in another PR if needed. Mixing multiple issues is the recipe for never merging a PR.
if isnullvalue(x) | isnullvalue(y) | ||
Nullable{restype}() | ||
else | ||
Nullable{restype}(f(unsafe_getindex(x), unsafe_getindex(y))) |
nalimilan
Jun 16, 2016
Contributor
Maybe better use @inbounds
instead of the unsafe_
function.
Maybe better use @inbounds
instead of the unsafe_
function.
(:.//, ://), (:.==, :(==)), (:.<, :<), (:.!=, :!=), | ||
(:.<=, :<=), (:.÷, :÷), (:.%, :%), (:.<<, :<<), (:.>>, :>>), | ||
(:.^, :^)) | ||
@eval $eop(x::Nullable, y::Union{Nullable, Number}) = broadcast($op, x, y) |
nalimilan
Jun 16, 2016
Contributor
I guess x::Union{Nullable, Number}, y::Union{Nullable, Number}
doesn't work?
I guess x::Union{Nullable, Number}, y::Union{Nullable, Number}
doesn't work?
TotalVerb
Jun 16, 2016
•
Author
Contributor
It might. I was just paranoid about ambiguity.
It might. I was just paranoid about ambiguity.
end | ||
|
||
# to maintain broadcast(fn, Number) behaviour | ||
broadcast(f, x::Number) = broadcast(f, collect(x)) |
nalimilan
Jun 16, 2016
Contributor
How is it affected by your changes?
How is it affected by your changes?
x.value | ||
end | ||
|
||
# convenience method for getindex without bounds checking |
nalimilan
Jun 16, 2016
Contributor
You don't need these anymore with the @boundscheck
declaration: use @inbounds
as I noted above.
You don't need these anymore with the @boundscheck
declaration: use @inbounds
as I noted above.
unsafe_getindex(x::Nullable) = x.value | ||
unsafe_getindex(x::Number) = x | ||
|
||
# convenience method for detecting null value |
nalimilan
Jun 16, 2016
Contributor
We should either add this definition to isnull
or not add it at all. Cf. JuliaData/DataFrames.jl#994 (comment) and following comments.
We should either add this definition to isnull
or not add it at all. Cf. JuliaData/DataFrames.jl#994 (comment) and following comments.
TotalVerb
Jun 17, 2016
Author
Contributor
I'll inline isa(x, Nullable) && isnull(x)
for now. Not sure if compiler is smart enough to optimize this out; I hope it is.
I'll inline isa(x, Nullable) && isnull(x)
for now. Not sure if compiler is smart enough to optimize this out; I hope it is.
nalimilan
Jun 17, 2016
Contributor
Yes, I think it is.
Yes, I think it is.
|
||
# iteration protocol | ||
start(x::Nullable) = 1 | ||
next(x::Nullable, i::Integer) = x.value, 0 |
nalimilan
Jun 16, 2016
Contributor
While it works, it would be more logical to return 2
as a state, which is more clearly out of bounds. Then below you can check x.isnull | i != 1
.
While it works, it would be more logical to return 2
as a state, which is more clearly out of bounds. Then below you can check x.isnull | i != 1
.
eschnett
Jun 16, 2016
Contributor
I'd use a Bool
for the state: either we are done or we are not done, there's no other case for Nullable
. (Technically, this might even lead to better code if the hardware can handle Bool
more efficiently than Int
.)
I'd use a Bool
for the state: either we are done or we are not done, there's no other case for Nullable
. (Technically, this might even lead to better code if the hardware can handle Bool
more efficiently than Int
.)
function filter{T}(p, x::Nullable{T}) | ||
if x.isnull | ||
x | ||
elseif p(x.value) |
nalimilan
Jun 16, 2016
Contributor
Write this as a single condition using the short-circuit operators: if isnull(x) || p(x.value)
.
Write this as a single condition using the short-circuit operators: if isnull(x) || p(x.value)
.
end | ||
|
||
function map(f, xs::Nullable...) | ||
if all(isnull, xs) |
nalimilan
Jun 16, 2016
Contributor
I guess it would be faster to to nulls = sum(isnull, xs)
, and then if nulls == length(xs)
and elseif nulls > 0
(length(xs)
is a compile-time constant).
I guess it would be faster to to nulls = sum(isnull, xs)
, and then if nulls == length(xs)
and elseif nulls > 0
(length(xs)
is a compile-time constant).
TotalVerb
Jun 17, 2016
Author
Contributor
This is more consistent in speed, but seems to be always slower from my tests. I think it's because normally one of all
and any
will short-circuit quickly.
This is more consistent in speed, but seems to be always slower from my tests. I think it's because normally one of all
and any
will short-circuit quickly.
nalimilan
Jun 17, 2016
Contributor
OK, maybe for very few elements like here this isn't worth it.
OK, maybe for very few elements like here this isn't worth it.
TotalVerb
Jun 18, 2016
Author
Contributor
I think the idea is that all
short-circuits on false
and any
on true
, so at least one of the branches is basically a no-op. And I suppose the sum is a bit more expensive than a series of branches.
I think the idea is that all
short-circuits on false
and any
on true
, so at least one of the branches is basically a no-op. And I suppose the sum is a bit more expensive than a series of branches.
@@ -1,5 +1,13 @@ | |||
# This file is a part of Julia. License is MIT: http://julialang.org/license | |||
|
|||
# "is a null with type T", curried on 2nd argument | |||
isnull_typed(x::Nullable, T::Type) = typeof(x).parameters[1] == T && isnull(x) |
nalimilan
Jun 16, 2016
Contributor
Use eltype
. Also, I'd call that function isnull_oftype
.
Use eltype
. Also, I'd call that function isnull_oftype
.
isnull_typed(t::Type) = x -> isnull_typed(x, t) | ||
|
||
# "is a nullable with value egal to x", curried on 2nd argument | ||
isnullableof(y::Nullable, x) = !isnull(y) && y.value === x |
nalimilan
Jun 16, 2016
Contributor
AFAIK this is covered by y === Nullable(x)
, so no need for a function.
AFAIK this is covered by y === Nullable(x)
, so no need for a function.
@test Nullable(0)[1] === 0 | ||
|
||
# collect | ||
@test isempty(collect(Nullable())) && eltype(collect(Nullable())) == Union{} |
nalimilan
Jun 16, 2016
Contributor
Split these &&
into two lines, this is more readable and more precise about what fails.
Split these &&
into two lines, this is more readable and more precise about what fails.
nalimilan
Jun 16, 2016
Contributor
BTW, it's also stricter do write isa(collect(Nullable()), Vector{Union{}}
.
BTW, it's also stricter do write isa(collect(Nullable()), Vector{Union{}}
.
@@ -305,6 +305,9 @@ promote_rule{T<:BitSigned64}(::Type{UInt64}, ::Type{T}) = UInt64 | |||
promote_rule{T<:Union{UInt32, UInt64}}(::Type{T}, ::Type{Int128}) = Int128 | |||
promote_rule{T<:BitSigned}(::Type{UInt128}, ::Type{T}) = UInt128 | |||
|
|||
# the result on one is a good heuristic for promotion type | |||
promote_op{T,R<:Integer}(::Type{T}, ::Type{R}) = T # to avoid ambiguity |
nalimilan
Jun 16, 2016
•
Contributor
If it's really needed to silence warnings, make it error()
since it's not supposed to be used.
EDIT: but I thought in 0.5 the warning wouldn't be printed, and you would only get an error when trying to use that function method, which is OK.
If it's really needed to silence warnings, make it error()
since it's not supposed to be used.
EDIT: but I thought in 0.5 the warning wouldn't be printed, and you would only get an error when trying to use that function method, which is OK.
TotalVerb
Jun 16, 2016
Author
Contributor
Why isn't it supposed to be used? broadcast(Float64, [1, 2, 3])
works today.
Why isn't it supposed to be used? broadcast(Float64, [1, 2, 3])
works today.
nalimilan
Jun 16, 2016
Contributor
OK, I didn't consider types as callables, but indeed they are. Use (@_pure_meta; T)
as for the current definition, though.
OK, I didn't consider types as callables, but indeed they are. Use (@_pure_meta; T)
as for the current definition, though.
@@ -305,6 +305,9 @@ promote_rule{T<:BitSigned64}(::Type{UInt64}, ::Type{T}) = UInt64 | |||
promote_rule{T<:Union{UInt32, UInt64}}(::Type{T}, ::Type{Int128}) = Int128 | |||
promote_rule{T<:BitSigned}(::Type{UInt128}, ::Type{T}) = UInt128 | |||
|
|||
# the result on one is a good heuristic for promotion type | |||
promote_op{T,R<:Integer}(::Type{T}, ::Type{R}) = T # to avoid ambiguity | |||
promote_op{R<:Integer}(op, ::Type{R}) = typeof(op(one(R))) |
nalimilan
Jun 16, 2016
Contributor
That sounds like a possible partial workaround to #16164 until we get a more general solution, but I think we should discuss this choice on that issue, and keep this PR focused on nullables. Please remove this commit for now to increase the changes of merging the PR soon.
That sounds like a possible partial workaround to #16164 until we get a more general solution, but I think we should discuss this choice on that issue, and keep this PR focused on nullables. Please remove this commit for now to increase the changes of merging the PR soon.
Thanks. I agree it's OK to start with an unoptimized version, and add a fast path for safe I'll leave it to people more familiar with the |
|
||
s = 0 | ||
for x in Nullable{Int}() | ||
s += x |
tkelman
Jun 16, 2016
Contributor
this does not seem desirable to me at all
this does not seem desirable to me at all
Why should nullables be iterable? This seems like going further down making scalars iterable, but with more conflating of nullability with container emptiness. I think it's clearer if the concepts are kept separate. |
The main feature is to give |
Should be able to opt in to the syntactic sugar without making them iterable. This would lead to far more methods which aren't expecting to deal with nullables trying to operate on them, which I imagine is the intent here but I see that going wrong often and would rather prefer being explicit. Iteration would suddenly discard the possibility of null values, or silently do nothing instead of visibly flagging that a null was present. Doing nothing on a null value isn't always appropriate, encoding it as part of the behavior strikes me as wrong. |
I have no strong opinion on implementing the more general interface, but I figure e.g. Scala and Rust had good reasons to do that. @johnmyleswhite? |
I intentionally left Now that that interpretation has been dispensed with, it could be worth reconsidering whether |
@tkelman The Julia documentation is pretty explicit that Nullable types are container-like:
This was why I filed #16889—being thought of as a container type is useless if they don't behave as containers. The container property is important, and is what (functionally) distinguishes |
There are effectively four interpretations of a
The current behaviour is interpretation 2. I would argue that interpretation 3 makes more sense for Julia. |
I'm not convinced. Why? We have plenty of "container" wrapper types that don't act like iterable collections. |
Nullable started as a very minimal type "with a very minimal interface with the hope that this will encourage you to resolve the uncertainty of whether a value is missing as soon as possible" to cite @johnmyleswhite . As some point it was even discussed that |
There are a few more interpretation of
At one point, Julia could begin to offer different types with different behaviour. There could be an Here, instead of adding more container-ness to An What hasn't happened so far is to make a strong case for these types, and to prototype them in a package. I'd expect some of these to become widely used, others to remain obscure. |
"Counter-productive" sounds quite strong. Do you have any evidence of a case where different interpretations really conflict?
Precisely, a lot of work has gone into NullableArrays, and we're seeing the current limitations of |
(pending a comment from Sacha on future cleanups he has in mind) |
Thanks again for making this happen, @TotalVerb! |
We should have checked the appveyor (edit: travis too!) log more carefully. This somehow didn't cause the tests to register as failing, but something strange is up here:
|
Oh dear. Let me see if I can reproduce this locally. |
I get a failure that bisects to here both on Windows and Linux from just running |
The failure isn't in the travis logs for #19745, so it could be the known typo in this PR substituting |
That seems like a good explanation to me. There's a test system bug that it was able to continue and be marked as a success here then. Manifests only when tests are run in parallel. @yuyichao any idea? |
@johnmyleswhite I recall a discussion where we agreed that nullables are not containers. However, that aspect of this --- e.g. defining But I'm very much against the
So, at the very least I find the title of this PR misleading. I think it's unacceptable for |
If I understand the current thinking correctly, I believe treating |
|
||
_broadcast_type(f, T::Type, As...) = Base._return_type(f, typestuple(T, As...)) | ||
_broadcast_type(f, A, Bs...) = Base._default_eltype(Base.Generator{ziptype(A, Bs...), ftype(f, A, Bs...)}) | ||
# nullables need to be treated like scalars sometimes and like containers |
JeffBezanson
Dec 30, 2016
Member
The occurrence of "sometimes" here definitely raises a red flag.
The occurrence of "sometimes" here definitely raises a red flag.
I have a PR to address @JeffBezanson's concerns. I can submit it within the next hour. |
Thanks @JeffBezanson for the comments. Indeed your concerns were noted and discussed previously around #16961 (comment) and the (weak) consensus seemed to be that "nullables go in arrays, and not the other way around" as a justification for this special case, but I admit that it is not the prettiest solution. Let's see if @pabloferz has a better one. |
FWIW, I'm ok with either approach. This "taking apart" is the behavior that Erik Meijer and Eric Lippert were encouraging us to adopt, but that's because their vision of a dot operator for C# always involved flattening for all containers-of-containers and not just containers-of-nullables. For now I think we can remove this behavior since it's not very important for most use cases. |
We can get rid of this behavior for now, but that seems to imply we'll need to keep a special array type like |
@nalimilan I agree that we will need to keep Related to this point, @ScottPJones had the following comment to make;
I think future progress on this will be on the package side. |
First off, much thanks for your great work in / persistence with this pull request @TotalVerb! :)
In brief: For better extensibility and maintainability of Specifically, this pull request introduced additional type-specific functionality into Not certain whether I will have time to comment in detail prior to travel tomorrow morning. Most specific concerns I have are happily addressed by #19745 and #19787. Cross refs: Best! |
No, AFAIK @vtjnash's plan is to optimize |
This implements
map
,filter
, andbroadcast
forNullable
arguments.cc: @nalimilan @johnmyleswhite @hayd @vchuravy