-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more operators on Nullable with lifting semantics #19034
Conversation
This defines all arithmetic operators, plus a few others, for Nullable, with lifting semantics.
Fantastic! Can't wait to see these in NullableOps.jl as well, so that I can remove my type piracy from Query.jl (which I'm also guilty off). Maybe not the right place to discuss this, but just the follow up question: we don't have agreement on methods for mixed Nullable and standard values, right? I would love to also see method definitions that handle things like |
Jeff's points from #16988 (comment) still stand, don't they? Wouldn't using a single higher-order generalizable strategy in all cases be more consistent than having a hard-coded non-extensible set of privileged operators that have these methods defined? |
I still side toward using |
I, too, don't see how this is terribly different from the vectorization problem, and we know how that resolved. |
Maybe |
@JeffBezanson That is a similar argument as was used for vectorized functions, and I agree. But we came to the conclusion that I don't have a problem with
with
where the readability difference is night and day. |
I am also of the opinion that |
But not everybody has Nullable values. Nor are operations on Nullables uniquely defined. I also believe it's better to eliminate null values as early as possible rather than threading them through every operation. For example
rather than relying on You can also use tricks like |
That is not the proposal here. The proposal is to add a select few methods for the most common arithmetic operators to work with
That simply doesn't square with the requirements of folks that do data analysis work... In some situations that can be done, but there are tons of situations where that is just not what a data analyst need to do, and we need to provide a solution for those cases if want julia to play in the data science space. The analogy to the vectorized function story has come up a couple of times. Here is how I see that: back when vectorized functions were added, no one had come up with the great idea that is now the |
This really resonates with me. It's just providing
Right. For example, in future versions of Julia with traits, we might be able to transfer the traits of
Similarly, if my mental model of a In conclusion, Julia has allowed us to abstract away so much better than any other language I have used, which means generic functions generally work very well (i.e. for a large range of input types). Requiring the user to explicitly implement higher-order lifting (even with a single character like |
@JeffBezanson my opinion is that your example is a great demonstration of an optimization a user might additionally provide. But Julia is usually very expressive in the sense that I can achieve much a the REPL with just a few lines of code. I don't want to be forced to type all that boilerplate at the REPL when |
@tkelman My position with regards to the argument that we're repeating the vectorization mistake: the plan is to provide lifted versions only for standard operators, i.e. those for which we deemed having a short element-wise form ( Just like for vectorization (
@JeffBezanson Maybe, but those who have nullable values are them intensively and cannot be satisfied with a verbose syntax. Also lifted operators on Operations are "uniquely defined" when you accept null propagation: lifting semantics are well-defined in several languages. |
Why are |
|
Could somebody point me to a compelling real-world example where this helps? It still seems marginal to me. I would expect, for example, to call a function to give me just the non-null values from a vector, at which point I have a normal numeric vector and can do anything. This strikes me as a minefield of special cases. maximum and minimum work, but std, var, and mean do not. Yet to provide this limited set of functionality, this PR adds over 130 methods. The C# precedent is a bit compelling; if everybody thinks that worked out well then it's certainly a good data point. |
A typical example came up today in a question to the mailing list: subsetting a data frame. With the current design, We want to provide user-friendly macros like DataFramesMeta, Query and StructuredQueries to make working with data it easy, efficient and flexible. These could perform automatic lifting but I think supporting these basic operations is still very useful in many cases (and I think @davidanthoff doesn't want to do automatic lifting with Query to remain close to standard Julia syntax). I'm afraid people are going to try this and find Julia frustrating if they cannot do that: I wouldn't feel confident switching data frames to |
Nullable{R}($op(x.value, y.value)) | ||
end | ||
end | ||
$op(x::Nullable{Union{}}, y::Nullable{Union{}}) = Nullable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the definitions with Union{}
really necessary? I thought promote_op
would handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that Union{} <: NullSafeTypes
, so it would be considered as safe and we would try applying the operation even if the value
field isn't defined. One alternative is to define null_safe_op
for Union{}
for each operation (to avoid ambiguities), which isn't cleaner. Not sure whether there's a better way of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could potentially be fixed by defining NullSafeTypes as Union{Type{Int8}, Type{Int16}, ...}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting. I will have a look at this solution.
I was hoping for something more real-world than |
@JeffBezanson It is even more disappointing to a new user that
is fine when
would throw an error unless there is a method for
is implemented as
I feel that this design decision in the |
+100 to that. If you're lucky enough to have no missing data, I don't see why Nullable should be forced on you. CSV.jl moved to making Nullable optional, and AFAICT everything got better and everybody is happy. Random thought: maybe we can use the C# "safe navigation" operator |
Operations on Re: the C# analogy: Does C# defined mixed-signature lifted operators, e.g. People seem to believe this PR is necessary, so I won't oppose it beyond what I've said here and elsewhere. But I do predict that merging this PR commits us to an undetermined amount of maintenance for this strategy. I don't intend to rely on it. |
I actually agree with @JeffBezanson here that it feels much more sensible and long-term to come up with a way to be able to apply lifting semantics to any function call, a la vectorization. x +? y # => gets lowered to lift(+)(x, y) |
We can certainly stop converting the
Or maybe just EDIT: maybe lifted operators |
@davidagold C# actually allows mixing nullables and scalar: https://msdn.microsoft.com/en-us/library/2cf62fcy(v=vs.140).aspx#Anchor_4 |
Is there a consensus that the existing broadcasting operators will not be appropriate for this? Before introducing new syntax we should consider whether |
@TotalVerb But then how do you apply a lifted element-wise operation like |
I think we should just make And lifted elementwise will be a problem anyway, right? Even if we have |
Just kind of thinking aloud, if I may... I can kind of see both sides here, though I think I'm leaning toward agreeing with Jeff. It seems good to know what you're doing if your data are nullable, in which case having to more explicitly deal with null values via I don't think that functions, especially those in Base, should have to be able to handle nulls by default. And treating a certain group specially seems somewhat arbitrary and is perhaps a slippery slope. While introducing a convenient syntax that can be applied anywhere is tempting, something like f?(x) ?+ g?(x) ?> 0 ? x ?+ 1 : y At the same time, reusing the broadcast syntax feels like a weird pun if you aren't used to thinking of Personally I think things like this could be dealt with in packages that provide abstractions and conveniences like automatic lifting by default, and Base shouldn't have to be concerned with it. People who want to deal with raw |
That's a good point. I personally am ok with adding these specific operations given the C# precedent, although I continue to think the mixed-type case is a good bit weirder than either the pure no-nullable-arguments case or the pure all-nullable-arguments case. I'd want to know when a reduction operation on an array like |
Having chimed in already, let me explain my major concern with these kinds of explicit method definitions for nullable types: they encourage people to define functions on nullables rather than employing a generic lifting strategy like dot-broadcast notation. My main fear with such ad hoc method definitions is that we'll see examples in the wild in which I believe all of the operations in this PR will match the standard lifting semantics one would expect (and so I'm ok with merging this PR), but I do worry about end-users deciding that their code would be cleaner if they only had access to a lifted version of |
I fully agree with @johnmyleswhite's concerns. I think the best solution to discourage people from defining specialized methods on It could still be interesting to get input from somebody familiar with the design of C#. If anybody knows such a person... |
I might be able to get Erik Meijer to chime in if we had specific questions to answer that I could e-mail him. |
I would ask if they were overall happy with allowing operators on nullables. Did it hide bugs? Were people annoyed or confused about the specific set of operations supported? Do they wish they could have applied the idea more widely? If they could do it again, would there be more kinds of nullable? |
For Query.jl I've now completely changed strategy around this: I'm no longer using Having said that, I have to say that I'm not sold at all by the general lifting approach, mainly because I don't see any workable proposal on the table for that. I've written my main objections about the current approaches up here queryverse/Query.jl#71, in case anyone is interested ;) The TLDR version is that I believe that unlike in the vectorization story, there is no one lifting semantics that should be applied to all functions. There are more points in the linked issue, but that is the main one. Of course, this still leaves the question of the API for |
@johnmyleswhite If you are in touch with Erik it would actually be fantastic if you could also simply ask him what he would have done differently in the design of LINQ in hindsight? It is a broad question, but Query.jl is very, very close to LINQ and getting any insight into what he thinks they messed up in the original design would be unbelievably helpful at this point in Query.jl's life. |
I don't consider defining a new null type and then adding methods to "every" function for it materially different from defining those same methods for |
Another Eric (Lippert, formerly on the C# team and ECMAScript committee), has some interesting comments about C#'s nullable and lifting -- design constraints, optimizations, and some issues encountered: http://stackoverflow.com/a/9013171 In particular: https://blogs.msdn.microsoft.com/ericlippert/2007/06/27/what-exactly-does-lifted-mean/ Of note there, with respect to C#'s lifting as a positive precedent for this PR:
(of course, IIUC, some of these issues were indeed cleaned up in later standard revisions) |
These are interesting references. Note though that the "confusion" he apologizes for only regards the use of the term "lifted" for operations which are actually not lifted in the strict sense: he's talking about comparison operators ( Nowhere does he actually say that lifting arithmetic operators by default was a mistake. We can even interpret this post as the affirmation of the contrary, as he apologizes for bad terminology, which would have been the occasion of saying the design was inconsistent too if he thought so. |
I had a meeting with Erik Meijer and Eric Lippert today about these issues. Before the meeting, Erik Meijer encouraged reading his paper on Cω and the power of the dot syntax, which is very close to our new broadcast syntax. The main distinction is that Erik thinks of things as The distinction from what we offer or plan to offer (aside from lifting field access) is that this lifting operation also automatically flattens the results. This is linked to a comment Jameson recently made in an offline discussion about how our dot syntax should apply to These ideas did not generally make it into C# and stayed only in the Cω prototype. In regards to C#'s development, Eric Lippert first said the following in an e-mail before we met in person to discuss the same issues:
Building off of these points in person, we discussed our broadcast syntax and how it relates to Erik Meijer's ideas for Cω. Both Erik and Eric were in favor of using our dot notation to automatically lift functions on Nullables. In addition, they thought that most binary operators like They also suggested implementing automatic flattening. We'll need to think through those details for how they'd apply in Julia. In general, they note that our dot syntax should continue down the road @nalimilan has already headed by transforming In that regard, the functions that deserve special semantics were Important things we should do differently than C#:
More broadly, they suggested we should choose one semantic interpretation of Finally, they suggested following C#'s lead and defining three-valued logic for non-short circuiting Boolean functions, but not allowing nullable values at all with short-circuiting Boolean operators. Eric's blog posts linked above detail the arguments for this distinction and I find his position very compelling. I believe that summarizes all of my communications with them on the topic. Happy to try translating all of this content and some recent offline conversations I've had with other committers into a formal design for where we want |
@johnmyleswhite Cool, thanks, this is fantastic! And will take a while to digest (at least for me). One quick question on something specific:
Did they give a reason for that recommendation? |
I'm not sure, except insofar as returning a nullable prevents one from using that idiom in condition clauses like |
Thanks for writing this detailed summary. Sounds like a great plan in general, and it's reassuring that they agree with our previous decisions, while giving insights on the ones we were unsure about. I just have a small reservation regarding this:
Eric's blog post only gives one argument in favor of not supporting Anyway, that's really a detail and it can be changed later without breaking backward compatibility, so I don't think we need to decide this right now. But I just wanted to mention it after reading the blog post. |
Short circuiting operators are control flow. Allowing nullables with "if necessary" semantics would be equivalent to truthiness in Would it achieve the same goals of avoiding misuse if non-lifted boolean operations were not-comparable-errors on nullables, and opt in lifting could give you 3VL when requested? Default 3VL seems like it might put us in an odd place where some generic code might do a sane thing on nullable inputs, but anything that used comparison results in control flow would error until explicit null handling was added. Do we want to require explicit null handling everywhere, somewhere (control flow only?), or nowhere? |
Is this right? Since 3VL logic generates |
Code that uses control flow would not work unmodified if it gets a nullable in the condition (unless we changed the above behavior). It's code that doesn't use any comparison results in control flow (not the biggest subset, but "some") that would sometimes give you a useful Nullable answer, without needing to change anything if you have 3VL auto lifting. |
I'm still confused. 3VL would apply to |
My point isn't that 3VL changes the behavior of code that has nullable comparisons in control flow. It's that it changes the behavior of code that doesn't - you would start getting nullable from comparisons as the output of some functions, if they managed not to error. You couldn't use the output in logical indexing or other places that expect a boolean though. So what advantage does auto lifting give, how often will the 3VL result actually be what you wanted? Generic code is probably written assuming comparisons return boolean, and that seems like an important part of the contract of what those operators should mean generically. |
I agree it doesn't bring a very clear advantage if nullable is still not supported in control flow. Though it doesn't hurt either, since code would have failed even earlier without it. Anyway, my point was that short-circuiting 3VL can be given clear and quite natural semantics, so that's not an argument for not implementing them. There may well be other arguments not to implement it, like the fact that it's not very useful in isolation. |
Closing since we now have #16961. |
This defines all arithmetic operators (plus a few others) for Nullable, with lifting semantics.
This is the next step after #18304: it implements most operators from #16988. I left out comparison operators for which it's not yet clear whether we want to return
Bool
orNullable{Bool}
; they will constitute the next step. The semantics for the present PR are quite clear (identical to those oflift
in #18758): returnNullable{T}(op(x, y))
if neither argument is null, andNullable{T}()
if one of them is.T
is chosen viapromote_op
to ensure type stability.These are currently shipped by NullableArrays.jl, but this is not ideal as it's really type piracy, and other packages like Query.jl need them even without using NullableArrays. The idea is to include them in Base, and then use the NullableOps.jl package for Julia 0.4 and 0.5 (just like Compat).
Cc: cc: @johnmyleswhite @davidagold @quinnj @davidanthoff @TotalVerb @vchuravy