Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Deprecate undefined == comparisons #18856

Closed
wants to merge 6 commits into from
Closed

Conversation

nalimilan
Copy link
Member

This PR is a first stab at deprecating cross-type comparisons for which we previously fell back on === (#15983).

The first commit uses === where it makes sense and can be merged separately (#18853).

The second commit is more controversial, and could also be discussed in a separate PR: it changes all in and find* methods to use isequal consistently (until know only Dict did this for in). Relevant discussions are #9381 (about in), and #16269 and #18668 (about find). This change allows making == an error for arbitrary comparisons without breaking too much code. Indeed, it is relatively common to store heterogeneous non-comparable types in arrays or non-standard dicts (which use == contrary to Dict): in particular, strings and chars are stored together in the LineEdit/REPL code, but that's also the case of other high-level objects like Base.Multimedia.display. Note that this change is not strictly needed to make == throw an error for not-comparable types: the code could be made more careful instead. But I started with this minimally invasive strategy (in terms of lines of code to change) to be able to go to the next step without wasting time.

The third commit changes more comparisons to be robust against the change in behavior of ==. It doesn't make much sense in isolation, but it doesn't break the tests either.

The fourth commit is the most interesting and challenging. Indeed, isequal falls back to ==, yet we don't want the former to print deprecation warnings nor throw errors (after the next release). So we need a way for isequal to change the behavior of == for its needs. So far I've found two possible approaches:

  • Use a try... catch block to catch the NotComparableError thrown by ==. This is unacceptable for performance unless the compiler is able to determine statically that the exception is thrown reliably by a method. Also, it doesn't work to silence deprecation warnings (for the first phase).
  • Use a mechanism similar to @inbounds/@boundscheck to tell == that no exception should be thrown. isequal would set this flag when calling ==. This is the one I've retained here, temporarily abusing @inbounds; if we choose it, a separate macro would be warranted (e.g. @comparable).
    Actually, this second approach is quite powerful, and it would offer a alternative behavior: instead of having == throw and isequal return false, both methods could throw by default, unless they are marked with @comparable. in and find* would use it by default to allow comparing arbitrary types. The advantage of this solution is that we wouldn't have to conflate two meanings into isequal: IEEE floating point semantics and arbitrary comparisons don't necessarily go together. The downside is that it could be more complex for users.

Anyway, this is obviously quite rough. Comments and advice welcome.

@@ -9,6 +9,9 @@ abstract Enum

Base.convert{T<:Integer}(::Type{T}, x::Enum) = convert(T, box(Int32, x))

Base.:(==)(x::Enum, y::Integer) = Int32(x) == y
Base.:(==)(x::Integer, y::Enum) = x == Int32(y)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these shouldnt exist. Enum and Integer aren't compatible types. Why was this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added this because of comparisons done with integer codes returned by C, e.g. this one. But we can tell people to call Cint before ==, like in most other places (e.g. here). Will fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. These lines were broken from the start, so that's the first bug uncovered by the new strict rules.

@@ -992,7 +993,7 @@ julia> findnext(A,3)
"""
function findnext(A, start::Integer)
for i = start:length(A)
if A[i] != 0
if !isequal(A[i], 0)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a recent proposal to make this 'iszero(x)' or zero(T), since == 0 doesn't really work right for some custom types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would fit well with this PR. Then we would need have the fallback iszero calll isequal (rather than ==). And since isequal(0, -0.0) == false, we should define iszero on floats so that iszero(-0.0) == true

@TotalVerb
Copy link
Contributor

So the new semantics will be: == numeric equality, isequal general equality, and === egality?

@nalimilan
Copy link
Member Author

So the new semantics will be: == numeric equality, isequal general equality, and === egality?

== will not be only for numeric values. I'd say it should be used to compare "comparable" values, i.e. it offers the additional safety that it will error if you try to compare inadvertently values from two types which can never be equal.

@TotalVerb
Copy link
Contributor

That makes sense. Could we preserve a subset of useful comparisons between noncomparable types?

  • Expr and anything... useful for metaprogramming. === won't do here.
  • Void and anything. === works but not in all cases e.g. Union{String,Void}

I would prefer not to use isequal in these cases.

@nalimilan
Copy link
Member Author

Yeah, these exceptions should definitely be considered. Void is very special already in that it's not really useful to compare nothing only with values of the same type. Expressions are also frequently used in contexts where other types can appear, but I can't tell what's the best behaviour for them.

@JeffBezanson
Copy link
Sponsor Member

I really want to avoid having a long complicated list saying when to use isequal vs. ==. With only three functions (isequal, ==, ===) trying to implement many distinctions, you get possibly-unwanted behaviors piggybacking on others. For example, if I have x == 0.0 and I want it to be type-permissive, I'll switch to isequal(x, 0.0), but then the behavior on -0.0 also changes.

+1 to the first commit though.

@TotalVerb
Copy link
Contributor

TotalVerb commented Oct 10, 2016

Maybe we should add some more equality predicates. We have a countable list of operators ====, =====, ======, etc. available. Looking forward to writing x ============== y in the near future...

Who could've thought that equality was such a hard problem?

@StefanKarpinski
Copy link
Sponsor Member

Who could've thought that equality was such a hard problem?

Anyone who paid attention to decades of Lisp debates on the subject :)

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Oct 11, 2016

@JeffBezanson: I think there's something very telling about the fact that the rest of this change forces the part that you like – and that without the errors that a stricter == imposes, it seems hard to have the discipline to use === and == as appropriate. I still like that this change gives the three equalities three very clear roles:

  • ===: can compare any kinds of values, captures programmatic distinguishability (egal)
  • isequal: can compare any kinds of values, captures equivalence classes of values
  • ==: can only make comparisons that "make sense", captures intuitive equality

We've tried to "sweep isequal under the carpet", but it doesn't really work that well – you end up explaining it anyway and the fact that == and isequal are almost the same makes the explanation really confusing and makes it really hard to know which one to use. This distinction makes it easier. Are you comparing two completely arbitrary values? Then use === or isequal. Are you comparing two things that should "make sense to compare" or give an error? Then use ==.

@TotalVerb
Copy link
Contributor

Jeff's point about negative zeroes still worries me. Why is it that type-permissive equality comes with additional stuff, like NaN and zero behaviour? It seems that two orthogonal concepts are being combined.

@nalimilan
Copy link
Member Author

Jeff's point about negative zeroes still worries me. Why is it that type-permissive equality comes with additional stuff, like NaN and zero behaviour? It seems that two orthogonal concepts are being combined.

They are theoretically orthogonal, but in practice there's one strong reason to have them linked: the fact that one wants to be able to use NaN as a dictionary key. I would be inclined to make isequal(0.0, -0.0) since that's the source of all pains around in (cf. #9381). Then the meaning of isequal would be much clearer.

@JeffBezanson
Copy link
Sponsor Member

I think there's something very telling about the fact that the rest of this change forces the part that you like

Yes, but it also forces changes that I don't like, namely randomly changing several comparisons from == to isequal. The problem is that it's not so easy to know when you're comparing two arbitrary values in generic code. In general find and in might be looking for an arbitrary value in an arbitrary container, but somebody who writes 0.0 in A likely expects floating-point equality. There's an entire (inconclusive) issue on that topic, so it would be a bit odd to casually flip that switch now just to work around some newly-added error cases.

I agree that if == and isequal behaved the same on -0.0 and NaN then this change would be easier to accept, since type-permissiveness would be the only distinction to think about. I don't find it all that clear to say that == is for comparisons that "make sense", since then the debate is about which comparisons make sense. The list of exceptions discussed above. e.g. Union{T,Void}, or Expr == Symbol is exhibit A.

isequal(::Void, ::Void) = true
isequal(::Void, ::Any) = false
isequal(::Any, ::Void) = false
isequal(x::Union{Method, TypeName}, y::Union{Method, TypeName}) = x === y
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come up? Would it be possible to address by changing some more calls to use ===?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's hard to track since the failure happens during bootstrap, and the backtrace doesn't mention the function name (even with julia-debug). Is there anything I can do to improve that?

@StefanKarpinski
Copy link
Sponsor Member

The problem is that it's not so easy to know when you're comparing two arbitrary values in generic code. In general find and in might be looking for an arbitrary value in an arbitrary container, but somebody who writes 0.0 in A likely expects floating-point equality.

I opened #9381 specifically because it's probably better to use isequal when computing x in A rather than == (and you seemed to agree, at least initially). I think that NaN in [NaN] being false is worse than 0.0 in [-0.0] being false. Of course, the latter could be fixed if we make isequal(-0.0, 0.0) true; another argument to consider in #9381.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Oct 11, 2016

I don't find it all that clear to say that == is for comparisons that "make sense", since then the debate is about which comparisons make sense. The list of exceptions discussed above. e.g. Union{T,Void}, or Expr == Symbol is exhibit A.

There is some room for debate here, but it's clear that some comparisons don't make any sense like "foo" == 1.2. I'm not sure what the exact criterion to use should be but for dict keys we used isequal(x, convert(T, x)) to check if x was a sane value of type T for a while. This one is harder to define because of the desire for symmetry.

@JeffBezanson
Copy link
Sponsor Member

I thought of a more formal way to express what I'm thinking. This change breaks a type embedding property: when you move from domain A to a super-domain B (e.g. moving from real to complex), elements in A should behave the same. So moving from a comparison that works on Float64 to one that works on Union{Float64, X} should not compare Float64s differently.

@StefanKarpinski
Copy link
Sponsor Member

Are you talking about collections? If we make "foo" == 1.2 an error, that doesn't necessarily affect ["foo"] == [1.2] (although perhaps that's what this PR does – I didn't notice that).

@JeffBezanson
Copy link
Sponsor Member

I'm mostly talking about moving to wider types. For example I start with x::Float64 == y::Float64, and then change something such that x and y now might be strings, or nothings: x::Union{Float64,String} == y::Union{Float64,String}. Under this change that no longer works, but nor can I use isequal since it changes how Float64s compare, so I have to write my own equality function at that point.

Having a function that both expands the domain within which equality works, and also changes how some previously-comparable elements compare is a gotcha.

Of course, one solution is to use isequal as much as possible, e.g. for in and other places. Hey, I'd be fine with that --- even better, we could just define const (==) = isequal :) But the point is, further use of isequal further undermines the argument that == should be strict. I would expect somebody to be puzzled that 1 == "x" is an error, but 1 in ["x"] is fine.

@davidanthoff
Copy link
Contributor

I'm not sure this is relevant here, so please feel free to ignore if this is off base. But for Query.jl it would be really key that x::T == x::Nullable{T} works.

@TotalVerb
Copy link
Contributor

@davidanthoff But as far as I can tell, that doesn't do what is expected.

julia> 1 == Nullable(1)
false

@davidanthoff
Copy link
Contributor

@TotalVerb Yes, my hope is that the definition will be changed in base so that 1 == Nullable(1) would return true.

Right now I overwrite that definition in the Query package, but that of course is terrible (type piracy and all of that).

@nalimilan
Copy link
Member Author

The current fallback on === is indeed a bit surprising, and that's one of the cases where it could be better do raise an error rather than returning false. Indeed, for nullable-nullable comparisons, we currently have a == method which throws just to avoid the fallback on === (which can give misleading results for null values, cf. #16923).

But let's not add Nullable in this discussion, it's mostly unrelated. We just need to choose what == method we want to define. Experience shows that this question is controversial, so don't leave it derail this PR.


I'm mostly talking about moving to wider types. For example I start with x::Float64 == y::Float64, and then change something such that x and y now might be strings, or nothings: x::Union{Float64,String} == y::Union{Float64,String}. Under this change that no longer works, but nor can I use isequal since it changes how Float64s compare, so I have to write my own equality function at that point.

Having a function that both expands the domain within which equality works, and also changes how some previously-comparable elements compare is a gotcha.

@JeffBezanson I guess one can see it that way, but it can also indicate that we need == methods to allow comparing nothing with any other type. Indeed, your example wouldn't make a lot of sense if you replaced Void with String or Char. If you did that, then the fact that == throws an error can be seen as a feature: if code was written expecting numbers, writing x == '1' or x == "1" can only give useless (at best) or confusing (at worst) results. That's just the same as what would happen if you passed a types for which + isn't defined.

Of course, one solution is to use isequal as much as possible, e.g. for in and other places. Hey, I'd be fine with that --- even better, we could just define const (==) = isequal :) But the point is, further use of isequal further undermines the argument that == should be strict. I would expect somebody to be puzzled that 1 == "x" is an error, but 1 in ["x"] is fine.

I wonder whether the core of the issue is that we don't have an easy to type operator for isequal. Maybe we need another operator? (=== would have been a perfect fit. :-)

FWIW, Go panics when comparing "not comparable" values. Of course it's a static language so it's a bit different.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Oct 14, 2016

I have to say that I don't find the x::Union{Float64,String} == y::Union{Float64,String} particularly convincing. That seems like bizarre code that should probably be refactored. I'd be interested in an example of this happening in the wild to see if it is actually as unnatural as it seems.

@nalimilan
Copy link
Member Author

nalimilan commented Oct 14, 2016

@StefanKarpinski Concrete cases of Union{T, Void} in Base are actually with String, Symbol, or Function. If you want to have a look, several changes in the first and fourth commits above are there to deal with that.

EDIT: perhaps more to the point, cases of Union{S, T} in Base other than Void were the mixing of Char and String in dictionaries in LineEdit/REPL code, which isn't a good idea AFAICT because the gain due to using a simpler type (Char) is most likely cancelled by the type instability. There also are comparisons between Symbol,Expr, QuoteNode and GlobalRef, but these are arguably special types.

@JeffBezanson
Copy link
Sponsor Member

Sure, you would not use Union{Float64,String}, but currently you don't even need to think about it, whereas here an ad-hoc list of special cases (e.g. Void) might be introduced. The real issues are

  1. Coupling type-permissive comparison to -0.0 and NaN comparison is a gotcha. I can easily imagine switching between == and isequal for one of those reasons, while forgetting that you're also pulling in the other one.
  2. I'm not convinced that it can simultaneously be important for 1 == "1" to give an error, yet allow 1 in ["1"].

I agree it would be great to have some sort of infix for isequal, but that doesn't fully address the in issue, since what the obvious operators == and in do out of the box is the most important thing.

@JeffBezanson
Copy link
Sponsor Member

Heh, I just noticed that x in (y,) is shorter than isequal(x,y), making it a passable infix for isequal. We could even add a specialization for 1-tuples so it has no cost over calling isequal directly. Of course x in [y] is even shorter but harder to optimize.

@kshyatt kshyatt added the deprecation This change introduces or involves a deprecation label Jan 8, 2017
@nalimilan
Copy link
Member Author

Doesn't look like this PR has a chance of being merged.

@nalimilan nalimilan closed this May 20, 2017
@nalimilan nalimilan deleted the nl/notcomparable branch May 20, 2017 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants