-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-14051][SQL] Implement Double.NaN==Float.NaN for consistency.
#11868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Double.NaN==Float.NaN in row.equals for consistency.Double.NaN==Float.NaN in row.equals for consistency.
|
Test build #53726 has finished for PR 11868 at commit
|
|
What do we do for hash code? |
|
Oh, thank you for pointing out that. I missed that part. Let me check that again. I guess we can change |
Double.NaN==Float.NaN in row.equals for consistency.Double.NaN==Float.NaN for consistency.
|
Hi, @rxin . Thank you again! I made a big mistake in this PR and now I fixed it due to your advice. Now, the followings are true. Also, I applied this to |
|
I spent a bit of time on this -- I'm actually not sure we want to change this anymore, because Scala itsefl doesn't do this and users can always screw up if they do field comparison themselves. |
|
It's because Scala uses the standard way of Java and IEEE floating point. I also know that NaN is always false with even other NaN in Java/Scala. |
|
For example, Oracle orders NaN greatest with respect to all other values, and evaluates NaN equal to NaN. |
|
Yea but if they do row1.getFloat(1) == row2.getDouble(2), it'd ... |
|
IBM DB2 also says "From an SQL perspective, infinity = infinity, NaN = NaN, and sNaN = sNaN." |
|
Oh, I see what is the point here now. @rxin , may I explain a little bit more? Mathematically,
However, Spark
As you guess easily, the followings are still false.
|
|
I think this PR makes Spark users feel less confused by completing the missing part of |
|
Test build #53799 has finished for PR 11868 at commit
|
|
Test build #54055 has finished for PR 11868 at commit
|
|
Test build #54124 has finished for PR 11868 at commit
|
| } | ||
| case f1: Float if java.lang.Float.isNaN(f1) => | ||
| if (!o2.isInstanceOf[Float] || ! java.lang.Float.isNaN(o2.asInstanceOf[Float])) { | ||
| if (!(o2.isInstanceOf[Float] && java.lang.Float.isNaN(o2.asInstanceOf[Float]) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in general NaN never equals NaN. There might be some reason to treat it differently here. On the one hand I tend to agree with this change anyway, on the grounds that it implements something like automatic promotion in Scala/Java. But clearly we're already not implementing the language's semantics, and trying to achieve something more like bitwise-equal semantics. In that case this wouldn't quite be right. Ask the author of the original change why it was made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for attention, @srowen . I agree with you and @rxin in the current Spark status view point.
Today, when I'm reading Spark SQL, DataFrames and Datasets Guide: NaN Semantics, I just suddenly want to update this PR.
NaN Semantics
There is specially handling for not-a-number (NaN) when dealing with float or double types that does not exactly match standard floating point semantics. Specifically:
- NaN = NaN returns true.
I'm still digging to find some useful cases for this PR outside SQL layers.
|
Test build #55494 has finished for PR 11868 at commit
|
|
Hi, @JoshRosen . According to the initial commit in |
|
Yea unfortunately the behavior is already pretty weird, and I'm not sure adding this would actually make it less weird, so I'm in favor of just not doing anything here. |
|
Sure. As I mentioned today, I'm going to close this PR since Spark don't want this. |
|
It's just for a record. |
What changes were proposed in this pull request?
Since SPARK-9079 and SPARK-9145,
NaN = NaNreturns true and works well. The only exception case is direct comparison betweenRow(Float.NaN)andRow(Double.NaN). The following is the example: the last two expressions had better betrueandList([NaN])for consistency.Please note that the following background truths as of today (before this PR).
How was this patch tested?
Pass the Jenkins tests including new testcases.