-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double #29515
Conversation
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
Outdated
Show resolved
Hide resolved
cc @cloud-fan , this should be relevant to you |
ok to test |
f <- Gen.oneOf( | ||
Gen.oneOf( | ||
Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue, | ||
0.0f, -0.0f, 1.0f, -1.0f), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are 1.0f
and -1.0f
also special values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't in the sense, that Arbitrary.arbFloat.arbitrary
can generate them, but they are in the sense, that it is more likely, that a function could act weirdly at these values. For example log1p
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave some comments in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
Show resolved
Hide resolved
also cc: @srowen |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
Show resolved
Hide resolved
Test build #127791 has finished for PR 29515 at commit
|
This failure is discovered by this change, not caused. Should that be fixed by a separate pull request? Not sure which of the two is the correct behavior. |
I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed. |
retest this please |
I meant, that it could be fixed in another PR, before this PR is merged |
Test build #127794 has finished for PR 29515 at commit
|
Could you file jira and fix it? |
There is a related issue with -0.0: https://issues.apache.org/jira/browse/SPARK-32110 |
@@ -793,7 +793,9 @@ case class EqualTo(left: Expression, right: Expression) | |||
// | FALSE | FALSE | TRUE | UNKNOWN | | |||
// | UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN | | |||
// +---------+---------+---------+---------+ | |||
protected override def nullSafeEval(left: Any, right: Any): Any = ordering.equiv(left, right) | |||
protected override def nullSafeEval(left: Any, right: Any): Any = { | |||
left == right || ordering.equiv(left, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, we shoud fix the existing test failures that we found in this PR. But, this fix looks improper, so could we use NormalizeNaNAndZero
instead? cc: @cloud-fan @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeNaNAndZero
can't help here, because checkConsistencyBetweenInterpretedAndCodegen
is done without optimizers.
Also it could introduce new correctness issues with atan2(-0.0, x)
and 1.0 / -0.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that 0.0 == -0.0
is the expected behavior, but if it is not, then we could leave this as it was change the code gen path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, how about the case array(-0.0) == array(0.0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are 100% correct.
It is an interesting problem, where the same comparator is used for both sorting and equality check.
For sorting -0.0
should be smaller than 0.0
, but in equality check they should be equal.
Just for reference, it seems that both hive and mysql consider them equal in the equality check:
https://issues.apache.org/jira/browse/HIVE-11174
Test build #127808 has finished for PR 29515 at commit
|
Test build #127807 has finished for PR 29515 at commit
|
Test build #127816 has finished for PR 29515 at commit
|
There is a |
This is a good catch! Let's fix the found bugs one by one and merge this PR at the end. @tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110 |
Currently yes, the |
Test build #127840 has finished for PR 29515 at commit
|
This is a new finding. But also related to the |
Test build #128299 has finished for PR 29515 at commit
|
@cloud-fan, now that #29647 is merged, can this be merged also? |
Are all the bugs that this PR found already fixed now? |
retest this please |
I believe, that they were the manifestation of the |
Test build #128688 has finished for PR 29515 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @cloud-fan
Test build #128707 has finished for PR 29515 at commit
|
hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ? |
That's more of an umbrella jira, that mainly covers inconsistency between operators. Just to be safe, we could trigger some retests on this. |
retest this please |
Test build #128717 has finished for PR 29515 at commit
|
…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted #29495 and verified that the existing test cases caught the bug. Closes #29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 6051755) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Merged to master/3.0. To check if no flaky tests happens, I will keep watching Jenkins jobs. Anyway, thanks, @tanelk ! |
…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in apache#29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted apache#29495 and verified that the existing test cases caught the bug. Closes apache#29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 6051755) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
The
LiteralGenerator
for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but theGen.chooseNum
method does not yield values that are outside the defined range. TheGen.chooseNum
for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .There is an similar class
RandomDataGenerator
that is used in some other tests. Added-0.0
and-0.0f
as special values to there too.These changes revealed an inconsistency with the equality check between
-0.0
and0.0
.Why are the changes needed?
The
LiteralGenerator
is mostly used in thecheckConsistencyBetweenInterpretedAndCodegen
method inMathExpressionsSuite
. This change would have caught the bug fixed in #29495 .Does this PR introduce any user-facing change?
No
How was this patch tested?
Locally reverted #29495 and verified that the existing test cases caught the bug.