[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double #29515

tanelk · 2020-08-22T17:01:51Z

What changes were proposed in this pull request?

The LiteralGenerator for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the Gen.chooseNum method does not yield values that are outside the defined range. The Gen.chooseNum for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class RandomDataGenerator that is used in some other tests. Added -0.0 and -0.0f as special values to there too.

These changes revealed an inconsistency with the equality check between -0.0 and 0.0.

Why are the changes needed?

The LiteralGenerator is mostly used in the checkConsistencyBetweenInterpretedAndCodegen method in MathExpressionsSuite. This change would have caught the bug fixed in #29495 .

Does this PR introduce any user-facing change?

No

How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala

tanelk · 2020-08-22T17:08:42Z

cc @cloud-fan , this should be relevant to you

maropu · 2020-08-22T23:14:21Z

ok to test

maropu · 2020-08-22T23:15:04Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala

+      f <- Gen.oneOf(
+        Gen.oneOf(
+          Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue,
+          0.0f, -0.0f, 1.0f, -1.0f),


Are 1.0f and -1.0f also special values?

They aren't in the sense, that Arbitrary.arbFloat.arbitrary can generate them, but they are in the sense, that it is more likely, that a function could act weirdly at these values. For example log1p.

Could you leave some comments in the code?

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala

maropu · 2020-08-22T23:17:29Z

also cc: @srowen

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala

SparkQA · 2020-08-23T01:18:45Z

Test build #127791 has finished for PR 29515 at commit 8c8313c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tanelk · 2020-08-23T01:50:32Z

Test build #127791 has finished for PR 29515 at commit 8c8313c.

This patch fails Spark unit tests.

This patch merges cleanly.

This patch adds no public classes.

This failure is discovered by this change, not caused. Should that be fixed by a separate pull request? Not sure which of the two is the correct behavior.
There could be more like this, but due to the random nature of generated values it might not show every build.

srowen · 2020-08-23T01:54:47Z

I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed.

tanelk · 2020-08-23T02:02:27Z

retest this please

tanelk · 2020-08-23T02:04:01Z

I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed.

I meant, that it could be fixed in another PR, before this PR is merged

SparkQA · 2020-08-23T06:26:58Z

Test build #127794 has finished for PR 29515 at commit 77d63ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-08-23T06:31:44Z

I meant, that it could be fixed in another PR, before this PR is merged

Could you file jira and fix it?

tanelk · 2020-08-23T09:26:14Z

There is a related issue with -0.0: https://issues.apache.org/jira/browse/SPARK-32110

maropu · 2020-08-23T10:12:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

@@ -793,7 +793,9 @@ case class EqualTo(left: Expression, right: Expression)
  // | FALSE   | FALSE   | TRUE    | UNKNOWN |
  // | UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN |
  // +---------+---------+---------+---------+
-  protected override def nullSafeEval(left: Any, right: Any): Any = ordering.equiv(left, right)
+  protected override def nullSafeEval(left: Any, right: Any): Any = {
+    left == right || ordering.equiv(left, right)


At least, we shoud fix the existing test failures that we found in this PR. But, this fix looks improper, so could we use NormalizeNaNAndZero instead? cc: @cloud-fan @viirya

NormalizeNaNAndZero can't help here, because checkConsistencyBetweenInterpretedAndCodegen is done without optimizers.

Also it could introduce new correctness issues with atan2(-0.0, x) and 1.0 / -0.0.

I assumed that 0.0 == -0.0 is the expected behavior, but if it is not, then we could leave this as it was change the code gen path.

But, how about the case array(-0.0) == array(0.0)?

You are 100% correct.

It is an interesting problem, where the same comparator is used for both sorting and equality check.
For sorting -0.0 should be smaller than 0.0, but in equality check they should be equal.

Just for reference, it seems that both hive and mysql consider them equal in the equality check:
https://issues.apache.org/jira/browse/HIVE-11174

SparkQA · 2020-08-23T13:32:19Z

Test build #127808 has finished for PR 29515 at commit 1116d3d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-23T14:41:06Z

Test build #127807 has finished for PR 29515 at commit 818abe2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-24T00:06:16Z

Test build #127816 has finished for PR 29515 at commit 2dce36e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tanelk · 2020-08-24T05:16:58Z

There is a org.apache.spark.sql.RandomDataGenerator, that does pretty much the same thing as the LiteralGenerator. Perhaps they should be unified?

cloud-fan · 2020-08-24T09:22:08Z

This is a good catch! Let's fix the found bugs one by one and merge this PR at the end.

@tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110

tanelk · 2020-08-24T09:32:06Z

This is a good catch! Let's fix the found bugs one by one and merge this PR at the end.

@tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110

Currently yes, the -0.0 inconsistency is the only bug.
I would add to the jira that there is also inconsistency in the code gen and the interpreted code paths: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport/org.apache.spark.sql.catalyst.expressions/PredicateSuite/AND__OR__EqualTo__EqualNullSafe_consistency_check/

SparkQA · 2020-08-24T10:28:44Z

Test build #127840 has finished for PR 29515 at commit 9a3b983.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tanelk · 2020-08-24T12:54:36Z

Test build #127840 has finished for PR 29515 at commit 9a3b983.

This patch fails Spark unit tests.

This patch merges cleanly.

This patch adds no public classes.

org.scalatest.exceptions.TestFailedException: Incorrect evaluation (codegen off): [false,0.0] IN ([true,Infinity],[true,7.328233268583497E27],[false,-0.0],[true,2.28620591956394208E17],[true,-1.7155423374193937E-160],[null,7.853276721068246E-75],[null,Infinity],[true,null],[false,null]), actual: false, expected: true

This is a new finding. But also related to the 0.0 == -0.0.

SparkQA · 2020-09-04T16:39:55Z

Test build #128299 has finished for PR 29515 at commit 529aba8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tanelk · 2020-09-11T16:04:26Z

@cloud-fan, now that #29647 is merged, can this be merged also?

maropu · 2020-09-15T01:44:09Z

@cloud-fan, now that #29647 is merged, can this be merged also?

Are all the bugs that this PR found already fixed now?

maropu · 2020-09-15T01:45:18Z

retest this please

tanelk · 2020-09-15T02:57:57Z

@cloud-fan, now that #29647 is merged, can this be merged also?

Are all the bugs that this PR found already fixed now?

I believe, that they were the manifestation of the 0.0 != -0.0

SparkQA · 2020-09-15T07:05:03Z

Test build #128688 has finished for PR 29515 at commit 529aba8.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-09-15T07:50:17Z

retest this please

maropu

cc: @cloud-fan

SparkQA · 2020-09-15T12:57:25Z

Test build #128707 has finished for PR 29515 at commit 529aba8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-09-15T13:48:51Z

hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ?

tanelk · 2020-09-15T14:23:11Z

hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ?

That's more of an umbrella jira, that mainly covers inconsistency between operators.
Here we uncovered an inconsistency within the equality operators codegen and non-codegen paths.

Just to be safe, we could trigger some retests on this.

maropu · 2020-09-15T14:32:52Z

retest this please

SparkQA · 2020-09-15T19:31:59Z

Test build #128717 has finished for PR 29515 at commit 529aba8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted #29495 and verified that the existing test cases caught the bug. Closes #29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 6051755) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

maropu · 2020-09-16T03:15:21Z

Merged to master/3.0. To check if no flaky tests happens, I will keep watching Jenkins jobs. Anyway, thanks, @tanelk !

…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in apache#29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted apache#29495 and verified that the existing test cases caught the bug. Closes apache#29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 6051755) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

Add special values to LiteralGenerator for float and double

8c8313c

probot-autolabeler bot added the SQL label Aug 22, 2020

tanelk commented Aug 22, 2020

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala Outdated Show resolved Hide resolved

maropu changed the title ~~[SPARK-32688][TESTS] Add special values to LiteralGenerator for float and double~~ [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 22, 2020

maropu reviewed Aug 22, 2020

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala Show resolved Hide resolved

maropu reviewed Aug 22, 2020

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala Show resolved Hide resolved

srowen reviewed Aug 22, 2020

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala Show resolved Hide resolved

Add 'MaxValue' to LiteralGenerator for float and double

77d63ac

Ensure -0.0 and 0.0 comparison consistency

818abe2

maropu reviewed Aug 23, 2020

View reviewed changes

Revert equality check

1116d3d

tanelk changed the title ~~[SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double~~ [WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 23, 2020

test -0.0 equality for complex types

2dce36e

tanelk changed the title ~~[WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double~~ [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 24, 2020

Add -0.0 as special value to RandomDataGenerator

9a3b983

tanelk added 2 commits September 4, 2020 13:35

revert test -0.0 equality for complex types

d20a2ed

revert test -0.0 equality for complex types

529aba8

maropu approved these changes Sep 15, 2020

View reviewed changes

maropu changed the title ~~[SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double~~ [SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double Sep 15, 2020

cloud-fan approved these changes Sep 15, 2020

View reviewed changes

maropu closed this in 6051755 Sep 16, 2020

[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double #29515

[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double #29515

Conversation

tanelk commented Aug 22, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

tanelk commented Aug 22, 2020

maropu commented Aug 22, 2020

maropu Aug 22, 2020

Choose a reason for hiding this comment

tanelk Aug 23, 2020

Choose a reason for hiding this comment

maropu Aug 23, 2020

Choose a reason for hiding this comment

tanelk Aug 23, 2020

Choose a reason for hiding this comment

maropu commented Aug 22, 2020

SparkQA commented Aug 23, 2020

tanelk commented Aug 23, 2020

srowen commented Aug 23, 2020

tanelk commented Aug 23, 2020

tanelk commented Aug 23, 2020 • edited Loading

SparkQA commented Aug 23, 2020

maropu commented Aug 23, 2020

tanelk commented Aug 23, 2020

maropu Aug 23, 2020

Choose a reason for hiding this comment

tanelk Aug 23, 2020 • edited Loading

Choose a reason for hiding this comment

tanelk Aug 23, 2020

Choose a reason for hiding this comment

maropu Aug 23, 2020 • edited Loading

Choose a reason for hiding this comment

tanelk Aug 23, 2020

Choose a reason for hiding this comment

SparkQA commented Aug 23, 2020

SparkQA commented Aug 23, 2020

SparkQA commented Aug 24, 2020

tanelk commented Aug 24, 2020

cloud-fan commented Aug 24, 2020

tanelk commented Aug 24, 2020

SparkQA commented Aug 24, 2020

tanelk commented Aug 24, 2020

SparkQA commented Sep 4, 2020

tanelk commented Sep 11, 2020

maropu commented Sep 15, 2020

maropu commented Sep 15, 2020

tanelk commented Sep 15, 2020

SparkQA commented Sep 15, 2020

maropu commented Sep 15, 2020

maropu left a comment

Choose a reason for hiding this comment

SparkQA commented Sep 15, 2020

cloud-fan commented Sep 15, 2020

tanelk commented Sep 15, 2020 • edited Loading

maropu commented Sep 15, 2020

SparkQA commented Sep 15, 2020

maropu commented Sep 16, 2020

tanelk commented Aug 22, 2020 •

edited

Loading

tanelk commented Aug 23, 2020 •

edited

Loading

tanelk Aug 23, 2020 •

edited

Loading

maropu Aug 23, 2020 •

edited

Loading

tanelk commented Sep 15, 2020 •

edited

Loading