Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double #29515

Closed
wants to merge 8 commits into from

Conversation

tanelk
Copy link
Contributor

@tanelk tanelk commented Aug 22, 2020

What changes were proposed in this pull request?

The LiteralGenerator for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the Gen.chooseNum method does not yield values that are outside the defined range. The Gen.chooseNum for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class RandomDataGenerator that is used in some other tests. Added -0.0 and -0.0f as special values to there too.

These changes revealed an inconsistency with the equality check between -0.0 and 0.0.

Why are the changes needed?

The LiteralGenerator is mostly used in the checkConsistencyBetweenInterpretedAndCodegen method in MathExpressionsSuite. This change would have caught the bug fixed in #29495 .

Does this PR introduce any user-facing change?

No

How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

@tanelk
Copy link
Contributor Author

tanelk commented Aug 22, 2020

cc @cloud-fan , this should be relevant to you

@maropu maropu changed the title [SPARK-32688][TESTS] Add special values to LiteralGenerator for float and double [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 22, 2020
@maropu
Copy link
Member

maropu commented Aug 22, 2020

ok to test

f <- Gen.oneOf(
Gen.oneOf(
Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue,
0.0f, -0.0f, 1.0f, -1.0f),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 1.0f and -1.0f also special values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't in the sense, that Arbitrary.arbFloat.arbitrary can generate them, but they are in the sense, that it is more likely, that a function could act weirdly at these values. For example log1p.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave some comments in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@maropu
Copy link
Member

maropu commented Aug 22, 2020

also cc: @srowen

@SparkQA
Copy link

SparkQA commented Aug 23, 2020

Test build #127791 has finished for PR 29515 at commit 8c8313c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk
Copy link
Contributor Author

tanelk commented Aug 23, 2020

Test build #127791 has finished for PR 29515 at commit 8c8313c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This failure is discovered by this change, not caused. Should that be fixed by a separate pull request? Not sure which of the two is the correct behavior.
There could be more like this, but due to the random nature of generated values it might not show every build.

@srowen
Copy link
Member

srowen commented Aug 23, 2020

I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed.

@tanelk
Copy link
Contributor Author

tanelk commented Aug 23, 2020

retest this please

@tanelk
Copy link
Contributor Author

tanelk commented Aug 23, 2020

I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed.

I meant, that it could be fixed in another PR, before this PR is merged

@SparkQA
Copy link

SparkQA commented Aug 23, 2020

Test build #127794 has finished for PR 29515 at commit 77d63ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 23, 2020

I meant, that it could be fixed in another PR, before this PR is merged

Could you file jira and fix it?

@tanelk
Copy link
Contributor Author

tanelk commented Aug 23, 2020

There is a related issue with -0.0: https://issues.apache.org/jira/browse/SPARK-32110

@@ -793,7 +793,9 @@ case class EqualTo(left: Expression, right: Expression)
// | FALSE | FALSE | TRUE | UNKNOWN |
// | UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN |
// +---------+---------+---------+---------+
protected override def nullSafeEval(left: Any, right: Any): Any = ordering.equiv(left, right)
protected override def nullSafeEval(left: Any, right: Any): Any = {
left == right || ordering.equiv(left, right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, we shoud fix the existing test failures that we found in this PR. But, this fix looks improper, so could we use NormalizeNaNAndZero instead? cc: @cloud-fan @viirya

Copy link
Contributor Author

@tanelk tanelk Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeNaNAndZero can't help here, because checkConsistencyBetweenInterpretedAndCodegen is done without optimizers.

Also it could introduce new correctness issues with atan2(-0.0, x) and 1.0 / -0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that 0.0 == -0.0 is the expected behavior, but if it is not, then we could leave this as it was change the code gen path.

Copy link
Member

@maropu maropu Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, how about the case array(-0.0) == array(0.0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are 100% correct.

It is an interesting problem, where the same comparator is used for both sorting and equality check.
For sorting -0.0 should be smaller than 0.0, but in equality check they should be equal.

Just for reference, it seems that both hive and mysql consider them equal in the equality check:
https://issues.apache.org/jira/browse/HIVE-11174

@tanelk tanelk changed the title [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double [WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 23, 2020
@SparkQA
Copy link

SparkQA commented Aug 23, 2020

Test build #127808 has finished for PR 29515 at commit 1116d3d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 23, 2020

Test build #127807 has finished for PR 29515 at commit 818abe2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127816 has finished for PR 29515 at commit 2dce36e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk
Copy link
Contributor Author

tanelk commented Aug 24, 2020

There is a org.apache.spark.sql.RandomDataGenerator, that does pretty much the same thing as the LiteralGenerator. Perhaps they should be unified?

@cloud-fan
Copy link
Contributor

This is a good catch! Let's fix the found bugs one by one and merge this PR at the end.

@tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110

@tanelk
Copy link
Contributor Author

tanelk commented Aug 24, 2020

This is a good catch! Let's fix the found bugs one by one and merge this PR at the end.

@tanelk is this the only bug we found so far? https://issues.apache.org/jira/browse/SPARK-32110

Currently yes, the -0.0 inconsistency is the only bug.
I would add to the jira that there is also inconsistency in the code gen and the interpreted code paths: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport/org.apache.spark.sql.catalyst.expressions/PredicateSuite/AND__OR__EqualTo__EqualNullSafe_consistency_check/

@tanelk tanelk changed the title [WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double Aug 24, 2020
@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127840 has finished for PR 29515 at commit 9a3b983.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk
Copy link
Contributor Author

tanelk commented Aug 24, 2020

Test build #127840 has finished for PR 29515 at commit 9a3b983.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

org.scalatest.exceptions.TestFailedException: Incorrect evaluation (codegen off): [false,0.0] IN ([true,Infinity],[true,7.328233268583497E27],[false,-0.0],[true,2.28620591956394208E17],[true,-1.7155423374193937E-160],[null,7.853276721068246E-75],[null,Infinity],[true,null],[false,null]), actual: false, expected: true

This is a new finding. But also related to the 0.0 == -0.0.

@SparkQA
Copy link

SparkQA commented Sep 4, 2020

Test build #128299 has finished for PR 29515 at commit 529aba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk
Copy link
Contributor Author

tanelk commented Sep 11, 2020

@cloud-fan, now that #29647 is merged, can this be merged also?

@maropu
Copy link
Member

maropu commented Sep 15, 2020

@cloud-fan, now that #29647 is merged, can this be merged also?

Are all the bugs that this PR found already fixed now?

@maropu
Copy link
Member

maropu commented Sep 15, 2020

retest this please

@tanelk
Copy link
Contributor Author

tanelk commented Sep 15, 2020

@cloud-fan, now that #29647 is merged, can this be merged also?

Are all the bugs that this PR found already fixed now?

I believe, that they were the manifestation of the 0.0 != -0.0

@SparkQA
Copy link

SparkQA commented Sep 15, 2020

Test build #128688 has finished for PR 29515 at commit 529aba8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 15, 2020

retest this please

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu maropu changed the title [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double [SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double Sep 15, 2020
@SparkQA
Copy link

SparkQA commented Sep 15, 2020

Test build #128707 has finished for PR 29515 at commit 529aba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ?

@tanelk
Copy link
Contributor Author

tanelk commented Sep 15, 2020

hmm, does this PR catch https://issues.apache.org/jira/browse/SPARK-32110 ?

That's more of an umbrella jira, that mainly covers inconsistency between operators.
Here we uncovered an inconsistency within the equality operators codegen and non-codegen paths.

Just to be safe, we could trigger some retests on this.

@maropu
Copy link
Member

maropu commented Sep 15, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 15, 2020

Test build #128717 has finished for PR 29515 at commit 529aba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 6051755 Sep 16, 2020
maropu pushed a commit that referenced this pull request Sep 16, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

Closes #29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 6051755)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Sep 16, 2020

Merged to master/3.0. To check if no flaky tests happens, I will keep watching Jenkins jobs. Anyway, thanks, @tanelk !

holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in apache#29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted apache#29495 and verified that the existing test cases caught the bug.

Closes apache#29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 6051755)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants