Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug #29495

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This PR reverts #27860 to downgrade Janino, as the new version has a bug.

Why are the changes needed?

The symptom is about NaN comparison. For code below

if (double_value <= 0.0) {
  ...
} else {
  ...
}

If double_value is NaN, NaN <= 0.0 is false and we should go to the else branch. However, current Spark goes to the if branch and causes correctness issues like SPARK-32640.

One way to fix it is:

boolean cond = double_value <= 0.0;
if (cond) {
  ...
} else {
  ...
}

I'm not familiar with Janino so I don't know what's going on there.

Does this PR introduce any user-facing change?

Yes, fix correctness bugs.

How was this patch tested?

a new test

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Aug 20, 2020

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the revert

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for tracking this down. +1 to revert.
Do we have a followup to look into it more or follow up with Janino?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason of upgrading to Janino 3.1.2 is also for fixing bug, but it looks not for correctness. +1 to revert for correctness.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The 3.1 version is not stable yet, I think. We should be more cautious to upgrade the version.

@rednaxelafx also found the other bug in Janino 3.1.

@rednaxelafx
Copy link
Contributor

Right, I had hit another bug in Janino 3.1.2 elsewhere with this code pattern:

int x;
if (cond) {
  x = ...
} else {
  long y = ...
  x = ...
}

(the actual code to trigger the problem is slightly more complicate but this example shows the gist)
This Janino bug is specific to the new stack map calculation logic, which is new in Janino 3.1.x. Going back to the 3..0.x releases gets around the problem.

With that issue and the issue mentioned in this PR, it looks like the conditional branch handling in Janino 3.1.x has degraded quite a bit. Let's avoid Janino 3.1.x until we can have more confidence in it.

I'd like to also suggest adding some codegen test cases specifically to test the underlying Java compiler we use at runtime, be it Janino or any other alternatives (e.g. javac). WDYT?


test("SPARK-32640: ln(NaN) should return NaN") {
val df = Seq(Double.NaN).toDF("d")
checkAnswer(df.selectExpr("ln(d)"), Row(Double.NaN))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: does this actually trigger codegen? It gets converted to LocalRelation during logical optimization, right?
(If this test fails before reverting Janino and passes after the revert, then I guess it's still valid to have it here, otherwise I'm a bit confused)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it does; ConvertToLocalRelation disabled in tests:

// Disable ConvertToLocalRelation for better test coverage. Test cases built on
// LocalRelation will exercise the optimization rules better by disabling it as
// this rule may potentially block testing of other optimization rules such as
// ConstantPropagation etc.
.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for confirming!

@SparkQA
Copy link

SparkQA commented Aug 20, 2020

Test build #127700 has finished for PR 29495 at commit 0e2faf6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @cloud-fan .
Merged to master for Apache Spark 3.1.0 on December 2020.

@maropu
Copy link
Member

maropu commented Aug 20, 2020

late LGTM

@maropu
Copy link
Member

maropu commented Aug 20, 2020

(the actual code to trigger the problem is slightly more complicate but this example shows the gist)
This Janino bug is specific to the new stack map calculation logic, which is new in Janino 3.1.x. Going back to the 3..0.x releases gets around the problem.

@rednaxelafx Thanks for the explanation, Kris! Have you already filed this issue in the Janino community? If you've done it, I think the link to it helps for better traceability. Also, is it difficult to add tests for the issue you describe above? I think its better to add tests for that, too, if we can run into it the Spark side.

@rednaxelafx
Copy link
Contributor

Hi @maropu san, I haven't gotten around to file the issue with Janino yet (I'd probably have to build the latest Janino to make sure the issue still exists in Janino master before filing the issue...).

I was suggesting to add a new set of tests in Spark specifically to test the underlying Java compiler, because there are certain code patterns that may hit compiler bugs and we should use this set of tests to:

  1. demonstrate valid Java that's not used by Spark yet, and it fails compilation (for documentation purpose, i.e. don't use the specified code pattern in the test in Spark codegen)
  2. demonstrate valid Java that's used by Spark codegen, and it should pass compilation (for compatibility verification)

For the bug I mentioned above, it is not used by Spark codegen yet and I had hit it elsewhere, so I hadn't thought about downgrading Janino in Spark just for that. But now thinking about it again, I think the (1) case here is pretty important too. As soon as the underlying compiler fixes the issue, the "expect failure" test cases will get notified and we'll know those patterns can be used in Spark.

WDYT?

@HeartSaVioR
Copy link
Contributor

Nice finding and thanks for fixing the bug. Late LGTM. I agree that downgrading the version is good resolution in current situation.

Downgrading this to 3.0.x re-brings the question though, as 3.0.x is deprecated and author won't care about it. Is migrating Janino to 3.1.x our eventual goal, or we'd like to stick with deprecated version line and deal with the version by ourselves once we hit some other issue?

@maropu
Copy link
Member

maropu commented Aug 20, 2020

For the bug I mentioned above, it is not used by Spark codegen yet and I had hit it elsewhere, so I hadn't thought about downgrading Janino in Spark just for that.

okay, I see. Thanks!

I was suggesting to add a new set of tests in Spark specifically to test the underlying Java compiler, because there are certain code patterns that may hit compiler bugs and we should use this set of tests to:

That sounds good to me because Spark heavily depends on Janino... and I think these kinds of tests always make Spark more robust. WDYT, the others? Out of curiosity; how do we extract the common code patterns and any idea?

@rednaxelafx
Copy link
Contributor

Out of curiosity; how do we extract the common code patterns and any idea?

I was thinking of starting out from two sources:

  • known issues that Spark had hit in the past (extracting from JIRA tickets)
  • some known problematic code patterns with Janino manually discovered

I can help with the latter.

@kiszk
Copy link
Member

kiszk commented Aug 21, 2020

late LGTM for downgrading Janino.

I agree with adding test cases for focusing on validation of any Janino for changing its version.

@cloud-fan
Copy link
Contributor Author

@HeartSaVioR I think eventually we should move to the officially supported Janino versions, after they fix bugs we report.

@HyukjinKwon
Copy link
Member

Nice, LGTM

@tgravescs
Copy link
Contributor

@cloud-fan did we file a jira to track this? It would be nice to link to any Janino bugs we filed.

if we have a set of code patterns that we know can cause problems I think it would be great to add more tests for them.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Aug 21, 2020

@rednaxelafx are you working on the bug report?

@kiszk
Copy link
Member

kiszk commented Aug 31, 2020

I found an interesting regression in Janino 3.1.2.

We may need to check the compilation speed, too.

@dongjoon-hyun
Copy link
Member

Thank you for reporting that too, @kiszk .

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Sep 1, 2020

By Janino author, the performance regression is introduced in 3.0.13 - that said, we are still affected by the regression.
janino-compiler/janino#131 (comment)

@kiszk
Copy link
Member

kiszk commented Sep 1, 2020

Oh, good catch!

maropu pushed a commit that referenced this pull request Sep 16, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

Closes #29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
maropu pushed a commit that referenced this pull request Sep 16, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted #29495 and verified that the existing test cases caught the bug.

Closes #29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 6051755)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…loat and double

### What changes were proposed in this pull request?

The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 .

There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too.

These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`.

### Why are the changes needed?

The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in apache#29495 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally reverted apache#29495 and verified that the existing test cases caught the bug.

Closes apache#29515 from tanelk/SPARK-32688.

Authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 6051755)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
srowen pushed a commit that referenced this pull request May 12, 2021
### What changes were proposed in this pull request?

This PR proposes to bump up the janino version from 3.0.16 to v3.1.4.
The major changes of this upgrade are as follows:
 - Fixed issue #131: Janino 3.1.2 is 10x slower than 3.0.11: The Compiler's IClassLoader was initialized way too eagerly, thus lots of classes were loaded from the class path, which is very slow.
 - Improved the encoding of stack map frames according to JVMS11 4.7.4: Previously, only "full_frame"s were generated.
 - Fixed issue #107: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package
 - Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions).

For all the changes, please see the change log: http://janino-compiler.github.io/janino/changelog.html

NOTE1: I've checked that there is no obvious performance regression. For all the data, see a link: https://docs.google.com/spreadsheets/d/1srxT9CioGQg1fLKM3Uo8z1sTzgCsMj4pg6JzpdcG6VU/edit?usp=sharing

NOTE2: We upgraded janino to 3.1.2 (#27860) once before, but the commit had been reverted in #29495 because of the correctness issue. Recently, #32374 had checked if Spark could land on v3.1.3 or not, but a new bug was found there. These known issues has been fixed in v3.1.4 by following PRs:
 - janino-compiler/janino#145
 - janino-compiler/janino#146

### Why are the changes needed?

janino v3.0.X  is no longer maintained.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA passed.

Closes #32455 from maropu/janino_v3.1.4.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.