[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug #29495

cloud-fan · 2020-08-20T17:12:17Z

What changes were proposed in this pull request?

This PR reverts #27860 to downgrade Janino, as the new version has a bug.

Why are the changes needed?

The symptom is about NaN comparison. For code below

if (double_value <= 0.0) {
  ...
} else {
  ...
}

If double_value is NaN, NaN <= 0.0 is false and we should go to the else branch. However, current Spark goes to the if branch and causes correctness issues like SPARK-32640.

One way to fix it is:

boolean cond = double_value <= 0.0;
if (cond) {
  ...
} else {
  ...
}

I'm not familiar with Janino so I don't know what's going on there.

Does this PR introduce any user-facing change?

Yes, fix correctness bugs.

How was this patch tested?

a new test

This reverts commit fe1d1e2.

cloud-fan · 2020-08-20T17:13:33Z

cc @rednaxelafx @kiszk @maropu @viirya @tgravescs @HeartSaVioR

gengliangwang

+1 for the revert

tgravescs

thanks for tracking this down. +1 to revert.
Do we have a followup to look into it more or follow up with Janino?

viirya

The reason of upgrading to Janino 3.1.2 is also for fixing bug, but it looks not for correctness. +1 to revert for correctness.

gatorsmile

+1

The 3.1 version is not stable yet, I think. We should be more cautious to upgrade the version.

@rednaxelafx also found the other bug in Janino 3.1.

rednaxelafx · 2020-08-20T19:43:00Z

Right, I had hit another bug in Janino 3.1.2 elsewhere with this code pattern:

int x;
if (cond) {
  x = ...
} else {
  long y = ...
  x = ...
}

(the actual code to trigger the problem is slightly more complicate but this example shows the gist)
This Janino bug is specific to the new stack map calculation logic, which is new in Janino 3.1.x. Going back to the 3..0.x releases gets around the problem.

With that issue and the issue mentioned in this PR, it looks like the conditional branch handling in Janino 3.1.x has degraded quite a bit. Let's avoid Janino 3.1.x until we can have more confidence in it.

I'd like to also suggest adding some codegen test cases specifically to test the underlying Java compiler we use at runtime, be it Janino or any other alternatives (e.g. javac). WDYT?

rednaxelafx · 2020-08-20T19:56:56Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+
+  test("SPARK-32640: ln(NaN) should return NaN") {
+    val df = Seq(Double.NaN).toDF("d")
+    checkAnswer(df.selectExpr("ln(d)"), Row(Double.NaN))


Question: does this actually trigger codegen? It gets converted to LocalRelation during logical optimization, right?
(If this test fails before reverting Janino and passes after the revert, then I guess it's still valid to have it here, otherwise I'm a bit confused)

Yea, it does; ConvertToLocalRelation disabled in tests:

spark/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala

Lines 70 to 74 in 8b119f1

// Disable ConvertToLocalRelation for better test coverage. Test cases built on

// LocalRelation will exercise the optimization rules better by disabling it as

// this rule may potentially block testing of other optimization rules such as

// ConstantPropagation etc.

.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)

Cool, thanks for confirming!

SparkQA · 2020-08-20T20:11:40Z

Test build #127700 has finished for PR 29495 at commit 0e2faf6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Thank you so much, @cloud-fan .
Merged to master for Apache Spark 3.1.0 on December 2020.

maropu · 2020-08-20T21:02:28Z

late LGTM

maropu · 2020-08-20T21:39:58Z

(the actual code to trigger the problem is slightly more complicate but this example shows the gist)
This Janino bug is specific to the new stack map calculation logic, which is new in Janino 3.1.x. Going back to the 3..0.x releases gets around the problem.

@rednaxelafx Thanks for the explanation, Kris! Have you already filed this issue in the Janino community? If you've done it, I think the link to it helps for better traceability. Also, is it difficult to add tests for the issue you describe above? I think its better to add tests for that, too, if we can run into it the Spark side.

rednaxelafx · 2020-08-20T23:20:02Z

Hi @maropu san, I haven't gotten around to file the issue with Janino yet (I'd probably have to build the latest Janino to make sure the issue still exists in Janino master before filing the issue...).

I was suggesting to add a new set of tests in Spark specifically to test the underlying Java compiler, because there are certain code patterns that may hit compiler bugs and we should use this set of tests to:

demonstrate valid Java that's not used by Spark yet, and it fails compilation (for documentation purpose, i.e. don't use the specified code pattern in the test in Spark codegen)
demonstrate valid Java that's used by Spark codegen, and it should pass compilation (for compatibility verification)

For the bug I mentioned above, it is not used by Spark codegen yet and I had hit it elsewhere, so I hadn't thought about downgrading Janino in Spark just for that. But now thinking about it again, I think the (1) case here is pretty important too. As soon as the underlying compiler fixes the issue, the "expect failure" test cases will get notified and we'll know those patterns can be used in Spark.

WDYT?

HeartSaVioR · 2020-08-20T23:23:38Z

Nice finding and thanks for fixing the bug. Late LGTM. I agree that downgrading the version is good resolution in current situation.

Downgrading this to 3.0.x re-brings the question though, as 3.0.x is deprecated and author won't care about it. Is migrating Janino to 3.1.x our eventual goal, or we'd like to stick with deprecated version line and deal with the version by ourselves once we hit some other issue?

maropu · 2020-08-20T23:36:24Z

For the bug I mentioned above, it is not used by Spark codegen yet and I had hit it elsewhere, so I hadn't thought about downgrading Janino in Spark just for that.

okay, I see. Thanks!

I was suggesting to add a new set of tests in Spark specifically to test the underlying Java compiler, because there are certain code patterns that may hit compiler bugs and we should use this set of tests to:

That sounds good to me because Spark heavily depends on Janino... and I think these kinds of tests always make Spark more robust. WDYT, the others? Out of curiosity; how do we extract the common code patterns and any idea?

rednaxelafx · 2020-08-21T00:08:21Z

Out of curiosity; how do we extract the common code patterns and any idea?

I was thinking of starting out from two sources:

known issues that Spark had hit in the past (extracting from JIRA tickets)
some known problematic code patterns with Janino manually discovered

I can help with the latter.

kiszk · 2020-08-21T02:39:25Z

late LGTM for downgrading Janino.

I agree with adding test cases for focusing on validation of any Janino for changing its version.

cloud-fan · 2020-08-21T03:17:44Z

@HeartSaVioR I think eventually we should move to the officially supported Janino versions, after they fix bugs we report.

HyukjinKwon · 2020-08-21T05:28:19Z

Nice, LGTM

tgravescs · 2020-08-21T13:26:54Z

@cloud-fan did we file a jira to track this? It would be nice to link to any Janino bugs we filed.

if we have a set of code patterns that we know can cause problems I think it would be great to add more tests for them.

cloud-fan · 2020-08-21T14:46:34Z

@rednaxelafx are you working on the bug report?

kiszk · 2020-08-31T18:46:59Z

I found an interesting regression in Janino 3.1.2.

We may need to check the compilation speed, too.

dongjoon-hyun · 2020-08-31T21:17:01Z

Thank you for reporting that too, @kiszk .

HeartSaVioR · 2020-09-01T00:13:51Z

By Janino author, the performance regression is introduced in 3.0.13 - that said, we are still affected by the regression.
janino-compiler/janino#131 (comment)

kiszk · 2020-09-01T00:56:15Z

Oh, good catch!

…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted #29495 and verified that the existing test cases caught the bug. Closes #29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in #29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted #29495 and verified that the existing test cases caught the bug. Closes #29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 6051755) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

…loat and double ### What changes were proposed in this pull request? The `LiteralGenerator` for float and double datatypes was supposed to yield special values (NaN, +-inf) among others, but the `Gen.chooseNum` method does not yield values that are outside the defined range. The `Gen.chooseNum` for a wide range of floats and doubles does not yield values in the "everyday" range as stated in typelevel/scalacheck#113 . There is an similar class `RandomDataGenerator` that is used in some other tests. Added `-0.0` and `-0.0f` as special values to there too. These changes revealed an inconsistency with the equality check between `-0.0` and `0.0`. ### Why are the changes needed? The `LiteralGenerator` is mostly used in the `checkConsistencyBetweenInterpretedAndCodegen` method in `MathExpressionsSuite`. This change would have caught the bug fixed in apache#29495 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally reverted apache#29495 and verified that the existing test cases caught the bug. Closes apache#29515 from tanelk/SPARK-32688. Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit 6051755) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

### What changes were proposed in this pull request? This PR proposes to bump up the janino version from 3.0.16 to v3.1.4. The major changes of this upgrade are as follows: - Fixed issue #131: Janino 3.1.2 is 10x slower than 3.0.11: The Compiler's IClassLoader was initialized way too eagerly, thus lots of classes were loaded from the class path, which is very slow. - Improved the encoding of stack map frames according to JVMS11 4.7.4: Previously, only "full_frame"s were generated. - Fixed issue #107: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package - Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions). For all the changes, please see the change log: http://janino-compiler.github.io/janino/changelog.html NOTE1: I've checked that there is no obvious performance regression. For all the data, see a link: https://docs.google.com/spreadsheets/d/1srxT9CioGQg1fLKM3Uo8z1sTzgCsMj4pg6JzpdcG6VU/edit?usp=sharing NOTE2: We upgraded janino to 3.1.2 (#27860) once before, but the commit had been reverted in #29495 because of the correctness issue. Recently, #32374 had checked if Spark could land on v3.1.3 or not, but a new bug was found there. These known issues has been fixed in v3.1.4 by following PRs: - janino-compiler/janino#145 - janino-compiler/janino#146 ### Why are the changes needed? janino v3.0.X is no longer maintained. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA passed. Closes #32455 from maropu/janino_v3.1.4. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <srowen@gmail.com>

cloud-fan added 2 commits August 21, 2020 01:04

add test

092431c

Revert "[SPARK-31214][BUILD] Upgrade Janino to 3.1.2"

0e2faf6

This reverts commit fe1d1e2.

probot-autolabeler bot added BUILD SQL labels Aug 20, 2020

gengliangwang approved these changes Aug 20, 2020

View reviewed changes

tgravescs approved these changes Aug 20, 2020

View reviewed changes

viirya approved these changes Aug 20, 2020

View reviewed changes

gatorsmile approved these changes Aug 20, 2020

View reviewed changes

rednaxelafx reviewed Aug 20, 2020

View reviewed changes

dongjoon-hyun approved these changes Aug 20, 2020

View reviewed changes

dongjoon-hyun closed this in 8b119f1 Aug 20, 2020

tanelk mentioned this pull request Aug 22, 2020

[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for float and double #29515

Closed

maropu mentioned this pull request Apr 22, 2021

Fix the bug that handles NaN comparisons incorrectly janino-compiler/janino#145

Merged

kiszk mentioned this pull request May 6, 2021

[SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4 #32455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug #29495

[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug #29495

cloud-fan commented Aug 20, 2020

cloud-fan commented Aug 20, 2020 •

edited

Loading

gengliangwang left a comment

tgravescs left a comment

viirya left a comment •

edited

Loading

gatorsmile left a comment

rednaxelafx commented Aug 20, 2020

rednaxelafx Aug 20, 2020

maropu Aug 20, 2020

rednaxelafx Aug 20, 2020

SparkQA commented Aug 20, 2020

dongjoon-hyun left a comment

maropu commented Aug 20, 2020

maropu commented Aug 20, 2020

rednaxelafx commented Aug 20, 2020

HeartSaVioR commented Aug 20, 2020

maropu commented Aug 20, 2020

rednaxelafx commented Aug 21, 2020

kiszk commented Aug 21, 2020

cloud-fan commented Aug 21, 2020

HyukjinKwon commented Aug 21, 2020

tgravescs commented Aug 21, 2020

cloud-fan commented Aug 21, 2020 •

edited

Loading

kiszk commented Aug 31, 2020

dongjoon-hyun commented Aug 31, 2020

HeartSaVioR commented Sep 1, 2020 •

edited

Loading

kiszk commented Sep 1, 2020

	// Disable ConvertToLocalRelation for better test coverage. Test cases built on
	// LocalRelation will exercise the optimization rules better by disabling it as
	// this rule may potentially block testing of other optimization rules such as
	// ConstantPropagation etc.
	.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)

[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug #29495

[SPARK-32640][SQL] Downgrade Janino to fix a correctness bug #29495

Conversation

cloud-fan commented Aug 20, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

cloud-fan commented Aug 20, 2020 • edited Loading

gengliangwang left a comment

Choose a reason for hiding this comment

tgravescs left a comment

Choose a reason for hiding this comment

viirya left a comment • edited Loading

Choose a reason for hiding this comment

gatorsmile left a comment

Choose a reason for hiding this comment

rednaxelafx commented Aug 20, 2020

rednaxelafx Aug 20, 2020

Choose a reason for hiding this comment

maropu Aug 20, 2020

Choose a reason for hiding this comment

rednaxelafx Aug 20, 2020

Choose a reason for hiding this comment

SparkQA commented Aug 20, 2020

dongjoon-hyun left a comment

Choose a reason for hiding this comment

maropu commented Aug 20, 2020

maropu commented Aug 20, 2020

rednaxelafx commented Aug 20, 2020

HeartSaVioR commented Aug 20, 2020

maropu commented Aug 20, 2020

rednaxelafx commented Aug 21, 2020

kiszk commented Aug 21, 2020

cloud-fan commented Aug 21, 2020

HyukjinKwon commented Aug 21, 2020

tgravescs commented Aug 21, 2020

cloud-fan commented Aug 21, 2020 • edited Loading

kiszk commented Aug 31, 2020

dongjoon-hyun commented Aug 31, 2020

HeartSaVioR commented Sep 1, 2020 • edited Loading

kiszk commented Sep 1, 2020

cloud-fan commented Aug 20, 2020 •

edited

Loading

viirya left a comment •

edited

Loading

cloud-fan commented Aug 21, 2020 •

edited

Loading

HeartSaVioR commented Sep 1, 2020 •

edited

Loading