[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

kiszk · 2017-11-12T18:02:56Z

What changes were proposed in this pull request?

This PR changes least and greatest code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

least with a lot of argument
greatest with a lot of argument

How was this patch tested?

Added a new test case into ArithmeticExpressionsSuite

mgaido91 · 2017-11-12T18:49:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala

@@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
    val evalChildren = children.map(_.genCode(ctx))
    val first = evalChildren(0)


what about avoiding splitting first and rest and treat them all the same? Since the declaration of ev.isNull and ev.value is moved, they can all be treated the same way. In this way the code would be cleaner, wouldn't it?

Do you want to assign evalChildren(0).value into ev.value and then compare all of evalChildren? It can avoid rest for cleaner code. However, it increases one comparison.
Or, do you have another idea?

I am thinking about doing what I did in #19720: here we might follow the same approach. Also for consistency across the code. Indeed, there would be an if more, but I think that the overhead would be negligible.
Or if we state that the right approach is this one to avoid one more if, I am best to change my approach in the PR I mentioned.

I know what you did. This PR cannot follow the approach straightforward due to initial value. We can easily know min/max value for primitive type for an initial value. How can we know the initial value for complicated type?
IMHO, the first element would be good for an initial value.

What do you think?

I don't see any difference on the initial value. IMHO the initial value in my PR as well as here should be "null", which means ev. isNull to true and the initial value for ev.value is irrelevant and it could be set to what is returned by the defaultValue function of the CodegenContext. Do you have any concern with this?

the value is used only when isNull is false. And in that context a proper value has been set for value, thus the problem you are describing can't happen IMHO.

I am sorry that I cannot understand your opinion.
Could you please explain how your opinion works in the case that all of childrens with ev.value = -2 and ev.isNull = false for Greatest by using ctx.defaultValue?

before using ev.value, ev.isNull is checked (https://github.com/kiszk/spark/blob/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L612). So in case of all children with -2, the first one will see that ev.isNull is true and then it will set it to false and assign -2 to ev.value. Thus the initial value assigned to ev.value is irrelevant, since ev.isNull is true and as far as ev.isNull is true, what ev.value contains doesn't matter.

Thank you for your explanation. Got it.
Your idea is to set true into ev.isNull rather than using ctx.defaultValue as an initial value for ev.value.

yes exactly, it doesn't matter how ev.value is initialized.

mgaido91 · 2017-11-12T18:49:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala

@@ -670,6 +673,8 @@ case class Greatest(children: Seq[Expression]) extends Expression {
    val evalChildren = children.map(_.genCode(ctx))
    val first = evalChildren(0)


SparkQA · 2017-11-12T21:02:16Z

Test build #83749 has finished for PR 19729 at commit bc3f0a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2017-11-13T16:11:54Z

LGTM

SparkQA · 2017-11-13T18:53:02Z

Test build #83793 has finished for PR 19729 at commit 5e87935.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… greatest ## What changes were proposed in this pull request? This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large. This PR resolved two cases: * `least` with a lot of argument * `greatest` with a lot of argument ## How was this patch tested? Added a new test case into `ArithmeticExpressionsSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #19729 from kiszk/SPARK-22499. (cherry picked from commit ed885e7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2017-11-16T16:57:22Z

thanks, merging to master/2.2

… greatest ## What changes were proposed in this pull request? This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large. This PR resolved two cases: * `least` with a lot of argument * `greatest` with a lot of argument ## How was this patch tested? Added a new test case into `ArithmeticExpressionsSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19729 from kiszk/SPARK-22499. (cherry picked from commit ed885e7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

initial commit

bc3f0a4

mgaido91 reviewed Nov 12, 2017

View reviewed changes

address review comment

5e87935

asfgit closed this in ed885e7 Nov 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

kiszk commented Nov 12, 2017

mgaido91 Nov 12, 2017

kiszk Nov 12, 2017

mgaido91 Nov 12, 2017

kiszk Nov 12, 2017

mgaido91 Nov 12, 2017

mgaido91 Nov 13, 2017

kiszk Nov 13, 2017

mgaido91 Nov 13, 2017

kiszk Nov 13, 2017

mgaido91 Nov 13, 2017

mgaido91 Nov 12, 2017

SparkQA commented Nov 12, 2017

mgaido91 commented Nov 13, 2017

SparkQA commented Nov 13, 2017

cloud-fan commented Nov 16, 2017

		@@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
		val evalChildren = children.map(_.genCode(ctx))
		val first = evalChildren(0)

		@@ -670,6 +673,8 @@ case class Greatest(children: Seq[Expression]) extends Expression {
		val evalChildren = children.map(_.genCode(ctx))
		val first = evalChildren(0)

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

Conversation

kiszk commented Nov 12, 2017

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 12, 2017

mgaido91 commented Nov 13, 2017

SparkQA commented Nov 13, 2017

cloud-fan commented Nov 16, 2017