Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
4 participants
@kiszk
Copy link
Contributor

commented Nov 12, 2017

What changes were proposed in this pull request?

This PR changes least and greatest code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

  • least with a lot of argument
  • greatest with a lot of argument

How was this patch tested?

Added a new test case into ArithmeticExpressionsSuite

@@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
val evalChildren = children.map(_.genCode(ctx))
val first = evalChildren(0)

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 12, 2017

Contributor

what about avoiding splitting first and rest and treat them all the same? Since the declaration of ev.isNull and ev.value is moved, they can all be treated the same way. In this way the code would be cleaner, wouldn't it?

This comment has been minimized.

Copy link
@kiszk

kiszk Nov 12, 2017

Author Contributor

Do you want to assign evalChildren(0).value into ev.value and then compare all of evalChildren? It can avoid rest for cleaner code. However, it increases one comparison.
Or, do you have another idea?

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 12, 2017

Contributor

I am thinking about doing what I did in #19720: here we might follow the same approach. Also for consistency across the code. Indeed, there would be an if more, but I think that the overhead would be negligible.
Or if we state that the right approach is this one to avoid one more if, I am best to change my approach in the PR I mentioned.

This comment has been minimized.

Copy link
@kiszk

kiszk Nov 12, 2017

Author Contributor

I know what you did. This PR cannot follow the approach straightforward due to initial value. We can easily know min/max value for primitive type for an initial value. How can we know the initial value for complicated type?
IMHO, the first element would be good for an initial value.

What do you think?

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 12, 2017

Contributor

I don't see any difference on the initial value. IMHO the initial value in my PR as well as here should be "null", which means ev. isNull to true and the initial value for ev.value is irrelevant and it could be set to what is returned by the defaultValue function of the CodegenContext. Do you have any concern with this?

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 13, 2017

Contributor

the value is used only when isNull is false. And in that context a proper value has been set for value, thus the problem you are describing can't happen IMHO.

This comment has been minimized.

Copy link
@kiszk

kiszk Nov 13, 2017

Author Contributor

I am sorry that I cannot understand your opinion.
Could you please explain how your opinion works in the case that all of childrens with ev.value = -2 and ev.isNull = false for Greatest by using ctx.defaultValue?

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 13, 2017

Contributor

before using ev.value, ev.isNull is checked (https://github.com/kiszk/spark/blob/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L612). So in case of all children with -2, the first one will see that ev.isNull is true and then it will set it to false and assign -2 to ev.value. Thus the initial value assigned to ev.value is irrelevant, since ev.isNull is true and as far as ev.isNull is true, what ev.value contains doesn't matter.

This comment has been minimized.

Copy link
@kiszk

kiszk Nov 13, 2017

Author Contributor

Thank you for your explanation. Got it.
Your idea is to set true into ev.isNull rather than using ctx.defaultValue as an initial value for ev.value.

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 13, 2017

Contributor

yes exactly, it doesn't matter how ev.value is initialized.

@@ -670,6 +673,8 @@ case class Greatest(children: Seq[Expression]) extends Expression {
val evalChildren = children.map(_.genCode(ctx))
val first = evalChildren(0)

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Nov 12, 2017

Contributor

ditto

@SparkQA

This comment has been minimized.

Copy link

commented Nov 12, 2017

Test build #83749 has finished for PR 19729 at commit bc3f0a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@mgaido91

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2017

LGTM

@SparkQA

This comment has been minimized.

Copy link

commented Nov 13, 2017

Test build #83793 has finished for PR 19729 at commit 5e87935.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 16, 2017

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and…
… greatest

## What changes were proposed in this pull request?

This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

* `least` with a lot of argument
* `greatest` with a lot of argument

## How was this patch tested?

Added a new test case into `ArithmeticExpressionsSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19729 from kiszk/SPARK-22499.

(cherry picked from commit ed885e7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan

This comment has been minimized.

Copy link
Contributor

commented Nov 16, 2017

thanks, merging to master/2.2

@asfgit asfgit closed this in ed885e7 Nov 16, 2017

MatthewRBruce added a commit to Shopify/spark that referenced this pull request Jul 31, 2018

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and…
… greatest

## What changes were proposed in this pull request?

This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

* `least` with a lot of argument
* `greatest` with a lot of argument

## How was this patch tested?

Added a new test case into `ArithmeticExpressionsSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19729 from kiszk/SPARK-22499.

(cherry picked from commit ed885e7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.