Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729

Closed
wants to merge 2 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Nov 12, 2017

What changes were proposed in this pull request?

This PR changes least and greatest code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

  • least with a lot of argument
  • greatest with a lot of argument

How was this patch tested?

Added a new test case into ArithmeticExpressionsSuite

@@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression {
val evalChildren = children.map(_.genCode(ctx))
val first = evalChildren(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about avoiding splitting first and rest and treat them all the same? Since the declaration of ev.isNull and ev.value is moved, they can all be treated the same way. In this way the code would be cleaner, wouldn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to assign evalChildren(0).value into ev.value and then compare all of evalChildren? It can avoid rest for cleaner code. However, it increases one comparison.
Or, do you have another idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about doing what I did in #19720: here we might follow the same approach. Also for consistency across the code. Indeed, there would be an if more, but I think that the overhead would be negligible.
Or if we state that the right approach is this one to avoid one more if, I am best to change my approach in the PR I mentioned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you did. This PR cannot follow the approach straightforward due to initial value. We can easily know min/max value for primitive type for an initial value. How can we know the initial value for complicated type?
IMHO, the first element would be good for an initial value.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any difference on the initial value. IMHO the initial value in my PR as well as here should be "null", which means ev. isNull to true and the initial value for ev.value is irrelevant and it could be set to what is returned by the defaultValue function of the CodegenContext. Do you have any concern with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value is used only when isNull is false. And in that context a proper value has been set for value, thus the problem you are describing can't happen IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry that I cannot understand your opinion.
Could you please explain how your opinion works in the case that all of childrens with ev.value = -2 and ev.isNull = false for Greatest by using ctx.defaultValue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before using ev.value, ev.isNull is checked (https://github.com/kiszk/spark/blob/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L612). So in case of all children with -2, the first one will see that ev.isNull is true and then it will set it to false and assign -2 to ev.value. Thus the initial value assigned to ev.value is irrelevant, since ev.isNull is true and as far as ev.isNull is true, what ev.value contains doesn't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation. Got it.
Your idea is to set true into ev.isNull rather than using ctx.defaultValue as an initial value for ev.value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly, it doesn't matter how ev.value is initialized.

@@ -670,6 +673,8 @@ case class Greatest(children: Seq[Expression]) extends Expression {
val evalChildren = children.map(_.genCode(ctx))
val first = evalChildren(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented Nov 12, 2017

Test build #83749 has finished for PR 19729 at commit bc3f0a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Nov 13, 2017

Test build #83793 has finished for PR 19729 at commit 5e87935.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 16, 2017
… greatest

## What changes were proposed in this pull request?

This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

* `least` with a lot of argument
* `greatest` with a lot of argument

## How was this patch tested?

Added a new test case into `ArithmeticExpressionsSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19729 from kiszk/SPARK-22499.

(cherry picked from commit ed885e7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2

@asfgit asfgit closed this in ed885e7 Nov 16, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
… greatest

## What changes were proposed in this pull request?

This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large.
This PR resolved two cases:

* `least` with a lot of argument
* `greatest` with a lot of argument

## How was this patch tested?

Added a new test case into `ArithmeticExpressionsSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19729 from kiszk/SPARK-22499.

(cherry picked from commit ed885e7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants