-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22499][SQL] Fix 64KB JVM bytecode limit problem with least and greatest #19729
Conversation
@@ -604,6 +604,8 @@ case class Least(children: Seq[Expression]) extends Expression { | |||
val evalChildren = children.map(_.genCode(ctx)) | |||
val first = evalChildren(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about avoiding splitting first
and rest
and treat them all the same? Since the declaration of ev.isNull
and ev.value
is moved, they can all be treated the same way. In this way the code would be cleaner, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to assign evalChildren(0).value
into ev.value
and then compare all of evalChildren
? It can avoid rest
for cleaner code. However, it increases one comparison.
Or, do you have another idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about doing what I did in #19720: here we might follow the same approach. Also for consistency across the code. Indeed, there would be an if
more, but I think that the overhead would be negligible.
Or if we state that the right approach is this one to avoid one more if
, I am best to change my approach in the PR I mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you did. This PR cannot follow the approach straightforward due to initial value. We can easily know min
/max
value for primitive type for an initial value. How can we know the initial value for complicated type?
IMHO, the first element would be good for an initial value.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any difference on the initial value. IMHO the initial value in my PR as well as here should be "null", which means ev. isNull
to true
and the initial value for ev.value
is irrelevant and it could be set to what is returned by the defaultValue
function of the CodegenContext
. Do you have any concern with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value
is used only when isNull
is false
. And in that context a proper value has been set for value
, thus the problem you are describing can't happen IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry that I cannot understand your opinion.
Could you please explain how your opinion works in the case that all of childrens with ev.value = -2
and ev.isNull = false
for Greatest
by using ctx.defaultValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before using ev.value
, ev.isNull
is checked (https://github.com/kiszk/spark/blob/bc3f0a43956a1e46928f0269d1d8d0aad9ee02b1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L612). So in case of all children with -2
, the first one will see that ev.isNull
is true
and then it will set it to false
and assign -2
to ev.value
. Thus the initial value assigned to ev.value
is irrelevant, since ev.isNull
is true
and as far as ev.isNull
is true
, what ev.value
contains doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your explanation. Got it.
Your idea is to set true
into ev.isNull
rather than using ctx.defaultValue
as an initial value for ev.value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly, it doesn't matter how ev.value
is initialized.
@@ -670,6 +673,8 @@ case class Greatest(children: Seq[Expression]) extends Expression { | |||
val evalChildren = children.map(_.genCode(ctx)) | |||
val first = evalChildren(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Test build #83749 has finished for PR 19729 at commit
|
LGTM |
Test build #83793 has finished for PR 19729 at commit
|
… greatest ## What changes were proposed in this pull request? This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large. This PR resolved two cases: * `least` with a lot of argument * `greatest` with a lot of argument ## How was this patch tested? Added a new test case into `ArithmeticExpressionsSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #19729 from kiszk/SPARK-22499. (cherry picked from commit ed885e7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.2 |
… greatest ## What changes were proposed in this pull request? This PR changes `least` and `greatest` code generation to place generated code for expression for arguments into separated methods if these size could be large. This PR resolved two cases: * `least` with a lot of argument * `greatest` with a lot of argument ## How was this patch tested? Added a new test case into `ArithmeticExpressionsSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19729 from kiszk/SPARK-22499. (cherry picked from commit ed885e7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR changes
least
andgreatest
code generation to place generated code for expression for arguments into separated methods if these size could be large.This PR resolved two cases:
least
with a lot of argumentgreatest
with a lot of argumentHow was this patch tested?
Added a new test case into
ArithmeticExpressionsSuite