-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22494][SQL] Fix 64KB limit exception with Coalesce and AtleastNNonNulls #19720
Conversation
Test build #83707 has finished for PR 19720 at commit
|
Test build #83719 has finished for PR 19720 at commit
|
ev.copy(code = s""" | ||
${ev.isNull} = true; | ||
${ev.value} = ${ctx.defaultValue(dataType)}; | ||
${ctx.splitExpressions(ctx.INPUT_ROW, evals)}""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can only split the expressions when they are created from a row object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't see which is the problem here. I see that here the row object is null
, but the goal is to set ev.isNull
and ev.value
, which is done. May you explain me if there is something I am missing? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
splitExpressions
puts expression codes in individual functions. Only if those expressions's input is from a row object, we can do this. If their input is from currentVars
, the splitting doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't cause problem in fact. In that case, it works as before, i.e., expressions.mkString("\n")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason why I put ev.isNull
and ev.value
as attributes of the generated class, in this way they can be used as before in the individual functions. If you want, I can try and use the other overloaded method splitExpressions
without passing the input row to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. nvm. looks good.
}.mkString("\n") | ||
} | ||
|
||
val code = ctx.splitExpressions(ctx.INPUT_ROW, evals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
${ctx.javaType(dataType)} ${ev.value} = ${firstEval.value};""" + | ||
rest.map { e => | ||
ctx.addMutableState("boolean", ev.isNull, s"") | ||
ctx.addMutableState(ctx.javaType(dataType), ev.value, s"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"" -> ""
@@ -357,7 +358,8 @@ case class AtLeastNNonNulls(n: Int, children: Seq[Expression]) extends Predicate | |||
|
|||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | |||
val nonnull = ctx.freshName("nonnull") | |||
val code = children.map { e => | |||
ctx.addMutableState("int", nonnull, s"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"" -> "".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Anyway, I am refactoring this, since I figured out a way to avoid the declaration of a global attribute. I can't do the same for coalesce unfortunately, because there I'd need to return two values from the methods.
LGTM |
Test build #83742 has finished for PR 19720 at commit
|
This reverts commit 911e172.
Test build #83743 has finished for PR 19720 at commit
|
LGTM |
Test build #83754 has finished for PR 19720 at commit
|
Test build #83781 has finished for PR 19720 at commit
|
Test build #83797 has finished for PR 19720 at commit
|
""" | ||
}, | ||
foldFunctions = { funcCalls => | ||
funcCalls.map { funcCall => s"$nonnull = $funcCall;" }.mkString("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funcCalls.map(funcCall => s"$nonnull = $funcCall;").mkString("\n")
val code = if (ctx.INPUT_ROW == null || ctx.currentVars != null) { | ||
evals.mkString("\n") | ||
} else { | ||
ctx.splitExpressions(evals, "atLeastNNonNull", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: atLeastNNonNulls
@@ -357,7 +358,7 @@ case class AtLeastNNonNulls(n: Int, children: Seq[Expression]) extends Predicate | |||
|
|||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | |||
val nonnull = ctx.freshName("nonnull") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really matter not to have nonnull
as a global variable? The code is simpler if it is a global one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does, because having it as a global variable may put more pressure on the constant pool (see SPARK-18016). Thus, whenever feasible, I do think that we should keep it local.
Test build #83843 has finished for PR 19720 at commit
|
LGTM |
boolean ${ev.isNull} = ${firstEval.isNull}; | ||
${ctx.javaType(dataType)} ${ev.value} = ${firstEval.value};""" + | ||
rest.map { e => | ||
ctx.addMutableState("boolean", ev.isNull, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ensure ev.isNull
always has a variable name? In other words, ev.isNull
never has true
or false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the previous code there is the same assumption, thus yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is guaranteed by Expression.genCode
.
…NNonNulls ## What changes were proposed in this pull request? Both `Coalesce` and `AtLeastNNonNulls` can cause the 64KB limit exception when used with a lot of arguments and/or complex expressions. This PR splits their expressions in order to avoid the issue. ## How was this patch tested? Added UTs Author: Marco Gaido <marcogaido91@gmail.com> Author: Marco Gaido <mgaido@hortonworks.com> Closes #19720 from mgaido91/SPARK-22494. (cherry picked from commit 4e7f07e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.2! |
@cloud-fan please do not backport this to 2.2. In 2.2 we don't have SPARK-18016 and this is adding new variables in the case of coalesce. Thus it can generate an higher pressure on the constant pool and this may even cause a regression IMHO. |
hmm, isn't running slower better than can't run? |
It's not about running slower. This PR solves the problem which makes the user facing an exception if there are a lot of arguments in The same thing is true for all the other PRs similar to this one submitted by @kiszk. Then, we should keep all these changes only on master, where part of SPARK-18016 is landing and hopefully soon it will be completely solved. |
If there is a query with a lot of coalesce function, wouldn't it hit the 64kb issue? |
No, a query with a |
I don't have a strong preference, but there were many 64kb compile error fixes for 2.2 or prior(e.g. @kiszk what do you think? |
…NNonNulls ## What changes were proposed in this pull request? Both `Coalesce` and `AtLeastNNonNulls` can cause the 64KB limit exception when used with a lot of arguments and/or complex expressions. This PR splits their expressions in order to avoid the issue. ## How was this patch tested? Added UTs Author: Marco Gaido <marcogaido91@gmail.com> Author: Marco Gaido <mgaido@hortonworks.com> Closes apache#19720 from mgaido91/SPARK-22494. (cherry picked from commit 4e7f07e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Both
Coalesce
andAtLeastNNonNulls
can cause the 64KB limit exception when used with a lot of arguments and/or complex expressions.This PR splits their expressions in order to avoid the issue.
How was this patch tested?
Added UTs