-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21871][SQL] Check actual bytecode size when compiling generated code #19083
Conversation
@@ -30,8 +30,15 @@ SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1; | |||
SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1; | |||
|
|||
-- Aggregate with nulls. | |||
-- | |||
-- In SPARK-21871, we added code to check the bytecode size of gen'd methods. If the size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test build #81241 has finished for PR 19083 at commit
|
Test build #81245 has finished for PR 19083 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this one that checks the exact bytecode size instead of source-code-string-size-based heuristics. Some inline comments:
// Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM. | ||
// Error: Could not create the Java Virtual Machine. | ||
// Error: A fatal exception has occurred. Program will exit. | ||
private val hugeMethodLimit = 8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this threshold is only meaningful on HotSpot VM and some HotSpot-derived JVMs. Other JVMs, for instance IBM J9 doesn't use the same threshold. It'd be a bit too strict and biased to make this a non-configurable behavior for all JVMs.
I'd suggest that if we are to do this, at least centralize the JVM detection logic somewhere (e.g. unify with the JVM detection logic in org.apache.spark.util.SizeEstimator
) and only set this kind of threshold based on the detected JVM. That way we're much less likely to regress on other JVMs, and/or even on future versions of the HotSpot VM where it'll get a new JIT compiler (Graal) with new JIT compilation heuristics that could be different from the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment! yea, I agree. I'll rethink this pr based on your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, IBM J9 has the same threshold for too-long functions (I'm not familiar with IBM JVM...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you know @kiszk, haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While IBM JDK has the similar option -Xjit:acceptHugeMethods
, IBM JDK unfortunately does not disclose its threshold now.
My suggestion is to make this threshold configurable by using SQLConf
. Is it possible to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I feel detecting specific JVM implementations might go too far for this purpose. yea, one of options is to add an internal option for this. WDYT? @rednaxelafx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm fine with having a SQLConf
conf flag for the threshold for now, with the option to move to a centralized JVM detection implementation later where we can still have a SQLConf
flag to override the heuristic based on detection.
Setting such a threshold to 64KB will effectively turn the restriction off (i.e. same behavior as -XX:-DontCompileHugeMethods
), so having a simple int conf should be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, ok! Thanks!
@@ -1091,6 +1111,7 @@ object CodeGenerator extends Logging { | |||
} | |||
|
|||
// Then walk the classes to get at the method bytecode. | |||
val methodsToByteCodeSize = mutable.ArrayBuffer[(String, Int)]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know the number of methods right below (for each generated class, cf.methodInfos
would give us the number of methods in that class, of which we expect almost all of them to have a Code
attribute), making pre-sized arrays for each class and then concatenating those arrays into a Seq
as the result would avoid potential resizing wastes from using a big default-sized ArrayBuffer
.
val errMsg = intercept[IllegalArgumentException] { | ||
CodeGenerator.compile(code) | ||
}.getMessage | ||
assert(errMsg.contains("the size of GeneratedClass.agg_doAggregateWithKeys is 9182 and " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard-coded method name and size is a bit too strict. Can we relax that a little bit so that it's more resilient to the naming and the size of the generated classes/methods?
@@ -1079,7 +1099,7 @@ object CodeGenerator extends Logging { | |||
/** | |||
* Records the generated class and method bytecode sizes by inspecting janino private fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to update method comment if we change method purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yea! Thanks! I'll update
throw new IllegalArgumentException( | ||
s"the size of $clazzName.$methodName is $byteCodeSize and this value goes over " + | ||
s"the HugeMethodLimit $hugeMethodLimit (JVM doesn't compile methods " + | ||
"larger than this limit)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We designed to show exception messages (see below the handling of JaninoRuntimeException
), before falling back to non whole-stage codegen execution. I think we should follow the same behavior.
This exception will be hidden from user for now in WholeStageCodegenExec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
6100734
to
9c58237
Compare
Test build #81309 has finished for PR 19083 at commit
|
Test build #81313 has finished for PR 19083 at commit
|
Test build #81312 has finished for PR 19083 at commit
|
Test build #81314 has finished for PR 19083 at commit
|
Test build #81327 has finished for PR 19083 at commit
|
@viirya @rednaxelafx @kiszk Could you check again? |
strExpr = Decode(Encode(strExpr, "utf-8"), "utf-8") | ||
} | ||
// Set the max value at `WHOLESTAGE_HUGE_METHOD_LIMIT` to compile gen'd code by janino | ||
withSQLConf(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key -> Int.MaxValue.toString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change value for WHOLESTAGE_HUGE_METHOD_LIMIT
while this test is not for whole-stage codegen? This parameter does not seem to be related to the whole-stage codegen.
We could select better naming for SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, you're right. I think CODEGEN_HUGE_METHOD_LIMIT
is better?
Test build #81498 has finished for PR 19083 at commit
|
retest this please |
Test build #81507 has finished for PR 19083 at commit
|
@gatorsmile if you get time, could you check this? Thanks. |
logError(msg) | ||
val maxLines = SQLConf.get.loggingMaxLinesForCodegen | ||
logInfo(s"\n${CodeFormatter.format(code, maxLines)}") | ||
throw new IllegalArgumentException(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best kind of exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompileException
is better?
LGTM except one comment |
Test build #82348 has finished for PR 19083 at commit
|
Test build #82353 has finished for PR 19083 at commit
|
|
||
val CODEGEN_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit") | ||
.internal() | ||
.doc("The bytecode size of a single compiled Java function generated by whole-stage codegen." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bytecode
-> The maximum bytecode
.internal() | ||
.doc("The bytecode size of a single compiled Java function generated by whole-stage codegen." + | ||
"When the compiled function exceeds this threshold, " + | ||
"the whole-stage codegen is deactivated for this subtree of the current query plan. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This threshold is not for whole-stage only, right? It is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, you're right. I'll brush up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
retest this please. |
1 similar comment
retest this please. |
@maropu Thanks for working on it. LGTM except two minor comments. |
if (maxCodeSize > sqlContext.conf.hugeMethodLimit) { | ||
logWarning(s"Found too long generated codes: the bytecode size was $maxCodeSize and " + | ||
s"this value went over the limit ${sqlContext.conf.hugeMethodLimit}. To avoid this, " + | ||
s"you can the limit ${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key} higher:\n$treeString") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can set the limit ... higher...
|
||
// Check if compiled code has a too large function | ||
if (maxCodeSize > sqlContext.conf.hugeMethodLimit) { | ||
logWarning(s"Found too long generated codes: the bytecode size was $maxCodeSize and " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We better explain why the too long codes is a problem as previous: Found too long generated codes and JIT optimization might not work: ...
.
// Check if compiled code has a too large function | ||
if (maxCodeSize > sqlContext.conf.hugeMethodLimit) { | ||
logWarning(s"Found too long generated codes: the bytecode size was $maxCodeSize and " + | ||
s"this value went over the limit ${sqlContext.conf.hugeMethodLimit}. To avoid this, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this value went over.... Whole-stage codegen disabled for this plan. To avoid this ...
.
@@ -151,7 +151,7 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext { | |||
} | |||
} | |||
|
|||
def genGroupByCodeGenContext(caseNum: Int): CodegenContext = { | |||
def genGroupByCodeGenContext(caseNum: Int): (CodegenContext, CodeAndComment) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned CodegenContext
is not used. We don't need to return it.
Few minor comments otherwise LGTM. |
Thanks, I'll update soon |
Test build #82435 has finished for PR 19083 at commit
|
fixed @gatorsmile |
|
||
max function bytecode size: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative | ||
------------------------------------------------------------------------------------------------ | ||
hugeMethodLimit = 8000 1043 / 1159 0.6 1591.5 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original codegen = F
case is removed? I think it is reasonable to compare with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, you're right; I'll update and thanks
import java.util.concurrent.ExecutionException | ||
|
||
import org.apache.spark.sql.Row | ||
import org.apache.spark.sql.catalyst.expressions.codegen.{CodeAndComment, CodegenContext, CodeGenerator} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use CodegenContext
anymore and can remove it.
@@ -151,7 +151,7 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext { | |||
} | |||
} | |||
|
|||
def genGroupByCodeGenContext(caseNum: Int): CodegenContext = { | |||
def genGroupByCodeGenContext(caseNum: Int): CodeAndComment = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genGroupByCodeGenContext
-> genGroupByCode
.
Test build #82437 has finished for PR 19083 at commit
|
if (maxCodeSize > sqlContext.conf.hugeMethodLimit) { | ||
logWarning(s"Found too long generated codes and JIT optimization might not work: " + | ||
s"the bytecode size was $maxCodeSize, this value went over the limit " + | ||
s"${sqlContext.conf.hugeMethodLimit}, and the whole-stage codegen was disable " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disable
-> disabled
logWarning(s"Found too long generated codes and JIT optimization might not work: " + | ||
s"the bytecode size was $maxCodeSize, this value went over the limit " + | ||
s"${sqlContext.conf.hugeMethodLimit}, and the whole-stage codegen was disable " + | ||
s"for this plan. To avoid this, you can set the limit " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set
-> raise
. then, remove higher
Test build #82442 has finished for PR 19083 at commit
|
Test build #82443 has finished for PR 19083 at commit
|
retest this please. |
Test build #82445 has finished for PR 19083 at commit
|
Test build #82446 has finished for PR 19083 at commit
|
Thanks! Merged to master. |
…d code This pr added code to check actual bytecode size when compiling generated code. In apache#18810, we added code to give up code compilation and use interpreter execution in `SparkPlan` if the line number of generated functions goes over `maxLinesPerFunction`. But, we already have code to collect metrics for compiled bytecode size in `CodeGenerator` object. So,we could easily reuse the code for this purpose. Added tests in `WholeStageCodegenSuite`. Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#19083 from maropu/SPARK-21871.
What changes were proposed in this pull request?
This pr added code to check actual bytecode size when compiling generated code. In #18810, we added code to give up code compilation and use interpreter execution in
SparkPlan
if the line number of generated functions goes overmaxLinesPerFunction
. But, we already have code to collect metrics for compiled bytecode size inCodeGenerator
object. So,we could easily reuse the code for this purpose.How was this patch tested?
Added tests in
WholeStageCodegenSuite
.