-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21720][SQL] Fix 64KB JVM bytecode limit problem with AND or OR #18972
Conversation
Test build #80793 has finished for PR 18972 at commit
|
Jenkins, retest this please |
Test build #80817 has finished for PR 18972 at commit
|
@@ -789,6 +789,36 @@ class CodegenContext { | |||
} | |||
|
|||
/** | |||
* Wrap the generated code of expression by a function. ev,isNull and ev.value are passed | |||
* by global variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not arbitrary codes all can be wrapped by this. The codes must be only created from a row object. We should note this in the comment.
Test build #80845 has finished for PR 18972 at commit
|
retest this please |
Test build #80853 has finished for PR 18972 at commit
|
ping @cloud-fan |
Jenkins, retest this please |
Test build #82675 has finished for PR 18972 at commit
|
Test build #82695 has finished for PR 18972 at commit
|
Jenkins, retest this please |
Test build #83164 has finished for PR 18972 at commit
|
Hi @kiszk , I have a question on this PR. |
@bali0019 It depends on the progress of the review. |
@cloud-fan could you review this if you have time? |
@@ -809,6 +809,36 @@ class CodegenContext { | |||
} | |||
|
|||
/** | |||
* Wrap the generated code of expression, which was created from a row object in INPUT_ROW, | |||
* by a function. ev,isNull and ev.value are passed by global variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: ev.isNull
val funcName = freshName(baseFuncName) | ||
val funcBody = | ||
s""" | ||
|private void $funcName(InternalRow ${INPUT_ROW}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work with whole stage codegen? the input is not InternalRow
but some variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works only if ctx.currentVars == null
.
We will follow to support the whole stage codegen as follow-up in other PRs.
df.filter(filter).count() | ||
}.getMessage | ||
assert(e.contains("grows beyond 64 KB")) | ||
// SPARK-21720 avoids an exception due to JVM code size limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create a config for the threshold instead of hardcoding 1024
, then we can keep the test case here, by setting the threshold to Long.max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I agree with you that we should create a config.
Although I create a PR to add a config for a constant in CodeGenerator
, it revealed that we need additional (large) work to fix active session management.
Can we introduce a config after fixing active session management?
@@ -2067,7 +2067,7 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { | |||
.count | |||
} | |||
|
|||
testQuietly("SPARK-19372: Filter can be executed w/o generated code due to JVM code size limit") { | |||
test("SPARK-19372: Filter can be executed w/o generated code due to JVM code size limit") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test case becomes invalid as we won't trigger the codegen fallback branch now. Can we just ignore this test and add a TODO to say something about the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will do it on Sunday.
Test build #83693 has finished for PR 18972 at commit
|
Test build #83740 has finished for PR 18972 at commit
|
This PR changes `AND` or `OR` code generation to place condition and then expressions' generated code into separated methods if these size could be large. When the method is newly generated, variables for `isNull` and `value` are declared as an instance variable to pass these values (e.g. `isNull1409` and `value1409`) to the callers of the generated method. This PR resolved two cases: * large code size of left expression * large code size of right expression Added a new test case into `CodeGenerationSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18972 from kiszk/SPARK-21720. (cherry picked from commit 9bf696d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.2! |
…essions ## What changes were proposed in this pull request? A frequently reported issue of Spark is the Java 64kb compile error. This is because Spark generates a very big method and it's usually caused by 3 reasons: 1. a deep expression tree, e.g. a very complex filter condition 2. many individual expressions, e.g. expressions can have many children, operators can have many expressions. 3. a deep query plan tree (with whole stage codegen) This PR focuses on 1. There are already several patches(apache#15620 apache#18972 apache#18641) trying to fix this issue and some of them are already merged. However this is an endless job as every non-leaf expression has this issue. This PR proposes to fix this issue in `Expression.genCode`, to make sure the code for a single expression won't grow too big. According to maropu 's benchmark, no regression is found with TPCDS (thanks maropu !): https://docs.google.com/spreadsheets/d/1K3_7lX05-ZgxDXi9X_GleNnDjcnJIfoSlSCDZcL4gdg/edit?usp=sharing ## How was this patch tested? existing test Author: Wenchen Fan <wenchen@databricks.com> Author: Wenchen Fan <cloud0fan@gmail.com> Closes apache#19767 from cloud-fan/codegen.
This PR changes `AND` or `OR` code generation to place condition and then expressions' generated code into separated methods if these size could be large. When the method is newly generated, variables for `isNull` and `value` are declared as an instance variable to pass these values (e.g. `isNull1409` and `value1409`) to the callers of the generated method. This PR resolved two cases: * large code size of left expression * large code size of right expression Added a new test case into `CodeGenerationSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#18972 from kiszk/SPARK-21720. (cherry picked from commit 9bf696d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This PR changes `AND` or `OR` code generation to place condition and then expressions' generated code into separated methods if these size could be large. When the method is newly generated, variables for `isNull` and `value` are declared as an instance variable to pass these values (e.g. `isNull1409` and `value1409`) to the callers of the generated method. This PR resolved two cases: * large code size of left expression * large code size of right expression Added a new test case into `CodeGenerationSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#18972 from kiszk/SPARK-21720. (cherry picked from commit 9bf696d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR changes
AND
orOR
code generation to place condition and then expressions' generated code into separated methods if these size could be large. When the method is newly generated, variables forisNull
andvalue
are declared as an instance variable to pass these values (e.g.isNull1409
andvalue1409
) to the callers of the generated method.This PR resolved two cases:
How was this patch tested?
Added a new test case into
CodeGenerationSuite