-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21441][SQL]Incorrect Codegen in SortMergeJoinExec results failures in some cases #18656
Conversation
Hi, @cloud-fan, @vanzin , could you help to take a look? |
Will CodegenFallback be used in wholestage codegen? I think it's not supported. |
Yeah, CodegenFallback just provide a fallback mode. |
No. I meant if there's a CodegenFallback expression, wholestage codegen should not be enabled. |
That's interesting, I will take a look at why the codegen is enabled |
I notice that the CollapseCodegenStages rule will still enable codegen for SortMergeJoinExec without checking CodegenFallback expressions. The logic in Actually, I'am not familiar with this part, please correct me if I get something wrong
|
I think the check for
Can you try it? Thanks. |
Great! I'm also considering to disable codegen for Moreover, I just wonder whether the current pattern oder in Could you give any ideas? @davies |
I have validated both cases with and without CodegenFallback expressions for |
@@ -489,13 +489,13 @@ case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] { | |||
* Inserts an InputAdapter on top of those that do not support codegen. | |||
*/ | |||
private def insertInputAdapter(plan: SparkPlan): SparkPlan = plan match { | |||
case p if !supportCodegen(p) => | |||
// collapse them recursively | |||
InputAdapter(insertWholeStageCodegen(p)) | |||
case j @ SortMergeJoinExec(_, _, _, _, left, right) if j.supportCodegen => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous pattern case already validates j.supportCodegen
, we don't need to verify it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SortMergeJoinExec.supportCodegen checks whether joinType.isInstanceOf[InnerLike]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, I think we should still verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supportCodegen
will call CodegenSupport.supportCodegen
, so SortMergeJoinExec.supportCodegen
is called then.
Btw, can you also add a test for this? Thanks. |
And please also add SQL tag to the PR title, e.g., [SPARK-21441][SQL]. Thanks. |
Thanks for reviewing, I will add a test later. |
@cloud-fan Can you help trigger the jenkins test for this? Thanks. |
ok to test |
LGTM, can you update the PR description? |
Test build #79738 has finished for PR 18656 at commit
|
LGTM for the code change. But I think we're better to have a test for this. |
Test build #79749 has finished for PR 18656 at commit
|
Test build #79751 has finished for PR 18656 at commit
|
…lures in some cases ## What changes were proposed in this pull request? https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21441 This issue can be reproduced by the following example: ``` val spark = SparkSession .builder() .appName("smj-codegen") .master("local") .config("spark.sql.autoBroadcastJoinThreshold", "1") .getOrCreate() val df1 = spark.createDataFrame(Seq((1, 1), (2, 2), (3, 3))).toDF("key", "int") val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "str") val df = df1.join(df2, df1("key") === df2("key")) .filter("int = 2 or reflect('java.lang.Integer', 'valueOf', str) = 1") .select("int") df.show() ``` To conclude, the issue happens when: (1) SortMergeJoin condition contains CodegenFallback expressions. (2) In PhysicalPlan tree, SortMergeJoin node is the child of root node, e.g., the Project in above example. This patch fixes the logic in `CollapseCodegenStages` rule. ## How was this patch tested? Unit test and manual verification in our cluster. Author: donnyzone <wellfengzhu@gmail.com> Closes #18656 from DonnyZone/Fix_SortMergeJoinExec. (cherry picked from commit 6b6dd68) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.2/2.1! |
…lures in some cases ## What changes were proposed in this pull request? https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21441 This issue can be reproduced by the following example: ``` val spark = SparkSession .builder() .appName("smj-codegen") .master("local") .config("spark.sql.autoBroadcastJoinThreshold", "1") .getOrCreate() val df1 = spark.createDataFrame(Seq((1, 1), (2, 2), (3, 3))).toDF("key", "int") val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "str") val df = df1.join(df2, df1("key") === df2("key")) .filter("int = 2 or reflect('java.lang.Integer', 'valueOf', str) = 1") .select("int") df.show() ``` To conclude, the issue happens when: (1) SortMergeJoin condition contains CodegenFallback expressions. (2) In PhysicalPlan tree, SortMergeJoin node is the child of root node, e.g., the Project in above example. This patch fixes the logic in `CollapseCodegenStages` rule. ## How was this patch tested? Unit test and manual verification in our cluster. Author: donnyzone <wellfengzhu@gmail.com> Closes #18656 from DonnyZone/Fix_SortMergeJoinExec. (cherry picked from commit 6b6dd68) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…lures in some cases ## What changes were proposed in this pull request? https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21441 This issue can be reproduced by the following example: ``` val spark = SparkSession .builder() .appName("smj-codegen") .master("local") .config("spark.sql.autoBroadcastJoinThreshold", "1") .getOrCreate() val df1 = spark.createDataFrame(Seq((1, 1), (2, 2), (3, 3))).toDF("key", "int") val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "str") val df = df1.join(df2, df1("key") === df2("key")) .filter("int = 2 or reflect('java.lang.Integer', 'valueOf', str) = 1") .select("int") df.show() ``` To conclude, the issue happens when: (1) SortMergeJoin condition contains CodegenFallback expressions. (2) In PhysicalPlan tree, SortMergeJoin node is the child of root node, e.g., the Project in above example. This patch fixes the logic in `CollapseCodegenStages` rule. ## How was this patch tested? Unit test and manual verification in our cluster. Author: donnyzone <wellfengzhu@gmail.com> Closes apache#18656 from DonnyZone/Fix_SortMergeJoinExec. (cherry picked from commit 6b6dd68) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…lures in some cases https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21441 This issue can be reproduced by the following example: ``` val spark = SparkSession .builder() .appName("smj-codegen") .master("local") .config("spark.sql.autoBroadcastJoinThreshold", "1") .getOrCreate() val df1 = spark.createDataFrame(Seq((1, 1), (2, 2), (3, 3))).toDF("key", "int") val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "str") val df = df1.join(df2, df1("key") === df2("key")) .filter("int = 2 or reflect('java.lang.Integer', 'valueOf', str) = 1") .select("int") df.show() ``` To conclude, the issue happens when: (1) SortMergeJoin condition contains CodegenFallback expressions. (2) In PhysicalPlan tree, SortMergeJoin node is the child of root node, e.g., the Project in above example. This patch fixes the logic in `CollapseCodegenStages` rule. Unit test and manual verification in our cluster. Author: donnyzone <wellfengzhu@gmail.com> Closes apache#18656 from DonnyZone/Fix_SortMergeJoinExec. (cherry picked from commit 6b6dd68) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21441
This issue can be reproduced by the following example:
To conclude, the issue happens when:
(1) SortMergeJoin condition contains CodegenFallback expressions.
(2) In PhysicalPlan tree, SortMergeJoin node is the child of root node, e.g., the Project in above example.
This patch fixes the logic in
CollapseCodegenStages
rule.How was this patch tested?
Unit test and manual verification in our cluster.