New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43113][SQL][FOLLOWUP] Add comment about copying steam-side variables #40881
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HyukjinKwon
approved these changes
Apr 21, 2023
will leave it to @cloud-fan though. |
cloud-fan
approved these changes
Apr 21, 2023
thanks, merging to master/3.4! |
cloud-fan
pushed a commit
that referenced
this pull request
Apr 21, 2023
…iables ### What changes were proposed in this pull request? Add a comment explaining a tricky situation involving the evaluation of stream-side variables. This is a follow-up to #40766. ### Why are the changes needed? Make the code more clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #40881 from bersprockets/SPARK-43113_followup. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6d4ed13) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
bersprockets
added a commit
to bersprockets/spark
that referenced
this pull request
Apr 23, 2023
…iables ### What changes were proposed in this pull request? Add a comment explaining a tricky situation involving the evaluation of stream-side variables. This is a follow-up to apache#40766. ### Why are the changes needed? Make the code more clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes apache#40881 from bersprockets/SPARK-43113_followup. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
HyukjinKwon
pushed a commit
that referenced
this pull request
Apr 24, 2023
…g code for a bound condition ### What changes were proposed in this pull request? This is a back-port of #40766 and #40881. In `JoinCodegenSupport#getJoinCondition`, evaluate any referenced stream-side variables before using them in the generated code. This patch doesn't evaluate the passed stream-side variables directly, but instead evaluates a copy (`streamVars2`). This is because `SortMergeJoin#codegenFullOuter` will want to evaluate the stream-side vars within a different scope than the condition check, so we mustn't delete the initialization code from the original `ExprCode` instances. ### Why are the changes needed? When a bound condition of a full outer join references the same stream-side column more than once, wholestage codegen generates bad code. For example, the following query fails with a compilation error: ``` create or replace temp view v1 as select * from values (1, 1), (2, 2), (3, 1) as v1(key, value); create or replace temp view v2 as select * from values (1, 22, 22), (3, -1, -1), (7, null, null) as v2(a, b, c); select * from v1 full outer join v2 on key = a and value > b and value > c; ``` The error is: ``` org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 277, Column 9: Redefinition of local variable "smj_isNull_7" ``` The same error occurs with code generated from ShuffleHashJoinExec: ``` select /*+ SHUFFLE_HASH(v2) */ * from v1 full outer join v2 on key = a and value > b and value > c; ``` In this case, the error is: ``` org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 174, Column 5: Redefinition of local variable "shj_value_1" ``` Neither `SortMergeJoin#codegenFullOuter` nor `ShuffledHashJoinExec#doProduce` evaluate the stream-side variables before calling `consumeFullOuterJoinRow#getJoinCondition`. As a result, `getJoinCondition` generates definition/initialization code for each referenced stream-side variable at the point of use. If a stream-side variable is used more than once in the bound condition, the definition/initialization code is generated more than once, resulting in the "Redefinition of local variable" error. In the end, the query succeeds, since Spark disables wholestage codegen and tries again. (In the case other join-type/strategy pairs, either the implementations don't call `JoinCodegenSupport#getJoinCondition`, or the stream-side variables are pre-evaluated before the call is made, so no error happens in those cases). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #40917 from bersprockets/full_join_codegen_issue_br33. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
snmvaughan
pushed a commit
to snmvaughan/spark
that referenced
this pull request
Jun 20, 2023
…iables ### What changes were proposed in this pull request? Add a comment explaining a tricky situation involving the evaluation of stream-side variables. This is a follow-up to apache#40766. ### Why are the changes needed? Make the code more clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes apache#40881 from bersprockets/SPARK-43113_followup. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6d4ed13) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
GladwinLee
pushed a commit
to lyft/spark
that referenced
this pull request
Oct 10, 2023
…iables ### What changes were proposed in this pull request? Add a comment explaining a tricky situation involving the evaluation of stream-side variables. This is a follow-up to apache#40766. ### Why are the changes needed? Make the code more clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes apache#40881 from bersprockets/SPARK-43113_followup. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6d4ed13) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii
pushed a commit
to lyft/spark
that referenced
this pull request
Oct 10, 2023
…iables ### What changes were proposed in this pull request? Add a comment explaining a tricky situation involving the evaluation of stream-side variables. This is a follow-up to apache#40766. ### Why are the changes needed? Make the code more clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes apache#40881 from bersprockets/SPARK-43113_followup. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6d4ed13) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add a comment explaining a tricky situation involving the evaluation of stream-side variables.
This is a follow-up to #40766.
Why are the changes needed?
Make the code more clear.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
N/A