-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20534][SQL] Make outer generate exec return empty rows #17810
Conversation
Test build #76309 has finished for PR 17810 at commit
|
Test build #76310 has finished for PR 17810 at commit
|
Test build #76311 has finished for PR 17810 at commit
|
@@ -91,7 +91,7 @@ class GeneratorFunctionSuite extends QueryTest with SharedSQLContext { | |||
val df = Seq((1, Seq(1, 2, 3)), (2, Seq())).toDF("a", "intList") | |||
checkAnswer( | |||
df.select(explode_outer('intList)), | |||
Row(1) :: Row(2) :: Row(3) :: Nil) | |||
Row(1) :: Row(2) :: Row(3) :: Row(null) :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the definition of outer
, outer
has no effect when join
is false. That means, df.select(explode_outer('intList))
should have the same outcome of df.select(explode('intList))
. Both should not generate null
, right?
@param outer when true, each input row will be output at least once, even if the output of the given
generator
is empty.outer
has no effect whenjoin
is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we should move away from that definition, and make outer
independent from join
. The reason for this is that I think it is super confusing that the following queries produce different results:
df.select($"k", outer_explode($"v"))
df.select(outer_explode($"v"))
If you do not want the outer semantic, then just use a regular generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Could we also update the document to explain the new behaviors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
LGTM |
Test build #76325 has finished for PR 17810 at commit
|
## What changes were proposed in this pull request? Generate exec does not produce `null` values if the generator for the input row is empty and the generate operates in outer mode without join. This is caused by the fact that the `join=false` code path is different from the `join=true` code path, and that the `join=false` code path did deal with outer properly. This PR addresses this issue. ## How was this patch tested? Updated `outer*` tests in `GeneratorFunctionSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #17810 from hvanhovell/SPARK-20534. (cherry picked from commit 6b44c4d) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Thanks! Merging to master/2.2 Since https://issues.apache.org/jira/browse/SPARK-13721 was only merged to 2.2 branch. Thus, I did not back port it to the earlier versions. |
What changes were proposed in this pull request?
Generate exec does not produce
null
values if the generator for the input row is empty and the generate operates in outer mode without join. This is caused by the fact that thejoin=false
code path is different from thejoin=true
code path, and that thejoin=false
code path did deal with outer properly. This PR addresses this issue.How was this patch tested?
Updated
outer*
tests inGeneratorFunctionSuite
.