-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22206][SQL][SparkR] gapply in R can't work on empty grouping columns #19436
Conversation
Test build #82468 has finished for PR 19436 at commit
|
Ok. The added test works to verify this is an issue. See the test result of https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82468/testReport. |
e0af5d6
to
6710141
Compare
LGTM mind opening a JIRA? |
@HyukjinKwon Yeah, wait me few minutes. Thanks. |
@@ -394,7 +394,11 @@ case class FlatMapGroupsInRExec( | |||
override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr) | |||
|
|||
override def requiredChildDistribution: Seq[Distribution] = | |||
ClusteredDistribution(groupingAttributes) :: Nil | |||
if (groupingAttributes.isEmpty) { | |||
AllTuples :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should empty == all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean empty grouping attributes == all tuples? Yeah, I think no grouping attributes means all tuples are in the one group.
Test build #82465 has finished for PR 19436 at commit
|
6710141
to
0e111a8
Compare
Test build #82470 has finished for PR 19436 at commit
|
Test build #82472 has finished for PR 19436 at commit
|
retest this please. |
Test build #82473 has finished for PR 19436 at commit
|
Let me install R environment to test it locally... |
# gapply on empty grouping columns. | ||
dfTwoPartition <- repartition(df, 2L) | ||
df1TwoPartition <- gapply(dfTwoPartition, c(), function(key, x) { x }, schema(dfTwoPartition)) | ||
expect_identical(sort(collect(df1TwoPartition)), sort(expected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I tested that this case verifies the changes before and after this PR:
df1 <- gapply(df, c(), function(key, x) { x }, schema(df))
actual <- collect(df1)
expect_identical(actual, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think it should work. repartition
is not necessary. I'm just wondering how to test this in R...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let me use your test code. I don't want to block this PR. :) Thanks.
0e111a8
to
71bf813
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #82474 has finished for PR 19436 at commit
|
…olumns ## What changes were proposed in this pull request? Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty grouping attributes. It should be a problem when running `EnsureRequirements` and `gapply` in R can't work on empty grouping columns. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19436 from viirya/fix-flatmapinr-distribution. (cherry picked from commit ae61f18) Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
…olumns ## What changes were proposed in this pull request? Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty grouping attributes. It should be a problem when running `EnsureRequirements` and `gapply` in R can't work on empty grouping columns. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19436 from viirya/fix-flatmapinr-distribution. (cherry picked from commit ae61f18) Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
Merged to master, branch-2.2 and branch-2.1. |
Thanks @HyukjinKwon @felixcheung |
…olumns ## What changes were proposed in this pull request? Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty grouping attributes. It should be a problem when running `EnsureRequirements` and `gapply` in R can't work on empty grouping columns. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#19436 from viirya/fix-flatmapinr-distribution. (cherry picked from commit ae61f18) Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
What changes were proposed in this pull request?
Looks like
FlatMapGroupsInRExec.requiredChildDistribution
didn't consider empty grouping attributes. It should be a problem when runningEnsureRequirements
andgapply
in R can't work on empty grouping columns.How was this patch tested?
Added test.