Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22206][SQL][SparkR] gapply in R can't work on empty grouping columns #19436

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Oct 5, 2017

What changes were proposed in this pull request?

Looks like FlatMapGroupsInRExec.requiredChildDistribution didn't consider empty grouping attributes. It should be a problem when running EnsureRequirements and gapply in R can't work on empty grouping columns.

How was this patch tested?

Added test.

@SparkQA
Copy link

SparkQA commented Oct 5, 2017

Test build #82468 has finished for PR 19436 at commit e0af5d6.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 5, 2017

Ok. The added test works to verify this is an issue. See the test result of https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82468/testReport.

@viirya viirya force-pushed the fix-flatmapinr-distribution branch from e0af5d6 to 6710141 Compare October 5, 2017 04:30
@HyukjinKwon
Copy link
Member

LGTM mind opening a JIRA?

@viirya
Copy link
Member Author

viirya commented Oct 5, 2017

@HyukjinKwon Yeah, wait me few minutes. Thanks.

@viirya viirya changed the title [SQL][WIP] Fix FlatMapGroupsInR's child distribution when grouping attributes are empty [SPARK-22206][SQL][SparkR] gapply in R can't work on empty grouping columns Oct 5, 2017
@@ -394,7 +394,11 @@ case class FlatMapGroupsInRExec(
override def producedAttributes: AttributeSet = AttributeSet(outputObjAttr)

override def requiredChildDistribution: Seq[Distribution] =
ClusteredDistribution(groupingAttributes) :: Nil
if (groupingAttributes.isEmpty) {
AllTuples :: Nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should empty == all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean empty grouping attributes == all tuples? Yeah, I think no grouping attributes means all tuples are in the one group.

@SparkQA
Copy link

SparkQA commented Oct 5, 2017

Test build #82465 has finished for PR 19436 at commit 6710141.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya force-pushed the fix-flatmapinr-distribution branch from 6710141 to 0e111a8 Compare October 5, 2017 06:22
@SparkQA
Copy link

SparkQA commented Oct 5, 2017

Test build #82470 has finished for PR 19436 at commit 6710141.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2017

Test build #82472 has finished for PR 19436 at commit 0e111a8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 5, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Oct 5, 2017

Test build #82473 has finished for PR 19436 at commit 0e111a8.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 5, 2017

Let me install R environment to test it locally...

# gapply on empty grouping columns.
dfTwoPartition <- repartition(df, 2L)
df1TwoPartition <- gapply(dfTwoPartition, c(), function(key, x) { x }, schema(dfTwoPartition))
expect_identical(sort(collect(df1TwoPartition)), sort(expected))
Copy link
Member

@HyukjinKwon HyukjinKwon Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I tested that this case verifies the changes before and after this PR:

df1 <- gapply(df, c(), function(key, x) { x }, schema(df))
actual <- collect(df1)
expect_identical(actual, expected)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think it should work. repartition is not necessary. I'm just wondering how to test this in R...

Copy link
Member Author

@viirya viirya Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let me use your test code. I don't want to block this PR. :) Thanks.

@viirya viirya force-pushed the fix-flatmapinr-distribution branch from 0e111a8 to 71bf813 Compare October 5, 2017 10:35
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Oct 5, 2017

Test build #82474 has finished for PR 19436 at commit 71bf813.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Oct 5, 2017
…olumns

## What changes were proposed in this pull request?

Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty grouping attributes. It should be a problem when running `EnsureRequirements` and `gapply` in R can't work on empty grouping columns.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19436 from viirya/fix-flatmapinr-distribution.

(cherry picked from commit ae61f18)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
asfgit pushed a commit that referenced this pull request Oct 5, 2017
…olumns

## What changes were proposed in this pull request?

Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty grouping attributes. It should be a problem when running `EnsureRequirements` and `gapply` in R can't work on empty grouping columns.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19436 from viirya/fix-flatmapinr-distribution.

(cherry picked from commit ae61f18)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
@HyukjinKwon
Copy link
Member

Merged to master, branch-2.2 and branch-2.1.

@asfgit asfgit closed this in ae61f18 Oct 5, 2017
@viirya
Copy link
Member Author

viirya commented Oct 5, 2017

Thanks @HyukjinKwon @felixcheung

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…olumns

## What changes were proposed in this pull request?

Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty grouping attributes. It should be a problem when running `EnsureRequirements` and `gapply` in R can't work on empty grouping columns.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#19436 from viirya/fix-flatmapinr-distribution.

(cherry picked from commit ae61f18)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
@viirya viirya deleted the fix-flatmapinr-distribution branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants