Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7462] By default retain group by columns in aggregate #5996

Closed
wants to merge 7 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented May 8, 2015

Updated Java, Scala, Python, and R.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32181 has started for PR 5996 at commit 1e6e666.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32181 has finished for PR 5996 at commit 1e6e666.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32181/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32192 has started for PR 5996 at commit d910141.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32192 has finished for PR 5996 at commit d910141.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct extends UnaryTransformer[Vector, Vector, ElementwiseProduct]
    • class ElementwiseProduct(val scalingVector: Vector) extends VectorTransformer
    • trait Star extends NamedExpression with trees.LeafNode[Expression]
    • trait CaseWhenLike extends Expression
    • case class CaseWhen(branches: Seq[Expression]) extends CaseWhenLike
    • case class CaseKeyWhen(key: Expression, branches: Seq[Expression]) extends CaseWhenLike

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32192/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32206 has started for PR 5996 at commit b8b87e1.

@shivaram
Copy link
Contributor

shivaram commented May 8, 2015

We can also remove the workaround we used in SparkR

# the GroupedData.agg(col, cols*) API does not contain grouping Column
with this change

Do you want to try this in this PR ? I can send a pull request to your branch too.

shivaram and others added 2 commits May 8, 2015 01:08
Revert workaround in SparkR to retain grouped cols
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32219 has started for PR 5996 at commit 5f923c0.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32206 has finished for PR 5996 at commit b8b87e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32206/
Test PASSed.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32219 has finished for PR 5996 at commit 5f923c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32219/
Test PASSed.

@@ -233,6 +236,9 @@ private[sql] class SQLConf extends Serializable {

private[spark] def dataFrameSelfJoinAutoResolveAmbiguity: Boolean =
getConf(DATAFRAME_SELF_JOIN_AUTO_RESOLVE_AMBIGUITY, "true").toBoolean

private[spark] def dataFrameRetainGroupColumns: Boolean =
getConf(DATAFRAME_RETAIN_GROUP_COLUMNS, "true").toBoolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasingly wondering if dataframe flags should be scoped (eager analysis affects sql(...) too and not just dataframe DSL functions).,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's talk more about this. if we want to do it, we should do it in 1.4.

rxin added 2 commits May 8, 2015 22:10
Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32297 has started for PR 5996 at commit aac7119.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32297 timed out for PR 5996 at commit aac7119 after a configured wait of 150m.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32297/
Test FAILed.

@rxin
Copy link
Contributor Author

rxin commented May 11, 2015

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32375 has started for PR 5996 at commit aac7119.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32375 has finished for PR 5996 at commit aac7119.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32375/
Test PASSed.

@rxin
Copy link
Contributor Author

rxin commented May 11, 2015

I'm merging this in branch-1.4. I will submit a followup PR for documentation.

asfgit pushed a commit that referenced this pull request May 11, 2015
Updated Java, Scala, Python, and R.

Author: Reynold Xin <rxin@databricks.com>
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #5996 from rxin/groupby-retain and squashes the following commits:

aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain
f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain
5f923c0 [Reynold Xin] Merge pull request #15 from shivaram/sparkr-groupby-retrain
c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit amplab-extras@9a6be74
b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite.
d910141 [Reynold Xin] Updated rest of the files
1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate

(cherry picked from commit 0a4844f)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 0a4844f May 11, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Updated Java, Scala, Python, and R.

Author: Reynold Xin <rxin@databricks.com>
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes apache#5996 from rxin/groupby-retain and squashes the following commits:

aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain
f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain
5f923c0 [Reynold Xin] Merge pull request apache#15 from shivaram/sparkr-groupby-retrain
c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit amplab-extras@9a6be74
b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite.
d910141 [Reynold Xin] Updated rest of the files
1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Updated Java, Scala, Python, and R.

Author: Reynold Xin <rxin@databricks.com>
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes apache#5996 from rxin/groupby-retain and squashes the following commits:

aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain
f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain
5f923c0 [Reynold Xin] Merge pull request apache#15 from shivaram/sparkr-groupby-retrain
c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit amplab-extras@9a6be74
b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite.
d910141 [Reynold Xin] Updated rest of the files
1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Updated Java, Scala, Python, and R.

Author: Reynold Xin <rxin@databricks.com>
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes apache#5996 from rxin/groupby-retain and squashes the following commits:

aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain
f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain
5f923c0 [Reynold Xin] Merge pull request apache#15 from shivaram/sparkr-groupby-retrain
c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit amplab-extras@9a6be74
b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite.
d910141 [Reynold Xin] Updated rest of the files
1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants