[SPARK-7462] By default retain group by columns in aggregate #5996

rxin · 2015-05-08T02:51:49Z

Updated Java, Scala, Python, and R.

AmplabJenkins · 2015-05-08T02:52:11Z

Merged build triggered.

AmplabJenkins · 2015-05-08T02:52:20Z

Merged build started.

SparkQA · 2015-05-08T02:53:25Z

Test build #32181 has started for PR 5996 at commit 1e6e666.

SparkQA · 2015-05-08T03:11:49Z

Test build #32181 has finished for PR 5996 at commit 1e6e666.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-08T03:11:52Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-08T03:11:53Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32181/
Test FAILed.

AmplabJenkins · 2015-05-08T04:02:13Z

Merged build triggered.

AmplabJenkins · 2015-05-08T04:02:20Z

Merged build started.

SparkQA · 2015-05-08T04:02:38Z

Test build #32192 has started for PR 5996 at commit d910141.

SparkQA · 2015-05-08T05:51:49Z

Test build #32192 has finished for PR 5996 at commit d910141.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ElementwiseProduct extends UnaryTransformer[Vector, Vector, ElementwiseProduct]
- class ElementwiseProduct(val scalingVector: Vector) extends VectorTransformer
- trait Star extends NamedExpression with trees.LeafNode[Expression]
- trait CaseWhenLike extends Expression
- case class CaseWhen(branches: Seq[Expression]) extends CaseWhenLike
- case class CaseKeyWhen(key: Expression, branches: Seq[Expression]) extends CaseWhenLike

AmplabJenkins · 2015-05-08T05:51:54Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-08T05:51:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32192/
Test FAILed.

AmplabJenkins · 2015-05-08T06:07:11Z

Merged build triggered.

AmplabJenkins · 2015-05-08T06:07:16Z

Merged build started.

SparkQA · 2015-05-08T06:08:56Z

Test build #32206 has started for PR 5996 at commit b8b87e1.

shivaram · 2015-05-08T07:44:06Z

We can also remove the workaround we used in SparkR

spark/R/pkg/R/group.R

Line 106 in f496bf3

# the GroupedData.agg(col, cols*) API does not contain grouping Column

with this change

Do you want to try this in this PR ? I can send a pull request to your branch too.

Based on reverting code added in commit amplab-extras@9a6be74

Revert workaround in SparkR to retain grouped cols

AmplabJenkins · 2015-05-08T08:22:13Z

Merged build triggered.

AmplabJenkins · 2015-05-08T08:22:22Z

Merged build started.

SparkQA · 2015-05-08T08:24:08Z

Test build #32219 has started for PR 5996 at commit 5f923c0.

SparkQA · 2015-05-08T08:32:44Z

Test build #32206 has finished for PR 5996 at commit b8b87e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-08T08:32:49Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-08T08:32:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32206/
Test PASSed.

SparkQA · 2015-05-08T10:48:58Z

Test build #32219 has finished for PR 5996 at commit 5f923c0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-08T10:49:03Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-08T10:49:03Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32219/
Test PASSed.

marmbrus · 2015-05-08T18:59:06Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

@@ -233,6 +236,9 @@ private[sql] class SQLConf extends Serializable {

  private[spark] def dataFrameSelfJoinAutoResolveAmbiguity: Boolean =
    getConf(DATAFRAME_SELF_JOIN_AUTO_RESOLVE_AMBIGUITY, "true").toBoolean
+
+  private[spark] def dataFrameRetainGroupColumns: Boolean =
+    getConf(DATAFRAME_RETAIN_GROUP_COLUMNS, "true").toBoolean


Increasingly wondering if dataframe flags should be scoped (eager analysis affects sql(...) too and not just dataframe DSL functions).,

let's talk more about this. if we want to do it, we should do it in 1.4.

Conflicts: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

…etain

AmplabJenkins · 2015-05-09T05:17:09Z

Merged build triggered.

AmplabJenkins · 2015-05-09T05:17:16Z

Merged build started.

SparkQA · 2015-05-09T05:19:11Z

Test build #32297 has started for PR 5996 at commit aac7119.

SparkQA · 2015-05-09T07:49:12Z

Test build #32297 timed out for PR 5996 at commit aac7119 after a configured wait of 150m.

AmplabJenkins · 2015-05-09T07:49:18Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-09T07:49:18Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32297/
Test FAILed.

rxin · 2015-05-11T06:12:53Z

Jenkins, retest this please.

AmplabJenkins · 2015-05-11T06:17:10Z

Merged build triggered.

AmplabJenkins · 2015-05-11T06:17:20Z

Merged build started.

SparkQA · 2015-05-11T06:18:00Z

Test build #32375 has started for PR 5996 at commit aac7119.

SparkQA · 2015-05-11T08:43:42Z

Test build #32375 has finished for PR 5996 at commit aac7119.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-11T08:43:48Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-11T08:43:48Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32375/
Test PASSed.

rxin · 2015-05-11T18:35:32Z

I'm merging this in branch-1.4. I will submit a followup PR for documentation.

Updated Java, Scala, Python, and R. Author: Reynold Xin <rxin@databricks.com> Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #5996 from rxin/groupby-retain and squashes the following commits: aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain 5f923c0 [Reynold Xin] Merge pull request #15 from shivaram/sparkr-groupby-retrain c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit amplab-extras@9a6be74 b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite. d910141 [Reynold Xin] Updated rest of the files 1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate (cherry picked from commit 0a4844f) Signed-off-by: Reynold Xin <rxin@databricks.com>

Updated Java, Scala, Python, and R. Author: Reynold Xin <rxin@databricks.com> Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes apache#5996 from rxin/groupby-retain and squashes the following commits: aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain 5f923c0 [Reynold Xin] Merge pull request apache#15 from shivaram/sparkr-groupby-retrain c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit amplab-extras@9a6be74 b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite. d910141 [Reynold Xin] Updated rest of the files 1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate

[SPARK-7462] By default retain group by columns in aggregate

1e6e666

Updated rest of the files

d910141

Fixed DataFrameJoinSuite.

b8b87e1

shivaram and others added 2 commits May 8, 2015 01:08

Revert workaround in SparkR to retain grouped cols

c1de670

Based on reverting code added in commit amplab-extras@9a6be74

Merge pull request #15 from shivaram/sparkr-groupby-retrain

5f923c0

Revert workaround in SparkR to retain grouped cols

marmbrus reviewed May 8, 2015
View reviewed changes

rxin added 2 commits May 8, 2015 22:10

Merge branch 'master' into groupby-retain

f6858f6

Conflicts: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-r…

aac7119

…etain

asfgit closed this in 0a4844f May 11, 2015

[SPARK-7462] By default retain group by columns in aggregate #5996

[SPARK-7462] By default retain group by columns in aggregate #5996

Conversation

rxin commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

SparkQA commented May 8, 2015

SparkQA commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

SparkQA commented May 8, 2015

SparkQA commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

SparkQA commented May 8, 2015

shivaram commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

SparkQA commented May 8, 2015

SparkQA commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

SparkQA commented May 8, 2015

AmplabJenkins commented May 8, 2015

AmplabJenkins commented May 8, 2015

marmbrus May 8, 2015

Choose a reason for hiding this comment

rxin May 9, 2015

Choose a reason for hiding this comment

AmplabJenkins commented May 9, 2015

AmplabJenkins commented May 9, 2015

SparkQA commented May 9, 2015

SparkQA commented May 9, 2015

AmplabJenkins commented May 9, 2015

AmplabJenkins commented May 9, 2015

rxin commented May 11, 2015

AmplabJenkins commented May 11, 2015

AmplabJenkins commented May 11, 2015

SparkQA commented May 11, 2015

SparkQA commented May 11, 2015

AmplabJenkins commented May 11, 2015

AmplabJenkins commented May 11, 2015

rxin commented May 11, 2015