[SPARK-11946][SQL] Audit pivot API for 1.6. #9929

rxin · 2015-11-24T08:03:58Z

Currently pivot's signature looks like

@scala.annotation.varargs
def pivot(pivotColumn: Column, values: Column*): GroupedData

@scala.annotation.varargs
def pivot(pivotColumn: String, values: Any*): GroupedData

I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List.

I also made similar changes for Python.

rxin · 2015-11-24T08:04:03Z

cc @aray

rxin · 2015-11-24T08:04:16Z

sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala

+    // Get the distinct values of the column and sort them so its consistent
+    val values = df.select(pivotColumn)
+      .distinct()
+      .sort(pivotColumn)


@aray do you know why we have a "sort" in here?

The sort is there to ensure that the output columns are in a consistent logical order.

ok thanks - i'm going to add a comment there to explain.

rxin · 2015-11-24T08:13:24Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

@@ -1574,7 +1574,6 @@ class DAGScheduler(
  }

  def stop() {
-    logInfo("Stopping DAGScheduler")


this is done as part of #9603 (comment) but it is way too small to deserve its own pr.

SparkQA · 2015-11-24T11:03:38Z

Test build #46591 has finished for PR 9929 at commit 73d37fc.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-11-24T19:49:13Z

python/pyspark/sql/group.py

-        jgd = self._jdf.pivot(_to_java_column(pivot_col),
-                              _to_seq(self.sql_ctx._sc, values, _create_column_from_literal))
+        if values is None:
+            jgd = self._jdf.pivot(pivot_col)


Should we use _to_java_column(pivot_col) and _to_seq() here? or df.pivot(df.a) may fail

SparkQA · 2015-11-24T20:39:54Z

Test build #2104 has finished for PR 9929 at commit 73d37fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-11-24T20:50:51Z

LGTM

rxin · 2015-11-24T20:54:33Z

Thanks - merging this in.

Currently pivot's signature looks like ```scala scala.annotation.varargs def pivot(pivotColumn: Column, values: Column*): GroupedData scala.annotation.varargs def pivot(pivotColumn: String, values: Any*): GroupedData ``` I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List. I also made similar changes for Python. Author: Reynold Xin <rxin@databricks.com> Closes #9929 from rxin/SPARK-11946. (cherry picked from commit f315272) Signed-off-by: Reynold Xin <rxin@databricks.com>

[SPARK-11946][SQL] Audit pivot API for 1.6.

6e18604

rxin reviewed Nov 24, 2015
View reviewed changes

Remove dag scheduler logging.

73d37fc

rxin reviewed Nov 24, 2015
View reviewed changes

davies reviewed Nov 24, 2015
View reviewed changes

asfgit closed this in f315272 Nov 24, 2015

aray mentioned this pull request Jul 3, 2018

[SPARK-24722][SQL] pivot() with Column type argument #21699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11946][SQL] Audit pivot API for 1.6. #9929

[SPARK-11946][SQL] Audit pivot API for 1.6. #9929

rxin commented Nov 24, 2015

rxin commented Nov 24, 2015

rxin Nov 24, 2015

aray Nov 24, 2015

rxin Nov 24, 2015

rxin Nov 24, 2015

SparkQA commented Nov 24, 2015

davies Nov 24, 2015

SparkQA commented Nov 24, 2015

davies commented Nov 24, 2015

rxin commented Nov 24, 2015

[SPARK-11946][SQL] Audit pivot API for 1.6. #9929

[SPARK-11946][SQL] Audit pivot API for 1.6. #9929

Conversation

rxin commented Nov 24, 2015

rxin commented Nov 24, 2015

rxin Nov 24, 2015

Choose a reason for hiding this comment

aray Nov 24, 2015

Choose a reason for hiding this comment

rxin Nov 24, 2015

Choose a reason for hiding this comment

rxin Nov 24, 2015

Choose a reason for hiding this comment

SparkQA commented Nov 24, 2015

davies Nov 24, 2015

Choose a reason for hiding this comment

SparkQA commented Nov 24, 2015

davies commented Nov 24, 2015

rxin commented Nov 24, 2015