Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11946][SQL] Audit pivot API for 1.6. #9929

Closed
wants to merge 2 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Nov 24, 2015

Currently pivot's signature looks like

@scala.annotation.varargs
def pivot(pivotColumn: Column, values: Column*): GroupedData

@scala.annotation.varargs
def pivot(pivotColumn: String, values: Any*): GroupedData

I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List.

I also made similar changes for Python.

@rxin
Copy link
Contributor Author

rxin commented Nov 24, 2015

cc @aray

// Get the distinct values of the column and sort them so its consistent
val values = df.select(pivotColumn)
.distinct()
.sort(pivotColumn)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aray do you know why we have a "sort" in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sort is there to ensure that the output columns are in a consistent logical order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks - i'm going to add a comment there to explain.

@@ -1574,7 +1574,6 @@ class DAGScheduler(
}

def stop() {
logInfo("Stopping DAGScheduler")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done as part of #9603 (comment) but it is way too small to deserve its own pr.

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #46591 has finished for PR 9929 at commit 73d37fc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jgd = self._jdf.pivot(_to_java_column(pivot_col),
_to_seq(self.sql_ctx._sc, values, _create_column_from_literal))
if values is None:
jgd = self._jdf.pivot(pivot_col)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use _to_java_column(pivot_col) and _to_seq() here? or df.pivot(df.a) may fail

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #2104 has finished for PR 9929 at commit 73d37fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Nov 24, 2015

LGTM

@rxin
Copy link
Contributor Author

rxin commented Nov 24, 2015

Thanks - merging this in.

@asfgit asfgit closed this in f315272 Nov 24, 2015
asfgit pushed a commit that referenced this pull request Nov 24, 2015
Currently pivot's signature looks like

```scala
scala.annotation.varargs
def pivot(pivotColumn: Column, values: Column*): GroupedData

scala.annotation.varargs
def pivot(pivotColumn: String, values: Any*): GroupedData
```

I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List.

I also made similar changes for Python.

Author: Reynold Xin <rxin@databricks.com>

Closes #9929 from rxin/SPARK-11946.

(cherry picked from commit f315272)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants