-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11946][SQL] Audit pivot API for 1.6. #9929
Conversation
cc @aray |
// Get the distinct values of the column and sort them so its consistent | ||
val values = df.select(pivotColumn) | ||
.distinct() | ||
.sort(pivotColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aray do you know why we have a "sort" in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort is there to ensure that the output columns are in a consistent logical order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks - i'm going to add a comment there to explain.
@@ -1574,7 +1574,6 @@ class DAGScheduler( | |||
} | |||
|
|||
def stop() { | |||
logInfo("Stopping DAGScheduler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done as part of #9603 (comment) but it is way too small to deserve its own pr.
Test build #46591 has finished for PR 9929 at commit
|
jgd = self._jdf.pivot(_to_java_column(pivot_col), | ||
_to_seq(self.sql_ctx._sc, values, _create_column_from_literal)) | ||
if values is None: | ||
jgd = self._jdf.pivot(pivot_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use _to_java_column(pivot_col)
and _to_seq()
here? or df.pivot(df.a)
may fail
Test build #2104 has finished for PR 9929 at commit
|
LGTM |
Thanks - merging this in. |
Currently pivot's signature looks like ```scala scala.annotation.varargs def pivot(pivotColumn: Column, values: Column*): GroupedData scala.annotation.varargs def pivot(pivotColumn: String, values: Any*): GroupedData ``` I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List. I also made similar changes for Python. Author: Reynold Xin <rxin@databricks.com> Closes #9929 from rxin/SPARK-11946. (cherry picked from commit f315272) Signed-off-by: Reynold Xin <rxin@databricks.com>
Currently pivot's signature looks like
I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List.
I also made similar changes for Python.