-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16429][SQL] Include StringType
columns in describe()
#14095
Conversation
@@ -228,6 +228,15 @@ class Dataset[T] private[sql]( | |||
} | |||
} | |||
|
|||
private[sql] def aggregatableColumns: Seq[Expression] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private rather than private sql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better.
Thank you for fast review, @rxin . I updated it. |
Test build #61929 has finished for PR 14095 at commit
|
Oh, it's a documented behavior.
|
Test build #61930 has finished for PR 14095 at commit
|
StringType
columns in Scala/Python describe()
StringType
columns in describe()
Can you fix Python? |
Oh, sure! |
And also update the documentation. |
Of course! |
I fixed Python/R and the docs accordingly, and tested locally. |
.filter(f => f.dataType.isInstanceOf[NumericType] || f.dataType.isInstanceOf[StringType]) | ||
.map { n => | ||
queryExecution.analyzed.resolveQuoted(n.name, sparkSession.sessionState.analyzer.resolver) | ||
.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that this would fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, this is an direct extension of line 225 of existing numericColumns
.
https://github.com/apache/spark/pull/14095/files#diff-7a46f10c3cedbf013cf255564d9483cdR225
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the failure of resolveQuoted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not fail because the names come from schema.fields
.
Test build #61962 has finished for PR 14095 at commit
|
Hi, @rxin . |
Test build #61965 has finished for PR 14095 at commit
|
Thanks - merging in master. |
Thank you for merging, @rxin . |
What changes were proposed in this pull request?
Currently, Spark
describe
supportsStringType
. However,describe()
returns a dataset for only all numeric columns. This PR aims to includeStringType
columns indescribe()
,describe
without argument.Background
Before
After
How was this patch tested?
Pass the Jenkins with a update testcase.