New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14127][SQL] "DESC <table>": Extracts schema information from table properties for data source tables #13025
Conversation
2eccb6e
to
54e2618
Compare
Test build #58232 has finished for PR 13025 at commit
|
Test build #58231 has finished for PR 13025 at commit
|
} else { | ||
Some(DataType.fromJson(schemaParts.mkString).asInstanceOf[StructType]) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a time when we only used spark.sql.sources.schema
. Let's check this first and then check spark.sql.sources.schema.numParts
. You can take a look at HiveMetastoreCatalog
's cachedDataSourceTables
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make a PR to handle spark.sql.sources.schema
.
LGTM. Merging to master and branch 2.0 |
…able properties for data source tables ## What changes were proposed in this pull request? This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in `DDLUtils` to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in `DescribeTableCommand` to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read. Sample output: ``` +----------------------------+---------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+---------------------------------------------------------+-------+ |a |bigint | | |b |bigint | | |c |bigint | | |d |bigint | | |# Partition Information | | | |# col_name | | | |d | | | | | | | |# Detailed Table Information| | | |Database: |default | | |Owner: |lian | | |Create Time: |Tue May 10 03:20:34 PDT 2016 | | |Last Access Time: |Wed Dec 31 16:00:00 PST 1969 | | |Location: |file:/Users/lian/local/src/spark/workspace-a/target/... | | |Table Type: |MANAGED | | |Table Parameters: | | | | rawDataSize |-1 | | | numFiles |1 | | | transient_lastDdlTime |1462875634 | | | totalSize |684 | | | spark.sql.sources.provider|parquet | | | EXTERNAL |FALSE | | | COLUMN_STATS_ACCURATE |false | | | numRows |-1 | | | | | | |# Storage Information | | | |SerDe Library: |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat: |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat: |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat| | |Compressed: |No | | |Num Buckets: |2 | | |Bucket Columns: |[b] | | |Sort Columns: |[c] | | |Storage Desc Parameters: | | | | path |file:/Users/lian/local/src/spark/workspace-a/target/... | | | serialization.format |1 | | +----------------------------+---------------------------------------------------------+-------+ ``` ## How was this patch tested? Test cases are added in `HiveDDLSuite` to check command output. Author: Cheng Lian <lian@databricks.com> Closes #13025 from liancheng/spark-14127-extract-schema-info. (cherry picked from commit 8a12580) Signed-off-by: Yin Huai <yhuai@databricks.com>
What changes were proposed in this pull request?
This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in
DDLUtils
to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used inDescribeTableCommand
to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read.Sample output:
How was this patch tested?
Test cases are added in
HiveDDLSuite
to check command output.