[SPARK-14127][SQL] "DESC <table>": Extracts schema information from table properties for data source tables #13025

liancheng · 2016-05-10T10:49:34Z

What changes were proposed in this pull request?

This is a follow-up of #12934 and #12844. This PR adds a set of utility methods in DDLUtils to help extract schema information (user-defined schema, partition columns, and bucketing information) from data source table properties. These utility methods are then used in DescribeTableCommand to refine output for data source tables. Before this PR, the aforementioned schema information are only shown as table properties, which are hard to read.

Sample output:

+----------------------------+---------------------------------------------------------+-------+
|col_name                    |data_type                                                |comment|
+----------------------------+---------------------------------------------------------+-------+
|a                           |bigint                                                   |       |
|b                           |bigint                                                   |       |
|c                           |bigint                                                   |       |
|d                           |bigint                                                   |       |
|# Partition Information     |                                                         |       |
|# col_name                  |                                                         |       |
|d                           |                                                         |       |
|                            |                                                         |       |
|# Detailed Table Information|                                                         |       |
|Database:                   |default                                                  |       |
|Owner:                      |lian                                                     |       |
|Create Time:                |Tue May 10 03:20:34 PDT 2016                             |       |
|Last Access Time:           |Wed Dec 31 16:00:00 PST 1969                             |       |
|Location:                   |file:/Users/lian/local/src/spark/workspace-a/target/...  |       |
|Table Type:                 |MANAGED                                                  |       |
|Table Parameters:           |                                                         |       |
|  rawDataSize               |-1                                                       |       |
|  numFiles                  |1                                                        |       |
|  transient_lastDdlTime     |1462875634                                               |       |
|  totalSize                 |684                                                      |       |
|  spark.sql.sources.provider|parquet                                                  |       |
|  EXTERNAL                  |FALSE                                                    |       |
|  COLUMN_STATS_ACCURATE     |false                                                    |       |
|  numRows                   |-1                                                       |       |
|                            |                                                         |       |
|# Storage Information       |                                                         |       |
|SerDe Library:              |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       |       |
|InputFormat:                |org.apache.hadoop.mapred.SequenceFileInputFormat         |       |
|OutputFormat:               |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat|       |
|Compressed:                 |No                                                       |       |
|Num Buckets:                |2                                                        |       |
|Bucket Columns:             |[b]                                                      |       |
|Sort Columns:               |[c]                                                      |       |
|Storage Desc Parameters:    |                                                         |       |
|  path                      |file:/Users/lian/local/src/spark/workspace-a/target/...  |       |
|  serialization.format      |1                                                        |       |
+----------------------------+---------------------------------------------------------+-------+

How was this patch tested?

Test cases are added in HiveDDLSuite to check command output.

SparkQA · 2016-05-10T12:19:22Z

Test build #58232 has finished for PR 13025 at commit 54e2618.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-10T12:20:30Z

Test build #58231 has finished for PR 13025 at commit 2eccb6e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-10T15:53:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+    } else {
+      Some(DataType.fromJson(schemaParts.mkString).asInstanceOf[StructType])
+    }
+  }


There was a time when we only used spark.sql.sources.schema. Let's check this first and then check spark.sql.sources.schema.numParts. You can take a look at HiveMetastoreCatalog's cachedDataSourceTables.

I will make a PR to handle spark.sql.sources.schema.

yhuai · 2016-05-10T15:59:48Z

LGTM. Merging to master and branch 2.0

Extracts schema information from table properties for data source tables

54e2618

liancheng force-pushed the spark-14127-extract-schema-info branch from 2eccb6e to 54e2618 Compare May 10, 2016 10:53

liancheng mentioned this pull request May 10, 2016

[SPARK-14346] SHOW CREATE TABLE for data source tables #12781

Closed

yhuai reviewed May 10, 2016
View reviewed changes

asfgit closed this in 8a12580 May 10, 2016

liancheng deleted the spark-14127-extract-schema-info branch May 12, 2016 15:25

gatorsmile mentioned this pull request Sep 24, 2016

[SPARK-17612][SQL] Support DESCRIBE table PARTITION SQL syntax #15168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14127][SQL] "DESC <table>": Extracts schema information from table properties for data source tables #13025

[SPARK-14127][SQL] "DESC <table>": Extracts schema information from table properties for data source tables #13025

liancheng commented May 10, 2016

SparkQA commented May 10, 2016

SparkQA commented May 10, 2016

yhuai May 10, 2016

yhuai May 10, 2016 •

edited

yhuai commented May 10, 2016

[SPARK-14127][SQL] "DESC <table>": Extracts schema information from table properties for data source tables #13025

[SPARK-14127][SQL] "DESC <table>": Extracts schema information from table properties for data source tables #13025

Conversation

liancheng commented May 10, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented May 10, 2016

SparkQA commented May 10, 2016

yhuai May 10, 2016

Choose a reason for hiding this comment

yhuai May 10, 2016 • edited

Choose a reason for hiding this comment

yhuai commented May 10, 2016

yhuai May 10, 2016 •

edited