-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35415][SQL] Change information to map type for SHOW TABLE EXTENDED command
#32563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive output:
hive> SHOW TABLE EXTENDED LIKE '*';
OK
tableName:spark_32976
owner:yumwang
location:file:/tmp/spark/spark_32976
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id, string name}
partitioned:true
partitionColumns:struct partition_columns { string part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown
tableName:t1
owner:yumwang
location:file:/tmp/hive/t1
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
columns:struct columns { string id}
partitioned:true
partitionColumns:struct partition_columns { date part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138589 has finished for PR 32563 at commit
|
information to map type for SHOW TABLE EXTENDED command
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the change in JDBCSuite, we need to add this change into the SQL migration guide. Could you add some, please, @wangyum ?
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138615 has finished for PR 32563 at commit
|
|
Test build #138622 has finished for PR 32563 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138686 has finished for PR 32563 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138687 has finished for PR 32563 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the spark.sql.legacy.keepCommandOutputSchema logic from ResolveSessionCatalog to v2Commands.
|
Test build #138698 has finished for PR 32563 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't do this, as the output expr ID becomes unstable and it changes after the plan is copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason why we created object DescribeNamespace and others. We shouldn't revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is the information column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output only contains information column.
This is the change of HiveResult.
Example of Hive output :
hive> SHOW TABLE EXTENDED LIKE '*';
OK
tableName:spark_32976
owner:yumwang
location:file:/tmp/spark/spark_32976
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id, string name}
partitioned:true
partitionColumns:struct partition_columns { string part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown
tableName:t1
owner:yumwang
location:file:/tmp/hive/t1
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
columns:struct columns { string id}
partitioned:true
partitionColumns:struct partition_columns { date part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result doesn't match the command output schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Hive only contains one column:
https://github.com/apache/hive/blob/45b48d5fdc3527e56347b297d1a41902d1313220/ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java#L248-L263.
The output similar to Spark's CatalogTable.toLinkedHashMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm confused. How can we allow a mismatch between the schema and data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only for thrift-server and the SQL shell, but not sql(...).show?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only used in the SQL shell, not by the thrift server. Otherwise, the JDBC ResultSet will get wrong for mapping the metadata and column results.
If we turn on the spark.sql.cli.print.header=true for SQL shell
- w/o this PR, the information column matches the first element of the map, then EOL. The rest of the map will print line by line, with wrong/no indentations.
- w/ this PR, I guess the schema header and the result more inappropriate matched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark-sql before:
spark-sql> set spark.sql.legacy.keepCommandOutputSchema=true;
spark.sql.legacy.keepCommandOutputSchema true
Time taken: 0.012 seconds, Fetched 1 row(s)
spark-sql> SHOW TABLE EXTENDED LIKE '*';
default test_parquet false CatalogTable(
Database: default
Table: test_parquet
Owner: yumwang
Created Time: Mon May 24 11:16:33 CST 2021
Last Access: UNKNOWN
Created By: Spark 3.2.0-SNAPSHOT
Type: MANAGED
Provider: hive
Table Properties: [transient_lastDdlTime=1621826201]
Statistics: 290 bytes
Location: file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties: [serialization.format=1]
Partition Provider: Catalog
Schema: root
|-- id: long (nullable = false)
)
Time taken: 0.031 seconds, Fetched 1 row(s)
spark-sql after:
spark-sql> SHOW TABLE EXTENDED LIKE '*';
default test_parquet false {"Created By":"Spark 3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition Provider":"Catalog","Provider":"hive","Schema":"root
|-- id: long (nullable = false)
","Serde Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 bytes","Storage Properties":"[serialization.format=1]","Table Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"}
Time taken: 0.043 seconds, Fetched 1 row(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beeline before:
0: jdbc:hive2://localhost:10000> set spark.sql.legacy.keepCommandOutputSchema=true;
+-------------------------------------------+--------+
| key | value |
+-------------------------------------------+--------+
| spark.sql.legacy.keepCommandOutputSchema | true |
+-------------------------------------------+--------+
1 row selected (0.055 seconds)
0: jdbc:hive2://localhost:10000> SHOW TABLE EXTENDED LIKE '*';
+-----------+---------------+--------------+----------------------------------------------------+
| database | tableName | isTemporary | information |
+-----------+---------------+--------------+----------------------------------------------------+
| default | test_parquet | false | CatalogTable(
Database: default
Table: test_parquet
Owner: yumwang
Created Time: Mon May 24 11:16:33 CST 2021
Last Access: UNKNOWN
Created By: Spark 3.2.0-SNAPSHOT
Type: MANAGED
Provider: hive
Table Properties: [transient_lastDdlTime=1621826201]
Statistics: 290 bytes
Location: file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties: [serialization.format=1]
Partition Provider: Catalog
Schema: root
|-- id: long (nullable = false)
)
|
+-----------+---------------+--------------+----------------------------------------------------+
1 row selected (0.086 seconds)
beeline after:
0: jdbc:hive2://localhost:10000> SHOW TABLE EXTENDED LIKE '*';
+------------+---------------+--------------+----------------------------------------------------+
| namespace | tableName | isTemporary | information |
+------------+---------------+--------------+----------------------------------------------------+
| default | test_parquet | false | {"Created By":"Spark 3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition Provider":"Catalog","Provider":"hive","Schema":"root
|-- id: long (nullable = false)
","Serde Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 bytes","Storage Properties":"[serialization.format=1]","Table Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"} |
+------------+---------------+--------------+----------------------------------------------------+
1 row selected (0.903 seconds)
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138966 has finished for PR 32563 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #138980 has finished for PR 32563 at commit
|
| command.executeCollect().map(_.getString(1)) | ||
| // SHOW TABLE EXTENDED in Hive only output the information column. | ||
| case command @ ExecutedCommandExec(s: ShowTablesCommand) if s.isExtended => | ||
| if (s.conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it's more robust to check the data type of s.output(3)
| isExtended: Boolean = false, | ||
| partitionSpec: Option[TablePartitionSpec] = None) extends LeafRunnableCommand { | ||
|
|
||
| private val keepLegacySchema = conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| Provider: parquet | ||
| Location [not included in comparison]/{warehouse_dir}/showdb.db/show_t2 | ||
| Schema: root | ||
| |-- b: string (nullable = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check other databases and see how they indicate nullable columns?
|
Kubernetes integration test starting |
|
Test build #139023 has finished for PR 32563 at commit
|
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139027 has finished for PR 32563 at commit
|
| -- key: integer (nullable = true) | ||
| -- val: integer (nullable = true) | ||
| ), false | ||
| Schema: struct<key:int,val:int>), false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last concern is the loss of the nullable info. How does other databases do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive> SHOW TABLE EXTENDED LIKE '*';
OK
tableName:spark_32976
owner:yumwang
location:file:/tmp/spark/spark_32976
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id, string name}
partitioned:true
partitionColumns:struct partition_columns { string part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #139234 has finished for PR 32563 at commit
|
| if (tracksPartitionsInCatalog) map.put("Partition Provider", "Catalog") | ||
| if (partitionColumnNames.nonEmpty) map.put("Partition Columns", partitionColumns) | ||
| if (schema.nonEmpty) map.put("Schema", schema.treeString) | ||
| if (schema.nonEmpty) map.put("Schema", schema.catalogString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about removing this line. It seems other DBs does not contains Schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no harm to keep it
| @@ -419,7 +419,7 @@ case class CatalogTable( | |||
| map ++= storage.toLinkedHashMap | |||
| if (tracksPartitionsInCatalog) map.put("Partition Provider", "Catalog") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CatalogTable.toLinkedHashMap was used for display only, not lookup. Shall we revisit all the key names here?
| AttributeReference("tableName", StringType, nullable = false)(), | ||
| AttributeReference("isTemporary", BooleanType, nullable = false)(), | ||
| AttributeReference("information", StringType, nullable = false)()) | ||
| AttributeReference("information", MapType(StringType, StringType, false), nullable = false)()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's better to output a struct type here, so that users can easily know what fields they can query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a general problem that most command outputs are not query-able. ESCRIBE EXTENDED tbl is also for display only.
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |


What changes were proposed in this pull request?
Change
informationcolumn to map type forSHOW TABLE EXTENDEDcommand.Why are the changes needed?
Usually not all information is what we need and it has poor readability. After SPARK-35283 and this PR. We can get the need information by key:
Does this PR introduce any user-facing change?
Yes. The
informationcolumn type changed.How was this patch tested?
Unit test.