Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented May 16, 2021

What changes were proposed in this pull request?

Change information column to map type for SHOW TABLE EXTENDED command.

Why are the changes needed?

Usually not all information is what we need and it has poor readability. After SPARK-35283 and this PR. We can get the need information by key:

WITH s AS (SHOW TABLE EXTENDED LIKE '*') SELECT tableName, information['Provider'] FROM s
+------------+---------------------+
|tableName   |information[Provider]|
+------------+---------------------+
|test_delta  |delta                |
|test_parquet|parquet              |
+------------+---------------------+

Does this PR introduce any user-facing change?

Yes. The information column type changed.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label May 16, 2021
Comment on lines 68 to 72
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive output:

hive> SHOW TABLE EXTENDED LIKE '*';
OK
tableName:spark_32976
owner:yumwang
location:file:/tmp/spark/spark_32976
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id, string name}
partitioned:true
partitionColumns:struct partition_columns { string part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown

tableName:t1
owner:yumwang
location:file:/tmp/hive/t1
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
columns:struct columns { string id}
partitioned:true
partitionColumns:struct partition_columns { date part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown

@SparkQA
Copy link

SparkQA commented May 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43110/

@SparkQA
Copy link

SparkQA commented May 16, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43110/

@SparkQA
Copy link

SparkQA commented May 16, 2021

Test build #138589 has finished for PR 32563 at commit d2e87b8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-35415][SQL] Change information to map type for SHOW TABLE EXTENDED command [SPARK-35415][SQL] Change information to map type for SHOW TABLE EXTENDED command May 16, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the change in JDBCSuite, we need to add this change into the SQL migration guide. Could you add some, please, @wangyum ?

@github-actions github-actions bot added the DOCS label May 17, 2021
@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43136/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43136/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43142/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43142/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Test build #138615 has finished for PR 32563 at commit f936eff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2021

Test build #138622 has finished for PR 32563 at commit c8c4853.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43207/

@SparkQA
Copy link

SparkQA commented May 18, 2021

Test build #138686 has finished for PR 32563 at commit 0ffbce4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43208/

@SparkQA
Copy link

SparkQA commented May 18, 2021

Test build #138687 has finished for PR 32563 at commit ee70e0b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43219/

@SparkQA
Copy link

SparkQA commented May 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43219/

Comment on lines 616 to 628
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the spark.sql.legacy.keepCommandOutputSchema logic from ResolveSessionCatalog to v2Commands.

@SparkQA
Copy link

SparkQA commented May 19, 2021

Test build #138698 has finished for PR 32563 at commit 8c1e5ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do this, as the output expr ID becomes unstable and it changes after the plan is copied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why we created object DescribeNamespace and others. We shouldn't revert it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is the information column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output only contains information column.

This is the change of HiveResult.

Example of Hive output :

hive> SHOW TABLE EXTENDED LIKE '*';
OK
tableName:spark_32976
owner:yumwang
location:file:/tmp/spark/spark_32976
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id, string name}
partitioned:true
partitionColumns:struct partition_columns { string part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown

tableName:t1
owner:yumwang
location:file:/tmp/hive/t1
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
columns:struct columns { string id}
partitioned:true
partitionColumns:struct partition_columns { date part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result doesn't match the command output schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm confused. How can we allow a mismatch between the schema and data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only for thrift-server and the SQL shell, but not sql(...).show?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only used in the SQL shell, not by the thrift server. Otherwise, the JDBC ResultSet will get wrong for mapping the metadata and column results.

If we turn on the spark.sql.cli.print.header=true for SQL shell

  1. w/o this PR, the information column matches the first element of the map, then EOL. The rest of the map will print line by line, with wrong/no indentations.
  2. w/ this PR, I guess the schema header and the result more inappropriate matched

Copy link
Member Author

@wangyum wangyum May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark-sql before:

spark-sql> set spark.sql.legacy.keepCommandOutputSchema=true;
spark.sql.legacy.keepCommandOutputSchema	true
Time taken: 0.012 seconds, Fetched 1 row(s)
spark-sql> SHOW TABLE EXTENDED LIKE '*';
default	test_parquet	false	CatalogTable(
Database: default
Table: test_parquet
Owner: yumwang
Created Time: Mon May 24 11:16:33 CST 2021
Last Access: UNKNOWN
Created By: Spark 3.2.0-SNAPSHOT
Type: MANAGED
Provider: hive
Table Properties: [transient_lastDdlTime=1621826201]
Statistics: 290 bytes
Location: file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties: [serialization.format=1]
Partition Provider: Catalog
Schema: root
 |-- id: long (nullable = false)
)

Time taken: 0.031 seconds, Fetched 1 row(s)

spark-sql after:

spark-sql> SHOW TABLE EXTENDED LIKE '*';
default	test_parquet	false	{"Created By":"Spark 3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition Provider":"Catalog","Provider":"hive","Schema":"root
 |-- id: long (nullable = false)
","Serde Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 bytes","Storage Properties":"[serialization.format=1]","Table Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"}
Time taken: 0.043 seconds, Fetched 1 row(s)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beeline before:

0: jdbc:hive2://localhost:10000> set spark.sql.legacy.keepCommandOutputSchema=true;
+-------------------------------------------+--------+
|                    key                    | value  |
+-------------------------------------------+--------+
| spark.sql.legacy.keepCommandOutputSchema  | true   |
+-------------------------------------------+--------+
1 row selected (0.055 seconds)
0: jdbc:hive2://localhost:10000> SHOW TABLE EXTENDED LIKE '*';
+-----------+---------------+--------------+----------------------------------------------------+
| database  |   tableName   | isTemporary  |                    information                     |
+-----------+---------------+--------------+----------------------------------------------------+
| default   | test_parquet  | false        | CatalogTable(
Database: default
Table: test_parquet
Owner: yumwang
Created Time: Mon May 24 11:16:33 CST 2021
Last Access: UNKNOWN
Created By: Spark 3.2.0-SNAPSHOT
Type: MANAGED
Provider: hive
Table Properties: [transient_lastDdlTime=1621826201]
Statistics: 290 bytes
Location: file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties: [serialization.format=1]
Partition Provider: Catalog
Schema: root
 |-- id: long (nullable = false)
)
 |
+-----------+---------------+--------------+----------------------------------------------------+
1 row selected (0.086 seconds)

beeline after:

0: jdbc:hive2://localhost:10000> SHOW TABLE EXTENDED LIKE '*';
+------------+---------------+--------------+----------------------------------------------------+
| namespace  |   tableName   | isTemporary  |                    information                     |
+------------+---------------+--------------+----------------------------------------------------+
| default    | test_parquet  | false        | {"Created By":"Spark 3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp/xxxx/spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition Provider":"Catalog","Provider":"hive","Schema":"root
 |-- id: long (nullable = false)
","Serde Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 bytes","Storage Properties":"[serialization.format=1]","Table Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"} |
+------------+---------------+--------------+----------------------------------------------------+
1 row selected (0.903 seconds)

@SparkQA
Copy link

SparkQA commented May 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43485/

@SparkQA
Copy link

SparkQA commented May 26, 2021

Test build #138966 has finished for PR 32563 at commit 44265e7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43499/

@SparkQA
Copy link

SparkQA commented May 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43499/

@SparkQA
Copy link

SparkQA commented May 26, 2021

Test build #138980 has finished for PR 32563 at commit ede029c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

command.executeCollect().map(_.getString(1))
// SHOW TABLE EXTENDED in Hive only output the information column.
case command @ ExecutedCommandExec(s: ShowTablesCommand) if s.isExtended =>
if (s.conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's more robust to check the data type of s.output(3)

isExtended: Boolean = false,
partitionSpec: Option[TablePartitionSpec] = None) extends LeafRunnableCommand {

private val keepLegacySchema = conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Provider: parquet
Location [not included in comparison]/{warehouse_dir}/showdb.db/show_t2
Schema: root
|-- b: string (nullable = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check other databases and see how they indicate nullable columns?

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43540/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Test build #139023 has finished for PR 32563 at commit 521396e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43540/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43542/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43542/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Test build #139027 has finished for PR 32563 at commit 320e72b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

-- key: integer (nullable = true)
-- val: integer (nullable = true)
), false
Schema: struct<key:int,val:int>), false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last concern is the loss of the nullable info. How does other databases do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems other databases do not include Schema when SHOW EXTENDED TABLES:
image
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive> SHOW TABLE EXTENDED LIKE '*';
OK
tableName:spark_32976
owner:yumwang
location:file:/tmp/spark/spark_32976
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id, string name}
partitioned:true
partitionColumns:struct partition_columns { string part}
totalNumberFiles:unknown
totalFileSize:unknown
maxFileSize:unknown
minFileSize:unknown
lastAccessTime:unknown
lastUpdateTime:unknown

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43757/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43757/

@SparkQA
Copy link

SparkQA commented Jun 2, 2021

Test build #139234 has finished for PR 32563 at commit 7b1520e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (tracksPartitionsInCatalog) map.put("Partition Provider", "Catalog")
if (partitionColumnNames.nonEmpty) map.put("Partition Columns", partitionColumns)
if (schema.nonEmpty) map.put("Schema", schema.treeString)
if (schema.nonEmpty) map.put("Schema", schema.catalogString)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about removing this line. It seems other DBs does not contains Schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no harm to keep it

@@ -419,7 +419,7 @@ case class CatalogTable(
map ++= storage.toLinkedHashMap
if (tracksPartitionsInCatalog) map.put("Partition Provider", "Catalog")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogTable.toLinkedHashMap was used for display only, not lookup. Shall we revisit all the key names here?

AttributeReference("tableName", StringType, nullable = false)(),
AttributeReference("isTemporary", BooleanType, nullable = false)(),
AttributeReference("information", StringType, nullable = false)())
AttributeReference("information", MapType(StringType, StringType, false), nullable = false)())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's better to output a struct type here, so that users can easily know what fields they can query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a general problem that most command outputs are not query-able. ESCRIBE EXTENDED tbl is also for display only.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 16, 2021
@github-actions github-actions bot closed this Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants