Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default #35204

Closed
wants to merge 1 commit into from

Conversation

Peng-Lei
Copy link
Contributor

@Peng-Lei Peng-Lei commented Jan 14, 2022

What changes were proposed in this pull request?

  1. Add quoted(identifier: TableIdentifier) to quoted the table name of V1 command(SHOW CREATE TABLE[AS SERDE]) to match V2 behavior. It just work when quoteIfNeeded
  2. Change addV2TableProperties of V1Table. Just when external == true, we will add location property.
  3. Change V1Table.Schema, re-construct the original schema from the string.
  4. Use V2 command as default for SHOW CRATE TABLE
  5. Change V2 behavior ShowTablePartitions to match V1 behavior.

Why are the changes needed?

It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.

Does this PR introduce any user-facing change?

use V2 command as default for show create table
if LEGACY_USE_V1_COMMAND == true
will use V1 command

How was this patch tested?

build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowCreateTableSuite"

@Peng-Lei
Copy link
Contributor Author

@cloud-fan @imback82 Could you take a look? Thank you very much.

@@ -58,7 +59,7 @@ private[sql] case class V1Table(v1Table: CatalogTable) extends Table {

override lazy val properties: util.Map[String, String] = addV2TableProperties(v1Table).asJava

override lazy val schema: StructType = v1Table.schema
override lazy val schema: StructType = CharVarcharUtils.getRawSchema(v1Table.schema, SQLConf.get)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when v2 command work with v1 table. when we resolve the table. we load the table by V2SessionCatalog, then create V1Table(sessionCatalog.getTableMetadata), here when getTableMetadata, we will replace char/varchar with string in schema. So V1Table(t) t is alway with string type although type is char/varchar. So re-construct the schema of V1Table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can remove #L496

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should restore the raw table schema in SHOW CREATE TABLE, not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other v2 command works v1 table. maybe encounter this problem. but I will update as you said. thank you very much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -1054,6 +1054,15 @@ trait ShowCreateTableCommandBase {
private def showViewText(metadata: CatalogTable, builder: StringBuilder): Unit = {
builder ++= metadata.viewText.mkString("AS ", "", "\n")
}

def quoted(identifier: TableIdentifier): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it simply identifier.asMultiPartIdentifier.quoted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableIdentifier could not asMultiPartIdentifier, Identifier can asMultiPartIdentifier

table.partitioning.foreach(t => transforms += t.describe())
builder ++= s"PARTITIONED BY ${transforms.mkString("(", ", ", ")")}\n"
table match {
case V1Table(t) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to special-case v1 table. We should recognize these special partition transforms (bucket and sorted_bucket) and generate the CLUSTERED BY syntax for them.

@huaxingao have we unified this part? We discussed this before and the parser should generate partition transforms for bucket spec already, instead of BucketSpec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't changed this yet. Remember that
c.partitioning ++ c.tableSpec.bucketSpec.map(_.asTransform) didn't work inside AstBuilder because sortColumnNames needed to be empty. Now it should work. I can have a follow up to fix this.

@Peng-Lei Peng-Lei force-pushed the SPARK-37878 branch 3 times, most recently from 643673e to 3b8bde2 Compare January 14, 2022 12:19
@Peng-Lei Peng-Lei force-pushed the SPARK-37878 branch 3 times, most recently from 55fd5e9 to 7b52ca9 Compare January 14, 2022 16:50

// compatible with v1
bucketSpec.map { bucket =>
if (bucket.bucketColumnNames.nonEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can turn this if into an assert, as this can never happen

val showProps = table.properties.asScala
.filterKeys(key => !CatalogV2Util.TABLE_RESERVED_PROPERTIES.contains(key)
&& !key.startsWith(TableCatalog.OPTION_PREFIX)
&& !tableOptions.contains(key))
&& !tableOptions.contains(key)
&& !key.equals(TableCatalog.PROP_EXTERNAL))
Copy link
Contributor

@cloud-fan cloud-fan Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it convered by TABLE_RESERVED_PROPERTIES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As PROP_EXTERNA is not include by TABLE_RESERVED_PROPERTIES. do you think it should be included by TABLE_RESERVED_PROPERTIES ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it should, it's indeed a reserved property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -57,7 +60,7 @@ case class ShowCreateTableExec(
}

private def showTableDataColumns(table: Table, builder: StringBuilder): Unit = {
val columns = table.schema().fields.map(_.toDDL)
val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL)
val columns = CharVarcharUtils.getRawSchema(table.schema(), conf).fields.map(_.toDDL)

We are within a physical plan, we can access conf here.

@@ -71,8 +74,9 @@ case class ShowCreateTableExec(
builder: StringBuilder,
tableOptions: Map[String, String]): Unit = {
if (tableOptions.nonEmpty) {
val props = tableOptions.toSeq.sortBy(_._1).map { case (key, value) =>
s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'"
val props = SQLConf.get.redactOptions(tableOptions).toSeq.sortBy(_._1).map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -257,7 +257,7 @@ SHOW CREATE TABLE tbl
-- !query schema
struct<createtab_stmt:string>
-- !query output
CREATE TABLE `default`.`tbl` (
CREATE TABLE default.tbl (
`a` FLOAT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent, shall we also only quote the column name if needed? This is can be done in a new PR, as it's not related to v2 command migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will do it.

@Peng-Lei
Copy link
Contributor Author

@cloud-fan Update the PR. As the testcase failed #testcase. Maybe "Add the external to TABLE_RESERVED_PROPERTIES" in separate PR is more suitable?

@cloud-fan
Copy link
Contributor

do you know why the test failed?

@Peng-Lei
Copy link
Contributor Author

Peng-Lei commented Jan 17, 2022

do you know why the test failed?

without filterNot(_ == PROP_EXTERNAL)

@cloud-fan
Copy link
Contributor

@Peng-Lei can you be more specific? What's the expectation of the tests and how do we break it?

I see that we exclude COMMENT there, as we do want to allow users to specify the COMMENT property directly. Do we have the same expectation for EXTERNAL? If yes, where is the code?

@Peng-Lei
Copy link
Contributor Author

@Peng-Lei can you be more specific? What's the expectation of the tests and how do we break it?

I see that we exclude COMMENT there, as we do want to allow users to specify the COMMENT property directly. Do we have the same expectation for EXTERNAL? If yes, where is the code?

expectation of the tests:

  • if LEGACY_PROPERTY_NON_RESERVED == false. That means we reserved table properties. so we could not use key of reserved properties for create/replace table options/tblproperties ... and set/unset tblproperties. so result is throw a ParseException.
  • if LEGACY_PROPERTY_NON_RESERVED == true. That means we can use reserved key for create/replace table options/tblproperties ... and set/unset tblproperties. but have no effect.

At first I was really confused why filter COMMENT and plan to do a new PR to fix it so that COMMENT and EXTERNAL are same with other reserved properties key. so modify the code of ASTBuilder.scala first. anthor reason I think EXTERNAL is a little special, as is should be consistent with whether the location is defined. But at now, I am wrong. Sorry. I should look more code to keep the same expectation with COMMENT. @cloud-fan

@cloud-fan
Copy link
Contributor

@Peng-Lei We should update AstBuilder.cleanTableProperties, to make EXTERNAL a truly reserved property. Let's open a new PR for it as it's a breaking change.

@Peng-Lei
Copy link
Contributor Author

@Peng-Lei We should update AstBuilder.cleanTableProperties, to make EXTERNAL a truly reserved property. Let's open a new PR for it as it's a breaking change.

The issue is created. I'm so sorry for spending a lot of your time on review this PR. Thank you very much for your patient code review.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 2c825d1 Jan 18, 2022
cloud-fan pushed a commit that referenced this pull request Jan 20, 2022
…me if needed

### What changes were proposed in this pull request?
Quote the column name just needed instead of anyway.

### Why are the changes needed?
[#comments](#35204 (comment))

### Does this PR introduce _any_ user-facing change?
Yes,It will change the result that users get the schema
eg:
```
"STRUCT<`_c0`: STRING, `_c1`: INT>"  => "STRUCT<_c0: STRING, _c1: INT>"
```
At now. for end-user. I learn about the 3 way to get schema directly
1. the function: eg
 ```
schema_of_json
schema_of_csv
```
2. table schema
    df.schema or show create table
3. call `toDDL` for StructType or StructField.

### How was this patch tested?
existed testcase.

Closes #35227 from Peng-Lei/Quote-Column.

Authored-by: PengLei <peng.8lei@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jan 21, 2022
### What changes were proposed in this pull request?

Take `external` as a reserved table property. and do not allow use `external` for end-user when `spark.sql.legacy.notReserveProperties` == `false`.

### Why are the changes needed?

[#disscuss](#35204 (comment)).
keep it consistent with other properties like `location` `owner` `provider` and so on.

### Does this PR introduce _any_ user-facing change?
Yes. end-user could not use `external` as property key when create table with tblproperties and alter table set tblproperties.

### How was this patch tested?
existed testcase.

Closes #35268 from Peng-Lei/SPARK-37950.

Authored-by: PengLei <peng.8lei@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request May 11, 2022
…n" property

### What changes were proposed in this pull request?

This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables.

This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue:
1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given.
2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed.

### Why are the changes needed?

avoid a potential regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default.

Closes #36498 from cloud-fan/regression.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request May 11, 2022
…n" property

### What changes were proposed in this pull request?

This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables.

This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue:
1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given.
2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed.

### Why are the changes needed?

avoid a potential regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default.

Closes #36498 from cloud-fan/regression.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit fa2bda5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants