-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default #35204
Conversation
@cloud-fan @imback82 Could you take a look? Thank you very much. |
@@ -58,7 +59,7 @@ private[sql] case class V1Table(v1Table: CatalogTable) extends Table { | |||
|
|||
override lazy val properties: util.Map[String, String] = addV2TableProperties(v1Table).asJava | |||
|
|||
override lazy val schema: StructType = v1Table.schema | |||
override lazy val schema: StructType = CharVarcharUtils.getRawSchema(v1Table.schema, SQLConf.get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when v2 command work with v1 table. when we resolve the table. we load the table by V2SessionCatalog, then create V1Table(sessionCatalog.getTableMetadata)
, here when getTableMetadata
, we will replace char/varchar with string in schema. So V1Table(t)
t is alway with string type although type is char/varchar. So re-construct the schema of V1Table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can remove #L496
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should restore the raw table schema in SHOW CREATE TABLE, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other v2 command works v1 table. maybe encounter this problem. but I will update as you said. thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -1054,6 +1054,15 @@ trait ShowCreateTableCommandBase { | |||
private def showViewText(metadata: CatalogTable, builder: StringBuilder): Unit = { | |||
builder ++= metadata.viewText.mkString("AS ", "", "\n") | |||
} | |||
|
|||
def quoted(identifier: TableIdentifier): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it simply identifier.asMultiPartIdentifier.quoted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableIdentifier
could not asMultiPartIdentifier
, Identifier
can asMultiPartIdentifier
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Show resolved
Hide resolved
table.partitioning.foreach(t => transforms += t.describe()) | ||
builder ++= s"PARTITIONED BY ${transforms.mkString("(", ", ", ")")}\n" | ||
table match { | ||
case V1Table(t) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to special-case v1 table. We should recognize these special partition transforms (bucket
and sorted_bucket
) and generate the CLUSTERED BY syntax for them.
@huaxingao have we unified this part? We discussed this before and the parser should generate partition transforms for bucket spec already, instead of BucketSpec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't changed this yet. Remember that
c.partitioning ++ c.tableSpec.bucketSpec.map(_.asTransform)
didn't work inside AstBuilder
because sortColumnNames needed to be empty. Now it should work. I can have a follow up to fix this.
643673e
to
3b8bde2
Compare
55fd5e9
to
7b52ca9
Compare
|
||
// compatible with v1 | ||
bucketSpec.map { bucket => | ||
if (bucket.bucketColumnNames.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can turn this if into an assert, as this can never happen
val showProps = table.properties.asScala | ||
.filterKeys(key => !CatalogV2Util.TABLE_RESERVED_PROPERTIES.contains(key) | ||
&& !key.startsWith(TableCatalog.OPTION_PREFIX) | ||
&& !tableOptions.contains(key)) | ||
&& !tableOptions.contains(key) | ||
&& !key.equals(TableCatalog.PROP_EXTERNAL)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it convered by TABLE_RESERVED_PROPERTIES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As PROP_EXTERNA
is not include by TABLE_RESERVED_PROPERTIES
. do you think it should be included by TABLE_RESERVED_PROPERTIES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea it should, it's indeed a reserved property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -57,7 +60,7 @@ case class ShowCreateTableExec( | |||
} | |||
|
|||
private def showTableDataColumns(table: Table, builder: StringBuilder): Unit = { | |||
val columns = table.schema().fields.map(_.toDDL) | |||
val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val columns = CharVarcharUtils.getRawSchema(table.schema(), SQLConf.get).fields.map(_.toDDL) | |
val columns = CharVarcharUtils.getRawSchema(table.schema(), conf).fields.map(_.toDDL) |
We are within a physical plan, we can access conf
here.
@@ -71,8 +74,9 @@ case class ShowCreateTableExec( | |||
builder: StringBuilder, | |||
tableOptions: Map[String, String]): Unit = { | |||
if (tableOptions.nonEmpty) { | |||
val props = tableOptions.toSeq.sortBy(_._1).map { case (key, value) => | |||
s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" | |||
val props = SQLConf.get.redactOptions(tableOptions).toSeq.sortBy(_._1).map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -257,7 +257,7 @@ SHOW CREATE TABLE tbl | |||
-- !query schema | |||
struct<createtab_stmt:string> | |||
-- !query output | |||
CREATE TABLE `default`.`tbl` ( | |||
CREATE TABLE default.tbl ( | |||
`a` FLOAT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent, shall we also only quote the column name if needed? This is can be done in a new PR, as it's not related to v2 command migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will do it.
@cloud-fan Update the PR. As the testcase failed #testcase. Maybe "Add the external to TABLE_RESERVED_PROPERTIES" in separate PR is more suitable? |
do you know why the test failed? |
without |
@Peng-Lei can you be more specific? What's the expectation of the tests and how do we break it? I see that we exclude |
expectation of the tests:
At first I was really confused why filter |
@Peng-Lei We should update |
thanks, merging to master! |
…me if needed ### What changes were proposed in this pull request? Quote the column name just needed instead of anyway. ### Why are the changes needed? [#comments](#35204 (comment)) ### Does this PR introduce _any_ user-facing change? Yes,It will change the result that users get the schema eg: ``` "STRUCT<`_c0`: STRING, `_c1`: INT>" => "STRUCT<_c0: STRING, _c1: INT>" ``` At now. for end-user. I learn about the 3 way to get schema directly 1. the function: eg ``` schema_of_json schema_of_csv ``` 2. table schema df.schema or show create table 3. call `toDDL` for StructType or StructField. ### How was this patch tested? existed testcase. Closes #35227 from Peng-Lei/Quote-Column. Authored-by: PengLei <peng.8lei@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Take `external` as a reserved table property. and do not allow use `external` for end-user when `spark.sql.legacy.notReserveProperties` == `false`. ### Why are the changes needed? [#disscuss](#35204 (comment)). keep it consistent with other properties like `location` `owner` `provider` and so on. ### Does this PR introduce _any_ user-facing change? Yes. end-user could not use `external` as property key when create table with tblproperties and alter table set tblproperties. ### How was this patch tested? existed testcase. Closes #35268 from Peng-Lei/SPARK-37950. Authored-by: PengLei <peng.8lei@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…n" property ### What changes were proposed in this pull request? This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables. This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue: 1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given. 2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed. ### Why are the changes needed? avoid a potential regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default. Closes #36498 from cloud-fan/regression. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…n" property ### What changes were proposed in this pull request? This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables. This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue: 1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given. 2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed. ### Why are the changes needed? avoid a potential regression ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default. Closes #36498 from cloud-fan/regression. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit fa2bda5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
quoted(identifier: TableIdentifier)
to quoted the table name of V1 command(SHOW CREATE TABLE[AS SERDE]) to match V2 behavior. It just work whenquoteIfNeeded
addV2TableProperties
ofV1Table
. Just whenexternal == true
, we will addlocation
property.V1Table
.Schema
, re-construct the original schema from the string.SHOW CRATE TABLE
ShowTablePartitions
to match V1 behavior.Why are the changes needed?
It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.
Does this PR introduce any user-facing change?
use V2 command as default for
show create table
if LEGACY_USE_V1_COMMAND == true
will use V1 command
How was this patch tested?
build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowCreateTableSuite"