-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" #24938
Conversation
Test build #106805 has finished for PR 24938 at commit
|
Test build #106809 has finished for PR 24938 at commit
|
Thanks for doing this! |
test("simple hive table as spark") { | ||
withTable("t1") { | ||
sql( | ||
s"""CREATE TABLE t1 ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: format issue? #25204 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fixed.
s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" | ||
val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) { | ||
throw new AnalysisException( | ||
s"$table is already a Spark data source table. Using `SHOW CREATE TABLE` instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Using -> Use or Please use ?
@viirya One question, if its a hive transactional table, should we error out ? |
@dilipbiswal Good question! Currently, looks like we may know if it is hive transactional table by looking into its properties ( |
thanks @viirya |
Test build #108550 has finished for PR 24938 at commit
|
retest this please |
Test build #108555 has finished for PR 24938 at commit
|
private def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = { | ||
builder ++= s"USING ${metadata.provider.get}\n" | ||
// scalastyle:off caselocale | ||
if (tableMetadata.properties.getOrElse("transactional", "false").toLowerCase.equals("true")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just move the property to unsupportedFeatures
of CatalogTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO .. its a good idea. I am not sure what happens today when we try to select from a hive transactional table ? If we add it to unsupported features, then we will get an error during select ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't try it, but seems you read it but no results return, like SPARK-15348, SPARK-16996 track.
Currently I don't see we set limits on unsupportedFeatures
when reading a table. Anyway, it is still fine to leave it as is. Just rise this as a question.
In SQL, the keyword "as" is usually used for aliasing. How about changing the syntax as
? |
I'm fine with it. WDYT? @gatorsmile |
This comment has been minimized.
This comment has been minimized.
retest this please |
Test build #111313 has finished for PR 24938 at commit
|
cc @gatorsmile |
@viirya I like the idea, but I feel a bit confusing about the key word |
Test build #113951 has finished for PR 24938 at commit
|
@@ -196,7 +196,7 @@ statement | |||
| SHOW PARTITIONS multipartIdentifier partitionSpec? #showPartitions | |||
| SHOW identifier? FUNCTIONS | |||
(LIKE? (multipartIdentifier | pattern=STRING))? #showFunctions | |||
| SHOW CREATE TABLE multipartIdentifier #showCreateTable | |||
| SHOW CREATE TABLE multipartIdentifier (AS SPARK)? #showCreateTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethinking it, let us make it more aggressive here. Instead of creating Spark native tables for the existing Hive serde tables, we can try to always show how to create Spark native tables if possible. This will further simplify the migration from Hive to Spark.
To the existing Spark users who prefer to keeping Hive serde formats, we can introduce a new option AS SERDE
which will keep the behaviors in Spark 2.4 or prior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. The new proposal makes more sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confusing and let me confirm. So you mean let SHOW CREATE TABLE work with AS SPARK
(so not to add new AS SPARK
option) by default, and only fallback to current behavior (show how to create Hive serde table) when given AS SERDE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
} | ||
} | ||
|
||
protected def showDataSourceTableDataColumns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method showDataSourceTableDataColumns
/ showDataSourceTableOptions
/ showDataSourceTableNonDataColumns
/ showCreateDataSourceTable
are only used in ShowCreateTableCommand
. Shall we move them into ShowCreateTableCommand
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yea, actually they put there because for previous AS SPARK
option, they are used in both commands. Forgot to move them. Thanks.
} | ||
|
||
protected def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = { | ||
builder ++= s"USING ${metadata.provider.get}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is metadata.provider
always defined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for datasource table. For such table, I think provider is available there.
builder ++= s" OUTPUTFORMAT: $format" | ||
} | ||
throw new AnalysisException( | ||
"Failed to execute SHOW CREATE TABLE AS SPARK against table " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the "AS SPARK" here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Thanks for finding this.
} | ||
|
||
private def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = { | ||
builder ++= s"USING ${metadata.provider.get}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it would be better to add comments or an assertion here to explain that the provider is always defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Added comments for that.
} | ||
} | ||
|
||
test("hive table with STORED AS clause in Spark DDL") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a test case with nested fields would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya Thanks so much for the work! I think we still need to update the migration guide in this PR or another follow-up PR(as Spark 3.0 code freezes today)
Overall LGTM!
@gengliangwang Thanks for review! I just added a test case and updated migration guide too. |
I am going to merge this one once the tests are passed. 👍 |
Test build #117690 has finished for PR 24938 at commit
|
Thanks! Merged to master. |
Test build #117692 has finished for PR 24938 at commit
|
Test build #117694 has finished for PR 24938 at commit
|
Test build #117697 has finished for PR 24938 at commit
|
throw new AnalysisException( | ||
"Failed to execute SHOW CREATE TABLE against table " + | ||
s"${tableMetadata.identifier}, which is created by Hive and uses the " + | ||
"following unsupported feature(s)\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we improve the exception message? Let end users know what are the new syntax for CREATE HIVE SERDE table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Let me create a follow-up.
} | ||
|
||
if (tableMetadata.tableType == VIEW) { | ||
throw new AnalysisException("Hive view isn't supported by SHOW CREATE TABLE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just create Spark View?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires more change other than simple message/doc change. Is this required to be in 3.0.0 too? If so, how much time we have?
|
||
if ("true".equalsIgnoreCase(tableMetadata.properties.getOrElse("transactional", "false"))) { | ||
throw new AnalysisException( | ||
"SHOW CREATE TABLE doesn't support transactional Hive table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here. Let end users know what are the workaround, i.e., new syntax.
@@ -328,6 +328,8 @@ license: | | |||
|
|||
- Since Spark 3.0, `SHOW TBLPROPERTIES` will cause `AnalysisException` if the table does not exist. In Spark version 2.4 and earlier, this scenario caused `NoSuchTableException`. Also, `SHOW TBLPROPERTIES` on a temporary view will cause `AnalysisException`. In Spark version 2.4 and earlier, it returned an empty result. | |||
|
|||
- Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Hive DDL
-> For generating Hive DDL
?
…REATE TABLE ### What changes were proposed in this pull request? This is a follow-up for #24938 to tweak error message and migration doc. ### Why are the changes needed? Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Closes #27505 from viirya/SPARK-27946-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
…REATE TABLE ### What changes were proposed in this pull request? This is a follow-up for #24938 to tweak error message and migration doc. ### Why are the changes needed? Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Closes #27505 from viirya/SPARK-27946-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com> (cherry picked from commit acfdb46) Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
// For a Hive serde table, we try to convert it to Spark DDL. | ||
if (tableMetadata.unsupportedFeatures.nonEmpty) { | ||
throw new AnalysisException( | ||
"Failed to execute SHOW CREATE TABLE against table " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is not useful to users as they don't know what to do to make their query work again in 3.0. Can we follow https://github.com/apache/spark/pull/24938/files#diff-a53c8b7022d13417a2ef33372464f9b5R1210 and ask users to run SHOW CREATE TABLE
with AS SERDE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It is too late now in my timezone. I will create a follow-up tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks!
) | ||
} | ||
|
||
if (tableMetadata.tableType == VIEW) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
…REATE TABLE ### What changes were proposed in this pull request? This is a follow-up for apache#24938 to tweak error message and migration doc. ### Why are the changes needed? Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Closes apache#27505 from viirya/SPARK-27946-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
What changes were proposed in this pull request?
This patch adds a DDL command
SHOW CREATE TABLE AS SERDE
. It is used to generate Hive DDL for a Hive table.For original
SHOW CREATE TABLE
, it now shows Spark DDL always. If given a Hive table, it tries to generate Spark DDL.For Hive serde to data source conversion, this uses the existing mapping inside
HiveSerDe
. If can't find a mapping there, throws an analysis exception on unsupported serde configuration.It is arguably that some Hive fileformat + row serde might be mapped to Spark data source, e.g., CSV. It is not included in this PR. To be conservative, it may not be supported.
For Hive serde properties, for now this doesn't save it to Spark DDL because it may not useful to keep Hive serde properties in Spark table.
How was this patch tested?
Added test.