Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" #24938

Closed
wants to merge 21 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 23, 2019

What changes were proposed in this pull request?

This patch adds a DDL command SHOW CREATE TABLE AS SERDE. It is used to generate Hive DDL for a Hive table.

For original SHOW CREATE TABLE, it now shows Spark DDL always. If given a Hive table, it tries to generate Spark DDL.

For Hive serde to data source conversion, this uses the existing mapping inside HiveSerDe. If can't find a mapping there, throws an analysis exception on unsupported serde configuration.

It is arguably that some Hive fileformat + row serde might be mapped to Spark data source, e.g., CSV. It is not included in this PR. To be conservative, it may not be supported.

For Hive serde properties, for now this doesn't save it to Spark DDL because it may not useful to keep Hive serde properties in Spark table.

How was this patch tested?

Added test.

@SparkQA
Copy link

SparkQA commented Jun 23, 2019

Test build #106805 has finished for PR 24938 at commit e86c8ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait ShowCreateTableCommandBase
  • case class ShowCreateTableCommand(table: TableIdentifier)
  • case class ShowCreateTableAsSparkCommand(table: TableIdentifier)

@SparkQA
Copy link

SparkQA commented Jun 23, 2019

Test build #106809 has finished for PR 24938 at commit 7795355.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait ShowCreateTableCommandBase
  • case class ShowCreateTableCommand(table: TableIdentifier)
  • case class ShowCreateTableAsSparkCommand(table: TableIdentifier)

@gatorsmile
Copy link
Member

Thanks for doing this!

cc @cloud-fan @gengliangwang

@gatorsmile
Copy link
Member

cc @cloud-fan @gengliangwang

test("simple hive table as spark") {
withTable("t1") {
sql(
s"""CREATE TABLE t1 (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format issue? #25204 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed.

s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'"
val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) {
throw new AnalysisException(
s"$table is already a Spark data source table. Using `SHOW CREATE TABLE` instead.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Using -> Use or Please use ?

@dilipbiswal
Copy link
Contributor

@viirya One question, if its a hive transactional table, should we error out ?

@viirya
Copy link
Member Author

viirya commented Aug 2, 2019

@dilipbiswal Good question! Currently, looks like we may know if it is hive transactional table by looking into its properties (transactional). No idea why it's not in unsupportedFeatures. Made it throw exception when finding it is a transactional table.

@dilipbiswal
Copy link
Contributor

thanks @viirya

@SparkQA
Copy link

SparkQA commented Aug 2, 2019

Test build #108550 has finished for PR 24938 at commit b9dacc5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Aug 2, 2019

Test build #108555 has finished for PR 24938 at commit b9dacc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = {
builder ++= s"USING ${metadata.provider.get}\n"
// scalastyle:off caselocale
if (tableMetadata.properties.getOrElse("transactional", "false").toLowerCase.equals("true")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just move the property to unsupportedFeatures of CatalogTable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO .. its a good idea. I am not sure what happens today when we try to select from a hive transactional table ? If we add it to unsupported features, then we will get an error during select ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't try it, but seems you read it but no results return, like SPARK-15348, SPARK-16996 track.

Currently I don't see we set limits on unsupportedFeatures when reading a table. Anyway, it is still fine to leave it as is. Just rise this as a question.

@gengliangwang
Copy link
Member

In SQL, the keyword "as" is usually used for aliasing. How about changing the syntax as

show create table [table_name] using spark

?

@viirya
Copy link
Member Author

viirya commented Aug 7, 2019

In SQL, the keyword "as" is usually used for aliasing. How about changing the syntax as
show create table [table_name] using spark

I'm fine with it. WDYT? @gatorsmile

@SparkQA

This comment has been minimized.

@viirya
Copy link
Member Author

viirya commented Sep 24, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 25, 2019

Test build #111313 has finished for PR 24938 at commit 021e48b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

cc @gatorsmile

@viirya
Copy link
Member Author

viirya commented Nov 17, 2019

ping @gengliangwang @gatorsmile

@gengliangwang
Copy link
Member

@viirya I like the idea, but I feel a bit confusing about the key word as.
cc @gatorsmile again

@SparkQA
Copy link

SparkQA commented Nov 17, 2019

Test build #113951 has finished for PR 24938 at commit a909790.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ShowCreateTableStatement(

@@ -196,7 +196,7 @@ statement
| SHOW PARTITIONS multipartIdentifier partitionSpec? #showPartitions
| SHOW identifier? FUNCTIONS
(LIKE? (multipartIdentifier | pattern=STRING))? #showFunctions
| SHOW CREATE TABLE multipartIdentifier #showCreateTable
| SHOW CREATE TABLE multipartIdentifier (AS SPARK)? #showCreateTable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethinking it, let us make it more aggressive here. Instead of creating Spark native tables for the existing Hive serde tables, we can try to always show how to create Spark native tables if possible. This will further simplify the migration from Hive to Spark.

To the existing Spark users who prefer to keeping Hive serde formats, we can introduce a new option AS SERDE which will keep the behaviors in Spark 2.4 or prior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The new proposal makes more sense!

Copy link
Member Author

@viirya viirya Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing and let me confirm. So you mean let SHOW CREATE TABLE work with AS SPARK (so not to add new AS SPARK option) by default, and only fallback to current behavior (show how to create Hive serde table) when given AS SERDE?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

}
}

protected def showDataSourceTableDataColumns(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method showDataSourceTableDataColumns / showDataSourceTableOptions/ showDataSourceTableNonDataColumns / showCreateDataSourceTable are only used in ShowCreateTableCommand. Shall we move them into ShowCreateTableCommand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yea, actually they put there because for previous AS SPARK option, they are used in both commands. Forgot to move them. Thanks.

}

protected def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = {
builder ++= s"USING ${metadata.provider.get}\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is metadata.provider always defined here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for datasource table. For such table, I think provider is available there.

builder ++= s" OUTPUTFORMAT: $format"
}
throw new AnalysisException(
"Failed to execute SHOW CREATE TABLE AS SPARK against table " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the "AS SPARK" here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Thanks for finding this.

}

private def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = {
builder ++= s"USING ${metadata.provider.get}\n"
Copy link
Member

@gengliangwang gengliangwang Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be better to add comments or an assertion here to explain that the provider is always defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Added comments for that.

}
}

test("hive table with STORED AS clause in Spark DDL") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a test case with nested fields would be great.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Thanks so much for the work! I think we still need to update the migration guide in this PR or another follow-up PR(as Spark 3.0 code freezes today)
Overall LGTM!

@viirya
Copy link
Member Author

viirya commented Jan 31, 2020

@gengliangwang Thanks for review! I just added a test case and updated migration guide too.

@gengliangwang
Copy link
Member

I am going to merge this one once the tests are passed. 👍

@SparkQA
Copy link

SparkQA commented Feb 1, 2020

Test build #117690 has finished for PR 24938 at commit 4311955.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile closed this in 8eecc20 Feb 1, 2020
@gatorsmile
Copy link
Member

Thanks! Merged to master.

@SparkQA
Copy link

SparkQA commented Feb 1, 2020

Test build #117692 has finished for PR 24938 at commit f886f04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2020

Test build #117694 has finished for PR 24938 at commit 9882932.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2020

Test build #117697 has finished for PR 24938 at commit 27c76b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

throw new AnalysisException(
"Failed to execute SHOW CREATE TABLE against table " +
s"${tableMetadata.identifier}, which is created by Hive and uses the " +
"following unsupported feature(s)\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve the exception message? Let end users know what are the new syntax for CREATE HIVE SERDE table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Let me create a follow-up.

}

if (tableMetadata.tableType == VIEW) {
throw new AnalysisException("Hive view isn't supported by SHOW CREATE TABLE")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just create Spark View?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires more change other than simple message/doc change. Is this required to be in 3.0.0 too? If so, how much time we have?


if ("true".equalsIgnoreCase(tableMetadata.properties.getOrElse("transactional", "false"))) {
throw new AnalysisException(
"SHOW CREATE TABLE doesn't support transactional Hive table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here. Let end users know what are the workaround, i.e., new syntax.

@@ -328,6 +328,8 @@ license: |

- Since Spark 3.0, `SHOW TBLPROPERTIES` will cause `AnalysisException` if the table does not exist. In Spark version 2.4 and earlier, this scenario caused `NoSuchTableException`. Also, `SHOW TBLPROPERTIES` on a temporary view will cause `AnalysisException`. In Spark version 2.4 and earlier, it returned an empty result.

- Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Hive DDL -> For generating Hive DDL ?

viirya added a commit that referenced this pull request Feb 10, 2020
…REATE TABLE

### What changes were proposed in this pull request?

This is a follow-up for #24938 to tweak error message and migration doc.

### Why are the changes needed?

Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing unit tests.

Closes #27505 from viirya/SPARK-27946-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
viirya added a commit that referenced this pull request Feb 10, 2020
…REATE TABLE

### What changes were proposed in this pull request?

This is a follow-up for #24938 to tweak error message and migration doc.

### Why are the changes needed?

Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing unit tests.

Closes #27505 from viirya/SPARK-27946-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
(cherry picked from commit acfdb46)
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
// For a Hive serde table, we try to convert it to Spark DDL.
if (tableMetadata.unsupportedFeatures.nonEmpty) {
throw new AnalysisException(
"Failed to execute SHOW CREATE TABLE against table " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is not useful to users as they don't know what to do to make their query work again in 3.0. Can we follow https://github.com/apache/spark/pull/24938/files#diff-a53c8b7022d13417a2ef33372464f9b5R1210 and ask users to run SHOW CREATE TABLE with AS SERDE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. It is too late now in my timezone. I will create a follow-up tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks!

)
}

if (tableMetadata.tableType == VIEW) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…REATE TABLE

### What changes were proposed in this pull request?

This is a follow-up for apache#24938 to tweak error message and migration doc.

### Why are the changes needed?

Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing unit tests.

Closes apache#27505 from viirya/SPARK-27946-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>
@viirya viirya deleted the SPARK-27946 branch December 27, 2023 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants