[SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" #24938

viirya · 2019-06-23T08:35:36Z

What changes were proposed in this pull request?

This patch adds a DDL command SHOW CREATE TABLE AS SERDE. It is used to generate Hive DDL for a Hive table.

For original SHOW CREATE TABLE, it now shows Spark DDL always. If given a Hive table, it tries to generate Spark DDL.

For Hive serde to data source conversion, this uses the existing mapping inside HiveSerDe. If can't find a mapping there, throws an analysis exception on unsupported serde configuration.

It is arguably that some Hive fileformat + row serde might be mapped to Spark data source, e.g., CSV. It is not included in this PR. To be conservative, it may not be supported.

For Hive serde properties, for now this doesn't save it to Spark DDL because it may not useful to keep Hive serde properties in Spark table.

How was this patch tested?

Added test.

SparkQA · 2019-06-23T11:48:35Z

Test build #106805 has finished for PR 24938 at commit e86c8ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait ShowCreateTableCommandBase
case class ShowCreateTableCommand(table: TableIdentifier)
case class ShowCreateTableAsSparkCommand(table: TableIdentifier)

SparkQA · 2019-06-23T17:18:48Z

Test build #106809 has finished for PR 24938 at commit 7795355.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait ShowCreateTableCommandBase
case class ShowCreateTableCommand(table: TableIdentifier)
case class ShowCreateTableAsSparkCommand(table: TableIdentifier)

gatorsmile · 2019-07-10T00:46:45Z

Thanks for doing this!

cc @cloud-fan @gengliangwang

gatorsmile · 2019-08-01T05:30:51Z

cc @cloud-fan @gengliangwang

maropu · 2019-08-01T05:46:58Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala

+  test("simple hive table as spark") {
+    withTable("t1") {
+      sql(
+        s"""CREATE TABLE t1 (


nit: format issue? #25204 (comment)

yes, fixed.

dilipbiswal · 2019-08-01T07:11:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

-        s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'"
+    val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) {
+      throw new AnalysisException(
+        s"$table is already a Spark data source table. Using `SHOW CREATE TABLE` instead.")


Nit: Using -> Use or Please use ?

dilipbiswal · 2019-08-01T07:23:01Z

@viirya One question, if its a hive transactional table, should we error out ?

viirya · 2019-08-02T06:37:37Z

@dilipbiswal Good question! Currently, looks like we may know if it is hive transactional table by looking into its properties (transactional). No idea why it's not in unsupportedFeatures. Made it throw exception when finding it is a transactional table.

dilipbiswal · 2019-08-02T06:40:02Z

thanks @viirya

SparkQA · 2019-08-02T07:05:02Z

Test build #108550 has finished for PR 24938 at commit b9dacc5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2019-08-02T07:09:02Z

retest this please

SparkQA · 2019-08-02T10:54:08Z

Test build #108555 has finished for PR 24938 at commit b9dacc5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-08-02T19:45:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

-  private def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = {
-    builder ++= s"USING ${metadata.provider.get}\n"
+      // scalastyle:off caselocale
+      if (tableMetadata.properties.getOrElse("transactional", "false").toLowerCase.equals("true")) {


Should we just move the property to unsupportedFeatures of CatalogTable?

IMHO .. its a good idea. I am not sure what happens today when we try to select from a hive transactional table ? If we add it to unsupported features, then we will get an error during select ?

I don't try it, but seems you read it but no results return, like SPARK-15348, SPARK-16996 track.

Currently I don't see we set limits on unsupportedFeatures when reading a table. Anyway, it is still fine to leave it as is. Just rise this as a question.

gengliangwang · 2019-08-05T06:07:41Z

In SQL, the keyword "as" is usually used for aliasing. How about changing the syntax as

show create table [table_name] using spark

?

viirya · 2019-08-07T18:32:15Z

In SQL, the keyword "as" is usually used for aliasing. How about changing the syntax as
show create table [table_name] using spark

I'm fine with it. WDYT? @gatorsmile

viirya · 2019-09-24T23:20:35Z

retest this please

SparkQA · 2019-09-25T03:17:05Z

Test build #111313 has finished for PR 24938 at commit 021e48b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2019-09-26T12:51:39Z

cc @gatorsmile

viirya · 2019-11-17T01:04:19Z

ping @gengliangwang @gatorsmile

gengliangwang · 2019-11-17T01:24:50Z

@viirya I like the idea, but I feel a bit confusing about the key word as.
cc @gatorsmile again

SparkQA · 2019-11-17T04:56:06Z

Test build #113951 has finished for PR 24938 at commit a909790.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ShowCreateTableStatement(

gatorsmile · 2020-01-06T19:25:05Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -196,7 +196,7 @@ statement
    | SHOW PARTITIONS multipartIdentifier partitionSpec?               #showPartitions
    | SHOW identifier? FUNCTIONS
        (LIKE? (multipartIdentifier | pattern=STRING))?                #showFunctions
-    | SHOW CREATE TABLE multipartIdentifier                            #showCreateTable
+    | SHOW CREATE TABLE multipartIdentifier (AS SPARK)?                #showCreateTable


After rethinking it, let us make it more aggressive here. Instead of creating Spark native tables for the existing Hive serde tables, we can try to always show how to create Spark native tables if possible. This will further simplify the migration from Hive to Spark.

To the existing Spark users who prefer to keeping Hive serde formats, we can introduce a new option AS SERDE which will keep the behaviors in Spark 2.4 or prior.

+1. The new proposal makes more sense!

A bit confusing and let me confirm. So you mean let SHOW CREATE TABLE work with AS SPARK (so not to add new AS SPARK option) by default, and only fallback to current behavior (show how to create Hive serde table) when given AS SERDE?

gengliangwang · 2020-01-31T22:04:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+    }
+  }
+
+  protected def showDataSourceTableDataColumns(


The method showDataSourceTableDataColumns / showDataSourceTableOptions/ showDataSourceTableNonDataColumns / showCreateDataSourceTable are only used in ShowCreateTableCommand. Shall we move them into ShowCreateTableCommand?

Oh, yea, actually they put there because for previous AS SPARK option, they are used in both commands. Forgot to move them. Thanks.

gengliangwang · 2020-01-31T22:15:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+  }
+
+  protected def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = {
+    builder ++= s"USING ${metadata.provider.get}\n"


Is metadata.provider always defined here?

This is for datasource table. For such table, I think provider is available there.

gengliangwang · 2020-01-31T22:56:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+        builder ++= s" OUTPUTFORMAT: $format"
+      }
+      throw new AnalysisException(
+        "Failed to execute SHOW CREATE TABLE AS SPARK against table " +


We should remove the "AS SPARK" here

Oops. Thanks for finding this.

gengliangwang · 2020-01-31T22:56:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+  }
+
+  private def showDataSourceTableOptions(metadata: CatalogTable, builder: StringBuilder): Unit = {
+    builder ++= s"USING ${metadata.provider.get}\n"


Nit: it would be better to add comments or an assertion here to explain that the provider is always defined.

Ok. Added comments for that.

gengliangwang · 2020-01-31T23:18:56Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala

+    }
+  }
+
+  test("hive table with STORED AS clause in Spark DDL") {


nit: a test case with nested fields would be great.

gengliangwang

@viirya Thanks so much for the work! I think we still need to update the migration guide in this PR or another follow-up PR(as Spark 3.0 code freezes today)
Overall LGTM!

viirya · 2020-01-31T23:57:35Z

@gengliangwang Thanks for review! I just added a test case and updated migration guide too.

gengliangwang · 2020-02-01T00:19:47Z

I am going to merge this one once the tests are passed. 👍

SparkQA · 2020-02-01T03:47:17Z

Test build #117690 has finished for PR 24938 at commit 4311955.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2020-02-01T03:55:57Z

Thanks! Merged to master.

SparkQA · 2020-02-01T04:26:14Z

Test build #117692 has finished for PR 24938 at commit f886f04.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-01T04:53:47Z

Test build #117694 has finished for PR 24938 at commit 9882932.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-01T05:32:39Z

Test build #117697 has finished for PR 24938 at commit 27c76b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2020-02-09T05:30:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+          throw new AnalysisException(
+            "Failed to execute SHOW CREATE TABLE against table " +
+              s"${tableMetadata.identifier}, which is created by Hive and uses the " +
+              "following unsupported feature(s)\n" +


Can we improve the exception message? Let end users know what are the new syntax for CREATE HIVE SERDE table.

ok. Let me create a follow-up.

gatorsmile · 2020-02-09T05:30:33Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+        }
+
+        if (tableMetadata.tableType == VIEW) {
+          throw new AnalysisException("Hive view isn't supported by SHOW CREATE TABLE")


Can we just create Spark View?

This requires more change other than simple message/doc change. Is this required to be in 3.0.0 too? If so, how much time we have?

gatorsmile · 2020-02-09T05:32:23Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+
+        if ("true".equalsIgnoreCase(tableMetadata.properties.getOrElse("transactional", "false"))) {
+          throw new AnalysisException(
+            "SHOW CREATE TABLE doesn't support transactional Hive table")


The same here. Let end users know what are the workaround, i.e., new syntax.

gatorsmile · 2020-02-09T05:36:23Z

docs/sql-migration-guide.md

@@ -328,6 +328,8 @@ license: |

  - Since Spark 3.0, `SHOW TBLPROPERTIES` will cause `AnalysisException` if the table does not exist. In Spark version 2.4 and earlier, this scenario caused `NoSuchTableException`. Also, `SHOW TBLPROPERTIES` on a temporary view will cause `AnalysisException`. In Spark version 2.4 and earlier, it returned an empty result.

+  - Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead.


For Hive DDL -> For generating Hive DDL ?

…REATE TABLE ### What changes were proposed in this pull request? This is a follow-up for #24938 to tweak error message and migration doc. ### Why are the changes needed? Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Closes #27505 from viirya/SPARK-27946-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>

…REATE TABLE ### What changes were proposed in this pull request? This is a follow-up for #24938 to tweak error message and migration doc. ### Why are the changes needed? Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Closes #27505 from viirya/SPARK-27946-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com> (cherry picked from commit acfdb46) Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>

cloud-fan · 2020-03-17T07:49:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+        // For a Hive serde table, we try to convert it to Spark DDL.
+        if (tableMetadata.unsupportedFeatures.nonEmpty) {
+          throw new AnalysisException(
+            "Failed to execute SHOW CREATE TABLE against table " +


This error message is not useful to users as they don't know what to do to make their query work again in 3.0. Can we follow https://github.com/apache/spark/pull/24938/files#diff-a53c8b7022d13417a2ef33372464f9b5R1210 and ask users to run SHOW CREATE TABLE with AS SERDE？

OK. It is too late now in my timezone. I will create a follow-up tomorrow.

cool, thanks!

cloud-fan · 2020-03-17T07:49:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

+          )
+        }
+
+        if (tableMetadata.tableType == VIEW) {


…REATE TABLE ### What changes were proposed in this pull request? This is a follow-up for apache#24938 to tweak error message and migration doc. ### Why are the changes needed? Making user know workaround if SHOW CREATE TABLE doesn't work for some Hive tables. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Closes apache#27505 from viirya/SPARK-27946-followup. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>

Add ShowCreateTableAsSparkCommand.

7795355

viirya force-pushed the SPARK-27946 branch from e86c8ad to 7795355 Compare June 23, 2019 14:08

dongjoon-hyun added the SQL label Jun 24, 2019

maropu reviewed Aug 1, 2019

View reviewed changes

dilipbiswal reviewed Aug 1, 2019

View reviewed changes

Address comments.

b9dacc5

viirya commented Aug 2, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into SPARK-27946

021e48b

This comment has been minimized.

Sign in to view

viirya added 2 commits November 15, 2019 08:34

Merge remote-tracking branch 'upstream/master' into SPARK-27946

a5e3a15

Updated.

a909790

gatorsmile reviewed Jan 6, 2020

View reviewed changes

gengliangwang reviewed Jan 31, 2020

View reviewed changes

Move some methods.

4311955

gengliangwang reviewed Jan 31, 2020

View reviewed changes

Address comment.

f886f04

gengliangwang reviewed Jan 31, 2020

View reviewed changes

gengliangwang approved these changes Jan 31, 2020

View reviewed changes

Add test and migration guide.

9882932

Update migration guide.

27c76b3

gatorsmile closed this in 8eecc20 Feb 1, 2020

gatorsmile reviewed Feb 9, 2020

View reviewed changes

viirya mentioned this pull request Feb 9, 2020

[SPARK-27946][SQL][FOLLOW-UP] Change doc and error message for SHOW CREATE TABLE #27505

Closed

cloud-fan reviewed Mar 17, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

)

}

if (tableMetadata.tableType == VIEW) {

Copy link

Contributor

cloud-fan Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

viirya deleted the SPARK-27946 branch December 27, 2023 18:38

		@@ -328,6 +328,8 @@ license: \|

		- Since Spark 3.0, `SHOW TBLPROPERTIES` will cause `AnalysisException` if the table does not exist. In Spark version 2.4 and earlier, this scenario caused `NoSuchTableException`. Also, `SHOW TBLPROPERTIES` on a temporary view will cause `AnalysisException`. In Spark version 2.4 and earlier, it returned an empty result.

		- Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead.

[SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" #24938

[SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" #24938

Conversation

viirya commented Jun 23, 2019 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jun 23, 2019

SparkQA commented Jun 23, 2019

gatorsmile commented Jul 10, 2019

gatorsmile commented Aug 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dilipbiswal commented Aug 1, 2019

viirya commented Aug 2, 2019

dilipbiswal commented Aug 2, 2019

SparkQA commented Aug 2, 2019

dilipbiswal commented Aug 2, 2019

SparkQA commented Aug 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang commented Aug 5, 2019

viirya commented Aug 7, 2019

This comment has been minimized.

viirya commented Sep 24, 2019

SparkQA commented Sep 25, 2019

gengliangwang commented Sep 26, 2019

viirya commented Nov 17, 2019

gengliangwang commented Nov 17, 2019

SparkQA commented Nov 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Jan 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang Jan 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang left a comment

Choose a reason for hiding this comment

viirya commented Jan 31, 2020

gengliangwang commented Feb 1, 2020

SparkQA commented Feb 1, 2020

gatorsmile commented Feb 1, 2020

SparkQA commented Feb 1, 2020

SparkQA commented Feb 1, 2020

SparkQA commented Feb 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Jun 23, 2019 •

edited

Loading

viirya Jan 27, 2020 •

edited

Loading

gengliangwang Jan 31, 2020 •

edited

Loading