[SPARK-19484][SQL]continue work to create hive table with an empty schema #16828

windpiger · 2017-02-07T06:11:39Z

What changes were proposed in this pull request?

after SPARK-19279, we could not create a Hive table with an empty schema,
we should tighten up the condition when create a hive table in
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L835

That is if a CatalogTable t has an empty schema, and (there is no spark.sql.schema.numParts or its value is 0), we should not add a default col schema, if we did, a table with an empty schema will be created, that is not we expected.

Additional reason to do this PR is that, when I do the optimize duplicate functions in MetaStoreRelation in #16787, there is a function to merge toHiveTable between HiveClientImpl and HiveUtils

https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L818
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L494

the problem when I do this merge is that HiveClientImpl's toHiveTable will add a default col schema for an empty schema table, this will affect the inferSchema result to check if user want to create a table with an empty schema.
https://github.com/apache/spark/pull/16636/files#diff-842e3447fc453de26c706db1cac8f2c4R586
https://github.com/apache/spark/pull/16636/files#diff-c4ed9859978dd6ac271b6a40ee945e4bR95

How was this patch tested?

N/A

…hema

windpiger · 2017-02-07T06:15:39Z

cc @gatorsmile @cloud-fan

SparkQA · 2017-02-07T07:52:14Z

Test build #72488 has finished for PR 16828 at commit 40f2896.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-07T08:33:38Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

@@ -832,7 +832,8 @@ private[hive] class HiveClientImpl(
    val (partCols, schema) = table.schema.map(toHiveColumn).partition { c =>
      table.partitionColumnNames.contains(c.getName)
    }
-    if (schema.isEmpty) {
+    if (schema.isEmpty&& table.properties.getOrElse(
+      HiveExternalCatalog.DATASOURCE_SCHEMA_NUMPARTS, "0").toInt != 0) {


why we need this condition? I think we store schema in table properties for all kinds of tables.

cloud-fan · 2017-02-07T08:37:23Z

...src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogBackwardCompatibilitySuite.scala

@@ -187,15 +187,6 @@ class HiveExternalCatalogBackwardCompatibilitySuite extends QueryTest
      "spark.sql.sources.schema.numParts" -> "1",
      "spark.sql.sources.schema.part.0" -> simpleSchemaJson))

-  lazy val dataSourceTableWithoutSchema = CatalogTable(


This is still a valid case we need to support

SparkQA · 2017-02-07T11:23:32Z

Test build #72495 has finished for PR 16828 at commit 56a01e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-07T13:41:07Z

Test build #72502 has finished for PR 16828 at commit 02f3147.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-07T14:39:53Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala

@@ -1328,7 +1330,7 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv
          storage = CatalogStorageFormat.empty.copy(
            properties = Map("path" -> path.getAbsolutePath)
          ),
-          schema = new StructType(),
+          schema = simpleSchema,


we were testing empty schema here.

cloud-fan · 2017-02-07T14:41:29Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

+    // spark table. the SPARK_TEST_OLD_SOURCE_TABLE_CREATE property is used to resolve this.
+    if (schema.isEmpty && (table.properties.getOrElse(
+      HiveExternalCatalog.DATASOURCE_SCHEMA_NUMPARTS, "0").toInt != 0) || table.properties
+      .getOrElse(HiveExternalCatalog.SPARK_TEST_OLD_SOURCE_TABLE_CREATE, "false").toBoolean) {


when will we hit this branch? only when we create empty schema table in test?

if we create a datasource table, the schema is empty while the table properties contains the DATASOURCE_SCHEMA_NUMPARTS will also hit this branch

but table properties should always contain DATASOURCE_SCHEMA_NUMPARTS, doesn't it?

if we inferSchema here
https://github.com/apache/spark/pull/16636/files#diff-c4ed9859978dd6ac271b6a40ee945e4bR95

this catalog does not have a DATASOURCE_SCHEMA_NUMPARTS in properties

so we can just check DDLUtils.isHiveTable?

yes, Iet me modify this , thanks!

SparkQA · 2017-02-07T17:42:18Z

Test build #72517 has finished for PR 16828 at commit c4e54c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-07T17:46:20Z

Test build #72519 has finished for PR 16828 at commit 8fba58d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-08T02:57:57Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

@@ -832,7 +833,10 @@ private[hive] class HiveClientImpl(
    val (partCols, schema) = table.schema.map(toHiveColumn).partition { c =>
      table.partitionColumnNames.contains(c.getName)
    }
-    if (schema.isEmpty) {
+
+    // after SPARK-19279, it is not allowed to create a hive table with an empty schema,


Uh... I found a bug in my original code for inferSchema. Let me fix it now. The schema could be not empty when it is a partitioned table.

remove || table.schema.nonEmpty this condition?

We just need to check whether the dataSchema is not empty, right?

if we have partition schema ,the condition will be hit || table.schema.nonEmpty and return table directly, maybe the hive table dataschema can be determined by Hive serde?

See the PR: #16848

ok~ thanks~

gatorsmile · 2017-02-08T04:27:44Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

+
+    // after SPARK-19279, it is not allowed to create a hive table with an empty schema,
+    // so here we should not add a default col schema
+    if (schema.isEmpty && !DDLUtils.isHiveTable(table)) {


change !DDLUtils.isHiveTable(table) to DDLUtils.isDatasourceTable(table)?

ok~ thanks~

SparkQA · 2017-02-08T04:38:49Z

Test build #72551 has finished for PR 16828 at commit cf2c382.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-02-08T04:43:15Z

looks like this PR is not so self-contained, can you embed it to your PR about unifying toHiveTable?

windpiger · 2017-02-08T05:03:39Z

I think this is also an improvement to avoid some later potential problems， can we keep it?

gatorsmile · 2017-02-08T05:12:34Z

The later potential problems? If we do not unify toHiveTable, are you able to write a test case to hit this issue?

windpiger · 2017-02-08T05:15:25Z

ok, I will close it. Thanks! @cloud-fan @gatorsmile

SparkQA · 2017-02-08T06:04:06Z

Test build #72557 has finished for PR 16828 at commit cc5141a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

[SPARK-19484][SQL]continue work to create hive table with an empty sc…

40f2896

…hema

cloud-fan reviewed Feb 7, 2017

View reviewed changes

add a new property to indicate to create an old spark source table

56a01e7

fix a code style

02f3147

cloud-fan reviewed Feb 7, 2017

View reviewed changes

windpiger added 3 commits February 7, 2017 23:59

modify test case

c4e54c1

fix code style

58d1b1f

fix code style

8fba58d

just for hive serde table to check empty schema

cf2c382

gatorsmile reviewed Feb 8, 2017

View reviewed changes

change isHiveTable to isDataSourceTable

cc5141a

windpiger closed this Feb 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19484][SQL]continue work to create hive table with an empty schema #16828

[SPARK-19484][SQL]continue work to create hive table with an empty schema #16828

windpiger commented Feb 7, 2017 •

edited

windpiger commented Feb 7, 2017

SparkQA commented Feb 7, 2017

cloud-fan Feb 7, 2017

cloud-fan Feb 7, 2017

SparkQA commented Feb 7, 2017

SparkQA commented Feb 7, 2017

cloud-fan Feb 7, 2017 •

edited

cloud-fan Feb 7, 2017

windpiger Feb 7, 2017 •

edited

cloud-fan Feb 7, 2017

windpiger Feb 8, 2017

cloud-fan Feb 8, 2017

windpiger Feb 8, 2017

SparkQA commented Feb 7, 2017

SparkQA commented Feb 7, 2017

gatorsmile Feb 8, 2017

windpiger Feb 8, 2017

gatorsmile Feb 8, 2017

windpiger Feb 8, 2017 •

edited

gatorsmile Feb 8, 2017

windpiger Feb 8, 2017

gatorsmile Feb 8, 2017

windpiger Feb 8, 2017

SparkQA commented Feb 8, 2017

cloud-fan commented Feb 8, 2017

windpiger commented Feb 8, 2017

gatorsmile commented Feb 8, 2017

windpiger commented Feb 8, 2017

SparkQA commented Feb 8, 2017

[SPARK-19484][SQL]continue work to create hive table with an empty schema #16828

[SPARK-19484][SQL]continue work to create hive table with an empty schema #16828

Conversation

windpiger commented Feb 7, 2017 • edited

What changes were proposed in this pull request?

How was this patch tested?

windpiger commented Feb 7, 2017

SparkQA commented Feb 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 7, 2017

SparkQA commented Feb 7, 2017

cloud-fan Feb 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

windpiger Feb 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 7, 2017

SparkQA commented Feb 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

windpiger Feb 8, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 8, 2017

cloud-fan commented Feb 8, 2017

windpiger commented Feb 8, 2017

gatorsmile commented Feb 8, 2017

windpiger commented Feb 8, 2017

SparkQA commented Feb 8, 2017

windpiger commented Feb 7, 2017 •

edited

cloud-fan Feb 7, 2017 •

edited

windpiger Feb 7, 2017 •

edited

windpiger Feb 8, 2017 •

edited