[SPARK-23348][SQL] append data using saveAsTable should adjust the data types #20527

cloud-fan · 2018-02-07T08:54:24Z

What changes were proposed in this pull request?

For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable.

There are several ways to insert/append data: SQL API, DataFrameWriter.insertInto, DataFrameWriter.saveAsTable. The first 2 ways create InsertIntoTable plan, and the last way creates CreateTable plan. However, we only adjust input query data types for InsertIntoTable, and users may hit weird errors when appending data using saveAsTable. See the JIRA for the error case.

This PR fixes this bug by adjusting data types for CreateTable too.

How was this patch tested?

new test.

cloud-fan · 2018-02-07T08:55:11Z

cc @gatorsmile @dongjoon-hyun

SparkQA · 2018-02-07T12:04:15Z

Test build #87151 has finished for PR 20527 at commit ad19125.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan]

dongjoon-hyun · 2018-02-07T22:03:36Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

+
+      Seq("c" -> 3).toDF("i", "j").write.mode("append").saveAsTable("t")
+      checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, "c")
+        :: Row(null, "3") :: Nil)


Thank you for pining me, @cloud-fan . +1 for the patch, LGTM.

jiangxb1987 · 2018-02-08T07:49:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

@@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit
           """.stripMargin)
      }

-      castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns)
+      insert.copy(query = newQuery, partition = normalizedPartSpec)


nit: don't need to copy the newQuery if it is the same as query.

it's also ok to always copy it and the code is neater.

jiangxb1987 · 2018-02-08T07:56:45Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

@@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo
      checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
    }
  }
+
+  // TODO: This test is copied from HiveDDLSuite, unify it later.
+  test("SPARK-23348: append data to data source table with saveAsTable") {


Do we also want to cover the following case:

2) Target tables have column metadata

?

maybe we can add this when unifying the test cases?

jiangxb1987 · 2018-02-08T07:57:19Z

LGTM only some nits.

gatorsmile · 2018-02-08T08:08:32Z

LGTM

Thanks! Merged to master/2.3

…ta types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable. There are several ways to insert/append data: SQL API, `DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. However, we only adjust input query data types for `InsertIntoTable`, and users may hit weird errors when appending data using `saveAsTable`. See the JIRA for the error case. This PR fixes this bug by adjusting data types for `CreateTable` too. ## How was this patch tested? new test. Author: Wenchen Fan <wenchen@databricks.com> Closes #20527 from cloud-fan/saveAsTable. (cherry picked from commit 7f5f5fb) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…ta types ## What changes were proposed in this pull request? For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable. There are several ways to insert/append data: SQL API, `DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. However, we only adjust input query data types for `InsertIntoTable`, and users may hit weird errors when appending data using `saveAsTable`. See the JIRA for the error case. This PR fixes this bug by adjusting data types for `CreateTable` too. ## How was this patch tested? new test. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#20527 from cloud-fan/saveAsTable.

append data using saveAsTable should adjust the data types

ad19125

dongjoon-hyun reviewed Feb 7, 2018

View reviewed changes

dongjoon-hyun approved these changes Feb 7, 2018

View reviewed changes

jiangxb1987 reviewed Feb 8, 2018

View reviewed changes

asfgit closed this in 7f5f5fb Feb 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23348][SQL] append data using saveAsTable should adjust the data types #20527

[SPARK-23348][SQL] append data using saveAsTable should adjust the data types #20527

cloud-fan commented Feb 7, 2018

cloud-fan commented Feb 7, 2018

SparkQA commented Feb 7, 2018

dongjoon-hyun Feb 7, 2018

jiangxb1987 Feb 8, 2018

cloud-fan Feb 8, 2018

jiangxb1987 Feb 8, 2018 •

edited

Loading

cloud-fan Feb 8, 2018

jiangxb1987 commented Feb 8, 2018

gatorsmile commented Feb 8, 2018

[SPARK-23348][SQL] append data using saveAsTable should adjust the data types #20527

[SPARK-23348][SQL] append data using saveAsTable should adjust the data types #20527

Conversation

cloud-fan commented Feb 7, 2018

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Feb 7, 2018

SparkQA commented Feb 7, 2018

dongjoon-hyun Feb 7, 2018

Choose a reason for hiding this comment

jiangxb1987 Feb 8, 2018

Choose a reason for hiding this comment

cloud-fan Feb 8, 2018

Choose a reason for hiding this comment

jiangxb1987 Feb 8, 2018 • edited Loading

Choose a reason for hiding this comment

cloud-fan Feb 8, 2018

Choose a reason for hiding this comment

jiangxb1987 commented Feb 8, 2018

gatorsmile commented Feb 8, 2018

jiangxb1987 Feb 8, 2018 •

edited

Loading