[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231

wangyum · 2017-09-14T04:39:24Z

What changes were proposed in this pull request?

#18266 add a new feature to support read JDBC table use custom schema, but we must specify all the fields. For simplicity, this PR support specify partial fields.

How was this patch tested?

unit tests

gatorsmile · 2017-09-14T05:04:56Z

docs/sql-programming-guide.md

@@ -1333,7 +1333,7 @@ the following case-insensitive options:
  <tr>
    <td><code>customSchema</code></td>
    <td>
-     The custom schema to use for reading data from JDBC connectors. For example, "id DECIMAL(38, 0), name STRING"). The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading.
+     The custom schema to use for reading data from JDBC connectors. For example, <code>"id DECIMAL(38, 0), name STRING"</code>. You can also specify partial fields, others use default values. For example, <code>"id DECIMAL(38, 0)"</code>. The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading.


others -> and the others use the default type mapping

gatorsmile · 2017-09-14T05:36:39Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -993,7 +996,10 @@ class JDBCSuite extends SparkFunSuite
        Seq(StructField("NAME", StringType, true), StructField("THEID", IntegerType, true)))
      val df = sql("select * from people_view")
      assert(df.schema.size === 2)
-      assert(df.schema === schema)


revert it back.

Change the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309
to

fields(i) = StructField(columnName, columnType, nullable)

You also need to update some test cases due to the above change, I think.

We shouldn't change it, because scale is used to infer column types: https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32

SparkQA · 2017-09-14T07:04:45Z

Test build #81758 has finished for PR 19231 at commit 9e7a8a4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2017-09-14T16:35:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+      // This is resolved by names, use the custom filed dataType to replace the default dateType.
+      val newSchema = tableSchema.map { col =>
+        userSchema.find(f => nameEquality(f.name, col.name)) match {
+          case Some(c) => col.copy(dataType = c.dataType, metadata = Metadata.empty)


Reset metadata to empty, otherwise it is not equal to the schema generated by CatalystSqlParser.parseTableSchema.
Anyway, the type is fixed and don't need it to infer column types.

Why not changing the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309
to

fields(i) = StructField(columnName, columnType, nullable)

Because scale is used to infer column types, we shouldn't remove it: https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32

gatorsmile · 2017-09-14T16:56:55Z

      val metadata = new MetadataBuilder()
        .putLong("scale", fieldScale)
      val columnType =
        dialect.getCatalystType(dataType, typeName, fieldSize, metadata).getOrElse(
          getCatalystType(dataType, fieldSize, fieldScale, isSigned))
      fields(i) = StructField(columnName, columnType, nullable, metadata.build())

->

      val metadata = new MetadataBuilder()
        .putLong("scale", fieldScale)
      val columnType =
        dialect.getCatalystType(dataType, typeName, fieldSize, metadata).getOrElse(
          getCatalystType(dataType, fieldSize, fieldScale, isSigned))
      fields(i) = StructField(columnName, columnType, nullable)

Calling dialect.getCatalystType is before we setting the metadata of StructField . Why we still need scale in the final schema? Do we use it in any other place?

SparkQA · 2017-09-14T18:16:24Z

Test build #81790 has finished for PR 19231 at commit c0edad2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-15T02:12:54Z

Test build #81798 has finished for PR 19231 at commit 1ee4ea0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-15T02:52:21Z

Test build #81800 has finished for PR 19231 at commit 06095f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-09-15T06:35:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+      // This is resolved by names, use the custom filed dataType to replace the default dataType.
+      val newSchema = tableSchema.map { col =>
+        userSchema.find(f => nameEquality(f.name, col.name)) match {
+          case Some(c) => col.copy(dataType = c.dataType)


Yes, we should keep the original nullability.

gatorsmile · 2017-09-15T06:35:22Z

LGTM

gatorsmile · 2017-09-15T06:36:10Z

Thanks! Merged to master.

…l schema doesn't have metadata. ## What changes were proposed in this pull request? This is a follow-up pr of apache#19231 which modified the behavior to remove metadata from JDBC table schema. This pr adds a test to check if the schema doesn't have metadata. ## How was this patch tested? Added a test and existing tests. Author: Takuya UESHIN <ueshin@databricks.com> Closes apache#20585 from ueshin/issues/SPARK-22002/fup1.

…l schema doesn't have metadata. ## What changes were proposed in this pull request? This is a follow-up pr of #19231 which modified the behavior to remove metadata from JDBC table schema. This pr adds a test to check if the schema doesn't have metadata. ## How was this patch tested? Added a test and existing tests. Author: Takuya UESHIN <ueshin@databricks.com> Closes #20585 from ueshin/issues/SPARK-22002/fup1. (cherry picked from commit 0c66fe4) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

Read JDBC table use custom schema support specify partial fields.

9e7a8a4

gatorsmile reviewed Sep 14, 2017

View reviewed changes

Improve test.

c0edad2

wangyum commented Sep 14, 2017

View reviewed changes

wangyum added 2 commits September 15, 2017 07:32

Remove metadata.

1ee4ea0

Update comment

06095f5

gatorsmile reviewed Sep 15, 2017

View reviewed changes

asfgit closed this in 4decedf Sep 15, 2017

ueshin mentioned this pull request Feb 12, 2018

[SPARK-22002][SQL][FOLLOWUP][TEST] Add a test to check if the original schema doesn't have metadata. #20585

Closed

wangyum deleted the SPARK-22002 branch October 8, 2019 04:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231

[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231

wangyum commented Sep 14, 2017

gatorsmile Sep 14, 2017

gatorsmile Sep 14, 2017

wangyum Sep 14, 2017

SparkQA commented Sep 14, 2017

wangyum Sep 14, 2017 •

edited

gatorsmile Sep 14, 2017

wangyum Sep 14, 2017

gatorsmile commented Sep 14, 2017 •

edited

SparkQA commented Sep 14, 2017

SparkQA commented Sep 15, 2017

SparkQA commented Sep 15, 2017

gatorsmile Sep 15, 2017

gatorsmile commented Sep 15, 2017

gatorsmile commented Sep 15, 2017

[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231

[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231

Conversation

wangyum commented Sep 14, 2017

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile Sep 14, 2017

Choose a reason for hiding this comment

gatorsmile Sep 14, 2017

Choose a reason for hiding this comment

wangyum Sep 14, 2017

Choose a reason for hiding this comment

SparkQA commented Sep 14, 2017

wangyum Sep 14, 2017 • edited

Choose a reason for hiding this comment

gatorsmile Sep 14, 2017

Choose a reason for hiding this comment

wangyum Sep 14, 2017

Choose a reason for hiding this comment

gatorsmile commented Sep 14, 2017 • edited

SparkQA commented Sep 14, 2017

SparkQA commented Sep 15, 2017

SparkQA commented Sep 15, 2017

gatorsmile Sep 15, 2017

Choose a reason for hiding this comment

gatorsmile commented Sep 15, 2017

gatorsmile commented Sep 15, 2017

wangyum Sep 14, 2017 •

edited

gatorsmile commented Sep 14, 2017 •

edited