Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231

Closed
wants to merge 4 commits into from
Closed

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Sep 14, 2017

What changes were proposed in this pull request?

#18266 add a new feature to support read JDBC table use custom schema, but we must specify all the fields. For simplicity, this PR support specify partial fields.

How was this patch tested?

unit tests

@@ -1333,7 +1333,7 @@ the following case-insensitive options:
<tr>
<td><code>customSchema</code></td>
<td>
The custom schema to use for reading data from JDBC connectors. For example, "id DECIMAL(38, 0), name STRING"). The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading.
The custom schema to use for reading data from JDBC connectors. For example, <code>"id DECIMAL(38, 0), name STRING"</code>. You can also specify partial fields, others use default values. For example, <code>"id DECIMAL(38, 0)"</code>. The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others -> and the others use the default type mapping

@@ -993,7 +996,10 @@ class JDBCSuite extends SparkFunSuite
Seq(StructField("NAME", StringType, true), StructField("THEID", IntegerType, true)))
val df = sql("select * from people_view")
assert(df.schema.size === 2)
assert(df.schema === schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert it back.

Change the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309
to

fields(i) = StructField(columnName, columnType, nullable)

You also need to update some test cases due to the above change, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81758 has finished for PR 19231 at commit 9e7a8a4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

// This is resolved by names, use the custom filed dataType to replace the default dateType.
val newSchema = tableSchema.map { col =>
userSchema.find(f => nameEquality(f.name, col.name)) match {
case Some(c) => col.copy(dataType = c.dataType, metadata = Metadata.empty)
Copy link
Member Author

@wangyum wangyum Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset metadata to empty, otherwise it is not equal to the schema generated by CatalystSqlParser.parseTableSchema.
Anyway, the type is fixed and don't need it to infer column types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not changing the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309
to

fields(i) = StructField(columnName, columnType, nullable)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile
Copy link
Member

gatorsmile commented Sep 14, 2017

      val metadata = new MetadataBuilder()
        .putLong("scale", fieldScale)
      val columnType =
        dialect.getCatalystType(dataType, typeName, fieldSize, metadata).getOrElse(
          getCatalystType(dataType, fieldSize, fieldScale, isSigned))
      fields(i) = StructField(columnName, columnType, nullable, metadata.build())

->

      val metadata = new MetadataBuilder()
        .putLong("scale", fieldScale)
      val columnType =
        dialect.getCatalystType(dataType, typeName, fieldSize, metadata).getOrElse(
          getCatalystType(dataType, fieldSize, fieldScale, isSigned))
      fields(i) = StructField(columnName, columnType, nullable)

Calling dialect.getCatalystType is before we setting the metadata of StructField . Why we still need scale in the final schema? Do we use it in any other place?

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81790 has finished for PR 19231 at commit c0edad2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2017

Test build #81798 has finished for PR 19231 at commit 1ee4ea0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 15, 2017

Test build #81800 has finished for PR 19231 at commit 06095f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// This is resolved by names, use the custom filed dataType to replace the default dataType.
val newSchema = tableSchema.map { col =>
userSchema.find(f => nameEquality(f.name, col.name)) match {
case Some(c) => col.copy(dataType = c.dataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should keep the original nullability.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 4decedf Sep 15, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Feb 12, 2018
…l schema doesn't have metadata.

## What changes were proposed in this pull request?

This is a follow-up pr of apache#19231 which modified the behavior to remove metadata from JDBC table schema.
This pr adds a test to check if the schema doesn't have metadata.

## How was this patch tested?

Added a test and existing tests.

Author: Takuya UESHIN <ueshin@databricks.com>

Closes apache#20585 from ueshin/issues/SPARK-22002/fup1.
asfgit pushed a commit that referenced this pull request Feb 12, 2018
…l schema doesn't have metadata.

## What changes were proposed in this pull request?

This is a follow-up pr of #19231 which modified the behavior to remove metadata from JDBC table schema.
This pr adds a test to check if the schema doesn't have metadata.

## How was this patch tested?

Added a test and existing tests.

Author: Takuya UESHIN <ueshin@databricks.com>

Closes #20585 from ueshin/issues/SPARK-22002/fup1.

(cherry picked from commit 0c66fe4)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@wangyum wangyum deleted the SPARK-22002 branch October 8, 2019 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants