[SPARK-42296][SQL] Apply spark.sql.inferTimestampNTZInDataSources.enabled on JDBC data source #39868

gengliangwang · 2023-02-03T05:01:15Z

What changes were proposed in this pull request?

Simliar to #39777 and #39812, this PR proposes to use spark.sql.inferTimestampNTZInDataSources.enabled to control the behavior of timestamp type inference on JDBC data sources.

Why are the changes needed?

Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior.

Does this PR introduce any user-facing change?

No, TimestampNTZ is not released yet.

How was this patch tested?

UTs

gengliangwang · 2023-02-03T05:01:28Z

cc @sadikovi

gengliangwang · 2023-02-03T05:02:45Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -1961,16 +1975,23 @@ class JDBCSuite extends QueryTest with SharedSparkSession {
            .option("url", urlWithUserAndPass)
            .option("dbtable", tableName)
            .save()
-
-          DateTimeTestUtils.outstandingZoneIds.foreach { zoneId =>


I find this test case requires 17 seconds on my M1 Max MBP. It can be longer on the github action tests. I suggest using a random time zone to reduce the execution time to 4 seconds.

dongjoon-hyun

+1, LGTM. Thank you, @gengliangwang and @cloud-fan .
Merged to master/3.4

…bled on JDBC data source ### What changes were proposed in this pull request? Simliar to #39777 and #39812, this PR proposes to use `spark.sql.inferTimestampNTZInDataSources.enabled` to control the behavior of timestamp type inference on JDBC data sources. ### Why are the changes needed? Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior. ### Does this PR introduce _any_ user-facing change? No, TimestampNTZ is not released yet. ### How was this patch tested? UTs Closes #39868 from gengliangwang/jdbcNTZ. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4760a8b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…urces.timestampNTZTypeInference.enabled ### What changes were proposed in this pull request? Rename TimestampNTZ data source inference configuration from `spark.sql.inferTimestampNTZInDataSources.enabled` to `spark.sql.sources.timestampNTZTypeInference.enabled` For more context on this configuration: #39777 #39812 #39868 ### Why are the changes needed? Since the configuration is for data source, we can put it under the prefix `spark.sql.sources`. The new naming is consistent with another configuration `spark.sql.sources.partitionColumnTypeInference.enabled`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #39885 from gengliangwang/renameConf. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>

…urces.timestampNTZTypeInference.enabled ### What changes were proposed in this pull request? Rename TimestampNTZ data source inference configuration from `spark.sql.inferTimestampNTZInDataSources.enabled` to `spark.sql.sources.timestampNTZTypeInference.enabled` For more context on this configuration: #39777 #39812 #39868 ### Why are the changes needed? Since the configuration is for data source, we can put it under the prefix `spark.sql.sources`. The new naming is consistent with another configuration `spark.sql.sources.partitionColumnTypeInference.enabled`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #39885 from gengliangwang/renameConf. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit c5c1927) Signed-off-by: Max Gekk <max.gekk@gmail.com>

…bled on JDBC data source ### What changes were proposed in this pull request? Simliar to apache#39777 and apache#39812, this PR proposes to use `spark.sql.inferTimestampNTZInDataSources.enabled` to control the behavior of timestamp type inference on JDBC data sources. ### Why are the changes needed? Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior. ### Does this PR introduce _any_ user-facing change? No, TimestampNTZ is not released yet. ### How was this patch tested? UTs Closes apache#39868 from gengliangwang/jdbcNTZ. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 4760a8b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…urces.timestampNTZTypeInference.enabled ### What changes were proposed in this pull request? Rename TimestampNTZ data source inference configuration from `spark.sql.inferTimestampNTZInDataSources.enabled` to `spark.sql.sources.timestampNTZTypeInference.enabled` For more context on this configuration: apache#39777 apache#39812 apache#39868 ### Why are the changes needed? Since the configuration is for data source, we can put it under the prefix `spark.sql.sources`. The new naming is consistent with another configuration `spark.sql.sources.partitionColumnTypeInference.enabled`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes apache#39885 from gengliangwang/renameConf. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit c5c1927) Signed-off-by: Max Gekk <max.gekk@gmail.com>

infer TimestampNTZ in JDBC

ecf0e3b

gengliangwang requested a review from cloud-fan February 3, 2023 05:01

github-actions bot added the SQL label Feb 3, 2023

gengliangwang commented Feb 3, 2023

View reviewed changes

cloud-fan approved these changes Feb 3, 2023

View reviewed changes

dongjoon-hyun approved these changes Feb 3, 2023

View reviewed changes

dongjoon-hyun closed this in 4760a8b Feb 3, 2023

gengliangwang mentioned this pull request Feb 4, 2023

[SPARK-42345][SQL] Rename TimestampNTZ inference conf as spark.sql.sources.timestampNTZTypeInference.enabled #39885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42296][SQL] Apply spark.sql.inferTimestampNTZInDataSources.enabled on JDBC data source #39868

[SPARK-42296][SQL] Apply spark.sql.inferTimestampNTZInDataSources.enabled on JDBC data source #39868

gengliangwang commented Feb 3, 2023

gengliangwang commented Feb 3, 2023

gengliangwang Feb 3, 2023

dongjoon-hyun Feb 3, 2023

dongjoon-hyun left a comment

[SPARK-42296][SQL] Apply spark.sql.inferTimestampNTZInDataSources.enabled on JDBC data source #39868

[SPARK-42296][SQL] Apply spark.sql.inferTimestampNTZInDataSources.enabled on JDBC data source #39868

Conversation

gengliangwang commented Feb 3, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

gengliangwang commented Feb 3, 2023

gengliangwang Feb 3, 2023

Choose a reason for hiding this comment

dongjoon-hyun Feb 3, 2023

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment