Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42243][SQL] Use spark.sql.inferTimestampNTZInDataSources.enabled to infer timestamp type on partition columns #39812

Closed
wants to merge 1 commit into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Use spark.sql.inferTimestampNTZInDataSources.enabled to infer timestamp type on partition columns, instead of spark.sql.timestampType.

Why are the changes needed?

Similar to #39777:

  • make the schema inference in data sources consistent
  • use a light-weight configuration for data source schema inference.

Does this PR introduce any user-facing change?

No, TimestampNTZ is not released yet.

How was this patch tested?

UT

"columns, the inference results will still be of TimestampLTZ types.")
"backward compatibility. As a result, for JSON/CSV files and partition directories " +
"written with TimestampNTZ columns, the inference results will still be of TimestampLTZ " +
"types.")
.version("3.4.0")
.booleanConf
.createWithDefault(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean users can't do NTZ roundtrip (write and read) in 3.4 by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Take partition directory naming formats as an example, the outputs from Timestamp NTZ and LTZ are exactly the same.

@gengliangwang
Copy link
Member Author

Merging to master/3.4. cc @xinrong-meng

gengliangwang added a commit that referenced this pull request Jan 31, 2023
…led` to infer timestamp type on partition columns

### What changes were proposed in this pull request?

Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp type on partition columns, instead of `spark.sql.timestampType`.

### Why are the changes needed?

Similar to #39777:
* make the schema inference in data sources consistent
* use a light-weight configuration for data source schema inference.

### Does this PR introduce _any_ user-facing change?

No, TimestampNTZ is not released yet.

### How was this patch tested?

UT

Closes #39812 from gengliangwang/partitionNTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit b509ad1)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Feb 3, 2023
…bled on JDBC data source

### What changes were proposed in this pull request?

Simliar to #39777 and #39812, this PR proposes to use `spark.sql.inferTimestampNTZInDataSources.enabled` to control the behavior of timestamp type inference on JDBC data sources.

### Why are the changes needed?

Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior.
### Does this PR introduce _any_ user-facing change?

No, TimestampNTZ is not released yet.

### How was this patch tested?

UTs

Closes #39868 from gengliangwang/jdbcNTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Feb 3, 2023
…bled on JDBC data source

### What changes were proposed in this pull request?

Simliar to #39777 and #39812, this PR proposes to use `spark.sql.inferTimestampNTZInDataSources.enabled` to control the behavior of timestamp type inference on JDBC data sources.

### Why are the changes needed?

Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior.
### Does this PR introduce _any_ user-facing change?

No, TimestampNTZ is not released yet.

### How was this patch tested?

UTs

Closes #39868 from gengliangwang/jdbcNTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4760a8b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Feb 4, 2023
…bled on JDBC data source

### What changes were proposed in this pull request?

Simliar to apache/spark#39777 and apache/spark#39812, this PR proposes to use `spark.sql.inferTimestampNTZInDataSources.enabled` to control the behavior of timestamp type inference on JDBC data sources.

### Why are the changes needed?

Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior.
### Does this PR introduce _any_ user-facing change?

No, TimestampNTZ is not released yet.

### How was this patch tested?

UTs

Closes #39868 from gengliangwang/jdbcNTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
MaxGekk pushed a commit that referenced this pull request Feb 5, 2023
…urces.timestampNTZTypeInference.enabled

### What changes were proposed in this pull request?

Rename TimestampNTZ data source inference configuration from `spark.sql.inferTimestampNTZInDataSources.enabled` to `spark.sql.sources.timestampNTZTypeInference.enabled`
For more context on this configuration:
#39777
#39812
#39868
### Why are the changes needed?

Since the configuration is for data source, we can put it under the prefix `spark.sql.sources`. The new naming is consistent with another configuration `spark.sql.sources.partitionColumnTypeInference.enabled`.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #39885 from gengliangwang/renameConf.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk pushed a commit that referenced this pull request Feb 5, 2023
…urces.timestampNTZTypeInference.enabled

### What changes were proposed in this pull request?

Rename TimestampNTZ data source inference configuration from `spark.sql.inferTimestampNTZInDataSources.enabled` to `spark.sql.sources.timestampNTZTypeInference.enabled`
For more context on this configuration:
#39777
#39812
#39868
### Why are the changes needed?

Since the configuration is for data source, we can put it under the prefix `spark.sql.sources`. The new naming is consistent with another configuration `spark.sql.sources.partitionColumnTypeInference.enabled`.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #39885 from gengliangwang/renameConf.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit c5c1927)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…led` to infer timestamp type on partition columns

### What changes were proposed in this pull request?

Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp type on partition columns, instead of `spark.sql.timestampType`.

### Why are the changes needed?

Similar to apache#39777:
* make the schema inference in data sources consistent
* use a light-weight configuration for data source schema inference.

### Does this PR introduce _any_ user-facing change?

No, TimestampNTZ is not released yet.

### How was this patch tested?

UT

Closes apache#39812 from gengliangwang/partitionNTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit b509ad1)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…bled on JDBC data source

### What changes were proposed in this pull request?

Simliar to apache#39777 and apache#39812, this PR proposes to use `spark.sql.inferTimestampNTZInDataSources.enabled` to control the behavior of timestamp type inference on JDBC data sources.

### Why are the changes needed?

Unify the TimestampNTZ type inference behavior over data sources. In JDBC/JSON/CSV data sources, a column can be Timestamp type or TimestampNTZ type. We need a lightweight configuration to control the behavior.
### Does this PR introduce _any_ user-facing change?

No, TimestampNTZ is not released yet.

### How was this patch tested?

UTs

Closes apache#39868 from gengliangwang/jdbcNTZ.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4760a8b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…urces.timestampNTZTypeInference.enabled

### What changes were proposed in this pull request?

Rename TimestampNTZ data source inference configuration from `spark.sql.inferTimestampNTZInDataSources.enabled` to `spark.sql.sources.timestampNTZTypeInference.enabled`
For more context on this configuration:
apache#39777
apache#39812
apache#39868
### Why are the changes needed?

Since the configuration is for data source, we can put it under the prefix `spark.sql.sources`. The new naming is consistent with another configuration `spark.sql.sources.partitionColumnTypeInference.enabled`.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#39885 from gengliangwang/renameConf.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit c5c1927)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants