Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35975][SQL] New configuration spark.sql.timestampType for the default timestamp type #33176

Closed
wants to merge 3 commits into from

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Jul 1, 2021

What changes were proposed in this pull request?

Add a new configuration spark.sql.timestampType, which configures the default timestamp type of Spark SQL, including SQL DDL and Cast clause. Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME ZONE as the default type while putting it as TIMESTAMP_LTZ will use TIMESTAMP WITH LOCAL TIME ZONE.

The default value of the new configuration is TIMESTAMP_LTZ, which is consistent with previous Spark releases.

Why are the changes needed?

A new configuration for switching the default timestamp type as timestamp without time zone.

Does this PR introduce any user-facing change?

No, it's a new feature.

How was this patch tested?

Unit test


val TIMESTAMP_TYPE =
buildConf("spark.sql.timestampType")
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope is bigger than this. It will affect timestamp literal, function to_timestamp and data source io, etc. We will get to that later.


val TIMESTAMP_TYPE =
buildConf("spark.sql.timestampType")
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about types literals, for instance timestamp'2021-07-01 01:02:03'. Should the config influence on it too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mentioned in #33176 (comment). The type literal logic with TIMESTAMP_NTZ:

  1. if there is no time zone part, return timestamp without time zone literal
  2. otherwise, return timestamp with local time zone.

val TIMESTAMP_TYPE =
buildConf("spark.sql.timestampType")
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " +
"clause. Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME " +
Copy link
Member

@MaxGekk MaxGekk Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you get the name from enum as you did that in .createWithDefault(TimestampTypes.TIMESTAMP_LTZ.toString). Maybe we will rename it again soon, who knows ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import org.apache.spark.util.ResetSystemProperties

// Test suite for setting the default timestamp type of Spark SQL
class TimestampTypeSuite extends QueryTest with SharedSparkSession with ResetSystemProperties {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have DataTypeSuite, so, could put the tests there, or you plan so many tests for TimestampType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simply write tests in DataTypeParserSuite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we apply the config in timestamp literals later, we should add tests in ExpressionParserSuite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is end-to-end test with actual queries, which can't be done in DataTypeSuite.
I just renamed the file to SetDefaultTimestampTypeSuite to avoid the confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan oh I missed your comments. Let me write tests in DataTypeParserSuite

@MaxGekk
Copy link
Member

MaxGekk commented Jul 1, 2021

@gengliangwang Could you mention what is the default value of spark.sql.timestampType in the PR description.
In general it LGTM.

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45039/

@@ -2502,7 +2502,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
case ("float" | "real", Nil) => FloatType
case ("double", Nil) => DoubleType
case ("date", Nil) => DateType
case ("timestamp", Nil) => TimestampType
case ("timestamp", Nil) => SQLConf.get.timestampType
Copy link
Contributor

@cloud-fan cloud-fan Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support parsing type string for LTZ and NTZ specifically so that people can avoid relying on config in their queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer doing that in separate PRs to keep this one clean. We should support the new keyword both on data type parsing and the type literals.
https://issues.apache.org/jira/browse/SPARK-35977
https://issues.apache.org/jira/browse/SPARK-35978

val TIMESTAMP_TYPE =
buildConf("spark.sql.timestampType")
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " +
s"clause. Setting the configuration as ${TimestampTypes.TIMESTAMP_NTZ.toString} will " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the compiler automatically calls the toString method in string interpolations, so, you can skip it. Or I am wrong?

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45043/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs)

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45043/

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45045/

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45045/

@MaxGekk
Copy link
Member

MaxGekk commented Jul 1, 2021

+1, LGTM. Merging to master.
Thank you, @gengliangwang and @cloud-fan @dongjoon-hyun for reviews.

@MaxGekk MaxGekk closed this in a643076 Jul 1, 2021
@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Test build #140527 has finished for PR 33176 at commit ad97b52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you, @gengliangwang and @MaxGekk !

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Test build #140529 has finished for PR 33176 at commit a3c64e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Test build #140532 has finished for PR 33176 at commit 6c6bf3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DataTypeParserSuite extends SparkFunSuite with SQLHelper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants