New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35975][SQL] New configuration spark.sql.timestampType
for the default timestamp type
#33176
Conversation
|
||
val TIMESTAMP_TYPE = | ||
buildConf("spark.sql.timestampType") | ||
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope is bigger than this. It will affect timestamp literal, function to_timestamp
and data source io, etc. We will get to that later.
|
||
val TIMESTAMP_TYPE = | ||
buildConf("spark.sql.timestampType") | ||
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about types literals, for instance timestamp'2021-07-01 01:02:03'
. Should the config influence on it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mentioned in #33176 (comment). The type literal logic with TIMESTAMP_NTZ:
- if there is no time zone part, return timestamp without time zone literal
- otherwise, return timestamp with local time zone.
val TIMESTAMP_TYPE = | ||
buildConf("spark.sql.timestampType") | ||
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " + | ||
"clause. Setting the configuration as TIMESTAMP_NTZ will use TIMESTAMP WITHOUT TIME " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you get the name from enum as you did that in .createWithDefault(TimestampTypes.TIMESTAMP_LTZ.toString)
. Maybe we will rename it again soon, who knows ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import org.apache.spark.util.ResetSystemProperties | ||
|
||
// Test suite for setting the default timestamp type of Spark SQL | ||
class TimestampTypeSuite extends QueryTest with SharedSparkSession with ResetSystemProperties { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have DataTypeSuite
, so, could put the tests there, or you plan so many tests for TimestampType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply write tests in DataTypeParserSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we apply the config in timestamp literals later, we should add tests in ExpressionParserSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is end-to-end test with actual queries, which can't be done in DataTypeSuite.
I just renamed the file to SetDefaultTimestampTypeSuite
to avoid the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan oh I missed your comments. Let me write tests in DataTypeParserSuite
@gengliangwang Could you mention what is the default value of |
Kubernetes integration test unable to build dist. exiting with code: 1 |
@@ -2502,7 +2502,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg | |||
case ("float" | "real", Nil) => FloatType | |||
case ("double", Nil) => DoubleType | |||
case ("date", Nil) => DateType | |||
case ("timestamp", Nil) => TimestampType | |||
case ("timestamp", Nil) => SQLConf.get.timestampType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to support parsing type string for LTZ and NTZ specifically so that people can avoid relying on config in their queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer doing that in separate PRs to keep this one clean. We should support the new keyword both on data type parsing and the type literals.
https://issues.apache.org/jira/browse/SPARK-35977
https://issues.apache.org/jira/browse/SPARK-35978
val TIMESTAMP_TYPE = | ||
buildConf("spark.sql.timestampType") | ||
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " + | ||
s"clause. Setting the configuration as ${TimestampTypes.TIMESTAMP_NTZ.toString} will " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the compiler automatically calls the toString
method in string interpolations, so, you can skip it. Or I am wrong?
Kubernetes integration test starting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs)
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
+1, LGTM. Merging to master. |
Test build #140527 has finished for PR 33176 at commit
|
Thank you, @gengliangwang and @MaxGekk ! |
Test build #140529 has finished for PR 33176 at commit
|
Test build #140532 has finished for PR 33176 at commit
|
What changes were proposed in this pull request?
Add a new configuration
spark.sql.timestampType
, which configures the default timestamp type of Spark SQL, including SQL DDL and Cast clause. Setting the configuration asTIMESTAMP_NTZ
will useTIMESTAMP WITHOUT TIME ZONE
as the default type while putting it asTIMESTAMP_LTZ
will useTIMESTAMP WITH LOCAL TIME ZONE
.The default value of the new configuration is TIMESTAMP_LTZ, which is consistent with previous Spark releases.
Why are the changes needed?
A new configuration for switching the default timestamp type as timestamp without time zone.
Does this PR introduce any user-facing change?
No, it's a new feature.
How was this patch tested?
Unit test