-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35663][SQL] Add DataType class for timestamp without time zone type #32802
Conversation
import org.apache.spark.annotation.Unstable | ||
|
||
/** | ||
* The timestamp without time zone type represents a a local time in microsecond precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a a
-> a
address comment
Test build #139404 has finished for PR 32802 at commit
|
retest this please |
* @since 3.2.0 | ||
*/ | ||
@Unstable | ||
class TimestampNTZType private() extends AtomicType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it will be visible to users, let's discuss other names for the type:
- TimestampWithoutTZType - like in the standard
- DatetimeType - BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_type
- TimestampNoTzType
- LocalTimestampType
Could you elaborate why the name you selected is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to fully match the SQL standard, we should use TimestmapWithoutTimeZoneType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's visible to non-SQL users, so the name does matter. But TimestmapWithoutTimeZoneType
is really a bit too long to type. Maybe TimestampNoTzType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk I was to avoid the new type name being too long.
TimestampWithoutTZType
sounds great.
Kubernetes integration test starting |
Kubernetes integration test starting |
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this, @gengliangwang. In general, the naming and PR looks good to me.
BTW, there are 4 milestones in the JIRA. Are you targeting everything into Apache Spark 3.2?
I will create sub-tasks for the community developers after the fundamental work is done. I think we can at least target Milestone 1 & 2 on 3.2 for now. Many of the code paths are similar to the default Timestamp type. |
Kubernetes integration test status success |
import org.apache.spark.annotation.Unstable | ||
|
||
/** | ||
* The timestamp without time zone type represents a local time in microsecond precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... represents a local time or a "wallclock" time in microsecond precision, independent of time zone.
Its valid range ...
To represent an absolute point in time, use `TimestampType` instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"represents a local time" is from SQL standard. I will keep it.
The other parts are updated.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status success |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #139409 has finished for PR 32802 at commit
|
Test build #139412 has finished for PR 32802 at commit
|
Test build #139415 has finished for PR 32802 at commit
|
thanks, merging to master! |
Test build #139416 has finished for PR 32802 at commit
|
@cloud-fan @MaxGekk @dongjoon-hyun Thanks all for the review! |
What changes were proposed in this pull request?
Extend Catalyst's type system by a new type that conforms to the SQL standard (see SQL:2016, section 4.6.2): TimestampWithoutTZType represents the timestamp without time zone type
Why are the changes needed?
Spark SQL today supports the TIMESTAMP data type. However the semantics provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. Timestamps embedded in a SQL query or passed through JDBC are presumed to be in session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with calendars.
In many (more) other cases, such as when dealing with log files it is desirable that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist in the standard.
In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for standard semantic.
Using these two types will provide clarity.
This is a starting PR. See more details in https://issues.apache.org/jira/browse/SPARK-35662
Does this PR introduce any user-facing change?
Yes, a new data type for Timestamp without time zone type. It is still in development.
How was this patch tested?
Unit test