Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35663][SQL] Add DataType class for timestamp without time zone type #32802

Closed
wants to merge 7 commits into from

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Jun 7, 2021

What changes were proposed in this pull request?

Extend Catalyst's type system by a new type that conforms to the SQL standard (see SQL:2016, section 4.6.2): TimestampWithoutTZType represents the timestamp without time zone type

Why are the changes needed?

Spark SQL today supports the TIMESTAMP data type. However the semantics provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. Timestamps embedded in a SQL query or passed through JDBC are presumed to be in session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with calendars.
In many (more) other cases, such as when dealing with log files it is desirable that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist in the standard.

In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for standard semantic.
Using these two types will provide clarity.
This is a starting PR. See more details in https://issues.apache.org/jira/browse/SPARK-35662

Does this PR introduce any user-facing change?

Yes, a new data type for Timestamp without time zone type. It is still in development.

How was this patch tested?

Unit test

import org.apache.spark.annotation.Unstable

/**
* The timestamp without time zone type represents a a local time in microsecond precision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a a -> a

@cloud-fan
Copy link
Contributor

cc @viirya @maropu @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139404 has finished for PR 32802 at commit f6fedd0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please

* @since 3.2.0
*/
@Unstable
class TimestampNTZType private() extends AtomicType {
Copy link
Member

@MaxGekk MaxGekk Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it will be visible to users, let's discuss other names for the type:

Could you elaborate why the name you selected is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to fully match the SQL standard, we should use TimestmapWithoutTimeZoneType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's visible to non-SQL users, so the name does matter. But TimestmapWithoutTimeZoneType is really a bit too long to type. Maybe TimestampNoTzType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk I was to avoid the new type name being too long.
TimestampWithoutTZType sounds great.

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43926/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43931/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this, @gengliangwang. In general, the naming and PR looks good to me.

BTW, there are 4 milestones in the JIRA. Are you targeting everything into Apache Spark 3.2?

@gengliangwang
Copy link
Member Author

gengliangwang commented Jun 7, 2021

BTW, there are 4 milestones in the JIRA. Are you targeting everything into Apache Spark 3.2?

I will create sub-tasks for the community developers after the fundamental work is done. I think we can at least target Milestone 1 & 2 on 3.2 for now. Many of the code paths are similar to the default Timestamp type.
For JDBC/python/R support, we can have them on Spark 3.2.1

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43926/

import org.apache.spark.annotation.Unstable

/**
* The timestamp without time zone type represents a local time in microsecond precision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... represents a local time or a "wallclock" time in microsecond precision, independent of time zone.
Its valid range ...
To represent an absolute point in time, use `TimestampType` instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"represents a local time" is from SQL standard. I will keep it.
The other parts are updated.

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43934/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43931/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43934/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43938/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43937/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43937/

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139409 has finished for PR 32802 at commit f6fedd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139412 has finished for PR 32802 at commit 54cd45d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139415 has finished for PR 32802 at commit 5c204ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 33f2627 Jun 7, 2021
@cloud-fan cloud-fan changed the title [SPARK-35663][SQL] Add Timestamp without time zone type [SPARK-35663][SQL] Add DataType class for timestamp without time zone type Jun 7, 2021
@SparkQA
Copy link

SparkQA commented Jun 7, 2021

Test build #139416 has finished for PR 32802 at commit 8fc30b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

@cloud-fan @MaxGekk @dongjoon-hyun Thanks all for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants