[SPARK-33500][SQL] Support field "EPOCH" in datetime function extract/date_part#30445
[SPARK-33500][SQL] Support field "EPOCH" in datetime function extract/date_part#30445gengliangwang wants to merge 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
hmm, I added EPOCH in the PR #25410 but it was removed by #28284 . What is the reason to add it back again? cc @yaooqinn @cloud-fan @dongjoon-hyun
@MaxGekk This is my major motivation. Actually, I didn't know we had it before and got removed. |
|
Because we thought it's not commonly used. But @gengliangwang gave a good point that we need a replacement for casting datetime to numbers. |
|
I checked the chat history with @cloud-fan, the reasons that we agreed for deleting EPOCH are 1) EPOCH is non-ANSI 2) EPOCH cannot express the meaning of |
|
ah yea, it's not really extract. Can we follow big query to add 3 functions: |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| * Returns the number of seconds since 1970-01-01 00:00:00-00 (can be negative). | ||
| */ | ||
| def getSecondsAfterEpoch(micros: Long, zoneId: ZoneId): Double = { | ||
| micros.toDouble / MICROS_PER_SECOND |
There was a problem hiding this comment.
@gengliangwang If you would like to be compatible to PostgreSQL, you need to take the removed implementation:
/**
* Returns the number of seconds with fractional part in microsecond precision
* since 1970-01-01 00:00:00 local time.
*/
def getEpoch(timestamp: SQLTimestamp, zoneId: ZoneId): Decimal = {
val offset = SECONDS.toMicros(
zoneId.getRules.getOffset(microsToInstant(timestamp)).getTotalSeconds)
val sinceEpoch = timestamp + offset
Decimal(sinceEpoch, 20, 6)
}PostgreSQL takes seconds since the local epoch 1970-01-01 00:00:00-00 but your implementation calculates seconds since 1970-01-01 00:00:00-00Z (in UTC time zone).
There was a problem hiding this comment.
+1, thanks for pointing it out 👍
|
Test build #131424 has finished for PR 30445 at commit
|
|
Close this since we already have #30566 |
What changes were proposed in this pull request?
Support filed EPOCH in the function
extractanddate_part, which returns the number of seconds since1970-01-01 00:00:00-00For example:
Note that this field only works with Date and Timestamp input. It doesn't work with Interval type. In Spark, the number of seconds of
month/dayis considered as uncertain in Spark (a month contains 28~31 days, a day contains 23 ~25 hours).Why are the changes needed?
This is useful for getting the number of seconds since
1970-01-01 00:00:00-00.PostgreSQL also supports the same field:
https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
The other reason is that since casting from TimestampType to Numeric Type is disallowed in ANSI mode, we need to provide a proper solution if a user has to do the casting.
Does this PR introduce any user-facing change?
Yes, a new field "EPOCH" for datetime function extract/date_part
How was this patch tested?
Unit test