New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31221][SQL] Rebase any date-times in conversions to/from Java types #27980
Conversation
Test build #120166 has finished for PR 27980 at commit
|
Timestamp.valueOf("2018-10-28 03:00:00"), | ||
Timestamp.valueOf("2018-10-28 03:30:00"))) | ||
ts("2018-10-28 01:30:00"), | ||
ts("2018-10-28 02:00:00", noDST = true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change the test because of the 2 lines. I wasn't able to build 2 timestamps via Timestamp.valueOf
after rebasing. Timestamp
class does normalization underneath, and replaces milliseconds since the epoch, and as a consequence of that checking of the results fails. Even textual representation, and year
, month
, ... nanos
are the same.
@@ -508,8 +504,8 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper { | |||
test("2016-03-13 03:00:00", "PST", "2016-03-13 10:00:00.0") | |||
test("2016-11-06 00:59:59", "PST", "2016-11-06 07:59:59.0") | |||
// 2016-11-06 01:00:00 PST could be 2016-11-06 08:00:00 UTC or 2016-11-06 09:00:00 UTC | |||
test("2016-11-06 01:00:00", "PST", "2016-11-06 09:00:00.0") | |||
test("2016-11-06 01:59:59", "PST", "2016-11-06 09:59:59.0") | |||
test("2016-11-06 01:00:00", "PST", "2016-11-06 08:00:00.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above. 08
or 09
depends on the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my only concern.
From what I know, Pacific Standard Time (PST) is UTC-8:00, and Pacific Daylight Time (PDT) is UTC-7:00. So 2016-11-06 09:00:00 UTC
is the only corrected answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I have found the reason this behavior. While resolving PST to zone offsets, this short name is converted to America/Los_Angeles by the map:
https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/java/time/ZoneId.java#L248
And as a consequence, PST
is not a constant offset:
val pst = getZoneId("PST")
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 5, 23, 59, 59)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 0, 0, 0)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 1, 0, 0)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 2, 0, 0)))
println(pst.getRules.getOffset(LocalDateTime.of(2016, 11, 6, 3, 0, 0)))
-07:00
-07:00
-07:00
-08:00
-08:00
At the test local timestamp 2016-11-06 01:00:00
, the offset is -07:00
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to work around this JDK bug in the test? e.g. not use PST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I got 2016-11-06 09:00:00.0
in the test by modifying the map SHORT_IDS
:
def getZoneId(timeZoneId: String): ZoneId = {
import scala.collection.JavaConverters._
val m = Map("PST" -> "-08:00")
ZoneId.of(timeZoneId, m.asJava)
}
def getDefaultZoneId(): ZoneId = {
getZoneId(defaultTimeZone().getID)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following works:
withDefaultTimeZone(TimeZone.getTimeZone("-08:00")) {
test("2016-11-06 01:00:00", "-08:00", "2016-11-06 09:00:00.0")
test("2016-11-06 01:59:59", "-08:00", "2016-11-06 09:59:59.0")
}
@cloud-fan Should I replace PST
by -08:00
in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def checkFromToJavaDate(d1: Date): Unit = { | ||
val d2 = toJavaDate(fromJavaDate(d1)) | ||
assert(format(d2) === format(d1)) | ||
assert(d2.toString === d1.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I restored the code of Spark 2.4: https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala#L88
Test build #120169 has finished for PR 27980 at commit
|
Test build #120170 has finished for PR 27980 at commit
|
@cloud-fan @HyukjinKwon Please, review this PR. |
Test build #120197 has finished for PR 27980 at commit
|
Test build #120251 has finished for PR 27980 at commit
|
The last commit updated a scala test, which is not related to the pyspark tests. The pip is a known issue. Thanks, merging to master/3.0! |
…types In the PR, I propose to apply rebasing for all dates/timestamps in conversion functions `fromJavaDate()`, `toJavaDate()`, `toJavaTimestamp()` and `fromJavaTimestamp()`. The rebasing is performed via building a local date-time in an original calendar, extracting date-time fields from the result, and creating new local date-time in the target calendar. The changes are need to be compatible with previous Spark version (2.4.5 and earlier versions) not only before the Gregorian cutover date `1582-10-15` but also for dates after the date. For instance, Gregorian calendar implementation in Java 7 `java.util.GregorianCalendar` is not accurate in resolving time zone offsets as Gregorian calendar introduced since Java 8. Yes, this PR can introduce behavior changes for dates after `1582-10-15`, in particular conversions of zone ids to zone offsets will be much more accurate. By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`, `CollectionExpressionsSuite`, `HiveOrcHadoopFsRelationSuite`, `ParquetIOSuite`. Closes #27980 from MaxGekk/reuse-rebase-funcs-in-java-funcs. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 1fd4607) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…types ### What changes were proposed in this pull request? In the PR, I propose to apply rebasing for all dates/timestamps in conversion functions `fromJavaDate()`, `toJavaDate()`, `toJavaTimestamp()` and `fromJavaTimestamp()`. The rebasing is performed via building a local date-time in an original calendar, extracting date-time fields from the result, and creating new local date-time in the target calendar. ### Why are the changes needed? The changes are need to be compatible with previous Spark version (2.4.5 and earlier versions) not only before the Gregorian cutover date `1582-10-15` but also for dates after the date. For instance, Gregorian calendar implementation in Java 7 `java.util.GregorianCalendar` is not accurate in resolving time zone offsets as Gregorian calendar introduced since Java 8. ### Does this PR introduce any user-facing change? Yes, this PR can introduce behavior changes for dates after `1582-10-15`, in particular conversions of zone ids to zone offsets will be much more accurate. ### How was this patch tested? By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`, `CollectionExpressionsSuite`, `HiveOrcHadoopFsRelationSuite`, `ParquetIOSuite`. Closes apache#27980 from MaxGekk/reuse-rebase-funcs-in-java-funcs. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to apply rebasing for all dates/timestamps in conversion functions
fromJavaDate()
,toJavaDate()
,toJavaTimestamp()
andfromJavaTimestamp()
. The rebasing is performed via building a local date-time in an original calendar, extracting date-time fields from the result, and creating new local date-time in the target calendar.Why are the changes needed?
The changes are need to be compatible with previous Spark version (2.4.5 and earlier versions) not only before the Gregorian cutover date
1582-10-15
but also for dates after the date. For instance, Gregorian calendar implementation in Java 7java.util.GregorianCalendar
is not accurate in resolving time zone offsets as Gregorian calendar introduced since Java 8.Does this PR introduce any user-facing change?
Yes, this PR can introduce behavior changes for dates after
1582-10-15
, in particular conversions of zone ids to zone offsets will be much more accurate.How was this patch tested?
By existing test suites
DateTimeUtilsSuite
,DateFunctionsSuite
,DateExpressionsSuite
,CollectionExpressionsSuite
,HiveOrcHadoopFsRelationSuite
,ParquetIOSuite
.