-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling #14398
[SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling #14398
Conversation
…itly on local system timezone - lets avoid that
…ays since it was always falling back to system timezone in the fall through case
Test build #62997 has finished for PR 14398 at commit
|
@@ -24,6 +24,8 @@ import javax.xml.bind.DatatypeConverter | |||
|
|||
import scala.annotation.tailrec | |||
|
|||
import sun.util.calendar.CalendarSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using java.util.Calendar
right -- I presume any sun.
classes are not for use by applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable - will also allow us to hide the julian calendar usage for old dates.
Also http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time - I really think we should stop handling time as much as possibly and leave it to people who spend enough time thinking about it to get it mostly right. :p |
@holdenk @davies The change proposed in this PR makes for cleaner code but it also regresses the fix for SPARK-15613. With this fix there are 4 incorrect test("15613 Incorrect days to millis conversion Europe/Moscow") {
DateTimeTestUtils.withDefaultTimeZone(TimeZone.getTimeZone("Europe/Moscow")) {
val badDays = (0 to 20000).filterNot{ d =>
d === millisToDays(daysToMillis(d))
}
val df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss z")
badDays.foreach { d =>
println(s"day: ${d}, date: ${df.format(new Date(daysToMillis(d)))}")
}
assert(badDays.isEmpty)
}
} TestFailedException: Vector(4108, 4473, 4838, 5204) was not empty
day: 4108, date: 1981-03-31 23:00:00 MSK
day: 4473, date: 1982-03-31 23:00:00 MSK
day: 4838, date: 1983-03-31 23:00:00 MSK
day: 5204, date: 1984-03-31 23:00:00 MSK ...and it increases the number of incorrectly converted days from 52 to 4036 when testing all time zones and dates between 1900 and 2020 (see test code snippet here). |
// create a Timestamp to get the unix timestamp (in UTC) | ||
val timestamp = new Timestamp(year - 1900, month - 1, day, hh, mm, ss, nano) | ||
guess = (millisLocal - timestamp.getTime).toInt | ||
calendar.set(year, month, day, hh, mm, ss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calendar.set(year, month - 1, day, hh, mm, ss)
and
calendar.clear(Calendar.MILLISECOND)
and
guess = (millisLocal - calendar.getTimeInMillis).toInt
makes the existing test cases work
val calendar = Calendar.getInstance(tz)
calendar.set(year, month - 1, day, hh, mm, ss)
calendar.clear(Calendar.MILLISECOND)
guess = (millisLocal - calendar.getTimeInMillis).toInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh one would think that the tests with the timestamps close to DST looking up offsets would also need this if that was the intended constructor but ok let me give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so my (previous) comments don't make sense in response to the comment edit but I'll leave as is.
As for the newly updated comment, the approach does avoid the deprecation warning while keeping the existing functionality - but:
- Its going to throw away any subsecond information in the timestamp by including it in the difference from millis local to the time in miliseconds
- It doesn't seem to resolve the timezone offset correctly (it seems to give back non-DST times in DST transitions)*
That being said - it is a strict improvement over the previous code so we could go for this approach instead since it is only a stop-gap solution and its really about choosing which way we want this code to be more broken in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DST is probably ok and my tests are just borked - so lets go with the new calendar approach and I'll fix it to preserve ms information.
Test build #63030 has finished for PR 14398 at commit
|
val calendar = Calendar.getInstance(tz) | ||
calendar.set(year, month - 1, day, hh, mm, ss) | ||
calendar.set(Calendar.MILLISECOND, ms) | ||
val date = calendar.getTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems unnecessary ... if the intend is to make sure Calendar.computeTime()
will be invoked then that will happen when calling calendar.getTimeInMillis()
on the next line as well
Test build #63028 has finished for PR 14398 at commit
|
Test build #63050 has finished for PR 14398 at commit
|
Just for good measure lets re-run the jenkins tests. |
Jenkins retest this please |
The approach LGTM |
Test build #63078 has finished for PR 14398 at commit
|
Yay - I still think we want to replace this entire file with a thin wrapper around specialized library at some point soon but incremental progress (avoid warning - don't depend on system timezone randomly near DST boundaries during timestamp conversion - which I think might be related to the issue Reynold added some logging for recently). |
LGTM |
Merged to master/2.0 |
…ove timezone handling ## What changes were proposed in this pull request? Removes the deprecated timestamp constructor and incidentally fixes the use which was using system timezone rather than the one specified when working near DST. This change also causes the roundtrip tests to fail since it now actually uses all the timezones near DST boundaries where it didn't before. Note: this is only a partial the solution, longer term we should follow up with https://issues.apache.org/jira/browse/SPARK-16788 to avoid this problem & simplify our timezone handling code. ## How was this patch tested? New tests for two timezones added so even if user timezone happens to coincided with one, the other tests should still fail. Important note: this (temporarily) disables the round trip tests until we can fix the issue more thoroughly. Author: Holden Karau <holden@us.ibm.com> Closes #14398 from holdenk/SPARK-16774-fix-use-of-deprecated-timestamp-constructor. (cherry picked from commit ab1e761) Signed-off-by: Sean Owen <sowen@cloudera.com>
What changes were proposed in this pull request?
Removes the deprecated timestamp constructor and incidentally fixes the use which was using system timezone rather than the one specified when working near DST.
This change also causes the roundtrip tests to fail since it now actually uses all the timezones near DST boundaries where it didn't before.
Note: this is only a partial the solution, longer term we should follow up with https://issues.apache.org/jira/browse/SPARK-16788 to avoid this problem & simplify our timezone handling code.
How was this patch tested?
New tests for two timezones added so even if user timezone happens to coincided with one, the other tests should still fail. Important note: this (temporarily) disables the round trip tests until we can fix the issue more thoroughly.