Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling #14398

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Jul 29, 2016

What changes were proposed in this pull request?

Removes the deprecated timestamp constructor and incidentally fixes the use which was using system timezone rather than the one specified when working near DST.

This change also causes the roundtrip tests to fail since it now actually uses all the timezones near DST boundaries where it didn't before.

Note: this is only a partial the solution, longer term we should follow up with https://issues.apache.org/jira/browse/SPARK-16788 to avoid this problem & simplify our timezone handling code.

How was this patch tested?

New tests for two timezones added so even if user timezone happens to coincided with one, the other tests should still fail. Important note: this (temporarily) disables the round trip tests until we can fix the issue more thoroughly.

@holdenk
Copy link
Contributor Author

holdenk commented Jul 29, 2016

cc @ckadner & @davies - still think the longterm solution is a larger rewrite of this into a thin wrapper around something like JSR-310, but in the meantime this should be some progress.

@SparkQA
Copy link

SparkQA commented Jul 29, 2016

Test build #62997 has finished for PR 14398 at commit 0385d8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -24,6 +24,8 @@ import javax.xml.bind.DatatypeConverter

import scala.annotation.tailrec

import sun.util.calendar.CalendarSystem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using java.util.Calendar right -- I presume any sun. classes are not for use by applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable - will also allow us to hide the julian calendar usage for old dates.

@holdenk
Copy link
Contributor Author

holdenk commented Jul 29, 2016

Also http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time - I really think we should stop handling time as much as possibly and leave it to people who spend enough time thinking about it to get it mostly right. :p

@ckadner
Copy link
Member

ckadner commented Jul 29, 2016

@holdenk @davies
I agree that date, time, and timezone handling is best left to the experts and leveraging JSR-310 (SPARK-16788) in the long run would probably solve most of the problems that have been painstakingly fixed or attempted to be fixed in DateTimeUtils.scala.

The change proposed in this PR makes for cleaner code but it also regresses the fix for SPARK-15613. With this fix there are 4 incorrect day -> millis -> day conversions since 1970/01/01 just for Europe/Moscow:

  test("15613 Incorrect days to millis conversion Europe/Moscow") {
    DateTimeTestUtils.withDefaultTimeZone(TimeZone.getTimeZone("Europe/Moscow")) {
      val badDays = (0 to 20000).filterNot{ d =>
        d === millisToDays(daysToMillis(d))
      }
      val df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss z")
      badDays.foreach { d =>
        println(s"day: ${d}, date: ${df.format(new Date(daysToMillis(d)))}")
      }
      assert(badDays.isEmpty)
    }
  }
TestFailedException: Vector(4108, 4473, 4838, 5204) was not empty
day: 4108, date: 1981-03-31 23:00:00 MSK
day: 4473, date: 1982-03-31 23:00:00 MSK
day: 4838, date: 1983-03-31 23:00:00 MSK
day: 5204, date: 1984-03-31 23:00:00 MSK

...and it increases the number of incorrectly converted days from 52 to 4036 when testing all time zones and dates between 1900 and 2020 (see test code snippet here).

// create a Timestamp to get the unix timestamp (in UTC)
val timestamp = new Timestamp(year - 1900, month - 1, day, hh, mm, ss, nano)
guess = (millisLocal - timestamp.getTime).toInt
calendar.set(year, month, day, hh, mm, ss)
Copy link
Member

@ckadner ckadner Jul 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calendar.set(year, month - 1, day, hh, mm, ss) and
calendar.clear(Calendar.MILLISECOND) and
guess = (millisLocal - calendar.getTimeInMillis).toInt

makes the existing test cases work

        val calendar = Calendar.getInstance(tz)
        calendar.set(year, month - 1, day, hh, mm, ss)
        calendar.clear(Calendar.MILLISECOND)
        guess = (millisLocal - calendar.getTimeInMillis).toInt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh one would think that the tests with the timestamps close to DST looking up offsets would also need this if that was the intended constructor but ok let me give it a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so my (previous) comments don't make sense in response to the comment edit but I'll leave as is.

As for the newly updated comment, the approach does avoid the deprecation warning while keeping the existing functionality - but:

  1. Its going to throw away any subsecond information in the timestamp by including it in the difference from millis local to the time in miliseconds
  2. It doesn't seem to resolve the timezone offset correctly (it seems to give back non-DST times in DST transitions)*

That being said - it is a strict improvement over the previous code so we could go for this approach instead since it is only a stop-gap solution and its really about choosing which way we want this code to be more broken in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So DST is probably ok and my tests are just borked - so lets go with the new calendar approach and I'll fix it to preserve ms information.

@holdenk holdenk changed the title [SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling [SPARK-16774][SQL][WIP] Fix use of deprecated timestamp constructor & improve timezone handling Jul 30, 2016
@SparkQA
Copy link

SparkQA commented Jul 30, 2016

Test build #63030 has finished for PR 14398 at commit 1109275.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val calendar = Calendar.getInstance(tz)
calendar.set(year, month - 1, day, hh, mm, ss)
calendar.set(Calendar.MILLISECOND, ms)
val date = calendar.getTime()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems unnecessary ... if the intend is to make sure Calendar.computeTime() will be invoked then that will happen when calling calendar.getTimeInMillis() on the next line as well

@SparkQA
Copy link

SparkQA commented Jul 30, 2016

Test build #63028 has finished for PR 14398 at commit 8670bb6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2016

Test build #63050 has finished for PR 14398 at commit 671f7be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Jul 31, 2016

Just for good measure lets re-run the jenkins tests.
Jenkins retest this please.

@srowen
Copy link
Member

srowen commented Aug 1, 2016

Jenkins retest this please

@srowen
Copy link
Member

srowen commented Aug 1, 2016

The approach LGTM

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63078 has finished for PR 14398 at commit 671f7be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 1, 2016

Yay - I still think we want to replace this entire file with a thin wrapper around specialized library at some point soon but incremental progress (avoid warning - don't depend on system timezone randomly near DST boundaries during timestamp conversion - which I think might be related to the issue Reynold added some logging for recently).

@holdenk holdenk changed the title [SPARK-16774][SQL][WIP] Fix use of deprecated timestamp constructor & improve timezone handling [SPARK-16774][SQL] Fix use of deprecated timestamp constructor & improve timezone handling Aug 1, 2016
@ckadner
Copy link
Member

ckadner commented Aug 1, 2016

LGTM

@srowen
Copy link
Member

srowen commented Aug 1, 2016

Merged to master/2.0

@asfgit asfgit closed this in ab1e761 Aug 1, 2016
asfgit pushed a commit that referenced this pull request Aug 1, 2016
…ove timezone handling

## What changes were proposed in this pull request?

Removes the deprecated timestamp constructor and incidentally fixes the use which was using system timezone rather than the one specified when working near DST.

This change also causes the roundtrip tests to fail since it now actually uses all the timezones near DST boundaries where it didn't before.

Note: this is only a partial the solution, longer term we should follow up with https://issues.apache.org/jira/browse/SPARK-16788 to avoid this problem & simplify our timezone handling code.

## How was this patch tested?

New tests for two timezones added so even if user timezone happens to coincided with one, the other tests should still fail. Important note: this (temporarily) disables the round trip tests until we can fix the issue more thoroughly.

Author: Holden Karau <holden@us.ibm.com>

Closes #14398 from holdenk/SPARK-16774-fix-use-of-deprecated-timestamp-constructor.

(cherry picked from commit ab1e761)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants