Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps #24316

Closed
wants to merge 1 commit into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 7, 2019

What changes were proposed in this pull request?

In the PR, I propose to restrict the range of random timestamp literals generated in LiteralGenerator. timestampLiteralGen. The generator creates instances of java.sql.Timestamp by passing milliseconds since epoch as Long type. Converting the milliseconds to microseconds can cause arithmetic overflow of Long type because Catalyst's Timestamp type stores microseconds since epoch in Long type internally as well. Proposed interval of random milliseconds is [Long.MinValue / 1000, Long.MaxValue / 1000].

For example, generated timestamp new java.sql.Timestamp(-3948373668011580000) causes Long overflow at the method:

  def fromJavaTimestamp(t: Timestamp): SQLTimestamp = {
  ...
      MILLISECONDS.toMicros(t.getTime()) + NANOSECONDS.toMicros(t.getNanos()) % NANOS_PER_MICROS
  ...
  }

because t.getTime() returns -3948373668011580000 which is multiplied by 1000 at MILLISECONDS.toMicros, and the result -3948373668011580000000 is less than Long.MinValue.

How was this patch tested?

By DateExpressionsSuite in the PR #24311

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 7, 2019

@dongjoon-hyun @srowen Please, take a look at the PR.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 7, 2019

Thank you for splitting this, @MaxGekk !

BTW, this PR is not tested by both the existing tests and the next suite because this reduces the range of generated data.

By DateExpressionsSuite in the PR #24311

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 7, 2019

Could you add a concrete example in the PR description for the following, too?

Converting the milliseconds to microseconds can cause arithmetic overflow of Long type because Catalyst's Timestamp type stores microseconds since epoch in Long type internally as well.

You may find some instance from the following failure.

[info] ArithmeticExpressionSuite:
...
[info] - function least *** FAILED *** (690 milliseconds)
[info]   ArithmeticException was thrown during property evaluation.
[info]     Message: long overflow
[info]     Occurred when passed generated values (
[info]   
[info]     )

@SparkQA
Copy link

SparkQA commented Apr 8, 2019

Test build #104366 has finished for PR 24316 at commit 990b5b5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 8, 2019

jenkins, retest this, please

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 8, 2019

BTW, this PR is not tested ... because this reduces the range of generated data.
By DateExpressionsSuite in the PR #24311

I would agree the PR is not tested by an explicit test but what would you like to test? I think we can trust to ScalaCheck's Gen.choose that it returns longs in the range [Long.MinValue / 1000, Long.MaxValue / 1000]. There is theoretical foundation that generated milliseconds out of the range cause Long arithmetic overflow, so, the range cannot be wider. What should the test proof?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK to me. I think the point is that Arbitrary is going to pick values that aren't valid for this test? The range changes then as a fix, and we can see that tests pass before and after. What else?

@SparkQA
Copy link

SparkQA commented Apr 8, 2019

Test build #104384 has finished for PR 24316 at commit 990b5b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
The updated document is correct and enough. What I want is to make it explicit in the PR description, @MaxGekk .

Thank you, @MaxGekk and @srowen .

@MaxGekk MaxGekk deleted the random-timestamps-gen branch September 18, 2019 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants