-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27401][SQL] Refactoring conversion of Timestamp to/from java.sql.Timestamp #24311
Conversation
Test build #104344 has finished for PR 24311 at commit
|
Test build #104345 has finished for PR 24311 at commit
|
Failed ORC related test which generates random dates -
@dongjoon-hyun Am I right? It is strange that only this ORC test failed. |
Oh, really I mean this looks fine pending tests. If they're transient failures, fine. If they're not, obviously they need to be fixed. here I assume they aren't significant failures, just tests that need to be adjusted. |
Oh, @MaxGekk . According to the last reverting commit, you are not aiming |
Before the last reverting, I got the following result. @MaxGekk . If there was an issue, it might be an old Hive 1.2.1 issue which is fixed in Apache ORC. If needed, you can file a JIRA issue. For old Hive 1.2.1 issue, I can dig it for you but it might be difficult to fix it because it seems to be not inside both Apache Spark and Apache ORC.
|
Test build #104347 has finished for PR 24311 at commit
|
lazy val timestampLiteralGen: Gen[Literal] = { | ||
for { millis <- Gen.choose(yearToMillis(-9999), yearToMillis(10000)) } | ||
yield Literal.create(new Timestamp(millis), TimestampType) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk . In general, we don't change the test cases during refactoring in order to ensure the test coverage. I'm wondering if this is a required change due to the known limitation of the new functions. Although this looks minor, I'd make this as an another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the random generator for timestamps, and limit max/min values for milliseconds since epoch because the long random generator can produce milliseconds that caused Long overflow at conversion of milliseconds to microseconds. Internally as you know we store microseconds since epoch in TimestampType
. For example, the old (current) generator can create an instance of java.sql.Timestamp(-3948373668011580000, 570000000)
. New function fromJavaTimestamp
calls instantToMicros
, and inside of it we use multiplyExact
which can detect Long
overflow on multiplication:
def instantToMicros(instant: Instant): Long = {
val us = Math.multiplyExact(-3948373668011580, 1000000)
...
}
Previous (current) implementation doesn't detect the overflow at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long overflow exception caused failures in
Line 202 in 1d20d13
checkConsistencyBetweenInterpretedAndCodegen( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the exact reason you should not put that in this PR, @MaxGekk .
Please do that improvement in another PR first. And do the refactoring PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun ok. I will open another PR. Thanks.
Yeh, it seems there could be a calendar incompatibility issue somewhere inside Hive + ORC which could cause the difference In any case, I think my date (not timestamp) related changes are potentially more expensive comparing to current implementation because of conversion of |
Test build #104360 has finished for PR 24311 at commit
|
Test build #104365 has finished for PR 24311 at commit
|
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test build #104399 has finished for PR 24311 at commit
|
To/from java.sql.Timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
------------------------------------------------------------------------------------------------------------------------ | ||
From java.sql.Timestamp 309 316 8 16.2 61.9 1.0X | ||
Collect longs 1410 2747 1158 3.5 282.0 0.2X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, could you rerun this in the more stable environment?
Stdev
looks too high.
Test build #104421 has finished for PR 24311 at commit
|
jenkins, retest this, please |
Test build #104425 has finished for PR 24311 at commit
|
jenkins, retest this, please |
Test build #104431 has finished for PR 24311 at commit
|
I changed only |
jenkins, retest this, please |
Test build #104449 has finished for PR 24311 at commit
|
What changes were proposed in this pull request?
In the PR, I propose simpler implementation of
toJavaTimestamp()
/fromJavaTimestamp()
by reusing existing functions ofDateTimeUtils
. This will allow to:toJavaTimestamp()
, and handle properly negative inputs.Long
overflow in conversion of milliseconds (java.sql.Timestamp
) to microseconds (Catalyst's Timestamp).How was this patch tested?
By existing test suites
DateTimeUtilsSuite
,DateFunctionsSuite
,DateExpressionsSuite
andCastSuite
. And by new benchmark for export/import timestamps added toDateTimeBenchmark
:Before:
After: