Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31710][SQL]Add compatibility flag to cast long to timestamp #28570

Closed

Conversation

GuoPhilipse
Copy link
Member

What changes were proposed in this pull request?
As we know,long datatype is interpreted as milliseconds when conversion to timestamp in hive, while long is interpreted as seconds when conversion to timestamp in spark, we have been facing error data during migrating hive sql to spark sql. with compatibility flag we can fix this error,

Why are the changes needed?
we have many sqls runing in product, so we need a compatibility flag to make them migrating smoothly ,meanwhile do not change the user behavior in spark.

Does this PR introduce any user-facing change?
if user use this patch ,then user should set this paramter ,
if not, user do not need to do anything.

How was this patch tested?
unit test added

As we know,long datatype is interpreted as milliseconds when conversion to timestamp in hive, while long is interpreted as seconds when conversion to timestamp  in spark, we have many sqls runing in product, so we need  a compatibility flag to make them migrating smoothly ,meanwhile do not change the user behavior in spark.
As we know,long datatype is interpreted as milliseconds when conversion to timestamp in hive, while long is interpreted as seconds when conversion to timestamp  in spark, we have many sqls runing in product, so we need  a compatibility flag to make them migrating smoothly ,meanwhile do not change the user behavior in spark.
As we know,long datatype is interpreted as milliseconds when conversion to timestamp in hive, while long is interpreted as seconds when conversion to timestamp  in spark, we have many sqls runing in product, so we need  a compatibility flag to make them migrating smoothly ,meanwhile do not change the user behavior in spark.
…710-3

[SPARK-31710][SQL]Add compatibility flag to cast long to timestamp
…710-2

[SPARK-31710][SQL]Add compatibility flag to cast long to timestamp
…710-1

[SPARK-31710][SQL]Add compatibility flag to cast long to timestamp
As we know,long datatype is interpreted as milliseconds when conversion to timestamp in hive, while long is interpreted as seconds when conversion to timestamp in spark, we have been facing error data during migrating hive sql to spark sql. with compatibility flag we can fix this error,
@GuoPhilipse GuoPhilipse changed the title Guo philipse patch spark 31710 [SPARK-31710][SQL]Add compatibility flag to cast long to timestamp May 18, 2020
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

private[this] def longToTimestamp(t: Long): Long = SECONDS.toMicros(t)
// SPARK-31710 converting seconds to us,Add compatibility flag
private[this] def longToTimestamp(t: Long): Long = {
if ( SQLConf.get.getConf( SQLConf.LONG_TIMESTAMP_CONVERSION_IN_SECONDS )) t * 1000000L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess checking the flag per each value brings significant performance drop. Could you show by a benchmark that performance is not sacrificed by your changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi MaxGekk
i will add new benchmark later

// SPARK-31710 converting seconds to us,Add compatibility flag
private[this] def longToTimestamp(t: Long): Long = {
if ( SQLConf.get.getConf( SQLConf.LONG_TIMESTAMP_CONVERSION_IN_SECONDS )) t * 1000000L
else t * 1000L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are constants in DateTimeConstants. Please, use them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion, i will recheck my code

val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
assert(negativeTs.getTime < 0)
val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
checkExceptionInExpression[ArithmeticException](cast(negativeTs, ByteType), errMsg("byte"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No errors while casting to byte anymore in ANSI mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry ,it is a mistake, i will revert this commit.

@@ -1353,17 +1353,33 @@ class AnsiCastSuite extends CastSuiteBase {
cast("abc.com", dataType), "invalid input")
}
}

test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you delete the test?

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

test("SPARK-31710:Add compatibility flag to cast long to timestamp") {
withSQLConf(
SQLConf.LONG_TIMESTAMP_CONVERSION_IN_SECONDS.key -> "false") {
for (tz <- ALL_TIMEZONES) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to test all time zones?

@bart-samwel
Copy link

There have now been 4 PRs with the same name and description. @GuoPhilipse is there something wrong with your workflow? I'm losing track of all the different PRs, and the comments are spread out across them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants