[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing if f overflow happens #29267

yaooqinn · 2020-07-28T02:23:56Z

This PR backports d315ebf to branch-3.0

What changes were proposed in this pull request?

When using Seconds.toMicros API to convert epoch seconds to microseconds,

 /**
     * Equivalent to
     * {@link #convert(long, TimeUnit) MICROSECONDS.convert(duration, this)}.
     * @param duration the duration
     * @return the converted duration,
     * or {@code Long.MIN_VALUE} if conversion would negatively
     * overflow, or {@code Long.MAX_VALUE} if it would positively overflow.
     */

This PR change it to Math.multiplyExact(epochSeconds, MICROS_PER_SECOND)

Why are the changes needed?

fix silent data change between 3.x and 2.x

 ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200722   bin/spark-sql -S -e "select to_timestamp('300000', 'y');"
+294247-01-10 12:00:54.775807

 kentyao@hulk  ~/Downloads/spark/spark-2.4.5-bin-hadoop2.7  bin/spark-sql -S  -e "select to_timestamp('300000', 'y');"
284550-10-19 15:58:1010.448384

Does this PR introduce any user-facing change?

Yes, we will raise ArithmeticException instead of giving the wrong answer if overflow.

How was this patch tested?

add unit test

yaooqinn · 2020-07-28T02:59:02Z

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

@@ -146,7 +146,3 @@ select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampF
 select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
-
-select from_unixtime(1, 'yyyyyyyyyyy-MM-dd');


I checked these tests which were added to test cases that exceed 10-'y'. It's safe to remove them now because they will fail starting from 7-'y' and we have already covered these in both datetime-parsing.sql and datetime-formatting.sql

yaooqinn · 2020-07-28T03:01:51Z

cc @cloud-fan

SparkQA · 2020-07-28T05:58:05Z

Test build #126686 has finished for PR 29267 at commit b612f88.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2020-07-28T07:05:02Z

Test build #126690 has finished for PR 29267 at commit 4f6059c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2020-07-28T07:17:29Z

retest this please

cloud-fan · 2020-07-28T10:03:35Z

github action passes, thanks, merging to 3.0!

…if f overflow happens This PR backports d315ebf to branch-3.0 ### What changes were proposed in this pull request? When using `Seconds.toMicros` API to convert epoch seconds to microseconds, ```scala /** * Equivalent to * {link #convert(long, TimeUnit) MICROSECONDS.convert(duration, this)}. * param duration the duration * return the converted duration, * or {code Long.MIN_VALUE} if conversion would negatively * overflow, or {code Long.MAX_VALUE} if it would positively overflow. */ ``` This PR change it to `Math.multiplyExact(epochSeconds, MICROS_PER_SECOND)` ### Why are the changes needed? fix silent data change between 3.x and 2.x ``` ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200722  bin/spark-sql -S -e "select to_timestamp('300000', 'y');" +294247-01-10 12:00:54.775807 ``` ``` kentyaohulk  ~/Downloads/spark/spark-2.4.5-bin-hadoop2.7  bin/spark-sql -S -e "select to_timestamp('300000', 'y');" 284550-10-19 15:58:1010.448384 ``` ### Does this PR introduce _any_ user-facing change? Yes, we will raise `ArithmeticException` instead of giving the wrong answer if overflow. ### How was this patch tested? add unit test Closes #29267 from yaooqinn/SPARK-32424-30. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

SparkQA · 2020-07-28T11:52:17Z

Test build #126703 has finished for PR 29267 at commit 4f6059c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

probot-autolabeler bot added DOCS SQL labels Jul 28, 2020

rebase

4f6059c

yaooqinn commented Jul 28, 2020

View reviewed changes

cloud-fan closed this Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing if f overflow happens #29267

[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing if f overflow happens #29267

yaooqinn commented Jul 28, 2020 •

edited

Loading

yaooqinn Jul 28, 2020 •

edited

Loading

yaooqinn commented Jul 28, 2020

SparkQA commented Jul 28, 2020

SparkQA commented Jul 28, 2020

yaooqinn commented Jul 28, 2020

cloud-fan commented Jul 28, 2020

SparkQA commented Jul 28, 2020

[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing if f overflow happens #29267

[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing if f overflow happens #29267

Conversation

yaooqinn commented Jul 28, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

yaooqinn Jul 28, 2020 • edited Loading

Choose a reason for hiding this comment

yaooqinn commented Jul 28, 2020

SparkQA commented Jul 28, 2020

SparkQA commented Jul 28, 2020

yaooqinn commented Jul 28, 2020

cloud-fan commented Jul 28, 2020

SparkQA commented Jul 28, 2020

yaooqinn commented Jul 28, 2020 •

edited

Loading

yaooqinn Jul 28, 2020 •

edited

Loading