Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32424][SQL][3.0] Fix silent data change for timestamp parsing if f overflow happens #29267

Closed
wants to merge 1 commit into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jul 28, 2020

This PR backports d315ebf to branch-3.0

What changes were proposed in this pull request?

When using Seconds.toMicros API to convert epoch seconds to microseconds,

 /**
     * Equivalent to
     * {@link #convert(long, TimeUnit) MICROSECONDS.convert(duration, this)}.
     * @param duration the duration
     * @return the converted duration,
     * or {@code Long.MIN_VALUE} if conversion would negatively
     * overflow, or {@code Long.MAX_VALUE} if it would positively overflow.
     */

This PR change it to Math.multiplyExact(epochSeconds, MICROS_PER_SECOND)

Why are the changes needed?

fix silent data change between 3.x and 2.x

 ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200722   bin/spark-sql -S -e "select to_timestamp('300000', 'y');"
+294247-01-10 12:00:54.775807
 kentyao@hulk  ~/Downloads/spark/spark-2.4.5-bin-hadoop2.7  bin/spark-sql -S  -e "select to_timestamp('300000', 'y');"
284550-10-19 15:58:1010.448384

Does this PR introduce any user-facing change?

Yes, we will raise ArithmeticException instead of giving the wrong answer if overflow.

How was this patch tested?

add unit test

@@ -146,7 +146,3 @@ select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampF
select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));
select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy'));

select from_unixtime(1, 'yyyyyyyyyyy-MM-dd');
Copy link
Member Author

@yaooqinn yaooqinn Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked these tests which were added to test cases that exceed 10-'y'. It's safe to remove them now because they will fail starting from 7-'y' and we have already covered these in both datetime-parsing.sql and datetime-formatting.sql

@yaooqinn
Copy link
Member Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126686 has finished for PR 29267 at commit b612f88.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126690 has finished for PR 29267 at commit 4f6059c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@cloud-fan
Copy link
Contributor

github action passes, thanks, merging to 3.0!

cloud-fan pushed a commit that referenced this pull request Jul 28, 2020
…if f overflow happens

This PR backports d315ebf to branch-3.0
### What changes were proposed in this pull request?

When using `Seconds.toMicros` API to convert epoch seconds to microseconds,

```scala
 /**
     * Equivalent to
     * {link #convert(long, TimeUnit) MICROSECONDS.convert(duration, this)}.
     * param duration the duration
     * return the converted duration,
     * or {code Long.MIN_VALUE} if conversion would negatively
     * overflow, or {code Long.MAX_VALUE} if it would positively overflow.
     */
```
This PR change it to `Math.multiplyExact(epochSeconds, MICROS_PER_SECOND)`

### Why are the changes needed?

fix silent data change between 3.x and 2.x
```
 ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200722   bin/spark-sql -S -e "select to_timestamp('300000', 'y');"
+294247-01-10 12:00:54.775807
```
```
 kentyaohulk  ~/Downloads/spark/spark-2.4.5-bin-hadoop2.7  bin/spark-sql -S  -e "select to_timestamp('300000', 'y');"
284550-10-19 15:58:1010.448384
```

### Does this PR introduce _any_ user-facing change?

Yes, we will raise `ArithmeticException` instead of giving the wrong answer if overflow.

### How was this patch tested?

add unit test

Closes #29267 from yaooqinn/SPARK-32424-30.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this Jul 28, 2020
@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126703 has finished for PR 29267 at commit 4f6059c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants