-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26218][SQL][Follow up] Fix the corner case when casting float to Integer. #27151
[SPARK-26218][SQL][Follow up] Fix the corner case when casting float to Integer. #27151
Conversation
3a12066
to
5e7b1ff
Compare
OK to test |
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
Hm, but:
Those values do correctly cast to an int. The cast does lose precision of course, but according to Scala/Java, the result is correct, no? |
Yes, the behavior is consistent with Scala/Java, it seems that if the value exceeds Int.Max, cast it to Int is Int.Max. |
Is this code path only used for ANSI mode? and is that defined by ANSI? I wouldn't expect the result of the cast to retain that much accuracy. You're not in general going to get the same int out when the int is large, after the round-trip - right? |
@@ -121,8 +121,8 @@ object FloatExactNumeric extends FloatIsFractional { | |||
private def overflowException(x: Float, dataType: String) = | |||
throw new ArithmeticException(s"Casting $x to $dataType causes overflow") | |||
|
|||
private val intUpperBound = Int.MaxValue.toFloat | |||
private val intLowerBound = Int.MinValue.toFloat | |||
private val intUpperBound = Int.MaxValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm also not clear how this helps - won't it just promote to a float in the comparison below anyway?
Do we want floorDiv, etc, instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math.floor
returns double, so it's promoted to double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's true
scala> Int.MaxValue.toDouble
res2: Double = 2.147483647E9
scala> Int.MaxValue.toFloat.toDouble
res3: Double = 2.147483648E9
private val intUpperBound = Int.MaxValue.toFloat | ||
private val intLowerBound = Int.MinValue.toFloat | ||
private val intUpperBound = Int.MaxValue | ||
private val intLowerBound = Int.MinValue | ||
private val longUpperBound = Long.MaxValue.toFloat | ||
private val longLowerBound = Long.MinValue.toFloat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we can remove toFloat
here too? also the toDouble
in DoubleExactNumeric
. They will be promoted anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. It looks more gracefully with consistent style.
Yes, it would be only invoked for ANSI mode. |
ok to test |
Test build #4990 has finished for PR 27151 at commit
|
retest this please |
Test build #116483 has finished for PR 27151 at commit
|
Test build #116476 has finished for PR 27151 at commit
|
Thanks for your review. May we need close this PR. Thanks for your review again. |
We are talking about SQL semantic not IEEE floating number definition. For pgsql
I think the fix makes sense. |
(OK I'm into the idea, yes) |
ok to test |
@turboFei can you fix the conflicts? |
If you look at the pgsql result, the new result is actually corrected: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/float4.out#L299 Can you just re-generate the answer files? You can look at the doc of |
I will do it later, thanks. |
477408d
to
5ee6ba1
Compare
Test build #117851 has finished for PR 27151 at commit
|
Test build #117853 has finished for PR 27151 at commit
|
unfortunately it conflicts. Can you fix it by git rebase? thanks! |
revert code style code style save
34ddb1e
to
0534eb6
Compare
Except Int.MaxValue, the max Integer i, which satisfies i.toFloat.toInt == i, is 2147483520. And it has been added into the UT(query-34/float4.sql). So, I just remove query-35. |
@@ -106,7 +106,6 @@ SELECT smallint(float('32767.6')); | |||
SELECT smallint(float('-32768.4')); | |||
SELECT smallint(float('-32768.6')); | |||
SELECT int(float('2147483520')); | |||
SELECT int(float('2147483647')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are copied from pgsql and we shouldn't change it. We just need to re-generate the answer file and keep the actual result as it is.
@@ -106,7 +106,7 @@ SELECT smallint(float('32767.6')); | |||
SELECT smallint(float('-32768.4')); | |||
SELECT smallint(float('-32768.6')); | |||
SELECT int(float('2147483520')); | |||
SELECT int(float('2147483647')); | |||
SELECT int(float('2147483392')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's NOT change the pgsql tests. They are used to verify the difference between Spark and pgsql. We should respect the test result, whatever it is.
Test build #117885 has finished for PR 27151 at commit
|
Test build #117898 has finished for PR 27151 at commit
|
Test build #117911 has finished for PR 27151 at commit
|
The last 2 commits are empty and just to trigger jenkins. The last effective commit has passed tests. I'm merging it to master/3.0, thanks! |
…to Integer ### What changes were proposed in this pull request? When spark.sql.ansi.enabled is true, for the statement: ``` select cast(cast(2147483648 as Float) as Integer) //result is 2147483647 ``` Its result is 2147483647 and does not throw `ArithmeticException`. The root cause is that, the below code does not work for some corner cases. https://github.com/apache/spark/blob/94fc0e3235162afc6038019eed6ec546e3d1983e/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala#L129-L141 For example: ![image](https://user-images.githubusercontent.com/6757692/72074911-badfde80-332d-11ea-963e-2db0e43c33e8.png) In this PR, I fix it by comparing Math.floor(x) with Int.MaxValue directly. ### Why are the changes needed? Result corrupt. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Unit test. Closes #27151 from turboFei/SPARK-26218-follow-up-int-overflow. Authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6d507b4) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Test build #117918 has finished for PR 27151 at commit
|
…ting float to Integer ### What changes were proposed in this pull request? This is a followup of [#27151](#27151). It fixes the same issue for the codegen path. ### Why are the changes needed? Result corrupt. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added Unit test. Closes #30585 from luluorta/SPARK-26218. Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
When spark.sql.ansi.enabled is true, for the statement:
Its result is 2147483647 and does not throw
ArithmeticException
.The root cause is that, the below code does not work for some corner cases.
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala
Lines 129 to 141 in 94fc0e3
For example:
In this PR, I fix it by comparing Math.floor(x) with Int.MaxValue directly.
Why are the changes needed?
Result corrupt.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added Unit test.