[SPARK-26218][SQL][Follow up] Fix the corner case when casting float to Integer. #27151

turboFei · 2020-01-09T14:19:02Z

What changes were proposed in this pull request?

When spark.sql.ansi.enabled is true, for the statement:

select cast(cast(2147483648 as Float) as Integer) //result is 2147483647

Its result is 2147483647 and does not throw ArithmeticException.

The root cause is that, the below code does not work for some corner cases.

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala

Lines 129 to 141 in 94fc0e3

    
           override def toInt(x: Float): Int = { 
        
             // When casting floating values to integral types, Spark uses the method `Numeric.toInt` 
        
             // Or `Numeric.toLong` directly. For positive floating values, it is equivalent to `Math.floor`; 
        
             // for negative floating values, it is equivalent to `Math.ceil`. 
        
             // So, we can use the condition `Math.floor(x) <= upperBound && Math.ceil(x) >= lowerBound` 
        
             // to check if the floating value x is in the range of an integral type after rounding. 
        
             // This condition applies to converting Float/Double value to any integral types. 
        
             if (Math.floor(x) <= intUpperBound && Math.ceil(x) >= intLowerBound) { 
        
               x.toInt 
        
             } else { 
        
               overflowException(x, "int") 
        
             } 
        
           }

For example:

In this PR, I fix it by comparing Math.floor(x) with Int.MaxValue directly.

Why are the changes needed?

Result corrupt.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Unit test.

turboFei · 2020-01-09T14:22:53Z

For long:

turboFei · 2020-01-09T14:25:22Z

For pgsql:

For teradata:

turboFei · 2020-01-09T14:27:56Z

cc @wangyum @cloud-fan @gengliangwang

cloud-fan · 2020-01-09T14:54:50Z

OK to test

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

srowen · 2020-01-09T15:20:01Z

Hm, but:

scala> (Int.MaxValue.toFloat+1).toInt
res13: Int = 2147483647

scala> (Int.MaxValue.toFloat+1).toInt == Int.MaxValue
res14: Boolean = true

Those values do correctly cast to an int. The cast does lose precision of course, but according to Scala/Java, the result is correct, no?

turboFei · 2020-01-09T15:25:31Z

Hm, but:
scala> (Int.MaxValue.toFloat+1).toInt
res13: Int = 2147483647

scala> (Int.MaxValue.toFloat+1).toInt == Int.MaxValue
res14: Boolean = true
Those values do correctly cast to an int. The cast does lose precision of course, but according to Scala/Java, the result is correct, no?

Yes, the behavior is consistent with Scala/Java, it seems that if the value exceeds Int.Max, cast it to Int is Int.Max.
But when spark.sql.ansi.enabled is true, we should throw exception to keep consistent with ansi.

srowen · 2020-01-09T15:28:08Z

Is this code path only used for ANSI mode? and is that defined by ANSI? I wouldn't expect the result of the cast to retain that much accuracy. You're not in general going to get the same int out when the int is large, after the round-trip - right?

srowen · 2020-01-09T15:29:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala

@@ -121,8 +121,8 @@ object FloatExactNumeric extends FloatIsFractional {
  private def overflowException(x: Float, dataType: String) =
    throw new ArithmeticException(s"Casting $x to $dataType causes overflow")

-  private val intUpperBound = Int.MaxValue.toFloat
-  private val intLowerBound = Int.MinValue.toFloat
+  private val intUpperBound = Int.MaxValue


Hm, I'm also not clear how this helps - won't it just promote to a float in the comparison below anyway?
Do we want floorDiv, etc, instead?

Math.floor returns double, so it's promoted to double

As mentioned by cloud-fan, it seems that cast int to float, then cast to double is not same with casting it to double directly.

it's true

scala> Int.MaxValue.toDouble res2: Double = 2.147483647E9 scala> Int.MaxValue.toFloat.toDouble res3: Double = 2.147483648E9

cloud-fan · 2020-01-09T15:56:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala

-  private val intUpperBound = Int.MaxValue.toFloat
-  private val intLowerBound = Int.MinValue.toFloat
+  private val intUpperBound = Int.MaxValue
+  private val intLowerBound = Int.MinValue
  private val longUpperBound = Long.MaxValue.toFloat
  private val longLowerBound = Long.MinValue.toFloat


seems we can remove toFloat here too? also the toDouble in DoubleExactNumeric. They will be promoted anyway.

Agree. It looks more gracefully with consistent style.

turboFei · 2020-01-09T16:06:36Z

Is this code path only used for ANSI mode? and is that defined by ANSI? I wouldn't expect the result of the cast to retain that much accuracy. You're not in general going to get the same int out when the int is large, after the round-trip - right?

Yes, it would be only invoked for ANSI mode.
Ansi would throw exception when overflow.
I have attached the relative behaviors of pgsql and teradata above.

cloud-fan · 2020-01-10T08:47:45Z

ok to test

SparkQA · 2020-01-10T09:57:16Z

Test build #4990 has finished for PR 27151 at commit 477408d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-01-10T10:45:07Z

retest this please

SparkQA · 2020-01-10T12:05:02Z

Test build #116483 has finished for PR 27151 at commit 477408d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-10T12:40:41Z

Test build #116476 has finished for PR 27151 at commit 477408d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

turboFei · 2020-01-10T17:31:00Z

Thanks for your review. May we need close this PR.
Float is not an accurate type and its scale is only 8 bits, which means that 21234567890f == 21234567800f.
And the Math.floor(floatValue) operation would also cast this float value to a double value, so it's reasonable to compare with Math.floor(floatValue) with Int.MaxValue.toFloat.

Thanks for your review again.

cloud-fan · 2020-01-12T07:17:32Z

We are talking about SQL semantic not IEEE floating number definition.

For pgsql

cloud0fan=# SELECT CAST(CAST(2147483648 as FLOAT) as Int);
ERROR:  integer out of range

I think the fix makes sense.

srowen · 2020-01-12T14:17:49Z

(OK I'm into the idea, yes)

cloud-fan · 2020-01-13T03:41:54Z

ok to test

cloud-fan · 2020-01-13T03:42:04Z

@turboFei can you fix the conflicts?

cloud-fan · 2020-02-04T05:09:10Z

If you look at the pgsql result, the new result is actually corrected: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/float4.out#L299

Can you just re-generate the answer files? You can look at the doc of SQLQueryTestSuite to see how to do it.

turboFei · 2020-02-04T05:32:04Z

I will do it later, thanks.

SparkQA · 2020-02-04T19:03:19Z

Test build #117851 has finished for PR 27151 at commit 5ee6ba1.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2020-02-04T21:16:02Z

Test build #117853 has finished for PR 27151 at commit 34ddb1e.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-05T03:36:08Z

unfortunately it conflicts. Can you fix it by git rebase? thanks!

revert code style code style save

turboFei · 2020-02-05T04:04:31Z

Except Int.MaxValue, the max Integer i, which satisfies i.toFloat.toInt == i, is 2147483520.

And it has been added into the UT(query-34/float4.sql). So, I just remove query-35.

cloud-fan · 2020-02-05T05:42:50Z

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/float4.sql

@@ -106,7 +106,6 @@ SELECT smallint(float('32767.6'));
 SELECT smallint(float('-32768.4'));
 SELECT smallint(float('-32768.6'));
 SELECT int(float('2147483520'));
-SELECT int(float('2147483647'));


These tests are copied from pgsql and we shouldn't change it. We just need to re-generate the answer file and keep the actual result as it is.

cloud-fan · 2020-02-05T07:24:52Z

sql/core/src/test/resources/sql-tests/inputs/postgreSQL/float4.sql

@@ -106,7 +106,7 @@ SELECT smallint(float('32767.6'));
 SELECT smallint(float('-32768.4'));
 SELECT smallint(float('-32768.6'));
 SELECT int(float('2147483520'));
-SELECT int(float('2147483647'));
+SELECT int(float('2147483392'));


let's NOT change the pgsql tests. They are used to verify the difference between Spark and pgsql. We should respect the test result, whatever it is.

SparkQA · 2020-02-05T08:05:02Z

Test build #117885 has finished for PR 27151 at commit 0534eb6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-05T08:05:02Z

Test build #117898 has finished for PR 27151 at commit ccad7e6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-05T12:52:01Z

Test build #117911 has finished for PR 27151 at commit 0ac6e47.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-05T13:24:11Z

The last 2 commits are empty and just to trigger jenkins. The last effective commit has passed tests. I'm merging it to master/3.0, thanks!

…to Integer ### What changes were proposed in this pull request? When spark.sql.ansi.enabled is true, for the statement: ``` select cast(cast(2147483648 as Float) as Integer) //result is 2147483647 ``` Its result is 2147483647 and does not throw `ArithmeticException`. The root cause is that, the below code does not work for some corner cases. https://github.com/apache/spark/blob/94fc0e3235162afc6038019eed6ec546e3d1983e/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala#L129-L141 For example: ![image](https://user-images.githubusercontent.com/6757692/72074911-badfde80-332d-11ea-963e-2db0e43c33e8.png) In this PR, I fix it by comparing Math.floor(x) with Int.MaxValue directly. ### Why are the changes needed? Result corrupt. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Unit test. Closes #27151 from turboFei/SPARK-26218-follow-up-int-overflow. Authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6d507b4) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

SparkQA · 2020-02-05T14:04:48Z

Test build #117918 has finished for PR 27151 at commit 4d47a49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…ting float to Integer ### What changes were proposed in this pull request? This is a followup of [#27151](#27151). It fixes the same issue for the codegen path. ### Why are the changes needed? Result corrupt. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added Unit test. Closes #30585 from luluorta/SPARK-26218. Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

turboFei changed the title ~~SPARK-26218: [Follow up] throw exception on overflow for integers~~ [SPARK-26218][Follow up] Fix the conner case when cast float to Integer. Jan 9, 2020

turboFei changed the title ~~[SPARK-26218][Follow up] Fix the conner case when cast float to Integer.~~ [SPARK-26218][Follow up] Fix the conner case when casting float to Integer. Jan 9, 2020

turboFei changed the title ~~[SPARK-26218][Follow up] Fix the conner case when casting float to Integer.~~ [SPARK-26218][Follow up] Fix the corner case when casting float to Integer. Jan 9, 2020

turboFei force-pushed the SPARK-26218-follow-up-int-overflow branch from 3a12066 to 5e7b1ff Compare January 9, 2020 14:37

cloud-fan reviewed Jan 9, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Jan 9, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Jan 9, 2020

View reviewed changes

srowen reviewed Jan 9, 2020

View reviewed changes

cloud-fan reviewed Jan 9, 2020

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-26218][Follow up] Fix the corner case when casting float to Integer.~~ [SPARK-26218][SQL][Follow up] Fix the corner case when casting float to Integer. Jan 9, 2020

turboFei closed this Jan 12, 2020

turboFei reopened this Jan 12, 2020

turboFei force-pushed the SPARK-26218-follow-up-int-overflow branch from 477408d to 5ee6ba1 Compare February 4, 2020 17:09

turboFei added 5 commits February 5, 2020 11:39

SPARK-26218: [Follow up] throw exception on overflow for integers

5647990

revert code style code style save

address comments

43866bc

fix style

e3f9191

fix ut

4a78aba

fix ut

0534eb6

turboFei force-pushed the SPARK-26218-follow-up-int-overflow branch from 34ddb1e to 0534eb6 Compare February 5, 2020 03:58

cloud-fan reviewed Feb 5, 2020

View reviewed changes

ut

ccad7e6

cloud-fan reviewed Feb 5, 2020

View reviewed changes

fix ut

0ac6e47

turboFei added 2 commits February 5, 2020 16:12

trigger jenkins

356284d

trigger

4d47a49

cloud-fan closed this in 6d507b4 Feb 5, 2020

dongjoon-hyun added the SQL label Feb 5, 2020

luluorta mentioned this pull request Dec 3, 2020

[SPARK-26218][SQL][FOLLOW UP] Fix the corner case of codegen when casting float to Integer #30585

Closed

	override def toInt(x: Float): Int = {
	// When casting floating values to integral types, Spark uses the method `Numeric.toInt`
	// Or `Numeric.toLong` directly. For positive floating values, it is equivalent to `Math.floor`;
	// for negative floating values, it is equivalent to `Math.ceil`.
	// So, we can use the condition `Math.floor(x) <= upperBound && Math.ceil(x) >= lowerBound`
	// to check if the floating value x is in the range of an integral type after rounding.
	// This condition applies to converting Float/Double value to any integral types.
	if (Math.floor(x) <= intUpperBound && Math.ceil(x) >= intLowerBound) {
	x.toInt
	} else {
	overflowException(x, "int")
	}
	}

[SPARK-26218][SQL][Follow up] Fix the corner case when casting float to Integer. #27151

[SPARK-26218][SQL][Follow up] Fix the corner case when casting float to Integer. #27151

Conversation

turboFei commented Jan 9, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

turboFei commented Jan 9, 2020

turboFei commented Jan 9, 2020

turboFei commented Jan 9, 2020

cloud-fan commented Jan 9, 2020

srowen commented Jan 9, 2020

turboFei commented Jan 9, 2020

srowen commented Jan 9, 2020

srowen Jan 9, 2020

Choose a reason for hiding this comment

cloud-fan Jan 9, 2020

Choose a reason for hiding this comment

turboFei Jan 9, 2020 • edited Loading

Choose a reason for hiding this comment

cloud-fan Jan 10, 2020

Choose a reason for hiding this comment

cloud-fan Jan 9, 2020

Choose a reason for hiding this comment

turboFei Jan 9, 2020

Choose a reason for hiding this comment

turboFei commented Jan 9, 2020 • edited Loading

cloud-fan commented Jan 10, 2020

SparkQA commented Jan 10, 2020

cloud-fan commented Jan 10, 2020

SparkQA commented Jan 10, 2020

SparkQA commented Jan 10, 2020

turboFei commented Jan 10, 2020 • edited Loading

cloud-fan commented Jan 12, 2020

srowen commented Jan 12, 2020

cloud-fan commented Jan 13, 2020

cloud-fan commented Jan 13, 2020

cloud-fan commented Feb 4, 2020

turboFei commented Feb 4, 2020

SparkQA commented Feb 4, 2020

SparkQA commented Feb 4, 2020

cloud-fan commented Feb 5, 2020

turboFei commented Feb 5, 2020

cloud-fan Feb 5, 2020

Choose a reason for hiding this comment

cloud-fan Feb 5, 2020

Choose a reason for hiding this comment

SparkQA commented Feb 5, 2020

SparkQA commented Feb 5, 2020

SparkQA commented Feb 5, 2020

cloud-fan commented Feb 5, 2020

SparkQA commented Feb 5, 2020

turboFei commented Jan 9, 2020 •

edited

Loading

turboFei Jan 9, 2020 •

edited

Loading

turboFei commented Jan 9, 2020 •

edited

Loading

turboFei commented Jan 10, 2020 •

edited

Loading