[SPARK-28018][SQL] Allow upcasting decimal to double/float#24849
[SPARK-28018][SQL] Allow upcasting decimal to double/float#24849gengliangwang wants to merge 1 commit intoapache:masterfrom
Conversation
| (precision - scale) >= (dt.precision - dt.scale) && scale >= dt.scale | ||
| case dt: IntegralType => | ||
| isWiderThan(DecimalType.forType(dt)) | ||
| // For DoubleType/FloatType, the value can be NaN, PositiveInfinity or NegativeInfinity. |
There was a problem hiding this comment.
This PR also added comment to explain why we can't upcast double/float to decimal.
|
@cloud-fan @ueshin @maropu @dongjoon-hyun @rxin @rdblue @gatorsmile @HyukjinKwon |
|
We shouldn't do this. Decimal guarantees no loss of precision (which is important for financial calculations). This type of cast is very dangerous. |
|
@rxin Thanks, +1. |
|
Test build #106411 has finished for PR 24849 at commit
|
|
At least PostgreSQL, Vertica, SQL Server and DB2 allow cast decimal to double/float. PostgreSQL: postgres=# select version();
version
----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
(1 row)
postgres=# create table test_spark_27856(c1 float8);
CREATE TABLE
postgres=# insert into test_spark_27856 values(cast(1123456789.01234567890123456 as numeric(38, 20)));
INSERT 0 1
postgres=# select c1, cast(1123456789.01234567890123456 as numeric(38, 20)) from test_spark_27856;
c1 | numeric
------------------+---------------------------------
1123456789.01235 | 1123456789.01234567890123456000
(1 row)Vertica: dbadmin=> select version();
version
------------------------------------
Vertica Analytic Database v9.1.1-0
(1 row)
dbadmin=> create table test_spark_27856(c1 float8);
CREATE TABLE
dbadmin=> insert into test_spark_27856 values(cast(1123456789.01234567890123456 as numeric(38, 20)));
OUTPUT
--------
1
(1 row)
dbadmin=> select c1, cast(1123456789.01234567890123456 as numeric(38, 20)) from test_spark_27856;
c1 | ?column?
------------------+---------------------------------
1123456789.01235 | 1123456789.01234567890123456000
(1 row)SQL Server: 1> select @@version
2> go
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Microsoft SQL Server 2019 (CTP3.0) - 15.0.1600.8 (X64)
May 17 2019 00:56:19
Copyright (C) 2019 Microsoft Corporation
Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) <X64>
(1 rows affected)
1> create table test_spark_27856(c1 real)
2> go
1> insert into test_spark_27856 values(cast(112345678901234567890.123456 as decimal(38,10)));
2> go
(1 rows affected)
1> select c1, cast(112345678901234567890.123456 as decimal(38,10)) from test_spark_27856
2> go
c1
-------------- ----------------------------------------
1.1234568E+20 112345678901234567890.1234560000
(1 rows affected)DB2: [db2inst1@2f3c821d36b7 ~]$ db2 "select service_level from table (sysproc.env_get_inst_info())"
SERVICE_LEVEL
--------------------------------------------------------------------------------------------------------------------------------
DB2 v10.5.0.5
1 record(s) selected.
[db2inst1@2f3c821d36b7 ~]$ db2 "create table test_spark_27856(c1 double)"
DB20000I The SQL command completed successfully.
[db2inst1@2f3c821d36b7 ~]$ db2 "insert into test_spark_27856 values(cast(112345678901234567890.123456 as decimal(31,10)))"
DB20000I The SQL command completed successfully.
[db2inst1@2f3c821d36b7 ~]$ db2 "select c1, cast(112345678901234567890.123456 as decimal(31,10)) from test_spark_27856"
C1 2
------------------------ ---------------------------------
+1.12345678901235E+020 112345678901234567890.1234560000
1 record(s) selected. |
|
I think this makes sense for table insertion. In postgres, we can even insert float/double to int/long column. Seems truncating is not that dangerous to table insertion. |
|
@wangyum @cloud-fan Thanks a lot for the investigation. The safe casting we are proposing in Spark 3.0 is a totally new behavior. It checks the data types in analyzer statically. However, considering most of the popular DBMS are using implicit casting, are we making it too strict for this specific case But there is no precision loss in the casting itself I think we should reconsider the proposal in this PR unless there is a better solution. BTW, I have been thinking about other solutions to this problem, e.g. only allow such casting for |
|
regarding |
float/double can be Infinity, I don't think allowing float/double -> int/long is a good idea.
I have tried creating a patch to allow such conversion for LocalRelation: https://github.com/apache/spark/compare/master...gengliangwang:allowUpcastDecimalLocalRelation?expand=1 I still suggest that we should simply allow upcasting tighter decimal to double/float. |
|
@wangyum, the problem is with implicit casts inserted by Spark, not explicit casts included in the query. |
|
@cloud-fan, |
This is not a safe cast. In the DSv2 sync we discussed options that would work. Why not go with those? |
@rdblue seems you misread the query.
|
I think it is explainable. There is no precision loss in the casting itself
As I remember, we didn't have a solution in DSv2 sync and we decide to come up with solutions offline. |
|
@cloud-fan, thanks for pointing that out. I see that it is inserting the cast. |
|
@gengliangwang, there were two suggestions in the DSv2 sync:
Both of those options are still valid. I'm surprised to see that decimals can be implicitly cast to float columns in most databases. Is this part of the SQL spec? If so, then I think we can go with that. Otherwise, I'd prefer one of the other options. |
Sorry I meant there is not a conclusion in the sync. I came up with the proposals in the sync, but I don't think they are good enough. |
|
@rdblue I have checked the Information technology — Database languages — SQL —Part 2 Foundation (SQL/Foundation) of the year 2011 In section 15.10: Effect of inserting tables into base tables So we can refer the rules In section 9.2: Store assignment From ISO SQL standard, "If a value of the declared type of T can be obtained from V by rounding or truncation, then the value of T is set to that value".
The new approach would be configurable. |


What changes were proposed in this pull request?
Currently, Spark only allows upcasting DecimalType to IntegralType or DecimalType.
This PR proposes: if the target DecimalType is tighter than DoubleType or FloatType, we can upcast it as well.
The upcasting matters because it blocks #24806. E.g, if there is a table "t" with only one column of double type,
The data type of value 10.0 is DecimalType(3, 1). In the current code, Spark can't upcast the decimal to the column of double type.
How was this patch tested?
Unit test