Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make modulos with negative float zero compat with other engines #11051

Open
comphead opened this issue Jun 21, 2024 · 8 comments
Open

Make modulos with negative float zero compat with other engines #11051

comphead opened this issue Jun 21, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@comphead
Copy link
Contributor

comphead commented Jun 21, 2024

Describe the bug

DF returns NaN in modulo query below and is not compliant with major engines:

> select 1 % -0.0;
+------------------------+
| Int64(1) % Float64(-0) |
+------------------------+
| NaN                    |
+------------------------+

PG, Trino fails on such query,
DuckDB, Spark returns NULL

@alamb @viirya @andygrove what would be expected behavior from DataFusion?
We can also introduce a config to be PG compliant or PG/Trino compliant?

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@parthchandra
Copy link

Surprising that Postgres fails. According to the docs, Infinity, -Infinity, and NaN are legitimate floating point values. This is also the expected behavior per the ANSI SQL standard, afaik.
NULL is not the right thing to return here.

@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

Postgres fails:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ psql -h localhost -U postgres
psql (14.12 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
         Some psql features might not work.
Type "help" for help.

postgres=# select 1 % -0.0;
ERROR:  division by zero

@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

I don't have a strong preference one way or the other, personally

@comphead
Copy link
Contributor Author

I don't have a strong preference one way or the other, personally

I would go with Spark/DuckDB way :) if there is no other strong opinions, I'll make a fix soon

@comphead comphead self-assigned this Jun 21, 2024
@parthchandra
Copy link

ERROR:  division by zero

I wasn't doubting it. :) But I am surprised.

@comphead I'll take back my 'NULL is not the right thing to return here' comment. It's better than failing for most users.
Just one thing to keep in mind - Spark sql has a isnan function which should be consistent with this.

@comphead
Copy link
Contributor Author

Thanks @parthchandra for the suggestion, I just checked we should be fine.

scala> spark.sql("select 1 % -0.0").show(false)
+---------+
|(1 % 0.0)|
+---------+
|NULL     |
+---------+

scala> spark.sql("select isnan(1 % -0.0)").show(false)
+----------------+
|isnan((1 % 0.0))|
+----------------+
|false           |
+----------------+

@jonahgao
Copy link
Member

arrow-rs follows the rules of IEEE 754. If we intend to be compatible with other engines, many cases will also need to be modified, such as division by zero.

DataFusion CLI v39.0.0
> select 1.0/0.0;
+-------------------------+
| Float64(1) / Float64(0) |
+-------------------------+
| inf                     |
+-------------------------+


> select 0.0/0.0;
+-------------------------+
| Float64(0) / Float64(0) |
+-------------------------+
| NaN                     |
+-------------------------+

In PostgreSQL:

postgres=# select 1.0/0.0;
ERROR:  division by zero
postgres=# select 0.0/0.0;
ERROR:  division by zero

@comphead
Copy link
Contributor Author

This is a good point @jonahgao so this gives me even more confidence to introduce a new DF param which controls should DF go with IEEE 754

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants