-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Comparisons with NaN values behave incorrectly
When a table contains a = NaN and b = 1.0, expressions such as (a < b) and (a > b) produce results that differ from IEEE-754
To Reproduce
Python (DataFusion API)
import numpy as np
import pyarrow as pa
import datafusion as dfn
from datafusion import col
table = pa.table({"a": [np.nan], "b": [1.0]})
ctx = dfn.SessionContext()
df = ctx.from_arrow(table)
result = df.select(
(col("a") < col("b")).alias("less"),
(col("a") > col("b")).alias("greater"),
)
print(result.to_pandas())datafusion-cli (SQL)
Use the CLI and create a NaN value (one way is using sqrt(-1.0)):
-- start the CLI
-- datafusion-cli
WITH t AS (
SELECT sqrt(CAST(-1.0 AS DOUBLE)) AS a, CAST(1.0 AS DOUBLE) AS b
)
SELECT
(a < b) AS less,
(a > b) AS greater
FROM t;SELECT
(a < b) AS less,
(a > b) AS greater
FROM t;
+-------+---------+
| less | greater |
+-------+---------+
| false | true |
+-------+---------+
1 row(s) fetched.
Alternatively (VALUES form):
SELECT
(a < b) AS less,
(a > b) AS greater
FROM (
VALUES (sqrt(CAST(-1.0 AS DOUBLE)), CAST(1.0 AS DOUBLE))
) AS t(a, b);Expected behavior
Both the Python API and the datafusion-cli should yield identical results and follow standard SQL semantics for comparisons with NaN:
(NaN < 1.0)→ false(NaN > 1.0)→ false
Results should be deterministic and consistent across interfaces.
Additional context
IEEE 754 (floating-point standard) — specifies that NaN is unordered: it is not less-than, greater-than, or equal to any value (including itself). Implementations should treat comparisons with NaN as unordered.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working