Skip to content

Incorrect NaN comparison #17557

@kosiew

Description

@kosiew

Describe the bug

Comparisons with NaN values behave incorrectly

When a table contains a = NaN and b = 1.0, expressions such as (a < b) and (a > b) produce results that differ from IEEE-754

To Reproduce

Python (DataFusion API)

import numpy as np
import pyarrow as pa
import datafusion as dfn
from datafusion import col

table = pa.table({"a": [np.nan], "b": [1.0]})
ctx = dfn.SessionContext()
df = ctx.from_arrow(table)

result = df.select(
    (col("a") < col("b")).alias("less"),
    (col("a") > col("b")).alias("greater"),
)
print(result.to_pandas())

datafusion-cli (SQL)

Use the CLI and create a NaN value (one way is using sqrt(-1.0)):

-- start the CLI
-- datafusion-cli

WITH t AS (
  SELECT sqrt(CAST(-1.0 AS DOUBLE)) AS a, CAST(1.0 AS DOUBLE) AS b
)
SELECT
  (a < b)  AS less,
  (a > b)  AS greater
FROM t;
SELECT
  (a < b)  AS less,
  (a > b)  AS greater
FROM t;

+-------+---------+
| less  | greater |
+-------+---------+
| false | true    |
+-------+---------+
1 row(s) fetched.

Alternatively (VALUES form):

SELECT
  (a < b) AS less,
  (a > b) AS greater
FROM (
  VALUES (sqrt(CAST(-1.0 AS DOUBLE)), CAST(1.0 AS DOUBLE))
) AS t(a, b);

Expected behavior

Both the Python API and the datafusion-cli should yield identical results and follow standard SQL semantics for comparisons with NaN:

  • (NaN < 1.0) → false
  • (NaN > 1.0) → false

Results should be deterministic and consistent across interfaces.

Additional context

apache/datafusion-python#1233

IEEE 754 (floating-point standard) — specifies that NaN is unordered: it is not less-than, greater-than, or equal to any value (including itself). Implementations should treat comparisons with NaN as unordered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions