Describe the bug
SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST LIMIT 1 should return
the first row of the same ordered DISTINCT query without LIMIT. On
DataFusion 53.0.0, the full ordered query puts NULL first, but adding
LIMIT 1 returns the first non-null value for string, integer, and float
columns.
To Reproduce
Environment
OS: Ubuntu 24.04.1 x86_64
Python: 3.12.3
datafusion: 53.0.0
pyarrow: 24.0.0
Reproduce
import pyarrow as pa
from datafusion import SessionContext
ctx = SessionContext()
batch = pa.RecordBatch.from_pylist(
[{"v": None}, {"v": ""}, {"v": "a"}],
schema=pa.schema([pa.field("v", pa.string(), nullable=True)]),
)
ctx.register_record_batches("t0", [[batch]])
full_sql = "SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST"
top1_sql = full_sql + " LIMIT 1"
full = ctx.sql(full_sql).collect()[0].to_pydict()["v"]
top1 = ctx.sql(top1_sql).collect()[0].to_pydict()["v"]
print("full:", full)
print("top1:", top1)
assert top1 == full[:1]
Output
full: [None, '', 'a']
top1: [None]
Expected behavior
full: [None, '', 'a']
top1: ['']
Additional context
No response
Describe the bug
SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST LIMIT 1should returnthe first row of the same ordered
DISTINCTquery withoutLIMIT. OnDataFusion
53.0.0, the full ordered query putsNULLfirst, but addingLIMIT 1returns the first non-null value for string, integer, and floatcolumns.
To Reproduce
Environment
Reproduce
Output
Expected behavior
Additional context
No response