DataFusion drops grouped MIN/MAX rows with NULL sort keys under ORDER BY + LIMIT

### Describe the bug

## Version

- Python: `3.12.3`
- Python package: `datafusion==53.0.0`
- Arrow input via `pyarrow==24.0.0`
- Platform used for reproduction: Ubuntu 24.04 x86_64, Linux `6.17.0-23-generic`

## Minimal Reproducer

```python
import pyarrow as pa
from datafusion import SessionContext

ctx = SessionContext()
batch = pa.RecordBatch.from_arrays(
    [
        pa.array(["a"], type=pa.string()),
        pa.array([None], type=pa.int64()),
    ],
    schema=pa.schema(
        [
            pa.field("g", pa.string(), nullable=True),
            pa.field("x", pa.int64(), nullable=True),
        ]
    ),
)
ctx.register_record_batches("t0", [[batch]])

base = "SELECT g, MIN(x) AS min_x FROM t0 GROUP BY g"
print(ctx.sql(f"SELECT min_x FROM ({base}) q LIMIT 20").to_pandas())
print(ctx.sql(f"SELECT min_x FROM ({base}) q ORDER BY min_x ASC NULLS LAST LIMIT 20").to_pandas())
```

## Actual Result

Standalone reproducer output:

```text
datafusion=53.0.0
pyarrow=24.0.0
control:
   min_x
0    NaN
top-k:
Empty DataFrame
Columns: [min_x]
Index: []
top-k record-batch rows=0
AssertionError: DataFusion dropped the group whose aggregate sort key is NULL
```

The control query returns the grouped row with a NULL aggregate result:

```text
   min_x
0    NaN
```

The query with `ORDER BY min_x ASC NULLS LAST LIMIT 20` returns no rows:

```text
Empty DataFrame
Columns: [min_x]
Index: []
```

With two groups, one having `MIN(x) = NULL` and one having `MIN(x) = 5`, the `ORDER BY ... LIMIT 20` query returns only the non-NULL sort-key group. The limit is larger than the number of groups, so no grouped row should be removed.



## Expected Result

Both single-group queries should return one row. `GROUP BY g` forms a group for `g = 'a'`; `MIN(x)` is NULL because all values in that group are NULL. Ordering with `NULLS LAST` and a limit larger than the result cardinality should preserve the row.



## Additional Context

 I found this with a cross-backend differential test. In the generated case, pandas, Polars eager/lazy, DuckDB, SQLite, and an independent DSL reference all preserved the grouped row whose aggregate sort key is NULL. DataFusion was the only backend that returned an empty result.

 I also ran targeted follow-up checks for the same pattern. The issue consistently reproduced for NULL `MIN(x)` ordered ascending and NULL `MAX(x)` ordered descending, while nearby controls such as plain NULL column top-k, NULL group key with non-NULL aggregate, and NULL `SUM(x)` / `AVG(x)` aggregate top-k preserved the row.

 I can share the standalone reproducer, boundary-check script, and full generated artifacts if useful.

### To Reproduce

_No response_

### Expected behavior

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFusion drops grouped MIN/MAX rows with NULL sort keys under ORDER BY + LIMIT #22190

Describe the bug

Version

Minimal Reproducer

Actual Result

Expected Result

Additional Context

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DataFusion drops grouped MIN/MAX rows with NULL sort keys under ORDER BY + LIMIT #22190

Description

Describe the bug

Version

Minimal Reproducer

Actual Result

Expected Result

Additional Context

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions