Follow-up enhancements to the row-group / file stats-based reorder landed in #21956. Listed here so they can be picked up independently.
Today PreparedAccessPlan::reorder_by_statistics and ParquetSource::reorder_files both:
- only look at
sort_order.first() — secondary sort keys are ignored
- only fire when the leading expression is a plain
Column — function-wrapped sorts (date_trunc('day', ts) DESC, CAST(x AS Date) DESC, ceil(value) DESC, …) skip the stats-based reorder
Four follow-ups, by priority:
P0 — Multi-column lexicographic reorder by per-column min
For ORDER BY a ASC, b DESC LIMIT N where both a and b are plain columns in the file schema, lex-sort row groups (and files) by (min(a) ASC, min(b) DESC) instead of just min(a) ASC. Parquet stores per-column stats, so the data is already there — switch from arrow::compute::sort_to_indices over a single stat_mins to lexsort_to_indices over per-column min arrays, with per-column SortOptions.
Most useful when the leading sort column has low cardinality (e.g. ORDER BY country, ts DESC LIMIT 100 — RGs that tie on country.min should be broken by ts.min).
Estimated cost: ~30-50 lines + a few unit tests.
P1 — Function-wrapped reorder via leaf-column extraction
For sort expressions like date_trunc('day', ts) DESC or CAST(x AS Date) DESC — single leaf Column wrapped in a monotonic function chain. Parquet has no stats keyed by the function expression, but min(f(col)) = f(min(col)) when f is monotonic, so min(col)'s order is a valid proxy.
Approach:
- Walk the sort expression's
PhysicalExpr::children() to find the unique leaf Column.
- Verify monotonicity through the wrapper chain —
EquivalenceProperties already does this in ordering_satisfy; the same machinery should be reusable here.
- Drive
reorder_by_statistics off the leaf column's stats.
DataFusion's existing evaluate_bounds / monotonicity inference is the right tool — happy to discuss the cleanest hook point.
P1 — Detect "already-sorted" sources to skip redundant reorder
When the source's declared output_ordering already matches the request modulo NULL placement (and the column is either non-nullable or known null-free), try_pushdown_sort currently returns Inexact instead of Exact because EquivalenceProperties::ordering_satisfy is strict about nulls_first / nulls_last.
Concrete example surfaced while diagnosing CI snapshot drift on topk.slt's partial_sorted case in #21956:
- Source declares
output_ordering=[number DESC] — DataFusion's default makes that number DESC NULLS FIRST.
- Query asks for
ORDER BY number DESC NULLS LAST LIMIT 3.
ordering_satisfy returns false on the NULL position mismatch, so the Phase 3 column-in-schema branch fires, setting sort_order_for_reorder=[number DESC NULLS LAST] and reverse_row_groups=true.
- At runtime the opener sorts row groups ASC-by-min, then reverses iteration — two operations that very nearly cancel out, returning the row groups to (approximately) their on-disk DESC order. Correct, but wasteful, and the surrounding
SortExec is kept even though the data is already in the requested order on disk.
A softer "Exact-equivalent" check would help: when the source ordering matches the request on (expr, descending) and either the column is non-nullable or null-count statistics confirm no NULLs in the sort column, try_pushdown_sort could short-circuit to Exact so the optimizer drops the SortExec outright.
P2 — Compound-expression reorder via interval propagation (probably defer)
For sorts on expressions involving more than one column (a + b DESC, a * 2 - b DESC, …) parquet stats give per-column intervals; combining them needs operator-specific interval arithmetic ([min(a)+min(b), max(a)+max(b)] for +, sign-aware for *, etc.). DataFusion has the building blocks (evaluate_bounds), but real-world workloads rarely sort by compound expressions and the integration cost is non-trivial. Listed for completeness; probably not worth doing unless a concrete benchmark motivates it.
cc @adriangb @alamb — wanted to keep the parent PR focused on the core mechanism + the review feedback you raised; these were good directions that came out of the design discussion but seemed better as separate issues.
Follow-up enhancements to the row-group / file stats-based reorder landed in #21956. Listed here so they can be picked up independently.
Today
PreparedAccessPlan::reorder_by_statisticsandParquetSource::reorder_filesboth:sort_order.first()— secondary sort keys are ignoredColumn— function-wrapped sorts (date_trunc('day', ts) DESC,CAST(x AS Date) DESC,ceil(value) DESC, …) skip the stats-based reorderFour follow-ups, by priority:
P0 — Multi-column lexicographic reorder by per-column min
For
ORDER BY a ASC, b DESC LIMIT Nwhere bothaandbare plain columns in the file schema, lex-sort row groups (and files) by(min(a) ASC, min(b) DESC)instead of justmin(a) ASC. Parquet stores per-column stats, so the data is already there — switch fromarrow::compute::sort_to_indicesover a singlestat_minstolexsort_to_indicesover per-column min arrays, with per-columnSortOptions.Most useful when the leading sort column has low cardinality (e.g.
ORDER BY country, ts DESC LIMIT 100— RGs that tie oncountry.minshould be broken byts.min).Estimated cost: ~30-50 lines + a few unit tests.
P1 — Function-wrapped reorder via leaf-column extraction
For sort expressions like
date_trunc('day', ts) DESCorCAST(x AS Date) DESC— single leafColumnwrapped in a monotonic function chain. Parquet has no stats keyed by the function expression, butmin(f(col)) = f(min(col))whenfis monotonic, somin(col)'s order is a valid proxy.Approach:
PhysicalExpr::children()to find the unique leafColumn.EquivalencePropertiesalready does this inordering_satisfy; the same machinery should be reusable here.reorder_by_statisticsoff the leaf column's stats.DataFusion's existing
evaluate_bounds/ monotonicity inference is the right tool — happy to discuss the cleanest hook point.P1 — Detect "already-sorted" sources to skip redundant reorder
When the source's declared
output_orderingalready matches the request modulo NULL placement (and the column is either non-nullable or known null-free),try_pushdown_sortcurrently returnsInexactinstead ofExactbecauseEquivalenceProperties::ordering_satisfyis strict aboutnulls_first/nulls_last.Concrete example surfaced while diagnosing CI snapshot drift on
topk.slt'spartial_sortedcase in #21956:output_ordering=[number DESC]— DataFusion's default makes thatnumber DESC NULLS FIRST.ORDER BY number DESC NULLS LAST LIMIT 3.ordering_satisfyreturnsfalseon the NULL position mismatch, so the Phase 3 column-in-schema branch fires, settingsort_order_for_reorder=[number DESC NULLS LAST]andreverse_row_groups=true.SortExecis kept even though the data is already in the requested order on disk.A softer "Exact-equivalent" check would help: when the source ordering matches the request on
(expr, descending)and either the column is non-nullable or null-count statistics confirm no NULLs in the sort column,try_pushdown_sortcould short-circuit toExactso the optimizer drops theSortExecoutright.P2 — Compound-expression reorder via interval propagation (probably defer)
For sorts on expressions involving more than one column (
a + b DESC,a * 2 - b DESC, …) parquet stats give per-column intervals; combining them needs operator-specific interval arithmetic ([min(a)+min(b), max(a)+max(b)]for+, sign-aware for*, etc.). DataFusion has the building blocks (evaluate_bounds), but real-world workloads rarely sort by compound expressions and the integration cost is non-trivial. Listed for completeness; probably not worth doing unless a concrete benchmark motivates it.cc @adriangb @alamb — wanted to keep the parent PR focused on the core mechanism + the review feedback you raised; these were good directions that came out of the design discussion but seemed better as separate issues.