-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminating multi-column sort when major column is a one-to-one and monotonic expression #8838
Comments
Thank you @suremarc -- I agree DataFusion should be able to remove the sort in this case. I marked this as a enhancement rather than a bug because I think it would be new functionality rather than something that used to work (though the distinction may be a bit arbitrary) |
I want to try this |
I suggest looking for |
Just to make sure we're all on the same page, I think the first case (single order by expression) already works due to monotonicity, and I think EXPLAIN
SELECT
CAST(c_customer_sk AS BIGINT) AS c_customer_sk_big
FROM delta_encoding_required_column
ORDER BY c_customer_sk_big DESC; The second case (multiple order by expressions) is the troublesome one, because I think monotonicity alone is not sufficient (that's where one-to-one-ness becomes relevant). |
fix: issue apache#8838 discard extra sort when sorted element is wrapped fix: issue apache#8838 discard extra sort when sorted element is wrapped
fix: issue apache#8838 discard extra sort when sorted element is wrapped fix: issue apache#8838 discard extra sort when sorted element is wrapped
…9127) * fix: issue #8838 discard extra sort when sorted element is wrapped fix: issue #8838 discard extra sort when sorted element is wrapped fix: issue #8838 discard extra sort when sorted element is wrapped * fix bugs * fix bugs * fix bugs * fix:bugs * adding tests * adding cast UTF8 type and diable scalarfunction situation * fix typo
* fix: issue #8838 discard extra sort when sorted element is wrapped fix: issue #8838 discard extra sort when sorted element is wrapped fix: issue #8838 discard extra sort when sorted element is wrapped * fix bugs * fix bugs * fix bugs * fix:bugs * adding tests * adding cast UTF8 type and diable scalarfunction situation * fix typo * Simplifications, add new test * Make resulting order deterministic after projection * Add comment to explain ratioanale of using IndexMap, and IndexSet * Add comment * Add negative tests --------- Co-authored-by: Yanxin Xiang <yanxinxiang0917@outlook.com>
Describe the bug
DataFusion is unable to eliminate multi-column sorts when the major column is a one-to-one and monotonic expression of a sorted input column:
floor(x), y
is not equivalent to sorting byx, y
, even thoughfloor
is monotonic).Int32
toInt64
is one-to-one and monotonic, so DataFusion should be able to avoid sorting in such a case. See below.To Reproduce
Data acquired from
parquet_testing/data/delta_encoding_required_column_expect.csv
(I couldn't getdatafusion-cli
to work with the parquet file for some reason).Resulting physical plan has a
SortExec
:Resulting physical plan has no
SortExec
:Expected behavior
The first query provided above should not require a sort
Additional context
I encountered this bug when trying to re-cast the timezone of a table ordered by both
timestamp
and a secondaryticker
column.The text was updated successfully, but these errors were encountered: