-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There are two use cases for this feature:
- Some storage providers or engines are able to guarantee that dictionary keys are already sorted and so sorting could be more efficient by using the keys instead of looking up corresponding strings.
- For the PARTITION BY part of window functions the data does not have to be sorted by the strings, sorting by the keys also ensures a partitioning
Describe the solution you'd like
Add a flag assume_sorted_dictionary to SortOptions. In sort_to_indices this flags gets used in the branch for dictionary types and if it is set we sort the keys as a primitive array. The same distinction also needs to be implemented in build_compare for the lexsort_to_indices kernel.
Additional context
Once this is implemented, the window function logic in DataFusion could be adjusted to take advantage of it.
svenwb and alamb
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog