Skip to content

Add option to sort by dictionary keys in sort kernels #980

@jhorstmann

Description

@jhorstmann

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

There are two use cases for this feature:

  • Some storage providers or engines are able to guarantee that dictionary keys are already sorted and so sorting could be more efficient by using the keys instead of looking up corresponding strings.
  • For the PARTITION BY part of window functions the data does not have to be sorted by the strings, sorting by the keys also ensures a partitioning

Describe the solution you'd like

Add a flag assume_sorted_dictionary to SortOptions. In sort_to_indices this flags gets used in the branch for dictionary types and if it is set we sort the keys as a primitive array. The same distinction also needs to be implemented in build_compare for the lexsort_to_indices kernel.

Additional context
Once this is implemented, the window function logic in DataFusion could be adjusted to take advantage of it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions