Skip to content

Push down limit to SortPreservingMergeExec and SortPreservingMergeStream #6000

@jychen7

Description

@jychen7

Is your feature request related to a problem or challenge?

This is separated from #3516 (comment).

On a high level, #3516 implements the green path.

  1. Push down limit to sort #3530 implements sort_batch -> lexsort_to_indices
  2. Use fetch limit in get_sorted_iter #3545 implements get_sorted_iter -> lexsort_to_indices
graph LR;
    SPME[SortPreservingMergeExec]-->spwn_handler-->SE[SortExec];
    SPME[SortPreservingMergeExec]-->streaming_merge-->SPMS[SortPreservingMergeStream];
    SE-->execute-->do_sort-->ExternalSort-->insert_batch-->sort_batch-->lexsort_to_indices;
    insert_batch-->spill-->in_mem_partial_sort-->get_sorted_iter-->lexsort_to_indices;
    ExternalSort-->sort-->streaming_merge;
    sort-->in_mem_partial_sort;

    style SPME color:red
    style streaming_merge color:red
    style SPMS color:red
    style spwn_handler color:green
    style SE color:green
    style execute color:green
    style do_sort color:green
    style ExternalSort color:green
    style insert_batch color:green
    style sort_batch color:green
    style lexsort_to_indices color:green
    style sort color:green
    style in_mem_partial_sort color:green
    style get_sorted_iter color:green
    style spill color:green
Loading

This should reduce memory usage of sort and hope this could help for #5969

Describe the solution you'd like

Push down the limit in red path above, i.e. SortPreservingMergeExec and SortPreservingMergeStream
(don't forget the path sort -> streaming_merge as well)

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceMake DataFusion faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions