-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
enhancementNew feature or requestNew feature or requestperformanceMake DataFusion fasterMake DataFusion faster
Description
Is your feature request related to a problem or challenge?
This is separated from #3516 (comment).
On a high level, #3516 implements the green path.
- Push down limit to sort #3530 implements
sort_batch -> lexsort_to_indices - Use
fetchlimit in get_sorted_iter #3545 implementsget_sorted_iter -> lexsort_to_indices
graph LR;
SPME[SortPreservingMergeExec]-->spwn_handler-->SE[SortExec];
SPME[SortPreservingMergeExec]-->streaming_merge-->SPMS[SortPreservingMergeStream];
SE-->execute-->do_sort-->ExternalSort-->insert_batch-->sort_batch-->lexsort_to_indices;
insert_batch-->spill-->in_mem_partial_sort-->get_sorted_iter-->lexsort_to_indices;
ExternalSort-->sort-->streaming_merge;
sort-->in_mem_partial_sort;
style SPME color:red
style streaming_merge color:red
style SPMS color:red
style spwn_handler color:green
style SE color:green
style execute color:green
style do_sort color:green
style ExternalSort color:green
style insert_batch color:green
style sort_batch color:green
style lexsort_to_indices color:green
style sort color:green
style in_mem_partial_sort color:green
style get_sorted_iter color:green
style spill color:green
This should reduce memory usage of sort and hope this could help for #5969
Describe the solution you'd like
Push down the limit in red path above, i.e. SortPreservingMergeExec and SortPreservingMergeStream
(don't forget the path sort -> streaming_merge as well)
Describe alternatives you've considered
No response
Additional context
No response
alamb
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestperformanceMake DataFusion fasterMake DataFusion faster