-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
SortPreservingMergeStream currently uses a binary heap for k-way merging, which gets O(NlogS) time complexity (S=num_streams). however, when a top record is taken from the heap, we need to perform a pop/push operation and are likely to take 2*logS comparisons.
an improved way is to use a Tournament Tree (aka Loser Tree) to do the selection. when the top record is taken, the tree structure is not modified, and only the path from bottom to top is visited. the number of comparisons is always logS.
reference: https://en.wikipedia.org/wiki/K-way_merge_algorithm#Tournament_Tree
Describe the solution you'd like
implement the loser tree algorithm in SortPreservingMergeStream.
Describe alternatives you've considered
Additional context
the benchmark shows the merging time is ~50% shorter after applying the tournament tree.