Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pbtree: MNode iterating with merge sort upon disk and buffer #12077

Merged
merged 21 commits into from
Feb 28, 2024

Conversation

linxt20
Copy link
Contributor

@linxt20 linxt20 commented Feb 23, 2024

This work is part of Pbtree's internal and external memory collaborative concurrency control work. It mainly changes the original ChildrenIterator's traversal deduplication and disordered arrangement into merged deduplication and ordered arrangement. On the one hand, it reduces the complexity of deduplication processing, and on the other hand, it also provides an ordered sequence that is easier to process and optimizes the performance of the program.

Implementation idea: ChildrenIterator points to the union of the memory node sequence and the disk node sequence. In the original implementation, the sequence returned by the disk node is already in order. If the memory node sequence is in order, orderly deduplication can be achieved through merging and deduplication. The memory node sequence is composed of two sequences, newbuffer and updatebuffer. On the premise that the two sequences are in order, merging and deduplication also need to be implemented. Newbuffer and updatebuffer inherit from the same base class MNodeChildBuffer, including receivingbuffer and flushingbuffer. After sorting, they also need to be merged and deduplicated.

For three merges and de-reordering, I abstracted the MergeSortIterator class and implemented the overall logic of merge sorting in it. Then inherit the abstract class in the three classes CachedMNodeMergeIterator, BufferIterator, and MNodeChildBufferIterator, and overload the post-processing functions and deduplication logic for different conditions in the merge as needed.

@MarcosZyk MarcosZyk changed the title Pbtree work merge sort for disk and memory Pbtree: MNode iterating with merge sort upon disk and buffer Feb 25, 2024
@MarcosZyk MarcosZyk merged commit ba86684 into apache:master Feb 28, 2024
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants