New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Optimize SortMergeReader: use loser tree to reduce comparisons #833
Conversation
Thanks @liming30 ! |
@JingsongLi hi, I changed the key in the benchmark to a
|
95f4117
to
4b4d8bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @liming30 !
Thanks for the contribution. It's a clever implementation with state transitions which utilizes several properties of the loser tree. I've gone through the algorithm several times and can't find an error.
However to help maintain the code in the future, I'd like to see a detailed Java docs about this algorithm. The document should include the following:
- What each state represents. The most confusing part to me is that
WINNER_WITH_SAME_KEY
indicates that the leaf wins with the same key as the previous global winner, howeverLOSER_WITH_SAME_KEY
only indicates that the leaf loses in its last local comparison with the same key. - The meaning of
firstSameKeyIndex
, and why can we iterate through allKeyValue
s with the same key by following this pointer. At first I was thinkingfirstSameKeyIndex
formed a linked list but it's not. - A detailed explanation of each step of your implementation. It seems to me that your implementation consists of three steps: initializing, moving all
KeyValue
s with the same smallest key to the top through severaladjust()
, and move them to the top through severaladjust()
again to advance their readers. - The proof of this implementation. The most important part is that why
adjust()
works, and why do we only need to check several, but not all combinations of states in the threeadjustWithXXLoserKey
.
Of course, if you have any reference link for the implementation or proof, you can add them in the Java docs to help for explanation.
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/SortMergeReader.java
Outdated
Show resolved
Hide resolved
This result is exciting! |
@tsreaper Thanks for your help reviewing the code. I created a google doc for the design and implementation of this code, please help review this document. If there is no problem with the content of the document, I can attach the core idea of the document and the link to the document to the comments of the code. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests for LoserTree
? There can be some unit cases.
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/SortMergeReader.java
Outdated
Show resolved
Hide resolved
Thanks for your detailed document. I've left some comments in the document, please resolve them. |
@tsreaper @JingsongLi Thanks for your review, related comments have been replied in doc. At the same time, I modified the implementation of part of the code in the new commit to perform state transition from the perspective of the local winner, so that the code is more concise and easy to understand. |
7aa7026
to
c10f440
Compare
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Outdated
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Outdated
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Outdated
Show resolved
Hide resolved
3dc8302
to
b100f0c
Compare
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Outdated
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Outdated
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LoserTree.java
Show resolved
Hide resolved
@tsreaper @JingsongLi Hi, I have addressed your comments, is there any other problem? |
dc2b30d
to
e3d0cab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add an option in CoreOptions
for using loser tree or heap. We can use loser tree by default and users also have the ability to fall back to heap if they encounter any problem.
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/SortMergeReader.java
Outdated
Show resolved
Hide resolved
1efef9e
to
b2a1b84
Compare
@tsreaper added |
paimon-core/src/main/java/org/apache/paimon/mergetree/MergeTreeReaders.java
Outdated
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/SortMergeReader.java
Outdated
Show resolved
Hide resolved
...on-core/src/test/java/org/apache/paimon/mergetree/compact/CombiningRecordReaderTestBase.java
Outdated
Show resolved
Hide resolved
paimon-core/src/test/java/org/apache/paimon/mergetree/MergeTreeTest.java
Outdated
Show resolved
Hide resolved
e75ecf3
to
f8d6612
Compare
f8d6612
to
e3e9f10
Compare
Purpose
[Feature] Optimize SortMergeReader: use loser tree to reduce comparisons (#741)
Tests
SortMergeReaderTestBase
already exists to verify the correctness ofSortMergeReader
.From the benchmark results, there is almost no difference between
loser-tree
andmin-heap
whenRecordReader
/data volume
is small. With the increase ofRecordReader
/data volume
, there can be about 10% performance improvement.API and Format
No
Documentation
A variant implementation of loser tree is introduced to optimize the comparison times of
SortMergeReader
.Different from the traditional loser tree, since
RecordReader
andMergeFunction
may reuse objects, we can iterate data forward forRecordReader
only after all the same keys in the entire tree are processed.