-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27329 Introduce prefix tree index block encoding use less space #4782
base: master
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trie implementation is not very memory efficient. We just use it when serialize/deserialize? Or we will also use it when reading?
throw new IllegalStateException("Unexpected unable to find index=" + index); | ||
} | ||
|
||
public static class TokenizerNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory overhead is a bit large here? Is it possible to use a double array trie here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only need when serialize, and will not use it when reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what is the IndexEncodedSeeker used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexEncodedSeeker is used for reading, and read ByteBuffer data direct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this TokenizerNode is just for construct the trie and serialize it, when reading, we will use the compact format and read the ByteBuffer directly instead of deserialize it to a TokenizerNode? Please add more comments so others could know this when they are not familiar with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
The first version is simple, index block only contains cell's row, so one row's all cells need to store in one HFileBlock. The version 2 support row + qualifier + timestamp + type which keep consistent with the current situation, one row's all cells can store in multiple HFileBlock. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
import org.apache.yetus.audience.InterfaceAudience; | ||
|
||
@InterfaceAudience.Private | ||
public class PrefixTreeIndexBlockEncoderV2 implements IndexBlockEncoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there 2 versions of this encoder? What's the difference? When to use one vs the other?
A simple implementation which only consider cell row and not cell qualifier.
00c7-202206201519-wx0t
00c7-202206201519-wx0zcldi7lnsiyas-N
00c7-202206201520-wx0re
00c7-202206201520-wx0ulgrwi7d542tm-N
00c7-202206201520-wx0x7
00c7-202206201521
00c7-202206201521-wx05xfbtw2mopyhs-C
00c7-202206201521-wx08
00c7-202206201521-wx0c
00c7-202206201521-wx0go
00c7-202206201522-wx0t
00c8-202206200751-wx0ah4gnbwptdyna-F
The prefix tree node is like this: