New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance VarByteChunkSVForwardIndexReader to directly read from data buffer for uncompressed data #5816
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we cover both compressed/uncompressed cases in the unit test?
LGTM otherwise.
Yes, |
int nextRowOffset; | ||
private int getValueEndOffset(int rowId, ByteBuffer chunkBuffer) { | ||
if (rowId == _numDocsPerChunk - 1) { | ||
// Last row in the trunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo; trunk -> chunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
/** | ||
* Helper method to compute the end offset of the value in the data buffer. | ||
*/ | ||
private long getValueEndOffset(int chunkId, int chunkRowId, long chunkStartOffset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the algorithm for getting endoffset or startOffset for next row is different for uncompressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Algorithm is slightly different because with the chunkBuffer
we can directly get the chunkEndOffset
via chunkBuffer.limit()
, which is not the case for the uncompressed one. That is why we have a branch on the last chunk.
…uffer for uncompressed data
d6deb16
to
2d12ff6
Compare
Description
Currently for var-byte raw index, we always pre-allocate the chunk buffer even for uncompressed data (the buffer could be huge if the index contains large size value). When reading values, we first copy the data into the chunk buffer, then read from the buffer. This could cause unnecessary overhead on copying the data as well as allocating the direct memory, and can even cause OOM if the buffer size is too big. The chunk buffer is needed for compressed data in order to decompress it, but not necessary for uncompressed data as we can directly read from the data buffer.
This PR enhances the VarByteChunkSVForwardIndexReader to directly read from the data buffer for uncompressed data, and avoid the overhead of the chunk buffer.