Integrate vectorized bit-unpacking#5548
Integrate vectorized bit-unpacking#5548siddharthteotia wants to merge 1 commit intoapache:masterfrom
Conversation
| * @param length length | ||
| * @param out out array to store the unpacked integers | ||
| */ | ||
| void readInt(long startIndex, int length, int[] out); |
There was a problem hiding this comment.
(nit) Rename the third argument to buffer?
| * @param length length | ||
| * @param buffer array of integers to encode | ||
| */ | ||
| void writeInt(int startIndex, int length, int[] buffer); |
There was a problem hiding this comment.
(nit) Rename the third argument to values as it is used as the input?
| } | ||
| } | ||
|
|
||
| // Helper functions used by multi-value reader and writer |
There was a problem hiding this comment.
Let's move these util methods into a separate util class.
| private final PinotDataBitSet _bitmapReader; | ||
| private final FixedBitIntReaderWriter _rawDataReader; | ||
| private final PinotDataBuffer _headerBitmapBuffer; | ||
| private final FixedBitIntReaderWriterV2 _rawDataReader; |
There was a problem hiding this comment.
I would suggest removing FixedBitIntReaderWriter and FixedBitIntReaderWriterV2 and directly use PinotBitSet. There is no value added to this extra layer.
|
|
||
| public final class FixedBitSingleValueReader extends BaseSingleColumnSingleValueReader { | ||
| private final FixedBitIntReaderWriter _reader; | ||
| private final FixedBitIntReaderWriterV2 _reader; |
There was a problem hiding this comment.
Directly use PinotBitSet, and move the logic of bulk read in FixedBitIntReaderWriterV2 to this class.
| } | ||
|
|
||
| public static class Bit1Encoded extends PinotDataBitSetV2 { | ||
| // grab a final local reference to avoid |
There was a problem hiding this comment.
Do you observe performance degradation on this? I think hotspot can definitely handle this. Keeping an extra reference seems like over-optimization to me.
| @Override | ||
| public void close() | ||
| throws IOException { | ||
| public void close() { |
There was a problem hiding this comment.
Move this to BasePinotBitSet?
| // NOTE: DO NOT close the PinotDataBuffer here because it is tracked by the caller and might be reused later. The | ||
| // caller is responsible of closing the PinotDataBuffer. | ||
| } | ||
| } No newline at end of file |
| import org.apache.pinot.core.segment.memory.PinotDataBuffer; | ||
|
|
||
|
|
||
| public abstract class BasePinotBitSet implements PinotBitSet { |
|
@siddharthteotia : I was going through some encoding formats and had run into |
Description
In PR #5409, we implemented efficient vectorized bit unpacking reader (no format change, just new unpack algorithms) for dictionary encoded bit-compressed forward index.
This PR starts the integration process which is divided into multiple parts
Refer to PR #5409 for performance numbers when vectorized bit unpacking was implemented. Here is the latest after wiring:
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
No
Does this PR fix a zero-downtime upgrade introduced earlier?
No
Does this PR otherwise need attention when creating release notes? Things to consider:
No
Release Notes
None
Documentation
If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document