Skip to content

Integrate vectorized bit-unpacking#5548

Open
siddharthteotia wants to merge 1 commit intoapache:masterfrom
siddharthteotia:vectorized-part1
Open

Integrate vectorized bit-unpacking#5548
siddharthteotia wants to merge 1 commit intoapache:masterfrom
siddharthteotia:vectorized-part1

Conversation

@siddharthteotia
Copy link
Contributor

@siddharthteotia siddharthteotia commented Jun 12, 2020

Description

In PR #5409, we implemented efficient vectorized bit unpacking reader (no format change, just new unpack algorithms) for dictionary encoded bit-compressed forward index.

This PR starts the integration process which is divided into multiple parts

  • Part 1: Use for SV reader, MV reader for power of 2 bit encodings. Fallback to older reader for non power of 2 encodings. Since the format hasn't changed, we can leverage the new reader on existing format/segments. This PR is for Part 1
  • Part 2: Part 2 can be divided as follows:
    • Part 2.1 Create a new writer that going forward uses LITTLE ENDIAN format. We need to move all our storage structures to native byte order (LE on pretty much all systems). See PR WIP: Support LITTLE ENDIAN indexes #5511. This new writer can be a good starting point to get going with LE format.
    • Part 2.2 The new writer can either choose to only have power of 2 bit encodings. So going forward for new segments, we will round up to nearest power of 2. The other alternative is to support non power of 2 upto 16 and from there on round off to 32. Either way, there is no impact on existing segments -- due to fallback implemented in Part 1 (this PR)
  • Part 3 - Change the scan code to start using the bulk API to fetch dictIds for predicate evaluation. The scan code will already benefit from the new efficient single read API. But we can leverage the bulk contiguous too.

Refer to PR #5409 for performance numbers when vectorized bit unpacking was implemented. Here is the latest after wiring:

Benchmark Score Unit
BenchmarkPinotDataBitSet.bit2Contiguous 33.166 ms/op
BenchmarkPinotDataBitSet.bit2ContiguousFast 18.205 ms/op
BenchmarkPinotDataBitSet.bit4Contiguous 29 ms/op
BenchmarkPinotDataBitSet.bit4ContiguousFast 15.898 ms/op
BenchmarkPinotDataBitSet.bit8Contiguous 30.236 ms/op
BenchmarkPinotDataBitSet.bit8ContiguousFast 16.702 ms/op
BenchmarkPinotDataBitSet.bit16Contiguous 52.347 ms/op
BenchmarkPinotDataBitSet.bit16ContiguousFast 19.708 ms/op
BenchmarkPinotDataBitSet.bit32Contiguous 122.751 ms/op
BenchmarkPinotDataBitSet.bit32ContiguousFast 16.287 ms/op
     
BenchmarkPinotDataBitSet.bit2BulkContiguous 30.333 ms/op
BenchmarkPinotDataBitSet.bit2BulkContiguousFast 14.306 ms/op
BenchmarkPinotDataBitSet.bit4BulkContiguous 31.881 ms/op
BenchmarkPinotDataBitSet.bit4BulkContiguousFast 15.687 ms/op
BenchmarkPinotDataBitSet.bit8BulkContiguous 38.648 ms/op
BenchmarkPinotDataBitSet.bit8BulkContiguousFast 17.252 ms/op
BenchmarkPinotDataBitSet.bit16BulkContiguous 63.883 ms/op
BenchmarkPinotDataBitSet.bit16BulkContiguousFast 18.757 ms/op
BenchmarkPinotDataBitSet.bit32BulkContiguous 137.234 ms/op
BenchmarkPinotDataBitSet.bit32BulkContiguousFast 25.108 ms/op
     
BenchmarkPinotDataBitSet.bit2BulkContiguousUnaligned 37.952 ms/op
BenchmarkPinotDataBitSet.bit2BulkContiguousUnalignedFast 14.289 ms/op
BenchmarkPinotDataBitSet.bit4BulkContiguousUnaligned 37.921 ms/op
BenchmarkPinotDataBitSet.bit4BulkContiguousUnalignedFast 16.385 ms/op
BenchmarkPinotDataBitSet.bit8BulkContiguousUnaligned 38.333 ms/op
BenchmarkPinotDataBitSet.bit8BulkContiguousUnalignedFast 14.872 ms/op
BenchmarkPinotDataBitSet.bit16BulkContiguousUnaligned 58.655 ms/op
BenchmarkPinotDataBitSet.bit16BulkContiguousUnalignedFast 18.367 ms/op
     
BenchmarkPinotDataBitSet.bit2BulkWithGaps 39.505 ops/ms
BenchmarkPinotDataBitSet.bit2BulkWithGapsFast 46.261 ops/ms
BenchmarkPinotDataBitSet.bit4BulkWithGaps 42.074 ops/ms
BenchmarkPinotDataBitSet.bit4BulkWithGapsFast 48.871 ops/ms
BenchmarkPinotDataBitSet.bit8BulkWithGaps 41.416 ops/ms
BenchmarkPinotDataBitSet.bit8BulkWithGapsFast 52.018 ops/ms
BenchmarkPinotDataBitSet.bit16BulkWithGaps 31.613 ops/ms
BenchmarkPinotDataBitSet.bit16BulkWithGapsFast 47.611 ops/ms
BenchmarkPinotDataBitSet.bit32BulkWithGaps 18.545 ops/ms
BenchmarkPinotDataBitSet.bit32BulkWithGapsFast 38.503 ops/ms
     
BenchmarkPinotDataBitSet.bit2BulkWithSparseGaps 44.544 ops/ms
BenchmarkPinotDataBitSet.bit2BulkWithSparseGapsFast 53.478 ops/ms
BenchmarkPinotDataBitSet.bit4BulkWithSparseGaps 44.629 ops/ms
BenchmarkPinotDataBitSet.bit4BulkWithSparseGapsFast 64.09 ops/ms
BenchmarkPinotDataBitSet.bit8BulkWithSparseGaps 41.044 ops/ms
BenchmarkPinotDataBitSet.bit8BulkWithSparseGapsFast 65.554 ops/ms
BenchmarkPinotDataBitSet.bit16BulkWithSparseGaps 31.628 ops/ms
BenchmarkPinotDataBitSet.bit16BulkWithSparseGapsFast 56.263 ops/ms
BenchmarkPinotDataBitSet.bit32BulkWithSparseGaps 31.04 ops/ms
BenchmarkPinotDataBitSet.bit32BulkWithSparseGapsFast 57.947 ops/ms

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
No

Does this PR fix a zero-downtime upgrade introduced earlier?
No

Does this PR otherwise need attention when creating release notes? Things to consider:
No

Release Notes

None

Documentation

If you have introduced a new feature or configuration, please add it to the documentation as well.
See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document

* @param length length
* @param out out array to store the unpacked integers
*/
void readInt(long startIndex, int length, int[] out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Rename the third argument to buffer?

* @param length length
* @param buffer array of integers to encode
*/
void writeInt(int startIndex, int length, int[] buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Rename the third argument to values as it is used as the input?

}
}

// Helper functions used by multi-value reader and writer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move these util methods into a separate util class.

private final PinotDataBitSet _bitmapReader;
private final FixedBitIntReaderWriter _rawDataReader;
private final PinotDataBuffer _headerBitmapBuffer;
private final FixedBitIntReaderWriterV2 _rawDataReader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing FixedBitIntReaderWriter and FixedBitIntReaderWriterV2 and directly use PinotBitSet. There is no value added to this extra layer.


public final class FixedBitSingleValueReader extends BaseSingleColumnSingleValueReader {
private final FixedBitIntReaderWriter _reader;
private final FixedBitIntReaderWriterV2 _reader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly use PinotBitSet, and move the logic of bulk read in FixedBitIntReaderWriterV2 to this class.

}

public static class Bit1Encoded extends PinotDataBitSetV2 {
// grab a final local reference to avoid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you observe performance degradation on this? I think hotspot can definitely handle this. Keeping an extra reference seems like over-optimization to me.

@Override
public void close()
throws IOException {
public void close() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to BasePinotBitSet?

// NOTE: DO NOT close the PinotDataBuffer here because it is tracked by the caller and might be reused later. The
// caller is responsible of closing the PinotDataBuffer.
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) new line

import org.apache.pinot.core.segment.memory.PinotDataBuffer;


public abstract class BasePinotBitSet implements PinotBitSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java Doc?

@ankitsultana
Copy link
Contributor

@siddharthteotia : I was going through some encoding formats and had run into FixedBitIntReaderWriterV2. Looks like it's not being used yet. Was curious: did we not see any perf improvements with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants