-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MV raw forward index and MV BYTES
data type
#7595
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7595 +/- ##
=============================================
- Coverage 71.59% 31.01% -40.59%
=============================================
Files 1559 1553 -6
Lines 79025 79022 -3
Branches 11702 11710 +8
=============================================
- Hits 56579 24508 -32071
- Misses 18639 52417 +33778
+ Partials 3807 2097 -1710
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
...rg/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
Outdated
Show resolved
Hide resolved
...rg/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
Outdated
Show resolved
Hide resolved
ea9a92b
to
174f00b
Compare
This PR has a conflict with #7604 -- we need to figure out the sequencing of these two (duplicate commits for the FWD index). |
Can the other PR wait for this and then rebase? The work done here is intended to prevent OOM, and the common commits can’t be merged without the rest of this PR. |
d8e37f4
to
ae5701e
Compare
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
Show resolved
Hide resolved
e788aa0
to
b381ad1
Compare
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/ColumnStatistics.java
Show resolved
Hide resolved
a8b1402
to
fbf804c
Compare
I had to force derivation of
When there is a very large row (> 1MB) we end up with 1 doc per chunk in the segment. The only good solution is to evolve the forward index format to allow variable numbers of docs per chunk for variable length data, but we can do that later if this becomes a problem, |
fbf804c
to
b80de86
Compare
...ache/pinot/segment/local/segment/index/readers/forward/VarByteChunkMVForwardIndexReader.java
Outdated
Show resolved
Hide resolved
b80de86
to
2acfec1
Compare
BYTES
BYTES
data type
byte[] bytes = new byte[Integer.BYTES | ||
+ values.length * Integer.BYTES]; //numValues, bytes required to store the content | ||
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes); | ||
//write the length | ||
byteBuffer.putInt(values.length); | ||
//write the content of each element | ||
for (final int value : values) { | ||
byteBuffer.putInt(value); | ||
} | ||
_indexWriter.putBytes(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not require allocation of a temporary buffer, this could just be implemented as an MV pattern on _indexWriter
, just as was done to eliminate the much larger buffers for byte[][]
and String[]
8d02f6a
to
497c8ab
Compare
7affdf5
to
056531a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, lgtm otherwise.
pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
Show resolved
Hide resolved
...ain/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
Outdated
Show resolved
Hide resolved
ff74da5
to
d8bd2ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader for fixed-length MV is not implemented
pinot-core/src/main/java/org/apache/pinot/core/minion/RawIndexConverter.java
Show resolved
Hide resolved
...t-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkSVForwardIndexWriter.java
Show resolved
Hide resolved
throws IOException { | ||
File file = new File(baseIndexDir, | ||
column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION); | ||
FileUtils.deleteQuietly(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kishoreg can you explain why you included this?
.../apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
Show resolved
Hide resolved
...rg/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
Show resolved
Hide resolved
int length = value.length(); | ||
_minLength = Math.min(_minLength, length); | ||
_maxLength = Math.max(_maxLength, length); | ||
rowLength += length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we count the actual encoded bytes length? We need to add (1 + length) integers to this. Same for STRING
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems wrong to me, this is the length of the data and it's not known whether it would be length prefixed (+4) or null terminated (+1) here and adding either would prevent the other.
throw new UnsupportedOperationException(); | ||
} | ||
|
||
default int getFloatMV(int docId, float[] valueBuffer, T context, int[] parentIndices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I hadn't noticed this from the initial commits @kishoreg made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it in a follow up
* Initial code for MultiValue forward Index * Wiring in the segment creation driver Impl * cleanup * finish off adding BYTES_ARRAY type * use less memory and fewer passes during encoding * reduce memory requirement for forwardindexwriter * track size in bytes of largest row so chunks can be sized to accommodate it * remove TODOs * force derivation of number of docs for raw MV columns * specify character encoding * leave changes to integration tests to MV TEXT index implementation * fix javadoc * don't use StringUtils * fix formatting after rebase * fix javadoc formatting again * use zstd's compress bound Co-authored-by: kishoreg <g.kishore@gmail.com>
Description
Co-authored with @kishoreg
BYTES_ARRAY
typeUpgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation