-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting range queries using indexes #5240
Conversation
@@ -27,6 +27,7 @@ | |||
|
|||
public class IndexingConfig extends BaseJsonConfig { | |||
private List<String> _invertedIndexColumns; | |||
private List<String> _rangeIndexColumns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not make a change to IndexingConfig. Let's start using FieldConfig.
IndexLoadingConfig can then be derived from FieldConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to how we do it for text index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. will have to do this as part of another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please add a TODO
_buffer.close(); | ||
} | ||
|
||
public int findRangeId(int value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a TODO here to indicate that for now we are doing a linear scan to find the corresponding range for a given dictId. If the number of ranges is high, we may want to do binary search?
public ImmutableRoaringBitmap getDocIds(Object value) { | ||
// This should not be called from anywhere. If it happens, there is a bug | ||
// and that's why we throw illegal state exception | ||
throw new IllegalStateException("bitmap inverted index reader supports lookup only on dictionary id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the exception
import org.slf4j.LoggerFactory; | ||
|
||
|
||
public class RangeIndexReader implements InvertedIndexReader<ImmutableRoaringBitmap> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc
import org.slf4j.LoggerFactory; | ||
|
||
|
||
public class RangeIndexHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc
@@ -86,6 +87,11 @@ private void extractFromTableConfig(@Nonnull TableConfig tableConfig) { | |||
_invertedIndexColumns.addAll(invertedIndexColumns); | |||
} | |||
|
|||
List<String> rangeIndexColumns = indexingConfig.getRangeIndexColumns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use FieldConfig -- this is another index type
@@ -124,14 +129,21 @@ public PhysicalColumnIndexContainer(SegmentDirectory.Reader segmentReader, Colum | |||
_invertedIndex = | |||
new BitmapInvertedIndexReader(segmentReader.getIndexFor(columnName, ColumnIndexType.INVERTED_INDEX), | |||
metadata.getCardinality()); | |||
_rangeIndex = null; | |||
} else if (loadRangeIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be if not else-if
@@ -124,14 +129,21 @@ public PhysicalColumnIndexContainer(SegmentDirectory.Reader segmentReader, Colum | |||
_invertedIndex = | |||
new BitmapInvertedIndexReader(segmentReader.getIndexFor(columnName, ColumnIndexType.INVERTED_INDEX), | |||
metadata.getCardinality()); | |||
_rangeIndex = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the if-else blocks here can be simplified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so but dint want to do that as part of this PR.
@Override | ||
public void put(int position, Number value) { | ||
_buffer.putInt(position << 2, value.intValue()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using constants for << 2, << 3 would be better for readability. Something like INT_SIZE_BYTES, DOUBLE_SIZE_BYTES etc
// Bitmap for range 2 | ||
// ...... | ||
// Bitmap for range R - 1 | ||
long bytesWritten = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see _forwardIndexValueBuffer being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont need that right?
return _valueBuffer.compare(val1, val2); | ||
}; | ||
Swapper swapper = (i, j) -> { | ||
Number temp = _docIdBuffer.get(i).intValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are doing in-place sorting on _docIdBuffer. Why do we need _docIdValueBuffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to sort the forward index, instead of sorting the actual forward index, I sort the position array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
throws IOException { | ||
//sort the forward index copy | ||
//go over the sorted value to compute ranges | ||
IntComparator comparator = (i, j) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is either going to sort raw values or dictIds. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual sort happens on docIdBuffer but uses forward index for sorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not call it forward index. Let's call it keys?
|
||
@Override | ||
public void addDoc(Object document, int docIdCounter) { | ||
throw new IllegalStateException("Bitmap inverted index creator does not support Object type currently"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix this exception
// Forward index buffers (from docId to dictId) | ||
private int _nextDocId; | ||
private PinotDataBuffer _forwardIndexValueBuffer; | ||
private NumberValueBuffer _valueBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This buffer will either store dictIds or raw values. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please rename it indexKeyBuffer?
|
||
// Forward index buffers (from docId to dictId) | ||
private int _nextDocId; | ||
private PinotDataBuffer _forwardIndexValueBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some short comments (one line is enough) about each buffer type?
private final int _numValues; | ||
private final boolean _useMMapBuffer; | ||
|
||
// Forward index buffers (from docId to dictId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we creating the forward index for the column as part of creating range index? The loop in SegmentColumnarIndexCreator that goes over each GenericRow takes care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a temp buffer, will renaming it to valueBuffer help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not use the term forward index inside range index creator. Can we please call it keyBuffer. The key that are stored could be dictIds or raw values.
* </ul> | ||
* <p>Based on the number of values we need to store, we use direct memory or MMap file to allocate the buffer. | ||
*/ | ||
public final class RangeIndexCreator implements InvertedIndexCreator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem with treating this as another inverted index by overriding InvertedIndexCreator and InvertedIndexReader interfaces, is that we are inherently making this work only on dictionary encoded columns because these interfaces and their APIs are written to work only on dictIds
However, for text index (which also implements the same interface), we added a docId based API but that takes an object value. May be we can expand that API for each primitive type and that way we can support raw value based inverted index as well. I think we should decide on this sooner because this question will pop up for every new index type that we may add.
Another alternative would be to change the inheritance hierarchy by introducing a mid layer of interfaces -- DictionaryBasedInvertedIndexCreator, RawValueBasedInvertedIndexCreator. Move all the dictId based APIs to former and raw value based to latter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, Jackie and I discussed about it and concluded the same. Jackie will clean this up at some point.
* In the second pass (processing values phase), when seal() method is called, all the dictIds should already been | ||
* added. We first reorder the values into the inverted index buffers by going over the dictIds in forward index | ||
* value buffer (for multi-valued column we also need forward index length buffer to get the docId for each dictId). | ||
* <p>Once we have the inverted index buffers, we simply go over them and create the bitmap for each dictId and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right. We don't create a bitmap for each dictId. I believe there is a bitmap for a contiguous range of dictIds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this is copy paste error from bitmap index
* A | ||
* </li> | ||
* <li> | ||
* In the first pass (adding values phase), when add() method is called, store the raw values into the forward index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean store the dictionary Ids?
@@ -22,6 +22,8 @@ | |||
|
|||
|
|||
public interface PredicateEvaluator { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
@Override | ||
protected FilterBlock getNextBlock() { | ||
|
||
//only dictionary based is supported for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a TODO for supporting this on raw columns
import org.roaringbitmap.buffer.MutableRoaringBitmap; | ||
|
||
|
||
public class RangeFilterOperator extends BaseFilterOperator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add javadoc
@@ -54,6 +54,12 @@ public static BaseFilterOperator getLeafFilterOperator(PredicateEvaluator predic | |||
|
|||
Predicate.Type predicateType = predicateEvaluator.getPredicateType(); | |||
|
|||
//Only for dictionary encoded columns and offline data sources | |||
if (predicateType == Predicate.Type.RANGE && dataSource.getDictionary() != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only dictionary encoded? For raw columns, each range could be a range of start and end raw values instead of dictIds. The data structure could be sorted on start raw values (just like it is on start dictIds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a TODO
|
||
|
||
@SuppressWarnings("unchecked") | ||
public final class DictionaryBasedSingleValueIterator extends BlockSingleValIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc
|
||
|
||
@SuppressWarnings("unchecked") | ||
public final class DictionaryBasedMultiValueIterator extends BlockMultiValIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc
|
||
private int _nextDocId; | ||
|
||
public DictionaryBasedSingleValueIterator(SingleColumnSingleValueReader reader, Dictionary dictionary, int numDocs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why do we have to implement two new iterators?
int startRangeId = rangeIndexReader.findRangeId(evaluator.getStartDictId()); | ||
int endRangeId = rangeIndexReader.findRangeId(evaluator.getEndDictId()); | ||
//Handle Matching Ranges - some ranges match fully but some partially | ||
//below code assumes first and last range always match partially which may not be the case always //todo: optimize it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about optimization. Won't this impact correctness?
For a given startDictId and endDictId, we find the matching ranges represented by startRangeId and endRangeId.
Some/all ranges between these could be a full match and some/all could be a partial match. Right?
I understand that all full matches can be ORed but how can we assume that everything from startRangeId + 1 to endRangeId - 1 will be a full match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/RangeIndexCreator.java
Show resolved
Hide resolved
private NumberValueBuffer _numberValueBuffer; | ||
|
||
private final File _tempDocIdBufferFile; | ||
private PinotDataBuffer _docIdValueBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename _docIdValueBuffer to _docIdBuffer and _docIdBuffer to _docIdBufferWrapperForSorting?
|
||
private final File _rangeIndexFile; | ||
|
||
private final File _tempValueBufferFile; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let'a add comments explaining purpose of each buffer.
dump(); | ||
} | ||
|
||
//sort the value buffer, change the docId buffer to maintain the mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a diagram here showing the contents of _numberValueBuffer and _docIdBuffer before and after sorting to explain the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working on this and addressing the comments. There are some follow-ups that I can take up:
(1) UseFieldConfig instead of IndexingConfig for specifying range index in table config
(2) Cleanup the if-else logic in PhysicalColumnIndexContainer that decides what indexes we should load.
(3) Refactor the inverted index interface to allow creating both raw and dictionary based (I already have this in progress).
(4) After (3), add support for creating raw values based range indexes (the index creator/reader need not change but there are assumptions elsewhere in the code)
1. Fix the compilation error introduced because of the merge of apache#4597 and apache#5240 2. Fix the bug of not loading the range index if both inverted index and range index exist TODO: The range index triggeres another severe issue of accessing closed DataBuffer which can cause JVM crash. Will address in a separate PR
1. Fix the compilation error introduced because of the merge of #4597 and #5240 2. Fix the bug of not loading the range index if both inverted index and range index exist TODO: The range index triggeres another severe issue of accessing closed DataBuffer which can cause JVM crash. Will address in a separate PR
Currently, range queries result in a full table scan and cannot be solved using inverted indices. This PR introduces the concept of Range index.
This will be useful in evaluating range predicates on time and metric columns
e.g. select count(*) from T where latency > 3sec
Design doc: https://docs.google.com/document/d/1eisu7L-ERLs1OZCASOz3qSpzZfoipplKrYgmBXaFobw/edit#