Support FST Index using Native FST Library#7684
Support FST Index using Native FST Library#7684atris wants to merge 13 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7684 +/- ##
==========================================
- Coverage 30.79% 27.66% -3.14%
==========================================
Files 1570 1572 +2
Lines 80024 80153 +129
Branches 11904 11923 +19
==========================================
- Hits 24644 22174 -2470
- Misses 53274 55969 +2695
+ Partials 2106 2010 -96
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
.../src/main/java/org/apache/pinot/core/data/manager/realtime/HLRealtimeSegmentDataManager.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Both native FST and Lucene FST are FST, and share the same interface. We should not treat them as different index, but different version of the same index. This way, most of the existing code can be shared, and we don't need to handle FST and NATIVE_FST separately
| * Returns the Native FST index for the column if exists, or {@code null} if not. | ||
| */ | ||
| @Nullable | ||
| TextIndexReader getNativeFSTIndex(); |
There was a problem hiding this comment.
Since native FST and lucene FST are both FST, just different implementation, and I don't think they should co-exist, let's reuse the same getFSTIndex() for both of them. We shouldn't need to change any code on the query execution side. The loader should be able to tell which version of FST exists and load it as the FST index.
There was a problem hiding this comment.
The problem is during creation -- there is no way to specify at a per field level as to which FST type needs to be used. Hence the need for the extra index
| private final List<String> _invertedIndexCreationColumns = new ArrayList<>(); | ||
| private final List<String> _textIndexCreationColumns = new ArrayList<>(); | ||
| private final List<String> _fstIndexCreationColumns = new ArrayList<>(); | ||
| private final List<String> _nativeFSTIndexCreationColumns = new ArrayList<>(); |
There was a problem hiding this comment.
I'd suggest adding an enum for the FST type (can be LUCENE or NATIVE instead of keeping separate lists
There was a problem hiding this comment.
Would that not limit the ability to create per column FST indices within same table?
| // If null, there won't be any index | ||
| public enum IndexType { | ||
| INVERTED, SORTED, TEXT, FST, H3, JSON, RANGE | ||
| INVERTED, SORTED, TEXT, FST, NATIVE_FST, H3, JSON, RANGE |
There was a problem hiding this comment.
Suggest treating native FST as FST, and add an enum in IndexingConfig for the FST version (LUCENE or NATIVE)
There was a problem hiding this comment.
I do not think that would work, since we do not allow specifying sub config per field in IndexingConfig?
|
Superseded by #7729 |
This PR introduces a new index type which is built using the native FST library. The new index is used for serving regexp queries (using REGEXP_LIKE) and LIKE operator.