Skip to content

Support FST Index using Native FST Library#7684

Closed
atris wants to merge 13 commits intoapache:masterfrom
atris:new_wiring_native_index
Closed

Support FST Index using Native FST Library#7684
atris wants to merge 13 commits intoapache:masterfrom
atris:new_wiring_native_index

Conversation

@atris
Copy link
Contributor

@atris atris commented Nov 3, 2021

This PR introduces a new index type which is built using the native FST library. The new index is used for serving regexp queries (using REGEXP_LIKE) and LIKE operator.

@codecov-commenter
Copy link

Codecov Report

Merging #7684 (726f33a) into master (2107b2c) will decrease coverage by 3.13%.
The diff coverage is 3.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7684      +/-   ##
==========================================
- Coverage   30.79%   27.66%   -3.14%     
==========================================
  Files        1570     1572       +2     
  Lines       80024    80153     +129     
  Branches    11904    11923      +19     
==========================================
- Hits        24644    22174    -2470     
- Misses      53274    55969    +2695     
+ Partials     2106     2010      -96     
Flag Coverage Δ
integration1 ?
integration2 27.66% <3.90%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...inot/core/operator/filter/FilterOperatorUtils.java 74.32% <0.00%> (-3.76%) ⬇️
...local/indexsegment/mutable/MutableSegmentImpl.java 0.00% <0.00%> (ø)
...l/realtime/converter/RealtimeSegmentConverter.java 0.00% <0.00%> (ø)
...ent/local/realtime/impl/RealtimeSegmentConfig.java 0.00% <0.00%> (ø)
...ment/creator/impl/SegmentColumnarIndexCreator.java 0.00% <0.00%> (ø)
...gment/index/column/IntermediateIndexContainer.java 0.00% <ø> (ø)
...ent/index/column/PhysicalColumnIndexContainer.java 0.00% <0.00%> (ø)
...ndex/converter/SegmentV1V2ToV3FormatConverter.java 0.00% <0.00%> (ø)
...local/segment/index/datasource/BaseDataSource.java 0.00% <0.00%> (ø)
...ocal/segment/index/datasource/EmptyDataSource.java 0.00% <0.00%> (ø)
... and 179 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2107b2c...726f33a. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both native FST and Lucene FST are FST, and share the same interface. We should not treat them as different index, but different version of the same index. This way, most of the existing code can be shared, and we don't need to handle FST and NATIVE_FST separately

* Returns the Native FST index for the column if exists, or {@code null} if not.
*/
@Nullable
TextIndexReader getNativeFSTIndex();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since native FST and lucene FST are both FST, just different implementation, and I don't think they should co-exist, let's reuse the same getFSTIndex() for both of them. We shouldn't need to change any code on the query execution side. The loader should be able to tell which version of FST exists and load it as the FST index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is during creation -- there is no way to specify at a per field level as to which FST type needs to be used. Hence the need for the extra index

private final List<String> _invertedIndexCreationColumns = new ArrayList<>();
private final List<String> _textIndexCreationColumns = new ArrayList<>();
private final List<String> _fstIndexCreationColumns = new ArrayList<>();
private final List<String> _nativeFSTIndexCreationColumns = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding an enum for the FST type (can be LUCENE or NATIVE instead of keeping separate lists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not limit the ability to create per column FST indices within same table?

// If null, there won't be any index
public enum IndexType {
INVERTED, SORTED, TEXT, FST, H3, JSON, RANGE
INVERTED, SORTED, TEXT, FST, NATIVE_FST, H3, JSON, RANGE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest treating native FST as FST, and add an enum in IndexingConfig for the FST version (LUCENE or NATIVE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that would work, since we do not allow specifying sub config per field in IndexingConfig?

@atris
Copy link
Contributor Author

atris commented Nov 9, 2021

Superseded by #7729

@atris atris closed this Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants