New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ExpressionFilterOperator #5132
Conversation
129be8e
to
16e1e55
Compare
16e1e55
to
7359aa3
Compare
1. Add BYTES type and multi-value support 2. Directly consturct DocIdSet to save the overhead of filtering 3. Remove the redundant isMatch() for all scan based iterators Also changed the numEntriesScannedInFilter for MV column to actual values fetched instead of values evaluated and added some TODOs for future filter optimization
7359aa3
to
aaa35fa
Compare
Codecov Report
@@ Coverage Diff @@
## master #5132 +/- ##
============================================
- Coverage 65.90% 65.87% -0.04%
Complexity 12 12
============================================
Files 1052 1056 +4
Lines 54170 54027 -143
Branches 8078 8045 -33
============================================
- Hits 35702 35591 -111
+ Misses 15819 15802 -17
+ Partials 2649 2634 -15
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please explain why we dont need to create the docId array anymore
|
||
public DocIdSetOperator(@Nonnull BaseFilterOperator filterOperator, int maxSizeOfDocIdSet, boolean threadLocal) { | ||
public DocIdSetOperator(BaseFilterOperator filterOperator, int maxSizeOfDocIdSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was needed rt. @fx19880617
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good.
The DocIdSet creation logic is moved outside.
See https://github.com/apache/incubator-pinot/pull/5132/files#diff-48213157488bdb6aa43c2f6da8d2a940R131
…Match() is still required ScanBasedDocIdIterator.isMatch() is still required when the query is in the following format: SELECT ... WHERE (filterA OR filterB) AND filterC And filterC is working on a scan-based column (column without inverted index) Enhance the HybridClusterIntegrationTest to catch this issue
…) is still required (#5328) ScanBasedDocIdIterator.isMatch() is still required when the query is in the following format: SELECT ... WHERE (filterA OR filterB) AND filterC And filterC is working on a scan-based column (column without inverted index) Enhance the HybridClusterIntegrationTest to catch this issue
Also changed the numEntriesScannedInFilter for MV column to actual values fetched instead of values evaluated and added some TODOs for future filter optimization