Optimize ExpressionFilterOperator #5132

Jackie-Jiang · 2020-03-10T00:35:13Z

Add BYTES type and multi-value support
Directly construct DocIdSet to save the overhead of filtering
Remove the redundant isMatch() for all scan based iterators

Also changed the numEntriesScannedInFilter for MV column to actual values fetched instead of values evaluated and added some TODOs for future filter optimization

1. Add BYTES type and multi-value support 2. Directly consturct DocIdSet to save the overhead of filtering 3. Remove the redundant isMatch() for all scan based iterators Also changed the numEntriesScannedInFilter for MV column to actual values fetched instead of values evaluated and added some TODOs for future filter optimization

codecov-io · 2020-03-30T00:49:42Z

Codecov Report

Merging #5132 into master will decrease coverage by 0.03%.
The diff coverage is 68.57%.

@@             Coverage Diff              @@
##             master    #5132      +/-   ##
============================================
- Coverage     65.90%   65.87%   -0.04%     
  Complexity       12       12              
============================================
  Files          1052     1056       +4     
  Lines         54170    54027     -143     
  Branches       8078     8045      -33     
============================================
- Hits          35702    35591     -111     
+ Misses        15819    15802      -17     
+ Partials       2649     2634      -15

Impacted Files	Coverage Δ	Complexity Δ
...e/pinot/broker/api/resources/PinotBrokerDebug.java	`76.66% <ø> (ø)`	`0.00 <0.00> (ø)`
.../BrokerResourceOnlineOfflineStateModelFactory.java	`55.81% <ø> (ø)`	`0.00 <0.00> (ø)`
.../pinot/broker/broker/helix/HelixBrokerStarter.java	`71.97% <ø> (ø)`	`0.00 <0.00> (ø)`
...thandler/SingleConnectionBrokerRequestHandler.java	`92.68% <ø> (ø)`	`0.00 <0.00> (ø)`
...rg/apache/pinot/broker/routing/RoutingManager.java	`78.84% <ø> (-2.31%)`	`0.00 <0.00> (ø)`
...ting/instanceselector/InstanceSelectorFactory.java	`71.42% <ø> (ø)`	`0.00 <0.00> (ø)`
...er/routing/segmentpruner/SegmentPrunerFactory.java	`83.33% <ø> (ø)`	`0.00 <0.00> (ø)`
...outing/segmentselector/SegmentSelectorFactory.java	`60.00% <ø> (ø)`	`0.00 <0.00> (ø)`
...oker/routing/timeboundary/TimeBoundaryManager.java	`87.50% <ø> (ø)`	`0.00 <0.00> (ø)`
...mmon/assignment/InstanceAssignmentConfigUtils.java	`67.50% <ø> (ø)`	`0.00 <0.00> (?)`
... and 182 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00fcb1d...aaa35fa. Read the comment docs.

kishoreg

LGTM. Please explain why we dont need to create the docId array anymore

kishoreg · 2020-04-03T20:25:37Z

pinot-core/src/main/java/org/apache/pinot/core/operator/DocIdSetOperator.java


-  public DocIdSetOperator(@Nonnull BaseFilterOperator filterOperator, int maxSizeOfDocIdSet, boolean threadLocal) {
+  public DocIdSetOperator(BaseFilterOperator filterOperator, int maxSizeOfDocIdSet) {


this was needed rt. @fx19880617

This is good.
The DocIdSet creation logic is moved outside.
See https://github.com/apache/incubator-pinot/pull/5132/files#diff-48213157488bdb6aa43c2f6da8d2a940R131

…Match() is still required ScanBasedDocIdIterator.isMatch() is still required when the query is in the following format: SELECT ... WHERE (filterA OR filterB) AND filterC And filterC is working on a scan-based column (column without inverted index) Enhance the HybridClusterIntegrationTest to catch this issue

…) is still required (#5328) ScanBasedDocIdIterator.isMatch() is still required when the query is in the following format: SELECT ... WHERE (filterA OR filterB) AND filterC And filterC is working on a scan-based column (column without inverted index) Enhance the HybridClusterIntegrationTest to catch this issue

Jackie-Jiang force-pushed the expression_doc_id_set branch from 129be8e to 16e1e55 Compare March 10, 2020 05:20

Jackie-Jiang requested a review from xiangfu0 March 10, 2020 05:22

Jackie-Jiang force-pushed the expression_doc_id_set branch from 16e1e55 to 7359aa3 Compare March 18, 2020 22:56

Jackie-Jiang force-pushed the expression_doc_id_set branch from 7359aa3 to aaa35fa Compare March 30, 2020 00:05

Jackie-Jiang requested a review from siddharthteotia March 30, 2020 01:01

kishoreg approved these changes Apr 3, 2020

View reviewed changes

xiangfu0 approved these changes Apr 3, 2020

View reviewed changes

xiangfu0 merged commit ac327bb into apache:master Apr 3, 2020

Jackie-Jiang deleted the expression_doc_id_set branch April 3, 2020 23:04

mcvsubbu mentioned this pull request May 7, 2020

Cleanup integration tests to get more coverage, less randomness #5346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ExpressionFilterOperator #5132

Optimize ExpressionFilterOperator #5132

Jackie-Jiang commented Mar 10, 2020 •

edited

codecov-io commented Mar 30, 2020 •

edited

kishoreg left a comment

kishoreg Apr 3, 2020

xiangfu0 Apr 3, 2020


		public DocIdSetOperator(@Nonnull BaseFilterOperator filterOperator, int maxSizeOfDocIdSet, boolean threadLocal) {
		public DocIdSetOperator(BaseFilterOperator filterOperator, int maxSizeOfDocIdSet) {

Optimize ExpressionFilterOperator #5132

Optimize ExpressionFilterOperator #5132

Conversation

Jackie-Jiang commented Mar 10, 2020 • edited

codecov-io commented Mar 30, 2020 • edited

Codecov Report

kishoreg left a comment

Choose a reason for hiding this comment

kishoreg Apr 3, 2020

Choose a reason for hiding this comment

xiangfu0 Apr 3, 2020

Choose a reason for hiding this comment

Jackie-Jiang commented Mar 10, 2020 •

edited

codecov-io commented Mar 30, 2020 •

edited