Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ExpressionFilterOperator #5132

Merged
merged 1 commit into from Apr 3, 2020

Conversation

Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Mar 10, 2020

  1. Add BYTES type and multi-value support
  2. Directly construct DocIdSet to save the overhead of filtering
  3. Remove the redundant isMatch() for all scan based iterators

Also changed the numEntriesScannedInFilter for MV column to actual values fetched instead of values evaluated and added some TODOs for future filter optimization

1. Add BYTES type and multi-value support
2. Directly consturct DocIdSet to save the overhead of filtering
3. Remove the redundant isMatch() for all scan based iterators

Also changed the numEntriesScannedInFilter for MV column to actual values fetched instead of values evaluated and added some TODOs for future filter optimization
@codecov-io
Copy link

codecov-io commented Mar 30, 2020

Codecov Report

Merging #5132 into master will decrease coverage by 0.03%.
The diff coverage is 68.57%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #5132      +/-   ##
============================================
- Coverage     65.90%   65.87%   -0.04%     
  Complexity       12       12              
============================================
  Files          1052     1056       +4     
  Lines         54170    54027     -143     
  Branches       8078     8045      -33     
============================================
- Hits          35702    35591     -111     
+ Misses        15819    15802      -17     
+ Partials       2649     2634      -15     
Impacted Files Coverage Δ Complexity Δ
...e/pinot/broker/api/resources/PinotBrokerDebug.java 76.66% <ø> (ø) 0.00 <0.00> (ø)
.../BrokerResourceOnlineOfflineStateModelFactory.java 55.81% <ø> (ø) 0.00 <0.00> (ø)
.../pinot/broker/broker/helix/HelixBrokerStarter.java 71.97% <ø> (ø) 0.00 <0.00> (ø)
...thandler/SingleConnectionBrokerRequestHandler.java 92.68% <ø> (ø) 0.00 <0.00> (ø)
...rg/apache/pinot/broker/routing/RoutingManager.java 78.84% <ø> (-2.31%) 0.00 <0.00> (ø)
...ting/instanceselector/InstanceSelectorFactory.java 71.42% <ø> (ø) 0.00 <0.00> (ø)
...er/routing/segmentpruner/SegmentPrunerFactory.java 83.33% <ø> (ø) 0.00 <0.00> (ø)
...outing/segmentselector/SegmentSelectorFactory.java 60.00% <ø> (ø) 0.00 <0.00> (ø)
...oker/routing/timeboundary/TimeBoundaryManager.java 87.50% <ø> (ø) 0.00 <0.00> (ø)
...mmon/assignment/InstanceAssignmentConfigUtils.java 67.50% <ø> (ø) 0.00 <0.00> (?)
... and 182 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00fcb1d...aaa35fa. Read the comment docs.

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please explain why we dont need to create the docId array anymore


public DocIdSetOperator(@Nonnull BaseFilterOperator filterOperator, int maxSizeOfDocIdSet, boolean threadLocal) {
public DocIdSetOperator(BaseFilterOperator filterOperator, int maxSizeOfDocIdSet) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was needed rt. @fx19880617

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good.
The DocIdSet creation logic is moved outside.
See https://github.com/apache/incubator-pinot/pull/5132/files#diff-48213157488bdb6aa43c2f6da8d2a940R131

@xiangfu0 xiangfu0 merged commit ac327bb into apache:master Apr 3, 2020
@Jackie-Jiang Jackie-Jiang deleted the expression_doc_id_set branch April 3, 2020 23:04
Jackie-Jiang added a commit to Jackie-Jiang/pinot that referenced this pull request May 4, 2020
…Match() is still required

ScanBasedDocIdIterator.isMatch() is still required when the query is in the following format:
SELECT ... WHERE (filterA OR filterB) AND filterC
And filterC is working on a scan-based column (column without inverted index)

Enhance the HybridClusterIntegrationTest to catch this issue
Jackie-Jiang added a commit that referenced this pull request May 5, 2020
…) is still required (#5328)

ScanBasedDocIdIterator.isMatch() is still required when the query is in the following format:
SELECT ... WHERE (filterA OR filterB) AND filterC
And filterC is working on a scan-based column (column without inverted index)

Enhance the HybridClusterIntegrationTest to catch this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants