Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331

Merged
merged 1 commit into from May 9, 2020

Conversation

Jackie-Jiang
Copy link
Contributor

For real-time range predicate, because the dictionary is not sorted, in order to get the matching dictionary ids, we have to scan the whole dictionary.
This will cause performance issue when the cardinality is high for the column.
Optimize it by adding a cardinality threshold (1000 for now) to decide whether to pre-calculate all the matching dictionary ids.

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Not sure about the Threshold but it's ok for now

} else {
_dictIdSetBased = false;
_matchingDictIdSet = null;
switch (dataType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, just shall we start thinking of how to simplify those switch cases code blocks?

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

…ng the dictionary when cardinality is high

For real-time range predicate, because the dictionary is not sorted, in order to get the matching dictionary ids, we have to scan the whole dictionary.
This will cause performance issue when the cardinality is high for the column.
Optimize it by adding a cardinality threshold (1000 for now) to decide whether to pre-calculate all the matching dictionary ids.
@Jackie-Jiang Jackie-Jiang force-pushed the optimize_realtime_range_evaluator branch from ec03154 to 021e062 Compare May 5, 2020 18:39
@codecov-io
Copy link

Codecov Report

Merging #5331 into master will decrease coverage by 9.24%.
The diff coverage is 2.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5331      +/-   ##
==========================================
- Coverage   66.08%   56.83%   -9.25%     
==========================================
  Files        1072     1072              
  Lines       54668    54723      +55     
  Branches     8152     8160       +8     
==========================================
- Hits        36125    31104    -5021     
- Misses      15895    21174    +5279     
+ Partials     2648     2445     -203     
Impacted Files Coverage Δ
...roker/requesthandler/BaseBrokerRequestHandler.java 20.16% <0.00%> (-59.20%) ⬇️
...core/operator/dociditerators/AndDocIdIterator.java 54.38% <0.00%> (-10.32%) ⬇️
...or/dociditerators/ExpressionScanDocIdIterator.java 0.00% <0.00%> (-49.70%) ⬇️
...e/operator/dociditerators/SVScanDocIdIterator.java 49.42% <0.00%> (-20.09%) ⬇️
...edicate/BaseDictionaryBasedPredicateEvaluator.java 54.16% <ø> (ø)
...predicate/BaseRawValueBasedPredicateEvaluator.java 86.36% <ø> (-1.52%) ⬇️
...lter/predicate/RangePredicateEvaluatorFactory.java 68.34% <0.00%> (-21.66%) ⬇️
...e/operator/dociditerators/MVScanDocIdIterator.java 40.62% <14.28%> (-25.48%) ⬇️
...r/filter/predicate/PredicateEvaluatorProvider.java 33.33% <100.00%> (-24.36%) ⬇️
...a/org/apache/pinot/minion/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
... and 321 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ccb389...021e062. Read the comment docs.

@kishoreg kishoreg merged commit af5a901 into master May 9, 2020
@kishoreg kishoreg deleted the optimize_realtime_range_evaluator branch May 9, 2020 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants