Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331

Jackie-Jiang · 2020-05-05T04:44:47Z

For real-time range predicate, because the dictionary is not sorted, in order to get the matching dictionary ids, we have to scan the whole dictionary.
This will cause performance issue when the cardinality is high for the column.
Optimize it by adding a cardinality threshold (1000 for now) to decide whether to pre-calculate all the matching dictionary ids.

kishoreg

LGTM.

Not sure about the Threshold but it's ok for now

xiangfu0 · 2020-05-05T06:41:32Z

...ain/java/org/apache/pinot/core/operator/filter/predicate/RangePredicateEvaluatorFactory.java

+      } else {
+        _dictIdSetBased = false;
+        _matchingDictIdSet = null;
+        switch (dataType) {


Not related to this PR, just shall we start thinking of how to simplify those switch cases code blocks?

xiangfu0

lgtm

…ng the dictionary when cardinality is high For real-time range predicate, because the dictionary is not sorted, in order to get the matching dictionary ids, we have to scan the whole dictionary. This will cause performance issue when the cardinality is high for the column. Optimize it by adding a cardinality threshold (1000 for now) to decide whether to pre-calculate all the matching dictionary ids.

codecov-io · 2020-05-05T19:53:26Z

Codecov Report

Merging #5331 into master will decrease coverage by 9.24%.
The diff coverage is 2.45%.

@@            Coverage Diff             @@
##           master    #5331      +/-   ##
==========================================
- Coverage   66.08%   56.83%   -9.25%     
==========================================
  Files        1072     1072              
  Lines       54668    54723      +55     
  Branches     8152     8160       +8     
==========================================
- Hits        36125    31104    -5021     
- Misses      15895    21174    +5279     
+ Partials     2648     2445     -203

Impacted Files	Coverage Δ
...roker/requesthandler/BaseBrokerRequestHandler.java	`20.16% <0.00%> (-59.20%)`	⬇️
...core/operator/dociditerators/AndDocIdIterator.java	`54.38% <0.00%> (-10.32%)`	⬇️
...or/dociditerators/ExpressionScanDocIdIterator.java	`0.00% <0.00%> (-49.70%)`	⬇️
...e/operator/dociditerators/SVScanDocIdIterator.java	`49.42% <0.00%> (-20.09%)`	⬇️
...edicate/BaseDictionaryBasedPredicateEvaluator.java	`54.16% <ø> (ø)`
...predicate/BaseRawValueBasedPredicateEvaluator.java	`86.36% <ø> (-1.52%)`	⬇️
...lter/predicate/RangePredicateEvaluatorFactory.java	`68.34% <0.00%> (-21.66%)`	⬇️
...e/operator/dociditerators/MVScanDocIdIterator.java	`40.62% <14.28%> (-25.48%)`	⬇️
...r/filter/predicate/PredicateEvaluatorProvider.java	`33.33% <100.00%> (-24.36%)`	⬇️
...a/org/apache/pinot/minion/metrics/MinionMeter.java	`0.00% <0.00%> (-100.00%)`	⬇️
... and 321 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ccb389...021e062. Read the comment docs.

kishoreg approved these changes May 5, 2020

View reviewed changes

xiangfu0 reviewed May 5, 2020

View reviewed changes

xiangfu0 approved these changes May 5, 2020

View reviewed changes

Jackie-Jiang force-pushed the optimize_realtime_range_evaluator branch from ec03154 to 021e062 Compare May 5, 2020 18:39

kishoreg merged commit af5a901 into master May 9, 2020

kishoreg deleted the optimize_realtime_range_evaluator branch May 9, 2020 06:44

Jackie-Jiang mentioned this pull request Aug 4, 2022

Fix the race condition of reflection scanning classes #9167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331

Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331

Jackie-Jiang commented May 5, 2020

kishoreg left a comment

xiangfu0 May 5, 2020

xiangfu0 left a comment

codecov-io commented May 5, 2020

Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331

Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331

Conversation

Jackie-Jiang commented May 5, 2020

kishoreg left a comment

Choose a reason for hiding this comment

xiangfu0 May 5, 2020

Choose a reason for hiding this comment

xiangfu0 left a comment

Choose a reason for hiding this comment

codecov-io commented May 5, 2020

Codecov Report