New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high #5331
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Not sure about the Threshold but it's ok for now
} else { | ||
_dictIdSetBased = false; | ||
_matchingDictIdSet = null; | ||
switch (dataType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, just shall we start thinking of how to simplify those switch cases code blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…ng the dictionary when cardinality is high For real-time range predicate, because the dictionary is not sorted, in order to get the matching dictionary ids, we have to scan the whole dictionary. This will cause performance issue when the cardinality is high for the column. Optimize it by adding a cardinality threshold (1000 for now) to decide whether to pre-calculate all the matching dictionary ids.
ec03154
to
021e062
Compare
Codecov Report
@@ Coverage Diff @@
## master #5331 +/- ##
==========================================
- Coverage 66.08% 56.83% -9.25%
==========================================
Files 1072 1072
Lines 54668 54723 +55
Branches 8152 8160 +8
==========================================
- Hits 36125 31104 -5021
- Misses 15895 21174 +5279
+ Partials 2648 2445 -203
Continue to review full report at Codecov.
|
For real-time range predicate, because the dictionary is not sorted, in order to get the matching dictionary ids, we have to scan the whole dictionary.
This will cause performance issue when the cardinality is high for the column.
Optimize it by adding a cardinality threshold (1000 for now) to decide whether to pre-calculate all the matching dictionary ids.