New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support to configure log2m value for hyperloglog #5564
Conversation
fec0e54
to
3799d93
Compare
Codecov Report
@@ Coverage Diff @@
## master #5564 +/- ##
==========================================
- Coverage 66.44% 66.33% -0.11%
==========================================
Files 1075 1122 +47
Lines 54773 57559 +2786
Branches 8168 8623 +455
==========================================
+ Hits 36396 38184 +1788
- Misses 15700 16544 +844
- Partials 2677 2831 +154
Continue to review full report at Codecov.
|
pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/HelixBrokerStarter.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
.checkArgument(numExpressions <= 2 && numExpressions >= 1, "DistinctCountHLL expects 1 or 2 arguments, got: ", | ||
numExpressions); | ||
if (arguments.size() == 2) { | ||
_log2M = Integer.valueOf(arguments.get(1).replace("'", "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to replace single quote here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer.valueOf(...) will throw exception on string "'1'"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would someone put '1'
here? In order to put '1'
as the literal, you need to explicitly escape '
(i.e. DistinctCountHLL(column, '''1''')
). Also, in that case we should fail the query because it is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from sql side, we are still doing DistinctCountHLL(column, 1)
, this expression is parsed to long literal, then converted to string literal in BrokerRequest with single quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right.. Can we add a TODO and fix it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
Currently PinotQuery2BrokerRequestConverter
enforces single quoted non-string literal in ParserUtils.standardizeExpression(...)
.
...va/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
Outdated
Show resolved
Hide resolved
...va/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
Outdated
Show resolved
Hide resolved
...va/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/core/query/aggregation/function/DistinctCountHLLMVAggregationFunction.java
Outdated
Show resolved
Hide resolved
854805e
to
02f7247
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the argument handling of '
02f7247
to
535d8c0
Compare
Description
Current distinctCountHLL hard coded Hyperloglo object's
log2m
value to 8, which prevents users from tuning the results accuracy vs query speed.This PR extends
distinctCountHLL
to takelog2m
value as the second argument.Adding a cluster config to allow users to customize default log2m value for query.
Release Notes
distinctCountHLL
,distinctCountHLLMV
functions by addinglog2m
value as the second parameter in the function.default.hyperloglog.log2m
to allow user set defaultlog2m
value.