Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support to configure log2m value for hyperloglog #5564

Merged
merged 2 commits into from Jun 16, 2020

Conversation

xiangfu0
Copy link
Contributor

Description

Current distinctCountHLL hard coded Hyperloglo object's log2m value to 8, which prevents users from tuning the results accuracy vs query speed.

This PR extends distinctCountHLL to take log2m value as the second argument.
Adding a cluster config to allow users to customize default log2m value for query.

Release Notes

  • Support customized accuracy for distinctCountHLL, distinctCountHLLMV functions by adding log2m value as the second parameter in the function.
  • Adding cluster config: default.hyperloglog.log2m to allow user set default log2m value.

@xiangfu0 xiangfu0 added the release-notes Referenced by PRs that need attention when compiling the next release notes label Jun 14, 2020
@xiangfu0 xiangfu0 linked an issue Jun 14, 2020 that may be closed by this pull request
3 tasks
@codecov-commenter
Copy link

codecov-commenter commented Jun 14, 2020

Codecov Report

Merging #5564 into master will decrease coverage by 0.10%.
The diff coverage is 70.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5564      +/-   ##
==========================================
- Coverage   66.44%   66.33%   -0.11%     
==========================================
  Files        1075     1122      +47     
  Lines       54773    57559    +2786     
  Branches     8168     8623     +455     
==========================================
+ Hits        36396    38184    +1788     
- Misses      15700    16544     +844     
- Partials     2677     2831     +154     
Flag Coverage Δ
#integrationtests 45.05% <51.40%> (?)
#unittests 56.97% <62.52%> (?)
Impacted Files Coverage Δ
...quota/HelixExternalViewBasedQueryQuotaManager.java 67.87% <0.00%> (ø)
...org/apache/pinot/common/function/FunctionInfo.java 73.33% <ø> (ø)
...java/org/apache/pinot/common/segment/ReadMode.java 66.66% <ø> (ø)
...org/apache/pinot/common/utils/CommonConstants.java 39.02% <0.00%> (+0.92%) ⬆️
...troller/helix/core/retention/RetentionManager.java 76.05% <0.00%> (-3.12%) ⬇️
...he/pinot/controller/util/AutoAddInvertedIndex.java 0.00% <0.00%> (ø)
.../org/apache/pinot/core/common/BaseBlockValSet.java 3.03% <0.00%> (-1.32%) ⬇️
.../java/org/apache/pinot/core/common/DataSource.java 100.00% <ø> (ø)
...data/manager/realtime/DefaultSegmentCommitter.java 80.00% <ø> (ø)
...e/data/manager/realtime/SplitSegmentCommitter.java 0.00% <ø> (-63.64%) ⬇️
... and 463 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33578ac...535d8c0. Read the comment docs.

@xiangfu0 xiangfu0 requested review from Jackie-Jiang, kishoreg and mayankshriv and removed request for Jackie-Jiang June 14, 2020 20:19
.checkArgument(numExpressions <= 2 && numExpressions >= 1, "DistinctCountHLL expects 1 or 2 arguments, got: ",
numExpressions);
if (arguments.size() == 2) {
_log2M = Integer.valueOf(arguments.get(1).replace("'", ""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to replace single quote here

Copy link
Contributor Author

@xiangfu0 xiangfu0 Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer.valueOf(...) will throw exception on string "'1'"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why would someone put '1' here? In order to put '1' as the literal, you need to explicitly escape ' (i.e. DistinctCountHLL(column, '''1''')). Also, in that case we should fail the query because it is invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from sql side, we are still doing DistinctCountHLL(column, 1), this expression is parsed to long literal, then converted to string literal in BrokerRequest with single quotes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right.. Can we add a TODO and fix it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

Currently PinotQuery2BrokerRequestConverter enforces single quoted non-string literal in ParserUtils.standardizeExpression(...).

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the argument handling of '

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tunable Hyperloglog object size
3 participants