-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](histogram) optimise aggregate function histogram #15317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
TeamCity pipeline, clickbench performance test result: |
5399f6b to
5c52be1
Compare
2ae0f44 to
56d3a82
Compare
be2d861 to
31992e5
Compare
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…statistics (#15490) Histogram statistics are more expensive to collect and we collect and persist them separately. This PR does the following work: 1. Add histogram syntax and add keyword `TABLE` 2. Add the task of collecting histogram statistics 3. Persistent histogram statistics 4. Replace fastjson with gson 5. Add unit tests... Relevant syntax examples: > Refer to some databases such as mysql and add the keyword `TABLE`. ```SQL -- collect column statistics ANALYZE TABLE statistics_test; -- collect histogram statistics ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2; ``` base on #15317
Proposed changes
This pr mainly to optimize the histogram(👉🏻 #14910) aggregation function. Including the following:
sample_rateandmax_bucket_numParameter description:
sample_rate:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.max_bucket_num:Optional. Limit the number of histogram buckets. The default value is 128.Example:
Query result description:
Field description:
Issue Number: close #xxx
Problem summary
Describe your changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...