-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCAPI-7460: histogram function description #6235
Conversation
@@ -2,6 +2,44 @@ | |||
|
|||
Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments. | |||
|
|||
## histogram | |||
|
|||
Calculates a histogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's an adaptive histogram which is quite unusual and worth mentioning in the very beginning, also afaik it's not guaranteed to be 100% accurate
histogram(number_of_bins)(values) | ||
``` | ||
|
||
The functions uses [A Streaming Parallel Decision Tree Algorithm](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). It calculates the borders of histogram bins automatically, and in common case the widths of bins are not equal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Borders are not just calculated, they are constantly adjusted along the way
**Parameters** | ||
|
||
`number_of_bins` — Number of bins for the histogram. | ||
`values` — [Expression](../syntax.md#syntax-expressions) resulting in a data sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"data sample" sounds out of place, maybe "input values" or something?
|
||
**Parameters** | ||
|
||
`number_of_bins` — Number of bins for the histogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you checked what happens you run this on table with less rows than number_of_bins
? probably it's the upper limit for number of bins, not the guaranteed number
SELECT histogram(5)(number + 1) FROM (SELECT * FROM system.numbers LIMIT 20) | ||
``` | ||
```text | ||
┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can be visualised using bar(...)
function, would be useful example
@BayoNet ping |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Category (leave one):