Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCAPI-7460: histogram function description #6235

Merged
merged 4 commits into from
Aug 21, 2019

Conversation

BayoNet
Copy link
Contributor

@BayoNet BayoNet commented Jul 31, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Category (leave one):

@BayoNet BayoNet added comp-documentation Documentation pr-documentation Documentation PRs for the specific code PR labels Jul 31, 2019
@@ -2,6 +2,44 @@

Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters – constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.

## histogram

Calculates a histogram.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an adaptive histogram which is quite unusual and worth mentioning in the very beginning, also afaik it's not guaranteed to be 100% accurate

histogram(number_of_bins)(values)
```

The functions uses [A Streaming Parallel Decision Tree Algorithm](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). It calculates the borders of histogram bins automatically, and in common case the widths of bins are not equal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borders are not just calculated, they are constantly adjusted along the way

**Parameters**

`number_of_bins` — Number of bins for the histogram.
`values` — [Expression](../syntax.md#syntax-expressions) resulting in a data sample.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"data sample" sounds out of place, maybe "input values" or something?


**Parameters**

`number_of_bins` — Number of bins for the histogram.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you checked what happens you run this on table with less rows than number_of_bins? probably it's the upper limit for number of bins, not the guaranteed number

SELECT histogram(5)(number + 1) FROM (SELECT * FROM system.numbers LIMIT 20)
```
```text
┌─histogram(5)(plus(number, 1))───────────────────────────────────────────┐
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably can be visualised using bar(...) function, would be useful example

@blinkov
Copy link
Contributor

blinkov commented Aug 12, 2019

@BayoNet ping

@BayoNet BayoNet changed the title En docs/docapi 7460 DOCAPI-7460: histogram function description Aug 21, 2019
@BayoNet BayoNet merged commit 1536d42 into ClickHouse:master Aug 21, 2019
@BayoNet BayoNet deleted the en-docs/DOCAPI-7460 branch August 21, 2019 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-documentation Documentation pr-documentation Documentation PRs for the specific code PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants