CLICKHOUSE-3547 streaming histogram aggregation#2521
CLICKHOUSE-3547 streaming histogram aggregation#2521alexey-milovidov merged 22 commits intoClickHouse:masterfrom
Conversation
242371c to
a9333b3
Compare
There was a problem hiding this comment.
Macro is used for single case. Why don't write bins_count = applyVisitor(FieldVisitorConvertToNumber<UInt32>(), params[0]);?
There was a problem hiding this comment.
It's better to add ErrorCode.
It's not obvious that bins count is a parameter (not an argument), and we must specify it. For example:
:) select histogram(number) from (select * from system.numbers limit 20);
SELECT histogram(number)
FROM
(
SELECT *
FROM system.numbers
LIMIT 20
)
Received exception from server (version 1.1.54386):
Code: 0. DB::Exception: Received from localhost:9004, ::1. DB::Exception: Function histogram requires only bins count.
There was a problem hiding this comment.
Why do we use int here? max_bins and size are UInt32.
There was a problem hiding this comment.
We will use foreign memory if max_bins == 0.
There was a problem hiding this comment.
max_bins=0 is strange case. I think function should accept only positive bin count.
There was a problem hiding this comment.
Also we need to check that argument is numeric.
…into ssmike-CLICKHOUSE-3547
|
Simple test for performance and precision: it's about 8.5 million rows/sec per single core. |
Looks like the results are wrong: |
|
|
The values don't look real on this query: |
|
TODO: allocate all state inplace, get rid of |
|
The type of parameter is not checked: |
|
As we preallocate memory for maximum size of histogram, better to limit its size by 256 (or maybe 1000). |
|
Incorrect error message: |
|
This should throw an exception: |
|
SELECT topK(0.2)(number) FROM (SELECT * FROM system.numbers LIMIT 50) Received exception from server (version 1.1.54387): ¯\_(ツ)_/¯ |
|
Ok, I've fixed limits. But I don't understand what you expect to see in histogram for one value. |
efbe03f to
877acef
Compare
|
I am confused, how to use it for following scenerio I have table with two columns such as route name and response time as duration. I want to see the histogram of response time with maximum duration of 5 second with 10 buckets. is this possible? |
Here we applied |
|
@alexey-milovidov Awesome. That makes sense. Let's say If don't use the Here I can understand that 10 ranges will come, but what is the ceiling limit? |
|
Histogram will span between actual minimum and maximum of data value. |
|
@alexey-milovidov Great. Thanks |
|
Hi, is it possible to histogram return fixed size ranges within the number of buckets specified for example 0-5,5-10,10-15,... that would be very helpful Thanks |
|
@alexey-milovidov We have the following table We could like something like We would like to do get both the sum of count and the histogram together |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en