'-Resample' combinator limitations #13493

RikeVoltz · 2020-08-07T15:53:29Z

Describe the issue
Aggregate functions with -Resample combinator don't work with some expressions that can be pre-evaluated and converted to constants.

How to reproduce

CREATE TABLE IF NOT EXISTS Test
(
    `time` DateTime, 
    `id` Int32
)
ENGINE = Memory();
INSERT INTO Test VALUES ('2020-08-07 01:03:05',1), ('2020-08-07 01:03:06',2), 
                        ('2020-08-07 04:08:12',3), ('2020-08-07 15:06:13',4), 
                        ('2020-08-07 15:06:14',5), ('2020-08-07 15:06:15',6);

Following queries give the same exception:

SELECT countResample(toDateTime('2020-08-07 00:00:00'), toDateTime('2020-08-08 00:00:00'), INTERVAL 1 HOUR)(id, time) FROM Test;
SELECT countResample(toUnixTimestamp('2020-08-07 00:00:00'), toUnixTimestamp('2020-08-08 00:00:00'), INTERVAL 1 HOUR)(id, time) FROM Test;
SELECT countResample(toUnixTimestamp('2020-08-07 00:00:00'), toUnixTimestamp('2020-08-08 00:00:00'), 3600)(id, time) FROM Test;
SELECT countResample(1596747600, 1596834000, INTERVAL 1 HOUR)(id, time) FROM Test;

Expected behavior

SELECT countResample(1596747600, 1596834000, 3600)(id, time) FROM Test;

┌─countResample(1596747600, 1596834000, 3600)(id, time)─┐
│ [0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0]     │
└───────────────────────────────────────────────────────┘

Error message and/or stacktrace
Code: 134. DB::Exception: Received from localhost:19019. DB::Exception: Parameters to aggregate functions must be literals.

The text was updated successfully, but these errors were encountered:

RikeVoltz · 2020-08-07T16:00:49Z

Also, it needs to be investigated if the Resample combinator works with the DateTime type at all.

somewheve · 2020-08-12T04:49:58Z

Can you provide me an example on how to resample n min bar?

Likes

CREATE TABLE IF NOT EXISTS Test
(
    `datetime` DateTime, 
    `last_price` Float64,
    `volume`  Int32,
    `amount`  Int32
)

RikeVoltz · 2020-08-12T06:41:46Z

Can you provide me an example on how to resample n min bar?

It depends on what you want to get in the answer - that's why it called "combinator" :)

In example, if you want to get total (summary) amount of "something" for every 7 minutes between 2020-01-02 00:00:00 and 2020-01-02 01:10:00 (yyyy-MM-dd hh:mm:ss example, 70 minutes interval), you need to:

Convert DateTime to Unix timestamp (2020-01-02 00:00:00=1577923200, 2020-01-02 01:10:00=1577927400);
Convert bucket size to seconds (7 min = 420 sec)
Use -Resample combinator with sum aggregate function:

SELECT sumResample(1577923200, 1577927400, 420)(amount, datetime) FROM Test;

So, for dataset like this:

INSERT INTO Test VALUES ('2020-01-02 00:00:00', 123.11, 12, 10), ('2020-01-02 00:01:10', 245.12, 6, 5), ('2020-01-02 00:02:20', 789.13, 7, 11), ('2020-01-02 00:56:13', 1001.1, 31, 156), ('2020-01-02 00:57:11', 1.23, 57, 13), ('2020-01-03 00:00:00', 111.22, 12, 11);

We'll get:

SELECT sumResample(1577923200, 1577927400, 420)(amount, datetime)
FROM Test

┌─sumResample(1577912400, 1577916600, 420)(amount, datetime)─┐
│ [26,0,0,0,0,0,0,0,169,0]                                   │
└────────────────────────────────────────────────────────────┘

From my point of view, the issue is in the first two steps - ClickHouse can precalculate it easily for us and it would be great if we could write queries like:

SELECT sumResample('2020-01-02 00:00:00', '2020-01-02 01:10:00', INTERVAL 7 MINUTE)(amount, datetime) FROM Test;

somewheve · 2020-08-12T06:52:39Z

It works now ?

RikeVoltz · 2020-08-12T06:55:29Z

The first one - yes:

SELECT sumResample(1577923200, 1577927400, 420)(amount, datetime) FROM Test

The second one - no, that's why issue is open:

SELECT sumResample('2020-01-02 00:00:00', '2020-01-02 01:10:00', INTERVAL 7 MINUTE)(amount, datetime) FROM Test;

somewheve · 2020-08-12T06:56:52Z

Got it, Thanks for your reply, I will try the first one.

somewheve · 2020-08-12T08:09:30Z

When I try to add the interval between start and end,

infi.clickhouse_orm.database.ServerError: The range given in function sumResample contains too many elements (version 20.5.4.40 (official build)) (69)

and the resample could only operate one element? , I hope to get the max price and min price, open price, last_price in the n-minute bars, also I need the sum the amount and volume,

here is the 1min resample code link

if you can, could you give me some suggestions? I am new to clickhouse.

RikeVoltz · 2020-08-12T10:40:43Z

infi.clickhouse_orm.database.ServerError: The range given in function sumResample contains too many elements (version 20.5.4.40 (official build)) (69)

This exception occurs when we try to split our interval to more, than 4096 buckets (probably, this and this lines are responsible for this). If there are more, than 4096 buckets in your case, probably the -Resample combinator isn't suitable for you, so you can use some of the other approaches.
One of them is to use some function from toStartOf family of functions, as supposed in the SO solution, you provided (In our example, probably toStartOfInterval(datetime, INTERVAL 7 MINUTE) (function from ClickHouse docs) will be the best one), but IMHO it's not the fully correct solution, because we can't control StartTimestamp of the first bucket (toStartOfInterval returns DateTime, that is divisible by the interval in seconds, but our start of the interval may not).
So, not the best, but the way more correct solution is to split timestamps by ourselves.

    WITH 
        toDateTime('2020-01-02 00:00:00') AS StartTimestamp, 
        toDateTime('2020-01-02 01:10:00') AS EndTimestamp, 
        7 * 60 AS BucketSize, 
        datetime - StartTimestamp AS RelativeDatetime, 
        intDiv(RelativeDatetime, BucketSize) AS slot
    SELECT 
        StartTimestamp + slot * BucketSize as StartOfBucket, 
        StartTimestamp + (slot+1) * BucketSize as EndOfBucket, 
        max(last_price) AS HighestPrice, 
        min(last_price) AS LowestPrice, 
        argMin(last_price, datetime) AS OpenPrice, 
        argMax(last_price, datetime) AS LatestPrice
    FROM Test
    WHERE (datetime >= StartTimestamp) AND (datetime < EndTimestamp)
    GROUP BY slot

The result is:

┌───────StartOfBucket─┬─────────EndOfBucket─┬─HighestPrice─┬─LowestPrice─┬─OpenPrice─┬─LatestPrice─┐
│ 2020-01-02 00:00:00 │ 2020-01-02 00:07:00 │       789.13 │      123.11 │    123.11 │      789.13 │
│ 2020-01-02 00:56:00 │ 2020-01-02 01:03:00 │       1001.1 │        1.23 │    1001.1 │        1.23 │
└─────────────────────┴─────────────────────┴──────────────┴─────────────┴───────────┴─────────────┘

Compare with the usage of toStartOfInterval function:

WITH 
    toDateTime('2020-01-02 00:00:00') AS StartTimestamp, 
    toDateTime('2020-01-02 01:10:00') AS EndTimestamp, 
    7 * 60 AS BucketSize, 
    toStartOfInterval(datetime, toIntervalMinute(7)) AS StartOfBucket
SELECT 
    StartOfBucket, 
    StartOfBucket + BucketSize AS EndOfBucket, 
    max(last_price) AS HighestPrice, 
    min(last_price) AS LowestPrice, 
    argMin(last_price, datetime) AS OpenPrice, 
    argMax(last_price, datetime) AS LatestPrice
FROM Test
WHERE (datetime >= StartTimestamp) AND (datetime < EndTimestamp)
GROUP BY StartOfBucket
ORDER BY StartOfBucket ASC

┌───────StartOfBucket─┬─────────EndOfBucket─┬─HighestPrice─┬─LowestPrice─┬─OpenPrice─┬─LatestPrice─┐
│ 2020-01-01 23:58:00 │ 2020-01-02 00:05:00 │       789.13 │      123.11 │    123.11 │      789.13 │
│ 2020-01-02 00:54:00 │ 2020-01-02 01:01:00 │       1001.1 │        1.23 │    1001.1 │        1.23 │
└─────────────────────┴─────────────────────┴──────────────┴─────────────┴───────────┴─────────────┘

P. S. The above queries may be incorrect (and, of course, can be rewritten in the shorter and more correct way)!

somewheve · 2020-08-12T10:42:42Z

really thanks for your reply, you are so handsome and cool ~!!

ps: Have a good day,

RikeVoltz · 2020-08-12T10:48:48Z

really thanks for your reply, you are so handsome and cool ~!!

ps: Have a good day,

You're welcome :)
P. S.: If you want empty buckets too, you can use trick with join:

WITH 
    toDateTime('2020-01-02 00:00:00') AS StartTimestamp, 
    toDateTime('2020-01-02 01:10:00') AS EndTimestamp, 
    7 * 60 AS BucketSize
SELECT 
    StartTimestamp + (slot * BucketSize) AS StartOfBucket, 
    StartTimestamp + ((slot + 1) * BucketSize) AS EndOfBucket, 
    HighestPrice, 
    LowestPrice, 
    OpenPrice, 
    LatestPrice
FROM 
(
    WITH 
        toDateTime('2020-01-02 00:00:00') AS StartTimestamp, 
        toDateTime('2020-01-02 01:10:00') AS EndTimestamp, 
        7 * 60 AS BucketSize, 
        datetime - StartTimestamp AS RelativeDatetime, 
        intDiv(RelativeDatetime, BucketSize) AS slot
    SELECT 
        slot, 
        max(last_price) AS HighestPrice, 
        min(last_price) AS LowestPrice, 
        argMin(last_price, datetime) AS OpenPrice, 
        argMax(last_price, datetime) AS LatestPrice
    FROM Test
    WHERE (datetime >= StartTimestamp) AND (datetime < EndTimestamp)
    GROUP BY slot
)
ANY RIGHT JOIN 
(
    WITH 
        toDateTime('2020-01-02 00:00:00') AS StartTimestamp, 
        toDateTime('2020-01-02 01:10:00') AS EndTimestamp, 
        7 * 60 AS BucketSize
    SELECT toInt32(number) AS slot
    FROM system.numbers
    LIMIT intDiv(EndTimestamp - StartTimestamp, BucketSize)
) USING (slot)
ORDER BY slot ASC

The result will be:

┌───────StartOfBucket─┬─────────EndOfBucket─┬─HighestPrice─┬─LowestPrice─┬─OpenPrice─┬─LatestPrice─┐
│ 2020-01-02 00:00:00 │ 2020-01-02 00:07:00 │       789.13 │      123.11 │    123.11 │      789.13 │
│ 2020-01-02 00:07:00 │ 2020-01-02 00:14:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:14:00 │ 2020-01-02 00:21:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:21:00 │ 2020-01-02 00:28:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:28:00 │ 2020-01-02 00:35:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:35:00 │ 2020-01-02 00:42:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:42:00 │ 2020-01-02 00:49:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:49:00 │ 2020-01-02 00:56:00 │            0 │           0 │         0 │           0 │
│ 2020-01-02 00:56:00 │ 2020-01-02 01:03:00 │       1001.1 │        1.23 │    1001.1 │        1.23 │
│ 2020-01-02 01:03:00 │ 2020-01-02 01:10:00 │            0 │           0 │         0 │           0 │
└─────────────────────┴─────────────────────┴──────────────┴─────────────┴───────────┴─────────────┘

somewheve · 2020-08-12T10:51:15Z

ok got it ~

somewheve · 2020-08-12T10:52:51Z

and I can use the same to resample price, if price up or down 4, right?

RikeVoltz · 2020-08-12T11:01:04Z

Probably I didn't quite understand what you meant, could you please describe it?

somewheve · 2020-08-12T11:10:49Z

the price is

[ 3001, 3002, 3004,3006, 3007, 3008, 3009, 3001, 3002, 3002, 3003, 3004,3000] 
# I want to resample the bar to  
[3001, 3002,3004, 3006] , [ 3007, 3008, 3009, 3001] , [ 3002, 3002, 3003, 3000]

When the difference between the lowest price and the highest price is more than 4， it should be regarded as a bar
also calculate the max, min, open, close and sum(volume) and sum(amount)

RikeVoltz · 2020-08-12T11:28:12Z

Do you want to resample by the DateTime or the other column (price, smth else) or both?

somewheve · 2020-08-12T11:30:36Z

not both but single, maybe price or volume, it's important to analyze them,

RikeVoltz added the usability label Aug 7, 2020

RikeVoltz closed this as completed Aug 12, 2020

RikeVoltz reopened this Aug 12, 2020

somewheve mentioned this issue Sep 17, 2020

How to split a data area by an accumulation value #14930

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'-Resample' combinator limitations #13493

'-Resample' combinator limitations #13493

RikeVoltz commented Aug 7, 2020

RikeVoltz commented Aug 7, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020 •

edited

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020

somewheve commented Aug 12, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020

somewheve commented Aug 12, 2020 •

edited

RikeVoltz commented Aug 12, 2020 •

edited

somewheve commented Aug 12, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020 •

edited

somewheve commented Aug 12, 2020

'-Resample' combinator limitations #13493

'-Resample' combinator limitations #13493

Comments

RikeVoltz commented Aug 7, 2020

RikeVoltz commented Aug 7, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020 • edited

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020

somewheve commented Aug 12, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020

somewheve commented Aug 12, 2020 • edited

RikeVoltz commented Aug 12, 2020 • edited

somewheve commented Aug 12, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020 • edited

somewheve commented Aug 12, 2020

RikeVoltz commented Aug 12, 2020 •

edited

somewheve commented Aug 12, 2020 •

edited

RikeVoltz commented Aug 12, 2020 •

edited

RikeVoltz commented Aug 12, 2020 •

edited