New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'-Resample' combinator limitations #13493
Comments
Also, it needs to be investigated if the Resample combinator works with the DateTime type at all. |
Can you provide me an example on how to resample n min bar? Likes
|
It depends on what you want to get in the answer - that's why it called "combinator" :) In example, if you want to get total (summary) amount of "something" for every 7 minutes between 2020-01-02 00:00:00 and 2020-01-02 01:10:00 (yyyy-MM-dd hh:mm:ss example, 70 minutes interval), you need to:
SELECT sumResample(1577923200, 1577927400, 420)(amount, datetime) FROM Test; So, for dataset like this: INSERT INTO Test VALUES ('2020-01-02 00:00:00', 123.11, 12, 10), ('2020-01-02 00:01:10', 245.12, 6, 5), ('2020-01-02 00:02:20', 789.13, 7, 11), ('2020-01-02 00:56:13', 1001.1, 31, 156), ('2020-01-02 00:57:11', 1.23, 57, 13), ('2020-01-03 00:00:00', 111.22, 12, 11); We'll get: SELECT sumResample(1577923200, 1577927400, 420)(amount, datetime)
FROM Test
┌─sumResample(1577912400, 1577916600, 420)(amount, datetime)─┐
│ [26,0,0,0,0,0,0,0,169,0] │
└────────────────────────────────────────────────────────────┘ From my point of view, the issue is in the first two steps - ClickHouse can precalculate it easily for us and it would be great if we could write queries like: SELECT sumResample('2020-01-02 00:00:00', '2020-01-02 01:10:00', INTERVAL 7 MINUTE)(amount, datetime) FROM Test; |
It works now ? |
The first one - yes:
The second one - no, that's why issue is open:
|
Got it, Thanks for your reply, I will try the first one. |
When I try to add the interval between start and end,
and the resample could only operate one element? , I hope to get the max price and min price, open price, last_price in the n-minute bars, also I need the sum the amount and volume, here is the 1min resample code link if you can, could you give me some suggestions? I am new to clickhouse. |
This exception occurs when we try to split our interval to more, than 4096 buckets (probably, this and this lines are responsible for this). If there are more, than 4096 buckets in your case, probably the -Resample combinator isn't suitable for you, so you can use some of the other approaches. WITH
toDateTime('2020-01-02 00:00:00') AS StartTimestamp,
toDateTime('2020-01-02 01:10:00') AS EndTimestamp,
7 * 60 AS BucketSize,
datetime - StartTimestamp AS RelativeDatetime,
intDiv(RelativeDatetime, BucketSize) AS slot
SELECT
StartTimestamp + slot * BucketSize as StartOfBucket,
StartTimestamp + (slot+1) * BucketSize as EndOfBucket,
max(last_price) AS HighestPrice,
min(last_price) AS LowestPrice,
argMin(last_price, datetime) AS OpenPrice,
argMax(last_price, datetime) AS LatestPrice
FROM Test
WHERE (datetime >= StartTimestamp) AND (datetime < EndTimestamp)
GROUP BY slot The result is: ┌───────StartOfBucket─┬─────────EndOfBucket─┬─HighestPrice─┬─LowestPrice─┬─OpenPrice─┬─LatestPrice─┐
│ 2020-01-02 00:00:00 │ 2020-01-02 00:07:00 │ 789.13 │ 123.11 │ 123.11 │ 789.13 │
│ 2020-01-02 00:56:00 │ 2020-01-02 01:03:00 │ 1001.1 │ 1.23 │ 1001.1 │ 1.23 │
└─────────────────────┴─────────────────────┴──────────────┴─────────────┴───────────┴─────────────┘ Compare with the usage of WITH
toDateTime('2020-01-02 00:00:00') AS StartTimestamp,
toDateTime('2020-01-02 01:10:00') AS EndTimestamp,
7 * 60 AS BucketSize,
toStartOfInterval(datetime, toIntervalMinute(7)) AS StartOfBucket
SELECT
StartOfBucket,
StartOfBucket + BucketSize AS EndOfBucket,
max(last_price) AS HighestPrice,
min(last_price) AS LowestPrice,
argMin(last_price, datetime) AS OpenPrice,
argMax(last_price, datetime) AS LatestPrice
FROM Test
WHERE (datetime >= StartTimestamp) AND (datetime < EndTimestamp)
GROUP BY StartOfBucket
ORDER BY StartOfBucket ASC
┌───────StartOfBucket─┬─────────EndOfBucket─┬─HighestPrice─┬─LowestPrice─┬─OpenPrice─┬─LatestPrice─┐
│ 2020-01-01 23:58:00 │ 2020-01-02 00:05:00 │ 789.13 │ 123.11 │ 123.11 │ 789.13 │
│ 2020-01-02 00:54:00 │ 2020-01-02 01:01:00 │ 1001.1 │ 1.23 │ 1001.1 │ 1.23 │
└─────────────────────┴─────────────────────┴──────────────┴─────────────┴───────────┴─────────────┘
P. S. The above queries may be incorrect (and, of course, can be rewritten in the shorter and more correct way)! |
really thanks for your reply, you are so handsome and cool ~!! ps: Have a good day, |
You're welcome :) WITH
toDateTime('2020-01-02 00:00:00') AS StartTimestamp,
toDateTime('2020-01-02 01:10:00') AS EndTimestamp,
7 * 60 AS BucketSize
SELECT
StartTimestamp + (slot * BucketSize) AS StartOfBucket,
StartTimestamp + ((slot + 1) * BucketSize) AS EndOfBucket,
HighestPrice,
LowestPrice,
OpenPrice,
LatestPrice
FROM
(
WITH
toDateTime('2020-01-02 00:00:00') AS StartTimestamp,
toDateTime('2020-01-02 01:10:00') AS EndTimestamp,
7 * 60 AS BucketSize,
datetime - StartTimestamp AS RelativeDatetime,
intDiv(RelativeDatetime, BucketSize) AS slot
SELECT
slot,
max(last_price) AS HighestPrice,
min(last_price) AS LowestPrice,
argMin(last_price, datetime) AS OpenPrice,
argMax(last_price, datetime) AS LatestPrice
FROM Test
WHERE (datetime >= StartTimestamp) AND (datetime < EndTimestamp)
GROUP BY slot
)
ANY RIGHT JOIN
(
WITH
toDateTime('2020-01-02 00:00:00') AS StartTimestamp,
toDateTime('2020-01-02 01:10:00') AS EndTimestamp,
7 * 60 AS BucketSize
SELECT toInt32(number) AS slot
FROM system.numbers
LIMIT intDiv(EndTimestamp - StartTimestamp, BucketSize)
) USING (slot)
ORDER BY slot ASC The result will be: ┌───────StartOfBucket─┬─────────EndOfBucket─┬─HighestPrice─┬─LowestPrice─┬─OpenPrice─┬─LatestPrice─┐
│ 2020-01-02 00:00:00 │ 2020-01-02 00:07:00 │ 789.13 │ 123.11 │ 123.11 │ 789.13 │
│ 2020-01-02 00:07:00 │ 2020-01-02 00:14:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:14:00 │ 2020-01-02 00:21:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:21:00 │ 2020-01-02 00:28:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:28:00 │ 2020-01-02 00:35:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:35:00 │ 2020-01-02 00:42:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:42:00 │ 2020-01-02 00:49:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:49:00 │ 2020-01-02 00:56:00 │ 0 │ 0 │ 0 │ 0 │
│ 2020-01-02 00:56:00 │ 2020-01-02 01:03:00 │ 1001.1 │ 1.23 │ 1001.1 │ 1.23 │
│ 2020-01-02 01:03:00 │ 2020-01-02 01:10:00 │ 0 │ 0 │ 0 │ 0 │
└─────────────────────┴─────────────────────┴──────────────┴─────────────┴───────────┴─────────────┘ |
ok got it ~ |
and I can use the same to resample price, if price up or down 4, right? |
Probably I didn't quite understand what you meant, could you please describe it? |
the price is
When the difference between the lowest price and the highest price is more than 4, it should be regarded as a bar |
Do you want to resample by the DateTime or the other column (price, smth else) or both? |
not both but single, maybe price or volume, it's important to analyze them, |
Describe the issue
Aggregate functions with -Resample combinator don't work with some expressions that can be pre-evaluated and converted to constants.
How to reproduce
Following queries give the same exception:
Expected behavior
Error message and/or stacktrace
Code: 134. DB::Exception: Received from localhost:19019. DB::Exception: Parameters to aggregate functions must be literals.
The text was updated successfully, but these errors were encountered: