Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ClickHouse] 按时间做累加的SQL怎么写? #11

Open
GarinZ opened this issue Jun 16, 2022 · 0 comments
Open

[ClickHouse] 按时间做累加的SQL怎么写? #11

GarinZ opened this issue Jun 16, 2022 · 0 comments

Comments

@GarinZ
Copy link
Owner

GarinZ commented Jun 16, 2022

点开文章的朋友们可以先看看讲的是不是你要的效果
初始数据:假设我们的表如下所示,时间戳(ts)我简化成了1到5的数字。

ts amount
1 10
2 5
3 3
4 10
5 5

处理结果:SQL执行完想变成这样

ts amount
1 10
2 15
3 18
4 22
5 27

我先放最终的完整SQL,然后再解释里面让人迷茫的Clickhouse函数。

SELECT ts, arraySum(arraySlice(total, 1, i)) as total_num
FROM (
  SELECT
    groupArray(ts) as t,
    groupArray(amount) as total
  FROM (
    SELECT ts, amount
    FROM [表名]
    WHERE ts >= [开始时间]
    AND ts <= [结束时间]
  )
)
ARRAY JOIN
  t as ts,
  arrayEnumerate(total) as i

STEP1:groupArray

里面除了SQL的基本语法之外,主要有4个看着就蛋疼的Clickhouse函数。我们从最内层往外看。

SELECT
  groupArray(ts) as t,
  groupArray(amount) as total
FROM (
  SELECT ts, amount
  FROM [表名]
  WHERE ts >= [开始时间]
  AND ts <= [结束时间]
)

这个SQL中groupArray比较陌生。这函数简单来说就是把列转成行,执行后的效果就是

t total
[1,2,3,4,5] [10,5,3,10,5]

STEP2:ARRAY JOIN

接下来只看ARRAY JOINarrayEnumerate,我把SQL中只保留ARRAY JOIN

SELECT t, total
FROM (
  SELECT
    groupArray(ts) as t,
    groupArray(amount) as total
  FROM (
    SELECT ts, amount
    FROM [表名]
    WHERE ts >= [开始时间]
    AND ts <= [结束时间]
  )
) ARRAY JOIN t

ARRAY JOIN的主要作用是把数组展开

t total
1 [10,5,3,10,5]
2 [10,5,3,10,5]
3 [10,5,3,10,5]
4 [10,5,3,10,5]
5 [10,5,3,10,5]

STEP3:带alias的ARRAY JOIN

原始SQL中还带了alias有啥用?当我们ARRAY JOIN t的时候,t在SQL中就代表了数组中的元素,而不是数组本身。那如果在查询结果中既包含元素、也包含数组本身怎么办?就得再加上alias了。s

SELECT t, total, ts
FROM (
  SELECT
    groupArray(ts) as t,
    groupArray(amount) as total
  FROM (
    SELECT ts, amount
    FROM [表名]
    WHERE ts >= [开始时间]
    AND ts <= [结束时间]
  )
) ARRAY JOIN t as ts

使用了alias的话,t就变成了数组的引用,ts就变成了数组元素的引用

t total ts
[1,2,3,4,5] [10,5,3,10,5] 1
[1,2,3,4,5] [10,5,3,10,5] 2
[1,2,3,4,5] [10,5,3,10,5] 3
[1,2,3,4,5] [10,5,3,10,5] 4
[1,2,3,4,5] [10,5,3,10,5] 5

STEP4:arrayEnumerate

arrayEnumerate这个方法就和row_number()一样,简单理解就是返回行号。

SELECT t, total, ts, i
FROM (
  SELECT
    groupArray(ts) as t,
    groupArray(amount) as total
  FROM (
    SELECT ts, amount
    FROM [表名]
    WHERE ts >= [开始时间]
    AND ts <= [结束时间]
  )
) ARRAY JOIN 
  t as ts,
  arrayEnumerate(total) as i

查询结果如下:

t total ts i
[1,2,3,4,5] [10,5,3,10,5] 1 1
[1,2,3,4,5] [10,5,3,10,5] 2 2
[1,2,3,4,5] [10,5,3,10,5] 3 3
[1,2,3,4,5] [10,5,3,10,5] 4 4
[1,2,3,4,5] [10,5,3,10,5] 5 5

STEP5:arraySlice和arraySum

那我们绕这么一圈对咱们的目标有啥帮助呢?憋着急,这时候就要用到arraySlicearraySum。我们再看看我们现在已经查询出来的结果。

t total ts i
[1,2,3,4,5] [10,5,3,10,5] 1 1
[1,2,3,4,5] [10,5,3,10,5] 2 2
[1,2,3,4,5] [10,5,3,10,5] 3 3
[1,2,3,4,5] [10,5,3,10,5] 4 4
[1,2,3,4,5] [10,5,3,10,5] 5 5

大家可能已经发现了,根据我们现有的数据,已经能从单行的数据中计算目标数据:

  • 第一行:10 = total[0]
  • 第二行:15 = total[0] + total[1]
  • 以此类推

咱们利用Clickhouse的函数就可以做这种处理:

  • arraySlice:截断数组,arraySlice(数组变量名称, 开始下标, 结束下标),注意开始下标从1开始,并且区间左闭右开[开始下标, 结束下标)。
  • arraySum:将数组中所有元素做加和。

所以咱们再看看这个SQL,截断每个total数组,截取到行号i的前一位。然后把数组加起来。

SELECT ts, arraySum(arraySlice(total, 1, i)) as total_num
FROM (
  SELECT
    groupArray(ts) as t,
    groupArray(amount) as total
  FROM (
    SELECT ts, amount
    FROM [表名]
    WHERE ts >= [开始时间]
    AND ts <= [结束时间]
  )
)
ARRAY JOIN
  t as ts,
  arrayEnumerate(total) as i

最终获得结果就是

ts total_num
1 10
2 15
3 18
4 22
5 27

结尾

按时间累加这个SQL第一次写的话我觉得还是挺绕的,大家可以把Clickhouse函数和细节放一边,先记忆查询结果的处理过程。

accumulation-by-time-range-query-view-steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant