PostgreSQL 14 preview - date_bin 任意起点, 任意bucket(interval) split bucket align 统计 - 时序场景(iot, 金融等), 关联timescaledb
digoal
2021-03-25
PostgreSQL , date_bin , 截断 , 统计
常用于BI系统, 从指定时间点开始, 按指定interval分割bucket, 输入一个ts返回它对应的bucket(这个bucket的开始时间), 通常用于group聚合统计. 如果能返回这是第几个bucket就更好了.
Add date_bin function
author Peter Eisentraut <peter@eisentraut.org>
Wed, 24 Mar 2021 15:16:14 +0000 (16:16 +0100)
committer Peter Eisentraut <peter@eisentraut.org>
Wed, 24 Mar 2021 15:18:24 +0000 (16:18 +0100)
commit 49ab61f0bdc93984a8d36b602f6f2a15f09ebcc7
tree 9002c71f5585965e1b3a685fe1edbd07d3e84d8e tree
parent 1509c6fc29c07d13c9a590fbd6f37c7576f58ba6 commit | diff
Add date_bin function
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.
Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-BIN
9.9.3. date_bin
The function date_bin “bins” the input timestamp into the specified interval (the stride) aligned with a specified origin.
Examples:
SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01');
Result: 2020-02-11 15:30:00
SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01 00:02:30');
Result: 2020-02-11 15:32:30
In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call, but the difference is that date_bin can truncate to an arbitrary interval.
The stride interval cannot contain units of month or larger.
+-- verify date_bin behaves the same as date_trunc for relevant intervals
+-- case 1: AD dates, origin < input
+SELECT
+ str,
+ interval,
+ date_trunc(str, ts) = date_bin(interval::interval, ts, timestamp '2001-01-01') AS equal
+FROM (
+ VALUES
+ ('week', '7 d'),
+ ('day', '1 d'),
+ ('hour', '1 h'),
+ ('minute', '1 m'),
+ ('second', '1 s'),
+ ('millisecond', '1 ms'),
+ ('microsecond', '1 us')
+) intervals (str, interval),
+(VALUES (timestamp '2020-02-29 15:44:17.71393')) ts (ts);
+ str | interval | equal
+-------------+----------+-------
+ week | 7 d | t
+ day | 1 d | t
+ hour | 1 h | t
+ minute | 1 m | t
+ second | 1 s | t
+ millisecond | 1 ms | t
+ microsecond | 1 us | t
+(7 rows)
+
+-- case 2: BC dates, origin < input
+SELECT
+ str,
+ interval,
+ date_trunc(str, ts) = date_bin(interval::interval, ts, timestamp '2000-01-01 BC') AS equal
+FROM (
+ VALUES
+ ('week', '7 d'),
+ ('day', '1 d'),
+ ('hour', '1 h'),
+ ('minute', '1 m'),
+ ('second', '1 s'),
+ ('millisecond', '1 ms'),
+ ('microsecond', '1 us')
+) intervals (str, interval),
+(VALUES (timestamp '0055-6-10 15:44:17.71393 BC')) ts (ts);
+ str | interval | equal
+-------------+----------+-------
+ week | 7 d | t
+ day | 1 d | t
+ hour | 1 h | t
+ minute | 1 m | t
+ second | 1 s | t
+ millisecond | 1 ms | t
+ microsecond | 1 us | t
+(7 rows)
+
+-- bin timestamps into arbitrary intervals
+SELECT
+ interval,
+ ts,
+ origin,
+ date_bin(interval::interval, ts, origin)
+FROM (
+ VALUES
+ ('15 days'),
+ ('2 hours'),
+ ('1 hour 30 minutes'),
+ ('15 minutes'),
+ ('10 seconds'),
+ ('100 milliseconds'),
+ ('250 microseconds')
+) intervals (interval),
+(VALUES (timestamp '2020-02-11 15:44:17.71393')) ts (ts),
+(VALUES (timestamp '2001-01-01')) origin (origin);
+ interval | ts | origin | date_bin
+-------------------+--------------------------------+--------------------------+--------------------------------
+ 15 days | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Thu Feb 06 00:00:00 2020
+ 2 hours | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 14:00:00 2020
+ 1 hour 30 minutes | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:00:00 2020
+ 15 minutes | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:30:00 2020
+ 10 seconds | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:44:10 2020
+ 100 milliseconds | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:44:17.7 2020
+ 250 microseconds | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:44:17.71375 2020
+(7 rows)
+
+-- shift bins using the origin parameter:
+SELECT date_bin('5 min'::interval, timestamp '2020-02-01 01:01:01', timestamp '2020-02-01 00:02:30');
+ date_bin
+--------------------------
+ Sat Feb 01 00:57:30 2020
+(1 row)
+
+-- disallow intervals with months or years
+SELECT date_bin('5 months'::interval, timestamp '2020-02-01 01:01:01', timestamp '2001-01-01');
+ERROR: timestamps cannot be binned into intervals containing months or years
+SELECT date_bin('5 years'::interval, timestamp '2020-02-01 01:01:01', timestamp '2001-01-01');
+ERROR: timestamps cannot be binned into intervals containing months or years
timescaledb例子, 使用PG 14新增函数就可以做group by得到类似time_bucket的效果
-- For a specific machine, what are its avg, min, and max temperature
-- readings over time to ensure it's in proper operating range?
SELECT time_bucket('10 seconds', time) AS period,
min(temperature) AS min_temp,
avg(temperature) AS avg_temp,
max(temperature) AS max_temp,
FROM measurements
WHERE machine_id = 'C931baF7'
AND time > NOW() - interval '150s'
GROUP BY period
ORDER BY period DESC;
| period | min_temp | avg_temp | max_temp |
|------------|----------|----------|----------|
| 1499792590 | 52 | 50 | 54 |
| 1499792580 | 52 | 50 | 54 |
| 1499792570 | 51 | 50 | 54 |
| 1499792560 | 50 | 48 | 52 |
| 1499792550 | 50 | 48 | 51 |
| 1499792540 | 49 | 48 | 51 |
| 1499792530 | 50 | 47 | 52 |
| 1499792520 | 50 | 48 | 51 |
| 1499792510 | 51 | 48 | 53 |
| 1499792500 | 51 | 50 | 54 |
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.