Skip to content

Latest commit

 

History

History
202 lines (170 loc) · 7.96 KB

20210325_01.md

File metadata and controls

202 lines (170 loc) · 7.96 KB

PostgreSQL 14 preview - date_bin 任意起点, 任意bucket(interval) split bucket align 统计 - 时序场景(iot, 金融等), 关联timescaledb

作者

digoal

日期

2021-03-25

标签

PostgreSQL , date_bin , 截断 , 统计


背景

常用于BI系统, 从指定时间点开始, 按指定interval分割bucket, 输入一个ts返回它对应的bucket(这个bucket的开始时间), 通常用于group聚合统计. 如果能返回这是第几个bucket就更好了.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=49ab61f0bdc93984a8d36b602f6f2a15f09ebcc7

Add date_bin function  
author	Peter Eisentraut <peter@eisentraut.org>	  
Wed, 24 Mar 2021 15:16:14 +0000 (16:16 +0100)  
committer	Peter Eisentraut <peter@eisentraut.org>	  
Wed, 24 Mar 2021 15:18:24 +0000 (16:18 +0100)  
commit	49ab61f0bdc93984a8d36b602f6f2a15f09ebcc7  
tree	9002c71f5585965e1b3a685fe1edbd07d3e84d8e	tree  
parent	1509c6fc29c07d13c9a590fbd6f37c7576f58ba6	commit | diff  
Add date_bin function  
  
Similar to date_trunc, but allows binning by an arbitrary interval  
rather than just full units.  
  
Author: John Naylor <john.naylor@enterprisedb.com>  
Reviewed-by: David Fetter <david@fetter.org>  
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>  
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>  
Reviewed-by: Artur Zakirov <zaartur@gmail.com>  
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com  

https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-BIN

9.9.3. date_bin
The function date_bin “bins” the input timestamp into the specified interval (the stride) aligned with a specified origin.

Examples:

SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01');
Result: 2020-02-11 15:30:00

SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01 00:02:30');
Result: 2020-02-11 15:32:30

In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call, but the difference is that date_bin can truncate to an arbitrary interval.

The stride interval cannot contain units of month or larger.

+-- verify date_bin behaves the same as date_trunc for relevant intervals  
+-- case 1: AD dates, origin < input  
+SELECT  
+  str,  
+  interval,  
+  date_trunc(str, ts) = date_bin(interval::interval, ts, timestamp '2001-01-01') AS equal  
+FROM (  
+  VALUES  
+  ('week', '7 d'),  
+  ('day', '1 d'),  
+  ('hour', '1 h'),  
+  ('minute', '1 m'),  
+  ('second', '1 s'),  
+  ('millisecond', '1 ms'),  
+  ('microsecond', '1 us')  
+) intervals (str, interval),  
+(VALUES (timestamp '2020-02-29 15:44:17.71393')) ts (ts);  
+     str     | interval | equal   
+-------------+----------+-------  
+ week        | 7 d      | t  
+ day         | 1 d      | t  
+ hour        | 1 h      | t  
+ minute      | 1 m      | t  
+ second      | 1 s      | t  
+ millisecond | 1 ms     | t  
+ microsecond | 1 us     | t  
+(7 rows)  
+  
+-- case 2: BC dates, origin < input  
+SELECT  
+  str,  
+  interval,  
+  date_trunc(str, ts) = date_bin(interval::interval, ts, timestamp '2000-01-01 BC') AS equal  
+FROM (  
+  VALUES  
+  ('week', '7 d'),  
+  ('day', '1 d'),  
+  ('hour', '1 h'),  
+  ('minute', '1 m'),  
+  ('second', '1 s'),  
+  ('millisecond', '1 ms'),  
+  ('microsecond', '1 us')  
+) intervals (str, interval),  
+(VALUES (timestamp '0055-6-10 15:44:17.71393 BC')) ts (ts);  
+     str     | interval | equal   
+-------------+----------+-------  
+ week        | 7 d      | t  
+ day         | 1 d      | t  
+ hour        | 1 h      | t  
+ minute      | 1 m      | t  
+ second      | 1 s      | t  
+ millisecond | 1 ms     | t  
+ microsecond | 1 us     | t  
+(7 rows)  
+  
+-- bin timestamps into arbitrary intervals  
+SELECT  
+  interval,  
+  ts,  
+  origin,  
+  date_bin(interval::interval, ts, origin)  
+FROM (  
+  VALUES  
+  ('15 days'),  
+  ('2 hours'),  
+  ('1 hour 30 minutes'),  
+  ('15 minutes'),  
+  ('10 seconds'),  
+  ('100 milliseconds'),  
+  ('250 microseconds')  
+) intervals (interval),  
+(VALUES (timestamp '2020-02-11 15:44:17.71393')) ts (ts),  
+(VALUES (timestamp '2001-01-01')) origin (origin);  
+     interval      |               ts               |          origin          |            date_bin              
+-------------------+--------------------------------+--------------------------+--------------------------------  
+ 15 days           | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Thu Feb 06 00:00:00 2020  
+ 2 hours           | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 14:00:00 2020  
+ 1 hour 30 minutes | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:00:00 2020  
+ 15 minutes        | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:30:00 2020  
+ 10 seconds        | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:44:10 2020  
+ 100 milliseconds  | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:44:17.7 2020  
+ 250 microseconds  | Tue Feb 11 15:44:17.71393 2020 | Mon Jan 01 00:00:00 2001 | Tue Feb 11 15:44:17.71375 2020  
+(7 rows)  
+  
+-- shift bins using the origin parameter:  
+SELECT date_bin('5 min'::interval, timestamp '2020-02-01 01:01:01', timestamp '2020-02-01 00:02:30');  
+         date_bin           
+--------------------------  
+ Sat Feb 01 00:57:30 2020  
+(1 row)  
+  
+-- disallow intervals with months or years  
+SELECT date_bin('5 months'::interval, timestamp '2020-02-01 01:01:01', timestamp '2001-01-01');  
+ERROR:  timestamps cannot be binned into intervals containing months or years  
+SELECT date_bin('5 years'::interval,  timestamp '2020-02-01 01:01:01', timestamp '2001-01-01');  
+ERROR:  timestamps cannot be binned into intervals containing months or years  

timescaledb例子, 使用PG 14新增函数就可以做group by得到类似time_bucket的效果

-- For a specific machine, what are its avg, min, and max temperature
-- readings over time to ensure it's in proper operating range?
SELECT time_bucket('10 seconds', time) AS period,
  min(temperature) AS min_temp,
  avg(temperature) AS avg_temp,
  max(temperature) AS max_temp,
FROM measurements
WHERE machine_id = 'C931baF7'
  AND time > NOW() - interval '150s'
GROUP BY period
ORDER BY period DESC;


| period     | min_temp | avg_temp | max_temp |
|------------|----------|----------|----------|
| 1499792590 | 52       | 50       | 54       |
| 1499792580 | 52       | 50       | 54       |
| 1499792570 | 51       | 50       | 54       |
| 1499792560 | 50       | 48       | 52       |
| 1499792550 | 50       | 48       | 51       |
| 1499792540 | 49       | 48       | 51       |
| 1499792530 | 50       | 47       | 52       |
| 1499792520 | 50       | 48       | 51       |
| 1499792510 | 51       | 48       | 53       |
| 1499792500 | 51       | 50       | 54       |

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat