PostgreSQL - 时序、IoT类场景 - first_value , last_value , agg , cte , window , recursive

作者

digoal

日期

2021-04-29

背景

知识点:

cte
window
recursive
agg
插件

TSDB的两个分析函数, 例如求分组的first_value和last_value

https://help.aliyun.com/document_detail/116041.html#h2-time_bucket-

示例：

first

根据device_id分组获得第一个温度值  
SELECT device_id, first(temp, time)  
FROM metrics  
GROUP BY device_id;  
警告⚠️: first 函数通过线性扫描计算结果，不使用索引。

last

根据device_id分组获得最后一个温度值  
SELECT device_id, last(temp, time)  
FROM metrics  
GROUP BY device_id;  
警告⚠️:last 函数通过线性扫描计算结果，不使用索引。

PostgreSQL 内置没有包含只取第一或最后一条的聚合函数.

https://www.postgresql.org/docs/current/functions-aggregate.html

《PostgreSQL aggregate function 1 : General-Purpose Aggregate Functions》
《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》
《PostgreSQL aggregate function 3 : Aggregate Functions for Ordered-Set》
《PostgreSQL aggregate function 4 : Hypothetical-Set Aggregate Functions》

但是有PostgreSQL插件可以支持, 也可以自定义, 如下:

https://wiki.postgresql.org/wiki/First/last_(aggregate)
http://pgxn.org/dist/first_last_agg/

还有更多统计和科学计算的函数可以在pgxn中找到

接下来展示2种内置方法:

创建测试表, 写入测试数据100万条

create table test (devid int, val float4, crt_time timestamp);   
insert into test select random()*20, random()*100, now()+((random()*2000::numeric)||' hour')::interval from generate_series(1,1000000);  
create index idx_test on test (devid, crt_time);

方法1、窗口实现

select devid,   
  min(val) filter (where rn1=1) as first_value,   
  min(val) filter (where rn2=1) as last_value  
from (  
  select   
    *,   
    row_number() over w1 as rn1,  
    row_number() over w2 as rn2   
  from  
  test   
    window   
      w1 as (partition by devid order by crt_time),  
      w2 as (partition by devid order by crt_time desc)  
) t  
group by devid;   
  
                                         QUERY PLAN                                            
---------------------------------------------------------------------------------------------  
 GroupAggregate  (cost=272322.19..314822.40 rows=21 width=12)  
   Group Key: test.devid  
   ->  WindowAgg  (cost=272322.19..292322.19 rows=1000000 width=32)  
         ->  Sort  (cost=272322.19..274822.19 rows=1000000 width=24)  
               Sort Key: test.devid, test.crt_time  
               ->  WindowAgg  (cost=132154.34..152154.34 rows=1000000 width=24)  
                     ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=16)  
                           Sort Key: test.devid, test.crt_time DESC  
                           ->  Seq Scan on test  (cost=0.00..15406.00 rows=1000000 width=16)  
 JIT:  
   Functions: 13  
   Options: Inlining false, Optimization false, Expressions true, Deforming true  
(12 rows)  
  
 devid | first_value | last_value    
-------+-------------+-------------  
     0 |    69.20287 |     92.5132  
     1 |    40.00488 |   79.331635  
     2 |    81.52082 |   24.342728  
     3 |    56.46295 | 0.083137356  
     4 |   53.477367 |    36.13544  
     5 |     80.3527 |   72.615295  
     6 |   42.685253 |    24.84936  
     7 |   44.414867 |   92.358284  
     8 |    46.09131 |    93.73747  
     9 |    69.82049 |    70.78088  
    10 |   15.770509 |    7.430693  
    11 |    23.28746 |   17.332998  
    12 |   53.020107 |   7.4527717  
    13 |    95.81737 |   87.950645  
    14 |   16.533426 |   41.366306  
    15 |   20.856157 |   51.702602  
    16 |   35.354103 |    91.17102  
    17 |   14.365077 |    75.35907  
    18 |   81.229645 |    88.41152  
    19 |   3.7511733 |    75.90414  
    20 |   6.8691735 |   69.208405  
(21 rows)  
Time: 1456.356 ms (00:01.456)  
  
postgres=# select * from test where devid=1 order by crt_time limit 1;  
 devid |   val    |          crt_time            
-------+----------+----------------------------  
     1 | 40.00488 | 2021-04-29 17:14:29.754692  
(1 row)  
  
postgres=# select * from test where devid=1 order by crt_time desc limit 1;  
 devid |    val    |          crt_time            
-------+-----------+----------------------------  
     1 | 79.331635 | 2021-07-22 01:12:35.705869  
(1 row)

方法2、递归极速实现, 1.5秒优化到0.x毫秒, 千倍以上提升.

《时序数据合并场景加速分析和实现 - 复合索引，窗口分组查询加速，变态递归加速》

1、first_value

postgres=# with recursive skip as (    
  (    
    select test tv from test order by devid,crt_time limit 1   
  )    
  union all    
  (    
    select (  
      select test tv from test where devid > (s.tv).devid order by devid,crt_time limit 1  
    ) from skip s where (s.tv).devid is not null limit 1    
  )     
)     
select * from skip t where t.* is not null;   
  
  
                                                QUERY PLAN                                                  
----------------------------------------------------------------------------------------------------------  
 CTE Scan on skip t  (cost=6.18..6.40 rows=11 width=32)  
   Filter: (t.* IS NOT NULL)  
   CTE skip  
     ->  Recursive Union  (cost=0.42..6.18 rows=11 width=40)  
           ->  Subquery Scan on "*SELECT* 1"  (cost=0.42..0.49 rows=1 width=40)  
                 ->  Limit  (cost=0.42..0.48 rows=1 width=52)  
                       ->  Index Scan using idx_test on test  (cost=0.42..57043.47 rows=1000000 width=52)  
           ->  Limit  (cost=0.00..0.55 rows=1 width=32)  
                 ->  WorkTable Scan on skip s  (cost=0.00..5.48 rows=10 width=32)  
                       Filter: ((tv).devid IS NOT NULL)  
(10 rows)  
  
                     tv                        
---------------------------------------------  
 (0,69.20287,"2021-04-29 17:29:03.513984")  
 (1,40.00488,"2021-04-29 17:14:29.754692")  
 (2,81.52082,"2021-04-29 17:13:38.138922")  
 (3,56.46295,"2021-04-29 17:13:41.576724")  
 (4,53.477367,"2021-04-29 17:13:38.799418")  
 (5,80.3527,"2021-04-29 17:20:07.61652")  
 (6,42.685253,"2021-04-29 17:12:43.131935")  
 (7,44.414867,"2021-04-29 17:14:24.41725")  
 (8,46.09131,"2021-04-29 17:13:21.088454")  
 (9,69.82049,"2021-04-29 17:16:56.557335")  
 (10,15.770509,"2021-04-29 17:12:43.72394")  
 (11,23.28746,"2021-04-29 17:14:51.66777")  
 (12,53.020107,"2021-04-29 17:13:26.826184")  
 (13,95.81737,"2021-04-29 17:17:23.924546")  
 (14,16.533426,"2021-04-29 17:16:12.762293")  
 (15,20.856157,"2021-04-29 17:15:59.386621")  
 (16,35.354103,"2021-04-29 17:13:39.879304")  
 (17,14.365077,"2021-04-29 17:12:40.277861")  
 (18,81.229645,"2021-04-29 17:15:47.412289")  
 (19,3.7511733,"2021-04-29 17:13:15.481478")  
 (20,6.8691735,"2021-04-29 17:16:08.605133")  
(21 rows)  
  
Time: 0.591 ms

2、last_value

postgres=# with recursive skip as (    
  (    
    select test tv from test order by devid,crt_time desc limit 1   
  )    
  union all    
  (    
    select (  
      select test tv from test where devid > (s.tv).devid order by devid,crt_time desc limit 1  
    ) from skip s where (s.tv).devid is not null limit 1    
  )     
)     
select * from skip t where t.* is not null;   
    
  
                                                   QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------  
 CTE Scan on skip t  (cost=20578.75..20578.97 rows=11 width=32)  
   Filter: (t.* IS NOT NULL)  
   CTE skip  
     ->  Recursive Union  (cost=3073.90..20578.75 rows=11 width=40)  
           ->  Subquery Scan on "*SELECT* 1"  (cost=3073.90..3073.99 rows=1 width=40)  
                 ->  Limit  (cost=3073.90..3073.98 rows=1 width=52)  
                       ->  Incremental Sort  (cost=3073.90..78293.89 rows=1000000 width=52)  
                             Sort Key: test.devid, test.crt_time DESC  
                             Presorted Key: test.devid  
                             ->  Index Scan using idx_test on test  (cost=0.42..57043.47 rows=1000000 width=52)  
           ->  Limit  (cost=0.00..1750.46 rows=1 width=32)  
                 ->  WorkTable Scan on skip s  (cost=0.00..17504.55 rows=10 width=32)  
                       Filter: ((tv).devid IS NOT NULL)  
(13 rows)  
  
  
                      tv                        
----------------------------------------------  
 (0,92.5132,"2021-07-22 00:50:09.418747")  
 (1,79.331635,"2021-07-22 01:12:35.705869")  
 (2,24.342728,"2021-07-22 01:11:43.214143")  
 (3,0.083137356,"2021-07-22 01:10:48.842551")  
 (4,36.13544,"2021-07-22 01:03:19.314456")  
 (5,72.615295,"2021-07-22 01:09:47.155842")  
 (6,24.84936,"2021-07-22 01:11:14.875089")  
 (7,92.358284,"2021-07-22 01:07:30.652387")  
 (8,93.73747,"2021-07-22 01:09:03.046665")  
 (9,70.78088,"2021-07-22 01:11:31.421899")  
 (10,7.430693,"2021-07-22 01:10:48.575366")  
 (11,17.332998,"2021-07-22 01:10:28.107436")  
 (12,7.4527717,"2021-07-22 01:10:52.440714")  
 (13,87.950645,"2021-07-22 01:08:22.256052")  
 (14,41.366306,"2021-07-22 01:04:35.893593")  
 (15,51.702602,"2021-07-22 01:10:41.037534")  
 (16,91.17102,"2021-07-22 01:08:18.668903")  
 (17,75.35907,"2021-07-22 01:07:49.376161")  
 (18,88.41152,"2021-07-22 01:10:30.431861")  
 (19,75.90414,"2021-07-22 01:11:18.230903")  
 (20,69.208405,"2021-07-22 01:12:20.86822")  
(21 rows)  
  
Time: 574.792 ms  
  
  
last_value 加个desc索引会更快  
  
postgres=# create index idx_test1 on test (devid, crt_time desc);  
CREATE INDEX  
Time: 963.675 ms  
postgres=# with recursive skip as (    
  (    
    select test tv from test order by devid,crt_time desc limit 1   
  )    
  union all    
  (    
    select (  
      select test tv from test where devid > (s.tv).devid order by devid,crt_time desc limit 1  
    ) from skip s where (s.tv).devid is not null limit 1    
  )     
)     
select * from skip t where t.* is not null;   
                      tv                        
----------------------------------------------  
 (0,92.5132,"2021-07-22 00:50:09.418747")  
 (1,79.331635,"2021-07-22 01:12:35.705869")  
 (2,24.342728,"2021-07-22 01:11:43.214143")  
 (3,0.083137356,"2021-07-22 01:10:48.842551")  
 (4,36.13544,"2021-07-22 01:03:19.314456")  
 (5,72.615295,"2021-07-22 01:09:47.155842")  
 (6,24.84936,"2021-07-22 01:11:14.875089")  
 (7,92.358284,"2021-07-22 01:07:30.652387")  
 (8,93.73747,"2021-07-22 01:09:03.046665")  
 (9,70.78088,"2021-07-22 01:11:31.421899")  
 (10,7.430693,"2021-07-22 01:10:48.575366")  
 (11,17.332998,"2021-07-22 01:10:28.107436")  
 (12,7.4527717,"2021-07-22 01:10:52.440714")  
 (13,87.950645,"2021-07-22 01:08:22.256052")  
 (14,41.366306,"2021-07-22 01:04:35.893593")  
 (15,51.702602,"2021-07-22 01:10:41.037534")  
 (16,91.17102,"2021-07-22 01:08:18.668903")  
 (17,75.35907,"2021-07-22 01:07:49.376161")  
 (18,88.41152,"2021-07-22 01:10:30.431861")  
 (19,75.90414,"2021-07-22 01:11:18.230903")  
 (20,69.208405,"2021-07-22 01:12:20.86822")  
(21 rows)  
  
Time: 0.975 ms

参考

《递归+排序字段加权 skip scan 解决窗口查询多列分组去重的性能问题》

《PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (递归查询变态优化) 非驱动列索引扫描优化》

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议，奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等，奖品丰富，快来许愿。开不开森.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20210429_02.md

20210429_02.md

PostgreSQL - 时序、IoT类场景 - first_value , last_value , agg , cte , window , recursive

作者

日期

标签

背景

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、生态合作、写心得拿奖品

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥

Files

20210429_02.md

Latest commit

History

20210429_02.md

File metadata and controls

PostgreSQL - 时序、IoT类场景 - first_value , last_value , agg , cte , window , recursive

作者

日期

标签

背景

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、生态合作、写心得拿奖品

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥