Skip to content

Latest commit

 

History

History
178 lines (132 loc) · 6.29 KB

20220610_05.md

File metadata and controls

178 lines (132 loc) · 6.29 KB

rule ELT 入库行存变列存 - 时序数据行存与点存实时转换

作者

digoal

日期

2022-06-10

标签

PostgreSQL , 点 , 行 , 时序


背景

时序数据可以行存, 也可以点存, 到底怎么选择?

  • 行存的优势, 查询行(或者行中的大量点)比较高效, 因为这些value密集存储在集中的block中.
  • 当需要查询大量连续的单点时, 行存就比较吃亏, 因为在同一行中没有被查到的点也会被扫描到, 浪费IO和buffer. 这种情况使用点存(列存)就比较好.
  • 当然在OLAP场景也比较适合列存, 因为OLAP场景通常也是少量列的大量数据扫描和计算, 列存省IO, 同时能使用到向量化技术加速计算.

列存如何拼出同一行记录?

  • 可以使用PK(固定或可变长度列通用)或offset(仅适用于固定长度列)把同一行的各列串起来.

PostgreSQL默认是HEAP表, 怎么把宽表转成点表?

  • 使用rule即可.

例子

创建宽表(表示传感器采集的所有维度(点), 例如这里有8个维度加时间字段).

create table t (ts timestamp primary key, c1 int, c2 int, c3 int, c4 text, c5 int, c6 int, c7 int, c8 int);  

创建点表

create table t1 (ts timestamp primary key, c1 int);  
create table t2 (ts timestamp primary key, c2 int);  
create table t3 (ts timestamp primary key, c3 int);  
create table t4 (ts timestamp primary key, c4 text);  
create table t5 (ts timestamp primary key, c5 int);  
create table t6 (ts timestamp primary key, c6 int);  
create table t7 (ts timestamp primary key, c7 int);  
create table t8 (ts timestamp primary key, c8 int);  

创建规则, 写入宽表时, 将数据分散到点表, 带上TS字段把同一行的各列串起来.

create rule r1 as on insert to t do instead (  
insert into t1 values (NEW.ts, NEW.c1);  
insert into t2 values (NEW.ts, NEW.c2);  
insert into t3 values (NEW.ts, NEW.c3);  
insert into t4 values (NEW.ts, NEW.c4);  
insert into t5 values (NEW.ts, NEW.c5);  
insert into t6 values (NEW.ts, NEW.c6);  
insert into t7 values (NEW.ts, NEW.c7);  
insert into t8 values (NEW.ts, NEW.c8)  
);  

写入测试, 写入宽表, 数据会自动转换到点表里面:

postgres=# insert into t values (now(),1,1,1,'test',1,1,1,1);  
INSERT 0 1  
  
  
postgres=# select * from t;  
 ts | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
----+----+----+----+----+----+----+----+----  
(0 rows)  
  
postgres=# select * from t1;  
             ts             | c1   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t2;  
             ts             | c2   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t3;  
             ts             | c3   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t4;  
             ts             |  c4    
----------------------------+------  
 2022-06-10 15:22:41.916042 | test  
(1 row)  
  
postgres=# select * from t5;  
             ts             | c5   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t6;  
             ts             | c6   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t7;  
             ts             | c7   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t8;  
             ts             | c8   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  

性能如何?

postgres=# \timing  
Timing is on.  
postgres=# insert into t select clock_timestamp(),1,1,1,'test',1,1,1,1 from generate_series(1,100000);  
INSERT 0 100000  
Time: 791.200 ms  
postgres=# insert into t select clock_timestamp(),1,1,1,'test',1,1,1,1 from generate_series(1,300000);  
INSERT 0 300000  
Time: 2233.316 ms (00:02.233)  
  
  
 public | t                | table | postgres | permanent   | heap          | 8192 bytes |   
 public | t1               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t2               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t3               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t4               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t5               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t6               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t7               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t8               | table | postgres | permanent   | heap          | 17 MB      |   

按点来换算的话, 每秒101万个点.

硬件如下

苹果笔记本   
MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports)   
2.4 GHz 四核Intel Core i5   
16 GB 2133 MHz LPDDR3   
256G disk   

digoal's wechat