digoal
2022-06-10
PostgreSQL , 点 , 行 , 时序
时序数据可以行存, 也可以点存, 到底怎么选择?
- 行存的优势, 查询行(或者行中的大量点)比较高效, 因为这些value密集存储在集中的block中.
- 当需要查询大量连续的单点时, 行存就比较吃亏, 因为在同一行中没有被查到的点也会被扫描到, 浪费IO和buffer. 这种情况使用点存(列存)就比较好.
- 当然在OLAP场景也比较适合列存, 因为OLAP场景通常也是少量列的大量数据扫描和计算, 列存省IO, 同时能使用到向量化技术加速计算.
列存如何拼出同一行记录?
- 可以使用PK(固定或可变长度列通用)或offset(仅适用于固定长度列)把同一行的各列串起来.
PostgreSQL默认是HEAP表, 怎么把宽表转成点表?
- 使用rule即可.
例子
创建宽表(表示传感器采集的所有维度(点), 例如这里有8个维度加时间字段).
create table t (ts timestamp primary key, c1 int, c2 int, c3 int, c4 text, c5 int, c6 int, c7 int, c8 int);
创建点表
create table t1 (ts timestamp primary key, c1 int);
create table t2 (ts timestamp primary key, c2 int);
create table t3 (ts timestamp primary key, c3 int);
create table t4 (ts timestamp primary key, c4 text);
create table t5 (ts timestamp primary key, c5 int);
create table t6 (ts timestamp primary key, c6 int);
create table t7 (ts timestamp primary key, c7 int);
create table t8 (ts timestamp primary key, c8 int);
创建规则, 写入宽表时, 将数据分散到点表, 带上TS字段把同一行的各列串起来.
create rule r1 as on insert to t do instead (
insert into t1 values (NEW.ts, NEW.c1);
insert into t2 values (NEW.ts, NEW.c2);
insert into t3 values (NEW.ts, NEW.c3);
insert into t4 values (NEW.ts, NEW.c4);
insert into t5 values (NEW.ts, NEW.c5);
insert into t6 values (NEW.ts, NEW.c6);
insert into t7 values (NEW.ts, NEW.c7);
insert into t8 values (NEW.ts, NEW.c8)
);
写入测试, 写入宽表, 数据会自动转换到点表里面:
postgres=# insert into t values (now(),1,1,1,'test',1,1,1,1);
INSERT 0 1
postgres=# select * from t;
ts | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----+----+----+----+----+----+----+----
(0 rows)
postgres=# select * from t1;
ts | c1
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
postgres=# select * from t2;
ts | c2
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
postgres=# select * from t3;
ts | c3
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
postgres=# select * from t4;
ts | c4
----------------------------+------
2022-06-10 15:22:41.916042 | test
(1 row)
postgres=# select * from t5;
ts | c5
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
postgres=# select * from t6;
ts | c6
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
postgres=# select * from t7;
ts | c7
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
postgres=# select * from t8;
ts | c8
----------------------------+----
2022-06-10 15:22:41.916042 | 1
(1 row)
性能如何?
postgres=# \timing
Timing is on.
postgres=# insert into t select clock_timestamp(),1,1,1,'test',1,1,1,1 from generate_series(1,100000);
INSERT 0 100000
Time: 791.200 ms
postgres=# insert into t select clock_timestamp(),1,1,1,'test',1,1,1,1 from generate_series(1,300000);
INSERT 0 300000
Time: 2233.316 ms (00:02.233)
public | t | table | postgres | permanent | heap | 8192 bytes |
public | t1 | table | postgres | permanent | heap | 17 MB |
public | t2 | table | postgres | permanent | heap | 17 MB |
public | t3 | table | postgres | permanent | heap | 17 MB |
public | t4 | table | postgres | permanent | heap | 17 MB |
public | t5 | table | postgres | permanent | heap | 17 MB |
public | t6 | table | postgres | permanent | heap | 17 MB |
public | t7 | table | postgres | permanent | heap | 17 MB |
public | t8 | table | postgres | permanent | heap | 17 MB |
按点来换算的话, 每秒101万个点.
硬件如下
苹果笔记本
MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports)
2.4 GHz 四核Intel Core i5
16 GB 2133 MHz LPDDR3
256G disk