digoal
2022-12-21
PostgreSQL , PolarDB , 实时物化增量视图 , pg_ivm , materialized view , pipelinedb , 流计算
PG当前支持的物化视图没有物化视图日志, 刷新时需要JOIN物化视图和基表比对所有记录找到diff进行增量刷新. 所以PG当前版本的物化视图增量刷新可能比较慢, 特别是表比较大的时候.
Oracle 物化视图支持materialized view log, 记录了基表的更新|新增|删除的日志, 可以通过log来进行增量刷新, 刷新物化视图比较高效.
为了解决这个问题, 一方面等待PG内核的更新迭代, 另一方面也可以关注外围插件的推出.
- 《PostgreSQL 增量物化视图插件 - pg_ivm incremental materialized view maintenance》
- 《PostgreSQL Incremental View Maintenance - ivm (增量物化视图MATERIALIZED VIEW)进化》
- 《DB吐槽大会,第39期 - PG 物化视图不支持基于log的增量刷新》
- 《PostgreSQL 15 wait - 增量物化视图》
本文将要介绍的是pg_ivm插件, 实施统计和过滤, 生成增量物化视图, 有点类似pipelinedb流计算, 不一样的地方是pg_ivm是实时的, pipelinedb是异步的.
- 《PostgreSQL 流计算插件 - pipelinedb 1.x 参数配置介绍》
- 《PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合》
- 《PostgreSQL 流计算插件pipelinedb sharding 集群版原理介绍 - 一个全功能的分布式流计算引擎》
- 《[未完待续] 流式机器学习(online machine learning) - pipelineDB with plR and plPython》
- 《PostgreSQL 流计算指 pipelinedb 实时处理Kafka消息流》
git clone --depth 1 https://github.com/sraoss/pg_ivm
cd pg_ivm
USE_PGXS=1 make
USE_PGXS=1 make install
psql
create extension pg_ivm;
测试
postgres=# create table test (gid int, v int);
CREATE TABLE
postgres=# insert into test select random()*100, random()*1000 from generate_series(1,10000);
INSERT 0 10000
postgres=# SELECT create_immv('immv','select gid,avg(v) from test group by gid');
NOTICE: created index "immv_index" on immv "immv"
create_immv
-------------
101
(1 row)
postgres=# select count(*) from immv ;
count
-------
101
(1 row)
postgres=# select * from immv limit 10;
gid | avg | __ivm_count_avg__ | __ivm_sum_avg__ | __ivm_count__
-----+----------------------+-------------------+-----------------+---------------
5 | 469.0769230769230769 | 117 | 54882 | 117
18 | 515.3894736842105263 | 95 | 48962 | 95
64 | 470.6224489795918367 | 98 | 46121 | 98
55 | 559.9687500000000000 | 96 | 53757 | 96
27 | 536.5566037735849057 | 106 | 56875 | 106
23 | 543.5520833333333333 | 96 | 52181 | 96
56 | 536.0434782608695652 | 92 | 49316 | 92
58 | 554.8555555555555556 | 90 | 49937 | 90
91 | 482.6632653061224490 | 98 | 47301 | 98
8 | 487.0480769230769231 | 104 | 50653 | 104
(10 rows)
postgres=# select * from immv order by gid limit 10;
gid | avg | __ivm_count_avg__ | __ivm_sum_avg__ | __ivm_count__
-----+----------------------+-------------------+-----------------+---------------
0 | 427.5333333333333333 | 45 | 19239 | 45
1 | 434.6982758620689655 | 116 | 50425 | 116
2 | 468.5056179775280899 | 89 | 41697 | 89
3 | 506.8977272727272727 | 88 | 44607 | 88
4 | 500.3369565217391304 | 92 | 46031 | 92
5 | 469.0769230769230769 | 117 | 54882 | 117
6 | 517.0300000000000000 | 100 | 51703 | 100
7 | 547.3125000000000000 | 96 | 52542 | 96
8 | 487.0480769230769231 | 104 | 50653 | 104
9 | 431.1616161616161616 | 99 | 42685 | 99
(10 rows)
postgres=# insert into test values (0,100);
INSERT 0 1
postgres=# select * from immv order by gid limit 10;
gid | avg | __ivm_count_avg__ | __ivm_sum_avg__ | __ivm_count__
-----+----------------------+-------------------+-----------------+---------------
0 | 420.4130434782608696 | 46 | 19339 | 46
1 | 434.6982758620689655 | 116 | 50425 | 116
2 | 468.5056179775280899 | 89 | 41697 | 89
3 | 506.8977272727272727 | 88 | 44607 | 88
4 | 500.3369565217391304 | 92 | 46031 | 92
5 | 469.0769230769230769 | 117 | 54882 | 117
6 | 517.0300000000000000 | 100 | 51703 | 100
7 | 547.3125000000000000 | 96 | 52542 | 96
8 | 487.0480769230769231 | 104 | 50653 | 104
9 | 431.1616161616161616 | 99 | 42685 | 99
(10 rows)
postgres=# insert into test values (-1,100);
INSERT 0 1
postgres=# select * from immv order by gid limit 10;
gid | avg | __ivm_count_avg__ | __ivm_sum_avg__ | __ivm_count__
-----+----------------------+-------------------+-----------------+---------------
-1 | 100.0000000000000000 | 1 | 100 | 1
0 | 420.4130434782608696 | 46 | 19339 | 46
1 | 434.6982758620689655 | 116 | 50425 | 116
2 | 468.5056179775280899 | 89 | 41697 | 89
3 | 506.8977272727272727 | 88 | 44607 | 88
4 | 500.3369565217391304 | 92 | 46031 | 92
5 | 469.0769230769230769 | 117 | 54882 | 117
6 | 517.0300000000000000 | 100 | 51703 | 100
7 | 547.3125000000000000 | 96 | 52542 | 96
8 | 487.0480769230769231 | 104 | 50653 | 104
(10 rows)
postgres=# \dt immv
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | immv | table | postgres
(1 row)
postgres=# refresh materialized view immv;
ERROR: "immv" is not a materialized view
postgres=# \d+ immv
Table "public.immv"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
gid | integer | | | | plain | | |
avg | numeric | | | | main | | |
__ivm_count_avg__ | bigint | | | | plain | | |
__ivm_sum_avg__ | bigint | | | | plain | | |
__ivm_count__ | bigint | | | | plain | | |
Indexes:
"immv_index" UNIQUE, btree (gid)
Access method: heap
在物化视图被创建后, pg_ivm会自动生成依赖的数据字段, 例如自动创建count_group,sum_group,count等, 被用于DML时合并增量更新物化视图的相应tuple.
https://github.com/sraoss/pg_ivm
《PostgreSQL 增量物化视图插件 - pg_ivm incremental materialized view maintenance》