digoal
2023-11-11
PostgreSQL , PolarDB , 分页 , limit offset , fetch with ties
fetch with ties 代替 limit offset 解决分页性能优化gap问题
https://www.postgresql.org/docs/current/sql-select.html
《PostgreSQL 13 offset fetch first with ties - 返回ordered peer行S》
当使用limit offset翻页翻到很后面时, 性能会非常差, 原因是offset需要扫描大量的记录并进行过滤. 所以出现了很多优化分页的方法. 其中增加位置偏移条件最为常用, 但是掌握不好容易出现gap.
create table t_off (id serial primary key, info text, c int, ts timestamp);
create index on t_off (c, ts);
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);
老标准: limit , 可能出现gap
db1=> explain select * from t_off where c=1 order by ts limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.28..7.64 rows=10 width=49)
-> Index Scan using t_off_c_ts_idx on t_off (cost=0.28..36.34 rows=49 width=49)
Index Cond: (c = 1)
(3 rows)
用于优化的ts条件, 查询完上次的记录得到的最大ts:
db1=> explain select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.28..8.51 rows=10 width=49)
-> Index Scan using t_off_c_ts_idx on t_off (cost=0.28..30.72 rows=37 width=49)
Index Cond: ((c = 1) AND (ts > '2023-11-11 00:31:57.304588'::timestamp without time zone))
(3 rows)
问题: 用于优化的ts条件有重复值, 会出现gap
db1=> select * from t_off where c=1 order by ts limit 5;
id | info | c | ts
-----+----------------------------------+---+----------------------------
16 | ce70259b8c0b5e83dca5f20cef93d1ee | 1 | 2023-11-11 00:31:57.304588
79 | 3a08bcf762a062e86318f516fddea793 | 1 | 2023-11-11 00:31:57.304588
230 | 01bc9f9c59b70311369aaa33c3cca3a7 | 1 | 2023-11-11 00:31:57.304588
257 | bb22ec8f00de73c19017479fdf981116 | 1 | 2023-11-11 00:31:57.304588
286 | 70bfb092ab95cc8c8f5aef9e2762af0c | 1 | 2023-11-11 00:31:57.304588
(5 rows)
db1=> select * from t_off where c=1 order by ts limit 10;
id | info | c | ts
-----+----------------------------------+---+----------------------------
16 | ce70259b8c0b5e83dca5f20cef93d1ee | 1 | 2023-11-11 00:31:57.304588
79 | 3a08bcf762a062e86318f516fddea793 | 1 | 2023-11-11 00:31:57.304588
230 | 01bc9f9c59b70311369aaa33c3cca3a7 | 1 | 2023-11-11 00:31:57.304588
257 | bb22ec8f00de73c19017479fdf981116 | 1 | 2023-11-11 00:31:57.304588
286 | 70bfb092ab95cc8c8f5aef9e2762af0c | 1 | 2023-11-11 00:31:57.304588
311 | c3b8f1eab66fe88cfd59e92d039fbaa6 | 1 | 2023-11-11 00:31:57.304588
452 | f1c000b8a200207c52b6c3f9a8babe72 | 1 | 2023-11-11 00:31:57.304588
470 | 7b4702f5a4ff75ef237cbf7af5b8d082 | 1 | 2023-11-11 00:31:57.304588
642 | 82d641bcf235eba92a9368f0ac553dca | 1 | 2023-11-11 00:31:57.304588
749 | 669488263cfcec928ac2693995309eea | 1 | 2023-11-11 00:31:57.304588
(10 rows)
db1=> select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts limit 5;
id | info | c | ts
------+----------------------------------+---+----------------------------
1017 | e12d1088e80136e869cf82816b28ab51 | 1 | 2023-11-11 00:31:57.313518
1058 | 39dc148d9223bcaddcdb2096750e513b | 1 | 2023-11-11 00:31:57.313518
1070 | 31c83c561119140721f94e8b28669914 | 1 | 2023-11-11 00:31:57.313518
1102 | ee8f076e2e29299ace4c5fb71ddf9dda | 1 | 2023-11-11 00:31:57.313518
1148 | 37232358c5995513c19545b8232aced1 | 1 | 2023-11-11 00:31:57.313518
(5 rows)
新标准: limit 改成 fetch , 返回可以根据需要超过Limit数, 把最后一条相同的ts都返回. 避免了翻页优化带来的gap. 老标准(limit)比较麻烦, 需要引入pk或uk来解决.
限制10条, 但是实际上有14条ts一样的都返回了.
db1=> select * from t_off where c=1 order by ts fetch first 10 row with ties;
id | info | c | ts
-----+----------------------------------+---+----------------------------
16 | ce70259b8c0b5e83dca5f20cef93d1ee | 1 | 2023-11-11 00:31:57.304588
79 | 3a08bcf762a062e86318f516fddea793 | 1 | 2023-11-11 00:31:57.304588
230 | 01bc9f9c59b70311369aaa33c3cca3a7 | 1 | 2023-11-11 00:31:57.304588
257 | bb22ec8f00de73c19017479fdf981116 | 1 | 2023-11-11 00:31:57.304588
286 | 70bfb092ab95cc8c8f5aef9e2762af0c | 1 | 2023-11-11 00:31:57.304588
311 | c3b8f1eab66fe88cfd59e92d039fbaa6 | 1 | 2023-11-11 00:31:57.304588
452 | f1c000b8a200207c52b6c3f9a8babe72 | 1 | 2023-11-11 00:31:57.304588
470 | 7b4702f5a4ff75ef237cbf7af5b8d082 | 1 | 2023-11-11 00:31:57.304588
642 | 82d641bcf235eba92a9368f0ac553dca | 1 | 2023-11-11 00:31:57.304588
749 | 669488263cfcec928ac2693995309eea | 1 | 2023-11-11 00:31:57.304588
816 | 9a92daa3c8ceb90e023d127485c4a053 | 1 | 2023-11-11 00:31:57.304588
827 | 82abd37368d66eb03347429ff4b28e9e | 1 | 2023-11-11 00:31:57.304588
846 | 48f9737b18bf3e8dae7531a1b7fa4d45 | 1 | 2023-11-11 00:31:57.304588
990 | f7885a8e585ae7fa0c4b73dd356f260b | 1 | 2023-11-11 00:31:57.304588
(14 rows)
此时gap就没有了.
db1=> select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts fetch first 10 row with ties;
id | info | c | ts
------+----------------------------------+---+----------------------------
1017 | e12d1088e80136e869cf82816b28ab51 | 1 | 2023-11-11 00:31:57.313518
1058 | 39dc148d9223bcaddcdb2096750e513b | 1 | 2023-11-11 00:31:57.313518
1070 | 31c83c561119140721f94e8b28669914 | 1 | 2023-11-11 00:31:57.313518
1102 | ee8f076e2e29299ace4c5fb71ddf9dda | 1 | 2023-11-11 00:31:57.313518
1148 | 37232358c5995513c19545b8232aced1 | 1 | 2023-11-11 00:31:57.313518
1425 | df4ff8d17227012f240b5ecee69d5758 | 1 | 2023-11-11 00:31:57.313518
1654 | c0bb3ebf9eda3a3a0228dd7784eccbfe | 1 | 2023-11-11 00:31:57.313518
1704 | 70283bbc87968f25aed18524030e1a21 | 1 | 2023-11-11 00:31:57.313518
1737 | 7c1b782104144f3cbc97e247e5572165 | 1 | 2023-11-11 00:31:57.313518
1779 | 8af082c81acc99fa2f8afdffd58f4d11 | 1 | 2023-11-11 00:31:57.313518
1861 | 0c9ecc39655bf834d1cdc07cf09ed362 | 1 | 2023-11-11 00:31:57.313518
(11 rows)
新标准非常适合翻页优化
db1=> explain select * from t_off where c=1 order by ts fetch first 10 row with ties;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.28..7.64 rows=10 width=49)
-> Index Scan using t_off_c_ts_idx on t_off (cost=0.28..36.34 rows=49 width=49)
Index Cond: (c = 1)
(3 rows)
db1=> explain select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts fetch first 10 row with ties;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.28..8.51 rows=10 width=49)
-> Index Scan using t_off_c_ts_idx on t_off (cost=0.28..30.72 rows=37 width=49)
Index Cond: ((c = 1) AND (ts > '2023-11-11 00:31:57.304588'::timestamp without time zone))
(3 rows)
如果担心数据倾斜(例如某个ts重复值非常非常多), 怕把程序内存打爆, 可以考虑2种解决方案.
1、使用游标返回结果. 很简单, 这里就不细说了.
declare cur cursor with hold for select * from t_off where c=1 order by ts fetch first 10 row with ties;
fetch 10 from cur;
-- 这个是铁定只会返回10行的.
...
close cur;
2、把ts字段改成表达式, 也就是让一个ties里面的值变成不一样, 但是不同ties依旧按原来的顺序排序, 这个表达式可以这么来定义:
这个方法不要求排序字段唯一, 只是减少重复值的个数, 所以还是要继续使用fetch first with ties的语法, 为了防止出现gap请不要使用limit offset.
这里用了row的Hashvalue作为rpads, 相同ts值的不同行可以得到不一样的value. 如果这个表有pK就更简单了, 直接pad Pk.
db1=> create or replace function ts_pads (timestamp, t_off) returns text as $$
select lpad(extract(epoch from $1)::text, 32, '0')||'00000000'||hash_record($2);
$$ language sql strict immutable;
CREATE FUNCTION
这得益于PostgreSQL支持表达式索引功能, 以及丰富的hash函数, 还有丰富的函数语法, 数据类型.
sum(hash_record(row))
也常常被用于数据同步的row级别正确性校验.
db1=> select ts_pads(ts, t_off), * from t_off where c=1 order by ts limit 10;
ts_pads | id | info | c | ts
----------------------------------------------------+------+----------------------------------+---+----------------------------
0000000000000001699699981.971758000000001507273372 | 81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.97175800000000-806584326 | 282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.97175800000000-805238007 | 9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.97175800000000577266532 | 482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.971758000000001568192286 | 706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.971758000000001970342205 | 887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.97175800000000351589404 | 326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758
0000000000000001699699981.980919000000001959836742 | 1163 | 91eb04e59d10ebcc70a62c42b3c4b0b8 | 1 | 2023-11-11 10:53:01.980919
0000000000000001699699981.98091900000000308502439 | 1142 | b6beee97559f091c104e82abf17ce2d0 | 1 | 2023-11-11 10:53:01.980919
0000000000000001699699981.980919000000001189140041 | 1017 | 49fb4b01a2c8fc1aac222a0223f058b3 | 1 | 2023-11-11 10:53:01.980919
(10 rows)
索引改成表达式, 排序也改成表达式.
db1=> create index on t_off (c, ts_pads(ts,t_off));
CREATE INDEX
查询改成如下, 完美解决排序字段重复值过多的问题:
db1=> explain select * from t_off where c=1 order by ts_pads(ts,t_off) fetch first 10 row with ties;
QUERY PLAN
------------------------------------------------------------------------------------------
Limit (cost=0.28..11.29 rows=10 width=81)
-> Index Scan using t_off_c_ts_pads_idx on t_off (cost=0.28..36.60 rows=33 width=81)
Index Cond: (c = 1)
(3 rows)
db1=> select * from t_off where c=1 order by ts_pads(ts,t_off) fetch first 10 row with ties;
id | info | c | ts
------+----------------------------------+---+----------------------------
770 | cd9f6d4e894cc6cad73b71906c4f4eb6 | 1 | 2023-11-11 10:53:01.971758
9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758
282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758
81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758
706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758
887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758
326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758
482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758
1864 | 4e6e51ecaa3a1e74865be0650784049f | 1 | 2023-11-11 10:53:01.980919
1751 | 5cdcc4c258370fa119b77f7f05884386 | 1 | 2023-11-11 10:53:01.980919
(10 rows)
db1=> update t_off set ts=now() where id=770;
UPDATE 1
db1=> select * from t_off where c=1 order by ts_pads(ts,t_off) fetch first 10 row with ties;
id | info | c | ts
------+----------------------------------+---+----------------------------
9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758
282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758
81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758
706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758
887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758
326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758
482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758
1864 | 4e6e51ecaa3a1e74865be0650784049f | 1 | 2023-11-11 10:53:01.980919
1751 | 5cdcc4c258370fa119b77f7f05884386 | 1 | 2023-11-11 10:53:01.980919
1648 | 826f0a3a9ca5efe6e0a13ed64af65b5e | 1 | 2023-11-11 10:53:01.980919
(10 rows)
db1=> select * from t_off where c=1 order by ts fetch first 10 row with ties;
id | info | c | ts
------+----------------------------------+---+----------------------------
9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758
81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758
282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758
326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758
482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758
706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758
887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758
1017 | 49fb4b01a2c8fc1aac222a0223f058b3 | 1 | 2023-11-11 10:53:01.980919
1142 | b6beee97559f091c104e82abf17ce2d0 | 1 | 2023-11-11 10:53:01.980919
1163 | 91eb04e59d10ebcc70a62c42b3c4b0b8 | 1 | 2023-11-11 10:53:01.980919
1331 | 4256ed33ed688fbf744f597d72be6eb3 | 1 | 2023-11-11 10:53:01.980919
1523 | 60a6d5cdfd6750f474dce5069b6db200 | 1 | 2023-11-11 10:53:01.980919
1627 | c0ab21f437c1dc16ecefd1ebff262efc | 1 | 2023-11-11 10:53:01.980919
1648 | 826f0a3a9ca5efe6e0a13ed64af65b5e | 1 | 2023-11-11 10:53:01.980919
1674 | 537a002e6e8f1c68dee05a2b140ea736 | 1 | 2023-11-11 10:53:01.980919
1702 | d0f805e3306484e7835bd9fbeadcc8b9 | 1 | 2023-11-11 10:53:01.980919
1751 | 5cdcc4c258370fa119b77f7f05884386 | 1 | 2023-11-11 10:53:01.980919
1790 | 9696a0e5356806fc1c72a876616915ad | 1 | 2023-11-11 10:53:01.980919
1857 | b9f78f7d1ea5f8d65fe04e9318bd1849 | 1 | 2023-11-11 10:53:01.980919
1864 | 4e6e51ecaa3a1e74865be0650784049f | 1 | 2023-11-11 10:53:01.980919
1961 | f6d5624a8f26f1ad91dedd134192b694 | 1 | 2023-11-11 10:53:01.980919
(21 rows)
select * from t_off where c=1 and ts_pads(ts,t_off) >
ts_pads('2023-11-11 10:53:01.980919', row(1648,'826f0a3a9ca5efe6e0a13ed64af65b5e',1,'2023-11-11 10:53:01.980919')::t_off)
order by ts_pads(ts,t_off) fetch first 10 row with ties;
db1=> select * from t_off where c=1 and ts_pads(ts,t_off) >
ts_pads('2023-11-11 10:53:01.980919', row(1648,'826f0a3a9ca5efe6e0a13ed64af65b5e',1,'2023-11-11 10:53:01.980919')::t_off)
order by ts_pads(ts,t_off) fetch first 10 row with ties;
id | info | c | ts
------+----------------------------------+---+----------------------------
1523 | 60a6d5cdfd6750f474dce5069b6db200 | 1 | 2023-11-11 10:53:01.980919
1017 | 49fb4b01a2c8fc1aac222a0223f058b3 | 1 | 2023-11-11 10:53:01.980919
1331 | 4256ed33ed688fbf744f597d72be6eb3 | 1 | 2023-11-11 10:53:01.980919
1961 | f6d5624a8f26f1ad91dedd134192b694 | 1 | 2023-11-11 10:53:01.980919
1163 | 91eb04e59d10ebcc70a62c42b3c4b0b8 | 1 | 2023-11-11 10:53:01.980919
1627 | c0ab21f437c1dc16ecefd1ebff262efc | 1 | 2023-11-11 10:53:01.980919
1857 | b9f78f7d1ea5f8d65fe04e9318bd1849 | 1 | 2023-11-11 10:53:01.980919
1142 | b6beee97559f091c104e82abf17ce2d0 | 1 | 2023-11-11 10:53:01.980919
1790 | 9696a0e5356806fc1c72a876616915ad | 1 | 2023-11-11 10:53:01.980919
1702 | d0f805e3306484e7835bd9fbeadcc8b9 | 1 | 2023-11-11 10:53:01.980919
(10 rows)
db1=> explain select * from t_off where c=1 and ts_pads(ts,t_off) >
ts_pads('2023-11-11 10:53:01.980919', row(1648,'826f0a3a9ca5efe6e0a13ed64af65b5e',1,'2023-11-11 10:53:01.980919')::t_off)
order by ts_pads(ts,t_off) fetch first 10 row with ties;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Limit (cost=0.28..13.98 rows=10 width=81)
-> Index Scan using t_off_c_ts_pads_idx on t_off (cost=0.28..15.35 rows=11 width=81)
Index Cond: ((c = 1) AND (ts_pads(ts, t_off.*) > '0000000000000001699699981.98091900000000-721814744'::text))
(3 rows)
可惜系统列不支持index, 否则使用行号(ctid)会更简单.
create or replace function ts_pads1 (timestamp, tid) returns text as $$
select lpad(extract(epoch from $1)::text, 32, '0')||'00000000'||hashtid($2);
$$ language sql strict immutable;
db1=> \set VERBOSITY verbose
db1=> create index on t_off (c, ts_pads1(ts, ctid));
ERROR: 0A000: index creation on system columns is not supported
LOCATION: DefineIndex, indexcmds.c:1083