Skip to content

Latest commit

 

History

History
354 lines (289 loc) · 19.9 KB

20231111_02.md

File metadata and controls

354 lines (289 loc) · 19.9 KB

PostgreSQL fetch with ties 代替 limit offset 解决分页性能优化gap问题

作者

digoal

日期

2023-11-11

标签

PostgreSQL , PolarDB , 分页 , limit offset , fetch with ties


背景

fetch with ties 代替 limit offset 解决分页性能优化gap问题

https://www.postgresql.org/docs/current/sql-select.html

《PostgreSQL 13 offset fetch first with ties - 返回ordered peer行S》

当使用limit offset翻页翻到很后面时, 性能会非常差, 原因是offset需要扫描大量的记录并进行过滤. 所以出现了很多优化分页的方法. 其中增加位置偏移条件最为常用, 但是掌握不好容易出现gap.

使用位置偏移条件优化分页的例子

create table t_off (id serial primary key, info text, c int, ts timestamp);  
create index on t_off (c, ts);  
  
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);  
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);  
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);  
insert into t_off (info,c,ts) select md5(random()::text), random()*100, now() from generate_series(1,1000);  

limit offset

老标准: limit , 可能出现gap

db1=> explain select * from t_off where c=1 order by ts limit 10;  
                                     QUERY PLAN                                        
-------------------------------------------------------------------------------------  
 Limit  (cost=0.28..7.64 rows=10 width=49)  
   ->  Index Scan using t_off_c_ts_idx on t_off  (cost=0.28..36.34 rows=49 width=49)  
         Index Cond: (c = 1)  
(3 rows)  

用于优化的ts条件, 查询完上次的记录得到的最大ts:

db1=> explain select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts limit 10;  
                                             QUERY PLAN                                               
----------------------------------------------------------------------------------------------------  
 Limit  (cost=0.28..8.51 rows=10 width=49)  
   ->  Index Scan using t_off_c_ts_idx on t_off  (cost=0.28..30.72 rows=37 width=49)  
         Index Cond: ((c = 1) AND (ts > '2023-11-11 00:31:57.304588'::timestamp without time zone))  
(3 rows)  

问题: 用于优化的ts条件有重复值, 会出现gap

db1=> select * from t_off where c=1 order by ts limit 5;  
 id  |               info               | c |             ts               
-----+----------------------------------+---+----------------------------  
  16 | ce70259b8c0b5e83dca5f20cef93d1ee | 1 | 2023-11-11 00:31:57.304588  
  79 | 3a08bcf762a062e86318f516fddea793 | 1 | 2023-11-11 00:31:57.304588  
 230 | 01bc9f9c59b70311369aaa33c3cca3a7 | 1 | 2023-11-11 00:31:57.304588  
 257 | bb22ec8f00de73c19017479fdf981116 | 1 | 2023-11-11 00:31:57.304588  
 286 | 70bfb092ab95cc8c8f5aef9e2762af0c | 1 | 2023-11-11 00:31:57.304588  
(5 rows)  
  
db1=> select * from t_off where c=1 order by ts limit 10;  
 id  |               info               | c |             ts               
-----+----------------------------------+---+----------------------------  
  16 | ce70259b8c0b5e83dca5f20cef93d1ee | 1 | 2023-11-11 00:31:57.304588  
  79 | 3a08bcf762a062e86318f516fddea793 | 1 | 2023-11-11 00:31:57.304588  
 230 | 01bc9f9c59b70311369aaa33c3cca3a7 | 1 | 2023-11-11 00:31:57.304588  
 257 | bb22ec8f00de73c19017479fdf981116 | 1 | 2023-11-11 00:31:57.304588  
 286 | 70bfb092ab95cc8c8f5aef9e2762af0c | 1 | 2023-11-11 00:31:57.304588  
 311 | c3b8f1eab66fe88cfd59e92d039fbaa6 | 1 | 2023-11-11 00:31:57.304588  
 452 | f1c000b8a200207c52b6c3f9a8babe72 | 1 | 2023-11-11 00:31:57.304588  
 470 | 7b4702f5a4ff75ef237cbf7af5b8d082 | 1 | 2023-11-11 00:31:57.304588  
 642 | 82d641bcf235eba92a9368f0ac553dca | 1 | 2023-11-11 00:31:57.304588  
 749 | 669488263cfcec928ac2693995309eea | 1 | 2023-11-11 00:31:57.304588  
(10 rows)  
  
  
db1=> select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts limit 5;  
  id  |               info               | c |             ts               
------+----------------------------------+---+----------------------------  
 1017 | e12d1088e80136e869cf82816b28ab51 | 1 | 2023-11-11 00:31:57.313518  
 1058 | 39dc148d9223bcaddcdb2096750e513b | 1 | 2023-11-11 00:31:57.313518  
 1070 | 31c83c561119140721f94e8b28669914 | 1 | 2023-11-11 00:31:57.313518  
 1102 | ee8f076e2e29299ace4c5fb71ddf9dda | 1 | 2023-11-11 00:31:57.313518  
 1148 | 37232358c5995513c19545b8232aced1 | 1 | 2023-11-11 00:31:57.313518  
(5 rows)  

fetch with ties

新标准: limit 改成 fetch , 返回可以根据需要超过Limit数, 把最后一条相同的ts都返回. 避免了翻页优化带来的gap. 老标准(limit)比较麻烦, 需要引入pk或uk来解决.

限制10条, 但是实际上有14条ts一样的都返回了.

db1=> select * from t_off where c=1 order by ts fetch first 10 row with ties;    
 id  |               info               | c |             ts               
-----+----------------------------------+---+----------------------------  
  16 | ce70259b8c0b5e83dca5f20cef93d1ee | 1 | 2023-11-11 00:31:57.304588  
  79 | 3a08bcf762a062e86318f516fddea793 | 1 | 2023-11-11 00:31:57.304588  
 230 | 01bc9f9c59b70311369aaa33c3cca3a7 | 1 | 2023-11-11 00:31:57.304588  
 257 | bb22ec8f00de73c19017479fdf981116 | 1 | 2023-11-11 00:31:57.304588  
 286 | 70bfb092ab95cc8c8f5aef9e2762af0c | 1 | 2023-11-11 00:31:57.304588  
 311 | c3b8f1eab66fe88cfd59e92d039fbaa6 | 1 | 2023-11-11 00:31:57.304588  
 452 | f1c000b8a200207c52b6c3f9a8babe72 | 1 | 2023-11-11 00:31:57.304588  
 470 | 7b4702f5a4ff75ef237cbf7af5b8d082 | 1 | 2023-11-11 00:31:57.304588  
 642 | 82d641bcf235eba92a9368f0ac553dca | 1 | 2023-11-11 00:31:57.304588  
 749 | 669488263cfcec928ac2693995309eea | 1 | 2023-11-11 00:31:57.304588  
 816 | 9a92daa3c8ceb90e023d127485c4a053 | 1 | 2023-11-11 00:31:57.304588  
 827 | 82abd37368d66eb03347429ff4b28e9e | 1 | 2023-11-11 00:31:57.304588  
 846 | 48f9737b18bf3e8dae7531a1b7fa4d45 | 1 | 2023-11-11 00:31:57.304588  
 990 | f7885a8e585ae7fa0c4b73dd356f260b | 1 | 2023-11-11 00:31:57.304588  
(14 rows)  

此时gap就没有了.

db1=> select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts fetch first 10 row with ties;    
  id  |               info               | c |             ts               
------+----------------------------------+---+----------------------------  
 1017 | e12d1088e80136e869cf82816b28ab51 | 1 | 2023-11-11 00:31:57.313518  
 1058 | 39dc148d9223bcaddcdb2096750e513b | 1 | 2023-11-11 00:31:57.313518  
 1070 | 31c83c561119140721f94e8b28669914 | 1 | 2023-11-11 00:31:57.313518  
 1102 | ee8f076e2e29299ace4c5fb71ddf9dda | 1 | 2023-11-11 00:31:57.313518  
 1148 | 37232358c5995513c19545b8232aced1 | 1 | 2023-11-11 00:31:57.313518  
 1425 | df4ff8d17227012f240b5ecee69d5758 | 1 | 2023-11-11 00:31:57.313518  
 1654 | c0bb3ebf9eda3a3a0228dd7784eccbfe | 1 | 2023-11-11 00:31:57.313518  
 1704 | 70283bbc87968f25aed18524030e1a21 | 1 | 2023-11-11 00:31:57.313518  
 1737 | 7c1b782104144f3cbc97e247e5572165 | 1 | 2023-11-11 00:31:57.313518  
 1779 | 8af082c81acc99fa2f8afdffd58f4d11 | 1 | 2023-11-11 00:31:57.313518  
 1861 | 0c9ecc39655bf834d1cdc07cf09ed362 | 1 | 2023-11-11 00:31:57.313518  
(11 rows)  

新标准非常适合翻页优化

db1=> explain select * from t_off where c=1 order by ts fetch first 10 row with ties;    
                                     QUERY PLAN                                        
-------------------------------------------------------------------------------------  
 Limit  (cost=0.28..7.64 rows=10 width=49)  
   ->  Index Scan using t_off_c_ts_idx on t_off  (cost=0.28..36.34 rows=49 width=49)  
         Index Cond: (c = 1)  
(3 rows)  
  
db1=> explain select * from t_off where c=1 and ts>'2023-11-11 00:31:57.304588' order by ts fetch first 10 row with ties;    
                                             QUERY PLAN                                               
----------------------------------------------------------------------------------------------------  
 Limit  (cost=0.28..8.51 rows=10 width=49)  
   ->  Index Scan using t_off_c_ts_idx on t_off  (cost=0.28..30.72 rows=37 width=49)  
         Index Cond: ((c = 1) AND (ts > '2023-11-11 00:31:57.304588'::timestamp without time zone))  
(3 rows)  

排序字段重复值过多, 数据倾斜可能导致一次fetch行数很多, 可能打卦程序内存, 怎么办?

如果担心数据倾斜(例如某个ts重复值非常非常多), 怕把程序内存打爆, 可以考虑2种解决方案.

1、使用游标返回结果. 很简单, 这里就不细说了.

declare cur cursor with hold for select * from t_off where c=1 order by ts fetch first 10 row with ties;   
fetch 10 from cur;  
-- 这个是铁定只会返回10行的.
...
close cur;

2、把ts字段改成表达式, 也就是让一个ties里面的值变成不一样, 但是不同ties依旧按原来的顺序排序, 这个表达式可以这么来定义:

这个方法不要求排序字段唯一, 只是减少重复值的个数, 所以还是要继续使用fetch first with ties的语法, 为了防止出现gap请不要使用limit offset.

这里用了row的Hashvalue作为rpads, 相同ts值的不同行可以得到不一样的value. 如果这个表有pK就更简单了, 直接pad Pk.

db1=> create or replace function ts_pads (timestamp, t_off) returns text as $$  
  select lpad(extract(epoch from $1)::text, 32, '0')||'00000000'||hash_record($2);  
$$ language sql strict immutable;  
CREATE FUNCTION  

这得益于PostgreSQL支持表达式索引功能, 以及丰富的hash函数, 还有丰富的函数语法, 数据类型.

sum(hash_record(row)) 也常常被用于数据同步的row级别正确性校验.

db1=> select ts_pads(ts, t_off), * from t_off where c=1 order by ts limit 10;  
                      ts_pads                       |  id  |               info               | c |             ts               
----------------------------------------------------+------+----------------------------------+---+----------------------------  
 0000000000000001699699981.971758000000001507273372 |   81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.97175800000000-806584326 |  282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.97175800000000-805238007 |    9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.97175800000000577266532  |  482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.971758000000001568192286 |  706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.971758000000001970342205 |  887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.97175800000000351589404  |  326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758  
 0000000000000001699699981.980919000000001959836742 | 1163 | 91eb04e59d10ebcc70a62c42b3c4b0b8 | 1 | 2023-11-11 10:53:01.980919  
 0000000000000001699699981.98091900000000308502439  | 1142 | b6beee97559f091c104e82abf17ce2d0 | 1 | 2023-11-11 10:53:01.980919  
 0000000000000001699699981.980919000000001189140041 | 1017 | 49fb4b01a2c8fc1aac222a0223f058b3 | 1 | 2023-11-11 10:53:01.980919  
(10 rows)  

索引改成表达式, 排序也改成表达式.

db1=> create index on t_off (c, ts_pads(ts,t_off));  
CREATE INDEX  

查询改成如下, 完美解决排序字段重复值过多的问题:

db1=> explain select * from t_off where c=1 order by ts_pads(ts,t_off) fetch first 10 row with ties;   
                                        QUERY PLAN                                          
------------------------------------------------------------------------------------------  
 Limit  (cost=0.28..11.29 rows=10 width=81)  
   ->  Index Scan using t_off_c_ts_pads_idx on t_off  (cost=0.28..36.60 rows=33 width=81)  
         Index Cond: (c = 1)  
(3 rows)  
  
db1=> select * from t_off where c=1 order by ts_pads(ts,t_off) fetch first 10 row with ties;   
  id  |               info               | c |             ts               
------+----------------------------------+---+----------------------------  
  770 | cd9f6d4e894cc6cad73b71906c4f4eb6 | 1 | 2023-11-11 10:53:01.971758  
    9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758  
  282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758  
   81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758  
  706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758  
  887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758  
  326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758  
  482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758  
 1864 | 4e6e51ecaa3a1e74865be0650784049f | 1 | 2023-11-11 10:53:01.980919  
 1751 | 5cdcc4c258370fa119b77f7f05884386 | 1 | 2023-11-11 10:53:01.980919  
(10 rows)  
  
db1=> update t_off set ts=now() where id=770;  
UPDATE 1  
  
db1=> select * from t_off where c=1 order by ts_pads(ts,t_off) fetch first 10 row with ties;   
  id  |               info               | c |             ts               
------+----------------------------------+---+----------------------------  
    9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758  
  282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758  
   81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758  
  706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758  
  887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758  
  326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758  
  482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758  
 1864 | 4e6e51ecaa3a1e74865be0650784049f | 1 | 2023-11-11 10:53:01.980919  
 1751 | 5cdcc4c258370fa119b77f7f05884386 | 1 | 2023-11-11 10:53:01.980919  
 1648 | 826f0a3a9ca5efe6e0a13ed64af65b5e | 1 | 2023-11-11 10:53:01.980919  
(10 rows)  
  
db1=> select * from t_off where c=1 order by ts fetch first 10 row with ties;   
  id  |               info               | c |             ts               
------+----------------------------------+---+----------------------------  
    9 | cc19da84df6272c1cc5646a9b9155bc3 | 1 | 2023-11-11 10:53:01.971758  
   81 | 19105b1e02029bcbcfb49cf2f141a733 | 1 | 2023-11-11 10:53:01.971758  
  282 | 2728526bc1ef81a09ee1a8bb1b06fd0a | 1 | 2023-11-11 10:53:01.971758  
  326 | b604c76a84c8b83fbe8ac18b228d4a3f | 1 | 2023-11-11 10:53:01.971758  
  482 | d9a3991ea826ed9f1dfff7b139961754 | 1 | 2023-11-11 10:53:01.971758  
  706 | 3efa6d0787186fd20bd8a786f0413ee0 | 1 | 2023-11-11 10:53:01.971758  
  887 | 792c41f97517e8fdd973c5c5256f8ef2 | 1 | 2023-11-11 10:53:01.971758  
 1017 | 49fb4b01a2c8fc1aac222a0223f058b3 | 1 | 2023-11-11 10:53:01.980919  
 1142 | b6beee97559f091c104e82abf17ce2d0 | 1 | 2023-11-11 10:53:01.980919  
 1163 | 91eb04e59d10ebcc70a62c42b3c4b0b8 | 1 | 2023-11-11 10:53:01.980919  
 1331 | 4256ed33ed688fbf744f597d72be6eb3 | 1 | 2023-11-11 10:53:01.980919  
 1523 | 60a6d5cdfd6750f474dce5069b6db200 | 1 | 2023-11-11 10:53:01.980919  
 1627 | c0ab21f437c1dc16ecefd1ebff262efc | 1 | 2023-11-11 10:53:01.980919  
 1648 | 826f0a3a9ca5efe6e0a13ed64af65b5e | 1 | 2023-11-11 10:53:01.980919  
 1674 | 537a002e6e8f1c68dee05a2b140ea736 | 1 | 2023-11-11 10:53:01.980919  
 1702 | d0f805e3306484e7835bd9fbeadcc8b9 | 1 | 2023-11-11 10:53:01.980919  
 1751 | 5cdcc4c258370fa119b77f7f05884386 | 1 | 2023-11-11 10:53:01.980919  
 1790 | 9696a0e5356806fc1c72a876616915ad | 1 | 2023-11-11 10:53:01.980919  
 1857 | b9f78f7d1ea5f8d65fe04e9318bd1849 | 1 | 2023-11-11 10:53:01.980919  
 1864 | 4e6e51ecaa3a1e74865be0650784049f | 1 | 2023-11-11 10:53:01.980919  
 1961 | f6d5624a8f26f1ad91dedd134192b694 | 1 | 2023-11-11 10:53:01.980919  
(21 rows)  
  
select * from t_off where c=1 and ts_pads(ts,t_off) >   
  ts_pads('2023-11-11 10:53:01.980919', row(1648,'826f0a3a9ca5efe6e0a13ed64af65b5e',1,'2023-11-11 10:53:01.980919')::t_off)  
order by ts_pads(ts,t_off) fetch first 10 row with ties;   
  
db1=> select * from t_off where c=1 and ts_pads(ts,t_off) >   
  ts_pads('2023-11-11 10:53:01.980919', row(1648,'826f0a3a9ca5efe6e0a13ed64af65b5e',1,'2023-11-11 10:53:01.980919')::t_off)  
order by ts_pads(ts,t_off) fetch first 10 row with ties;   
  id  |               info               | c |             ts               
------+----------------------------------+---+----------------------------  
 1523 | 60a6d5cdfd6750f474dce5069b6db200 | 1 | 2023-11-11 10:53:01.980919  
 1017 | 49fb4b01a2c8fc1aac222a0223f058b3 | 1 | 2023-11-11 10:53:01.980919  
 1331 | 4256ed33ed688fbf744f597d72be6eb3 | 1 | 2023-11-11 10:53:01.980919  
 1961 | f6d5624a8f26f1ad91dedd134192b694 | 1 | 2023-11-11 10:53:01.980919  
 1163 | 91eb04e59d10ebcc70a62c42b3c4b0b8 | 1 | 2023-11-11 10:53:01.980919  
 1627 | c0ab21f437c1dc16ecefd1ebff262efc | 1 | 2023-11-11 10:53:01.980919  
 1857 | b9f78f7d1ea5f8d65fe04e9318bd1849 | 1 | 2023-11-11 10:53:01.980919  
 1142 | b6beee97559f091c104e82abf17ce2d0 | 1 | 2023-11-11 10:53:01.980919  
 1790 | 9696a0e5356806fc1c72a876616915ad | 1 | 2023-11-11 10:53:01.980919  
 1702 | d0f805e3306484e7835bd9fbeadcc8b9 | 1 | 2023-11-11 10:53:01.980919  
(10 rows)  
  
db1=> explain select * from t_off where c=1 and ts_pads(ts,t_off) >   
  ts_pads('2023-11-11 10:53:01.980919', row(1648,'826f0a3a9ca5efe6e0a13ed64af65b5e',1,'2023-11-11 10:53:01.980919')::t_off)  
order by ts_pads(ts,t_off) fetch first 10 row with ties;   
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.28..13.98 rows=10 width=81)  
   ->  Index Scan using t_off_c_ts_pads_idx on t_off  (cost=0.28..15.35 rows=11 width=81)  
         Index Cond: ((c = 1) AND (ts_pads(ts, t_off.*) > '0000000000000001699699981.98091900000000-721814744'::text))  
(3 rows)  

可惜系统列不支持index, 否则使用行号(ctid)会更简单.

create or replace function ts_pads1 (timestamp, tid) returns text as $$  
  select lpad(extract(epoch from $1)::text, 32, '0')||'00000000'||hashtid($2);  
$$ language sql strict immutable;  
  
db1=> \set VERBOSITY verbose

db1=> create index on t_off (c, ts_pads1(ts, ctid));  
ERROR:  0A000: index creation on system columns is not supported
LOCATION:  DefineIndex, indexcmds.c:1083

digoal's wechat