Skip to content

Latest commit

 

History

History
165 lines (116 loc) · 8.09 KB

20230613_02.md

File metadata and controls

165 lines (116 loc) · 8.09 KB

DB吐槽大会,第86期 - PG 优化器 incremental sort , merge sort , query rewrite 支持场景有限

作者

digoal

日期

2023-06-13

标签

PostgreSQL , PolarDB , merge sort , incremental sort , 优化器 , query rewrite


背景

视频回放

《PostgreSQL 14 preview - 支持window function的incremental sort》

《PostgreSQL 11 preview - Incremental Sort(排序优化)》

《PostgreSQL 16 preview - 优化器支持Incremental Sort for DISTINCT》

incremental sort主要是减少大批量数据排序带来的CPU开销, 如果数据扫描过程中已经保证了数据返回是有序的, 在下一个节点则可以使用增量排序.

但是支持比较有限, 例如下面这些场景都没有办法用到incremental sort, 也没有使用merge sort.

postgres=# create table test (c1 int, c2 int, c3 timestamp);  
CREATE TABLE  
postgres=# insert into test select random()*100, random()*10, clock_timestamp() from generate_series(1,1000000);  
INSERT 0 1000000  
postgres=# create index on test (c1,c2,c3);  
CREATE INDEX  
postgres=# analyze test;  
ANALYZE  
postgres=# insert into test select random()*100, random()*10, clock_timestamp() from generate_series(1,1000000);  
INSERT 0 1000000  
postgres=# analyze test;  
ANALYZE  
postgres=# explain analyze select * from (select * from test where c1=1 and c2=1 union all select * from test where c1=1 and c2=2 ) as t order by c3;  
                                                                       QUERY PLAN                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=392.97..403.68 rows=4286 width=16) (actual time=3.509..3.883 rows=4019 loops=1)  
   Sort Key: test.c3  
   Sort Method: quicksort  Memory: 253kB  
   ->  Append  (cost=0.43..134.41 rows=4286 width=16) (actual time=0.117..2.032 rows=4019 loops=1)  
         ->  Index Only Scan using test_c1_c2_c3_idx on test  (cost=0.43..56.41 rows=2139 width=16) (actual time=0.115..0.704 rows=2003 loops=1)  
               Index Cond: ((c1 = 1) AND (c2 = 1))  
               Heap Fetches: 0  
         ->  Index Only Scan using test_c1_c2_c3_idx on test test_1  (cost=0.43..56.57 rows=2147 width=16) (actual time=0.024..0.534 rows=2016 loops=1)  
               Index Cond: ((c1 = 1) AND (c2 = 2))  
               Heap Fetches: 0  
 Planning Time: 0.396 ms  
 Execution Time: 4.250 ms  
(12 rows)  
  
  
  
postgres=# explain analyze select * from (select * from test where c1=1  union all select * from test where c1=2 ) as t order by c3;  
                                                                        QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=4093.03..4190.86 rows=39133 width=16) (actual time=20.905..23.855 rows=40070 loops=1)  
   Sort Key: test.c3  
   Sort Method: quicksort  Memory: 3102kB  
   ->  Append  (cost=0.43..1107.95 rows=39133 width=16) (actual time=0.023..11.305 rows=40070 loops=1)  
         ->  Index Only Scan using test_c1_c2_c3_idx on test  (cost=0.43..470.53 rows=20200 width=16) (actual time=0.023..3.537 rows=20042 loops=1)  
               Index Cond: (c1 = 1)  
               Heap Fetches: 0  
         ->  Index Only Scan using test_c1_c2_c3_idx on test test_1  (cost=0.43..441.75 rows=18933 width=16) (actual time=0.030..2.616 rows=20028 loops=1)  
               Index Cond: (c1 = 2)  
               Heap Fetches: 0  
 Planning Time: 0.255 ms  
 Execution Time: 26.104 ms  
(12 rows)  
  
  
postgres=# explain analyze select * from test where c1 in (1,2)  order by c3;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=3895.17..3993.01 rows=39133 width=16) (actual time=21.573..24.208 rows=40070 loops=1)  
   Sort Key: c3  
   Sort Method: quicksort  Memory: 3102kB  
   ->  Index Only Scan using test_c1_c2_c3_idx on test  (cost=0.43..910.09 rows=39133 width=16) (actual time=0.032..11.376 rows=40070 loops=1)  
         Index Cond: (c1 = ANY ('{1,2}'::integer[]))  
         Heap Fetches: 0  
 Planning Time: 0.140 ms  
 Execution Time: 26.950 ms  
(8 rows)  
  
  
  
postgres=# set work_mem ='64kB';  
SET  
postgres=# explain analyze select * from (select * from test where c1=1  union all select * from test where c1=2 ) as t order by c3;  
                                                                        QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=4880.23..4978.06 rows=39133 width=16) (actual time=29.418..32.977 rows=40070 loops=1)  
   Sort Key: test.c3  
   Sort Method: external merge  Disk: 1048kB  
   ->  Append  (cost=0.43..1107.95 rows=39133 width=16) (actual time=0.030..11.332 rows=40070 loops=1)  
         ->  Index Only Scan using test_c1_c2_c3_idx on test  (cost=0.43..470.53 rows=20200 width=16) (actual time=0.029..3.559 rows=20042 loops=1)  
               Index Cond: (c1 = 1)  
               Heap Fetches: 0  
         ->  Index Only Scan using test_c1_c2_c3_idx on test test_1  (cost=0.43..441.75 rows=18933 width=16) (actual time=0.015..2.645 rows=20028 loops=1)  
               Index Cond: (c1 = 2)  
               Heap Fetches: 0  
 Planning Time: 0.212 ms  
 Execution Time: 36.555 ms  
(12 rows)  

以上都没有用到merge sort, incremental sort.

下面就更加不行了, 因为还涉及query rewrite.

create table test(id int, c1 text, c2 date, c3 text, c4 float);  
  
create index idx on test (c1,c2,c4 desc)  
  
select * from test  
where  
c1 in ('1','2','3')  
and c2 between current_date-1 and current_date  
order by c4 desc limit 10;  

c1, c2 都是离散值.

c1,c2在输入条件c1 in ('1','2','3') and c2 between current_date-1 and current_date中的所有可能值可以组成一个有限的排列组合.

idx索引的每个c1,c2组合下的所有index tuplec4 desc上是有序的, 所以使用idx索引, 按每个c1,c2组合跳跃查询, 并持续使用merge sort(c4), 即可最高效的返回数据.

每个c1,c2组合跳跃查询可以使用并行进行. 例如每个worker扫描一组c1,c2条件.

digoal's wechat