digoal
2023-06-13
PostgreSQL , PolarDB , merge sort , incremental sort , 优化器 , query rewrite
《PostgreSQL 14 preview - 支持window function的incremental sort》
《PostgreSQL 11 preview - Incremental Sort(排序优化)》
《PostgreSQL 16 preview - 优化器支持Incremental Sort for DISTINCT》
incremental sort主要是减少大批量数据排序带来的CPU开销, 如果数据扫描过程中已经保证了数据返回是有序的, 在下一个节点则可以使用增量排序.
但是支持比较有限, 例如下面这些场景都没有办法用到incremental sort, 也没有使用merge sort.
postgres=# create table test (c1 int, c2 int, c3 timestamp);
CREATE TABLE
postgres=# insert into test select random()*100, random()*10, clock_timestamp() from generate_series(1,1000000);
INSERT 0 1000000
postgres=# create index on test (c1,c2,c3);
CREATE INDEX
postgres=# analyze test;
ANALYZE
postgres=# insert into test select random()*100, random()*10, clock_timestamp() from generate_series(1,1000000);
INSERT 0 1000000
postgres=# analyze test;
ANALYZE
postgres=# explain analyze select * from (select * from test where c1=1 and c2=1 union all select * from test where c1=1 and c2=2 ) as t order by c3;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=392.97..403.68 rows=4286 width=16) (actual time=3.509..3.883 rows=4019 loops=1)
Sort Key: test.c3
Sort Method: quicksort Memory: 253kB
-> Append (cost=0.43..134.41 rows=4286 width=16) (actual time=0.117..2.032 rows=4019 loops=1)
-> Index Only Scan using test_c1_c2_c3_idx on test (cost=0.43..56.41 rows=2139 width=16) (actual time=0.115..0.704 rows=2003 loops=1)
Index Cond: ((c1 = 1) AND (c2 = 1))
Heap Fetches: 0
-> Index Only Scan using test_c1_c2_c3_idx on test test_1 (cost=0.43..56.57 rows=2147 width=16) (actual time=0.024..0.534 rows=2016 loops=1)
Index Cond: ((c1 = 1) AND (c2 = 2))
Heap Fetches: 0
Planning Time: 0.396 ms
Execution Time: 4.250 ms
(12 rows)
postgres=# explain analyze select * from (select * from test where c1=1 union all select * from test where c1=2 ) as t order by c3;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4093.03..4190.86 rows=39133 width=16) (actual time=20.905..23.855 rows=40070 loops=1)
Sort Key: test.c3
Sort Method: quicksort Memory: 3102kB
-> Append (cost=0.43..1107.95 rows=39133 width=16) (actual time=0.023..11.305 rows=40070 loops=1)
-> Index Only Scan using test_c1_c2_c3_idx on test (cost=0.43..470.53 rows=20200 width=16) (actual time=0.023..3.537 rows=20042 loops=1)
Index Cond: (c1 = 1)
Heap Fetches: 0
-> Index Only Scan using test_c1_c2_c3_idx on test test_1 (cost=0.43..441.75 rows=18933 width=16) (actual time=0.030..2.616 rows=20028 loops=1)
Index Cond: (c1 = 2)
Heap Fetches: 0
Planning Time: 0.255 ms
Execution Time: 26.104 ms
(12 rows)
postgres=# explain analyze select * from test where c1 in (1,2) order by c3;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3895.17..3993.01 rows=39133 width=16) (actual time=21.573..24.208 rows=40070 loops=1)
Sort Key: c3
Sort Method: quicksort Memory: 3102kB
-> Index Only Scan using test_c1_c2_c3_idx on test (cost=0.43..910.09 rows=39133 width=16) (actual time=0.032..11.376 rows=40070 loops=1)
Index Cond: (c1 = ANY ('{1,2}'::integer[]))
Heap Fetches: 0
Planning Time: 0.140 ms
Execution Time: 26.950 ms
(8 rows)
postgres=# set work_mem ='64kB';
SET
postgres=# explain analyze select * from (select * from test where c1=1 union all select * from test where c1=2 ) as t order by c3;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4880.23..4978.06 rows=39133 width=16) (actual time=29.418..32.977 rows=40070 loops=1)
Sort Key: test.c3
Sort Method: external merge Disk: 1048kB
-> Append (cost=0.43..1107.95 rows=39133 width=16) (actual time=0.030..11.332 rows=40070 loops=1)
-> Index Only Scan using test_c1_c2_c3_idx on test (cost=0.43..470.53 rows=20200 width=16) (actual time=0.029..3.559 rows=20042 loops=1)
Index Cond: (c1 = 1)
Heap Fetches: 0
-> Index Only Scan using test_c1_c2_c3_idx on test test_1 (cost=0.43..441.75 rows=18933 width=16) (actual time=0.015..2.645 rows=20028 loops=1)
Index Cond: (c1 = 2)
Heap Fetches: 0
Planning Time: 0.212 ms
Execution Time: 36.555 ms
(12 rows)
以上都没有用到merge sort, incremental sort.
下面就更加不行了, 因为还涉及query rewrite.
create table test(id int, c1 text, c2 date, c3 text, c4 float);
create index idx on test (c1,c2,c4 desc)
select * from test
where
c1 in ('1','2','3')
and c2 between current_date-1 and current_date
order by c4 desc limit 10;
c1, c2
都是离散值.
c1,c2
在输入条件c1 in ('1','2','3') and c2 between current_date-1 and current_date
中的所有可能值可以组成一个有限的排列组合.
idx
索引的每个c1,c2
组合下的所有index tuple
在c4 desc
上是有序的, 所以使用idx
索引, 按每个c1,c2
组合跳跃查询, 并持续使用merge sort(c4)
, 即可最高效的返回数据.
每个c1,c2
组合跳跃查询可以使用并行进行. 例如每个worker扫描一组c1,c2
条件.