Skip to content

Latest commit

 

History

History
193 lines (132 loc) · 7.99 KB

20230705_03.md

File metadata and controls

193 lines (132 loc) · 7.99 KB

PolarDB和PostgreSQL开源通过pg_qualstats 优化数据库性能: 推荐索引

作者

digoal

日期

2023-07-05

标签

PostgreSQL , PolarDB , 索引推荐 , 性能优化


背景

索引是数据库常见优化手段(快速定位目标数据, 减少数据扫码带来的IO消耗和过滤条件带来的计算消耗, 从而提升性能.), 但是很多时候开发者可能并不知道要怎么创建索引, 所以通常是上线后才发现性能不行.

《PostgreSQL 索引推荐 - HypoPG , pg_qualstats》

《PostgreSQL SQL自动优化案例 - 极简,自动推荐索引》

https://github.com/powa-team/pg_qualstats

PolarDB和PostgreSQL开源通过 pg_qualstats 优化数据库性能: 推荐索引

不过经过测试发现pg_qualstats功能比较弱, 例如推荐btree索引比较理想, 其他索引接口(例如gin, gist, brin, bloom, sp-gist等)不太理想. 但是它另一个核心价值是采样并存储过滤性较差的SQL. 即使有的SQL没自动推荐索引, 作为DBA则可以用自己的知识来解决, pg_qualstats充当发现过滤性差的SQL的角色.

如果你对索引不太了解, 可以阅读:

《自动选择正确索引访问接口(btree,hash,gin,gist,sp-gist,brin,bitmap...)的方法》

《PostgreSQL 9种索引的原理和应用场景》

例子

postgres=# create table a (id int, info text, ts timestamp);  
CREATE TABLE  
postgres=# insert into a select generate_series(1,1000000), md5(random()::Text), clock_timestamp();  
INSERT 0 1000000  
  
把采样设置为1, 全采  
postgres=# set pg_qualstats.sample_rate=1;  
SET  
  
  
postgres=# select count(*) from a where id=1;  
 count    
-------  
     1  
(1 row)  
  
postgres=# select count(*) from a where info ~ 'abcd';    
  count    
-------  
   428  
(1 row)  
  
  
自动推荐了1个索引, 模糊查询没有自动推荐  
  
postgres=# SELECT v       FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'indexes')       ORDER BY v::text COLLATE "C";  
                                   v    
------------------------------------------------------------------------  
 {"ddl" : "CREATE INDEX ON public.a USING btree (id)", "queryids" : []}  
(1 row)  
  
但是, 可以从下面的信息中看到模糊查询被采样了, 过滤性较差.  
  
postgres=# select * from pg_qualstats_pretty;  
 left_schema | left_table | left_column |   operator   | right_schema | right_table | right_column | occurences | execution_count | nbfiltered    
-------------+------------+-------------+--------------+--------------+-------------+--------------+------------+-----------------+------------  
 public      | a          | id          | pg_catalog.= |              |             |              |          1 |         1000000 |     999999  
 public      | a          | info        | pg_catalog.~ |              |             |              |          1 |         1000000 |     999572  
(2 rows)  
  
没有被优化的SQL如下:  
  
postgres=# SELECT v       FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'unoptimised')   ORDER BY v::text COLLATE "C";  
                    v    
------------------------------------------  
 {"qual" : "a.info ~ ?", "queryids" : []}  
(1 row)  
  
postgres=# select count(*) from a where info like 'abcd%';    
 count    
-------  
     6  
(1 row)  
  
postgres=# SELECT v       FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'unoptimised')   ORDER BY v::text COLLATE "C";  
                     v    
-------------------------------------------  
 {"qual" : "a.info ~ ?", "queryids" : []}  
 {"qual" : "a.info ~~ ?", "queryids" : []}  
(2 rows)  
  
postgres=# select * from pg_qualstats_pretty;  
  
left_schema | left_table | left_column |   operator    | right_schema | right_table | right_column | occurences | execution_count | nbfiltered    
-------------+------------+-------------+---------------+--------------+-------------+--------------+------------+-----------------+------------  
 public      | a          | id          | pg_catalog.=  |              |             |              |          1 |         1000000 |     999999  
 public      | a          | info        | pg_catalog.~  |              |             |              |          1 |         1000000 |     999572  
 public      | a          | info        | pg_catalog.~~ |              |             |              |          1 |         1000000 |     999994  
(3 rows)  
  
postgres=# SELECT v       FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'indexes')       ORDER BY v::text COLLATE "C";  
                                   v    
------------------------------------------------------------------------  
 {"ddl" : "CREATE INDEX ON public.a USING btree (id)", "queryids" : []}  
(1 row)  

通过以上测试, 对数据库索引掌握比较好的同学就知道该怎么优化那些没有被pg_qualstats优化的SQL:

create extension pg_trgm;  
create index on a using gin (info gin_trgm_ops);  

再次查询就变快了.

postgres=# explain select count(*) from a where info like 'abcd%';    
                                    QUERY PLAN                                      
----------------------------------------------------------------------------------  
 Aggregate  (cost=137.64..137.65 rows=1 width=8)  
   ->  Bitmap Heap Scan on a  (cost=27.18..137.39 rows=100 width=0)  
         Recheck Cond: (info ~~ 'abcd%'::text)  
         ->  Bitmap Index Scan on a_info_idx  (cost=0.00..27.15 rows=100 width=0)  
               Index Cond: (info ~~ 'abcd%'::text)  
(5 rows)  
  
postgres=# select count(*) from a where info like 'abcd%';    
 count   
-------  
     6  
(1 row)  
  
postgres=# \timing  
Timing is on.  
postgres=# select count(*) from a where info ~ 'abcd';    
 count   
-------  
   428  
(1 row)  
  
Time: 70.469 ms  
  
  
这里pg_qualstats推荐错了, 方法用gin, 但是ops缺用gist.  
  
postgres=# SELECT v  FROM json_array_elements(  pg_qualstats_index_advisor(min_filter => 50)->'indexes') ORDER BY v::text COLLATE "C";  
                                          v                                             
--------------------------------------------------------------------------------------  
 {"ddl" : "CREATE INDEX ON public.a USING btree (id)", "queryids" : []}  
 {"ddl" : "CREATE INDEX ON public.a USING gin (info gist_trgm_ops)", "queryids" : []}  
 {"ddl" : "CREATE INDEX ON public.a USING gin (info gist_trgm_ops)", "queryids" : []}  
(3 rows)  
  
Time: 34.414 ms  
postgres=# SELECT v   FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'unoptimised') ORDER BY v::text COLLATE "C";  
 v   
---  
(0 rows)  
  
Time: 14.443 ms  

digoal's wechat