digoal
2023-07-05
PostgreSQL , PolarDB , 索引推荐 , 性能优化
索引是数据库常见优化手段(快速定位目标数据, 减少数据扫码带来的IO消耗和过滤条件带来的计算消耗, 从而提升性能.), 但是很多时候开发者可能并不知道要怎么创建索引, 所以通常是上线后才发现性能不行.
《PostgreSQL 索引推荐 - HypoPG , pg_qualstats》
《PostgreSQL SQL自动优化案例 - 极简,自动推荐索引》
https://github.com/powa-team/pg_qualstats
PolarDB和PostgreSQL开源通过 pg_qualstats 优化数据库性能: 推荐索引
不过经过测试发现pg_qualstats功能比较弱, 例如推荐btree索引比较理想, 其他索引接口(例如gin, gist, brin, bloom, sp-gist等)不太理想. 但是它另一个核心价值是采样并存储过滤性较差的SQL. 即使有的SQL没自动推荐索引, 作为DBA则可以用自己的知识来解决, pg_qualstats充当发现过滤性差的SQL的角色.
如果你对索引不太了解, 可以阅读:
《自动选择正确索引访问接口(btree,hash,gin,gist,sp-gist,brin,bitmap...)的方法》
postgres=# create table a (id int, info text, ts timestamp);
CREATE TABLE
postgres=# insert into a select generate_series(1,1000000), md5(random()::Text), clock_timestamp();
INSERT 0 1000000
把采样设置为1, 全采
postgres=# set pg_qualstats.sample_rate=1;
SET
postgres=# select count(*) from a where id=1;
count
-------
1
(1 row)
postgres=# select count(*) from a where info ~ 'abcd';
count
-------
428
(1 row)
自动推荐了1个索引, 模糊查询没有自动推荐
postgres=# SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'indexes') ORDER BY v::text COLLATE "C";
v
------------------------------------------------------------------------
{"ddl" : "CREATE INDEX ON public.a USING btree (id)", "queryids" : []}
(1 row)
但是, 可以从下面的信息中看到模糊查询被采样了, 过滤性较差.
postgres=# select * from pg_qualstats_pretty;
left_schema | left_table | left_column | operator | right_schema | right_table | right_column | occurences | execution_count | nbfiltered
-------------+------------+-------------+--------------+--------------+-------------+--------------+------------+-----------------+------------
public | a | id | pg_catalog.= | | | | 1 | 1000000 | 999999
public | a | info | pg_catalog.~ | | | | 1 | 1000000 | 999572
(2 rows)
没有被优化的SQL如下:
postgres=# SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'unoptimised') ORDER BY v::text COLLATE "C";
v
------------------------------------------
{"qual" : "a.info ~ ?", "queryids" : []}
(1 row)
postgres=# select count(*) from a where info like 'abcd%';
count
-------
6
(1 row)
postgres=# SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'unoptimised') ORDER BY v::text COLLATE "C";
v
-------------------------------------------
{"qual" : "a.info ~ ?", "queryids" : []}
{"qual" : "a.info ~~ ?", "queryids" : []}
(2 rows)
postgres=# select * from pg_qualstats_pretty;
left_schema | left_table | left_column | operator | right_schema | right_table | right_column | occurences | execution_count | nbfiltered
-------------+------------+-------------+---------------+--------------+-------------+--------------+------------+-----------------+------------
public | a | id | pg_catalog.= | | | | 1 | 1000000 | 999999
public | a | info | pg_catalog.~ | | | | 1 | 1000000 | 999572
public | a | info | pg_catalog.~~ | | | | 1 | 1000000 | 999994
(3 rows)
postgres=# SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'indexes') ORDER BY v::text COLLATE "C";
v
------------------------------------------------------------------------
{"ddl" : "CREATE INDEX ON public.a USING btree (id)", "queryids" : []}
(1 row)
通过以上测试, 对数据库索引掌握比较好的同学就知道该怎么优化那些没有被pg_qualstats优化的SQL:
create extension pg_trgm;
create index on a using gin (info gin_trgm_ops);
再次查询就变快了.
postgres=# explain select count(*) from a where info like 'abcd%';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=137.64..137.65 rows=1 width=8)
-> Bitmap Heap Scan on a (cost=27.18..137.39 rows=100 width=0)
Recheck Cond: (info ~~ 'abcd%'::text)
-> Bitmap Index Scan on a_info_idx (cost=0.00..27.15 rows=100 width=0)
Index Cond: (info ~~ 'abcd%'::text)
(5 rows)
postgres=# select count(*) from a where info like 'abcd%';
count
-------
6
(1 row)
postgres=# \timing
Timing is on.
postgres=# select count(*) from a where info ~ 'abcd';
count
-------
428
(1 row)
Time: 70.469 ms
这里pg_qualstats推荐错了, 方法用gin, 但是ops缺用gist.
postgres=# SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'indexes') ORDER BY v::text COLLATE "C";
v
--------------------------------------------------------------------------------------
{"ddl" : "CREATE INDEX ON public.a USING btree (id)", "queryids" : []}
{"ddl" : "CREATE INDEX ON public.a USING gin (info gist_trgm_ops)", "queryids" : []}
{"ddl" : "CREATE INDEX ON public.a USING gin (info gist_trgm_ops)", "queryids" : []}
(3 rows)
Time: 34.414 ms
postgres=# SELECT v FROM json_array_elements( pg_qualstats_index_advisor(min_filter => 50)->'unoptimised') ORDER BY v::text COLLATE "C";
v
---
(0 rows)
Time: 14.443 ms