Skip to content

Latest commit

 

History

History
326 lines (205 loc) · 12.4 KB

20190330_05.md

File metadata and controls

326 lines (205 loc) · 12.4 KB

PostgreSQL 12 preview - 自定义统计信息 - 支持mcv 多列高频词

作者

digoal

日期

2019-03-30

标签

PostgreSQL , mcv , 自定义统计信息


背景

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7300a699502fe5432b05fbc75baca534b080bebb

数据库为了得到一条SQL的最优执行计划,通常需要通过优化器来解决。

PostgreSQL 支持很多种优化器,例如cbo, geqo, aqo等。

《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》

以CBO为例,基于成本的优化。成本计算的好与坏直接决定了执行计划的好坏。成本计算相关因素:公式本身,因子,以及统计信息的准确性。

统计信息又包括:相关性,唯一值个数,高频值,高频值比例,柱状图等等。

postgres=# \d pg_stats  
                     View "pg_catalog.pg_stats"  
         Column         |   Type   | Collation | Nullable | Default   
------------------------+----------+-----------+----------+---------  
 schemaname             | name     |           |          |   
 tablename              | name     |           |          |   
 attname                | name     |           |          |   
 inherited              | boolean  |           |          |   
 null_frac              | real     |           |          |   
 avg_width              | integer  |           |          |   
 n_distinct             | real     |           |          |   
 most_common_vals       | anyarray |           |          |   
 most_common_freqs      | real[]   |           |          |   
 histogram_bounds       | anyarray |           |          |   
 correlation            | real     |           |          |   
 most_common_elems      | anyarray |           |          |   
 most_common_elem_freqs | real[]   |           |          |   
 elem_count_histogram   | real[]   |           |          |   

默认情况下PG只统计每一列的统计信息,对于复杂条件(多个字段的AND OR或者包括字段、表达式的AND OR组合条件,根据统计信息得到的选择性的评估可能不准确,导致SQL优化得到的执行计划不准确),为了提高复杂条件的SQL优化准确性,PG 10开始支持自定义统计信息。

《PostgreSQL 10 黑科技 - 自定义统计信息》

用户可以根据SQL的条件,自己定义与条件相符的多列组合统计信息。

PG 10的版本支持的较为简陋,自定义统计信息仅包括多少个唯一值、相关性。

PG 12,将支持多字段组合高频词统计等内容,提高评估准确性。

Add support for multivariate MCV lists   
  
Introduce a third extended statistic type, supported by the CREATE  
STATISTICS command - MCV lists, a generalization of the statistic  
already built and used for individual columns.  
  
Compared to the already supported types (n-distinct coefficients and  
functional dependencies), MCV lists are more complex, include column  
values and allow estimation of much wider range of common clauses  
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).  
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.  
  
Author: Tomas Vondra  
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera  
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com  
postgres=# \d pg_statistic_ext  
                Table "pg_catalog.pg_statistic_ext"  
     Column      |      Type       | Collation | Nullable | Default   
-----------------+-----------------+-----------+----------+---------  
 oid             | oid             |           | not null |   
 stxrelid        | oid             |           | not null |   
 stxname         | name            |           | not null |   
 stxnamespace    | oid             |           | not null |   
 stxowner        | oid             |           | not null |   
 stxkeys         | int2vector      |           | not null |   
 stxkind         | "char"[]        |           | not null |   
 stxndistinct    | pg_ndistinct    | C         |          |   
 stxdependencies | pg_dependencies | C         |          |   
 stxmcv          | pg_mcv_list     | C         |          |   
Indexes:  
    "pg_statistic_ext_name_index" UNIQUE, btree (stxname, stxnamespace)  
    "pg_statistic_ext_oid_index" UNIQUE, btree (oid)  
    "pg_statistic_ext_relid_index" btree (stxrelid)  

例子

postgres=# \h create stati  
Command:     CREATE STATISTICS  
Description: define extended statistics  
Syntax:  
CREATE STATISTICS [ IF NOT EXISTS ] statistics_name  
    [ ( statistics_kind [, ... ] ) ]  
    ON column_name, column_name [, ...]  
    FROM table_name  
  
URL: https://www.postgresql.org/docs/devel/sql-createstatistics.html  
postgres=# create table abc(c1 int, c2 int, c3 int);  
CREATE TABLE  
  
postgres=# insert into abc select random()*10, random()*20, random()*30 from generate_series(1,100000);  
INSERT 0 100000  
  
postgres=# insert into abc select random()*10, random()*20, 1 from generate_series(1,100000);
INSERT 0 100000

postgres=# create statistics st1 (ndistinct,dependencies,mcv) on c1,c2,c3 from abc;  
CREATE STATISTICS  
postgres=# vacuum analyze abc;  
VACUUM  
postgres=# select * from pg_statistic_ext ;  
  oid  | stxrelid | stxname | stxnamespace | stxowner | stxkeys | stxkind |                       stxndistinct                       | stxdependencies | stxmcv   
-------+----------+---------+--------------+----------+---------+---------+----------------------------------------------------------+-----------------+--------  
 16395 |    16391 | st1     |         2200 |       10 | 1 2 3   | {d,f,m} | {"1, 2": 231, "1, 3": 341, "2, 3": 651, "1, 2, 3": 6996} |                 |   
(1 row)  
  
postgres=# \x  
Expanded display is on.  
postgres=# select * from pg_statistic_ext ;  
-[ RECORD 1 ]---+---------------------------------------------------------  
oid             | 16395  
stxrelid        | 16391  
stxname         | st1  
stxnamespace    | 2200  
stxowner        | 10  
stxkeys         | 1 2 3  
stxkind         | {d,f,m}  
stxndistinct    | {"1, 2": 231, "1, 3": 341, "2, 3": 651, "1, 2, 3": 6996}  
stxdependencies |   
stxmcv          |   
postgres=# select pg_mcv_list_items(stxmcv) from pg_statistic_ext ;
               pg_mcv_list_items               
-----------------------------------------------
 (0,"{2, 13, 1}","{f,f,f}",0.0034,0.002646)
 (1,"{6, 13, 1}","{f,f,f}",0.003367,0.002641)
 (2,"{3, 7, 1}","{f,f,f}",0.003267,0.002628)
 (3,"{2, 2, 1}","{f,f,f}",0.003167,0.002703)
 (4,"{5, 19, 1}","{f,f,f}",0.003133,0.002647)
 (5,"{8, 15, 1}","{f,f,f}",0.003133,0.002524)
 (6,"{9, 12, 1}","{f,f,f}",0.0031,0.002598)
 (7,"{8, 6, 1}","{f,f,f}",0.0031,0.002588)
 (8,"{3, 9, 1}","{f,f,f}",0.003067,0.002641)
 (9,"{3, 6, 1}","{f,f,f}",0.003067,0.002564)
 (10,"{7, 4, 1}","{f,f,f}",0.003067,0.002543)
 (11,"{6, 17, 1}","{f,f,f}",0.003033,0.002544)
 (12,"{5, 6, 1}","{f,f,f}",0.003,0.002604)
 (13,"{6, 2, 1}","{f,f,f}",0.003,0.002698)
 (14,"{8, 8, 1}","{f,f,f}",0.003,0.002709)
 (15,"{6, 4, 1}","{f,f,f}",0.002967,0.002564)
 (16,"{2, 5, 1}","{f,f,f}",0.002967,0.002622)
 (17,"{7, 9, 1}","{f,f,f}",0.002967,0.002633)
 (18,"{3, 13, 1}","{f,f,f}",0.002967,0.002628)
 (19,"{2, 14, 1}","{f,f,f}",0.002967,0.002658)
 (20,"{5, 14, 1}","{f,f,f}",0.002933,0.002681)
 (21,"{2, 3, 1}","{f,f,f}",0.002933,0.002533)
 (22,"{5, 8, 1}","{f,f,f}",0.002933,0.002725)
 (23,"{3, 3, 1}","{f,f,f}",0.0029,0.002515)
 (24,"{7, 3, 1}","{f,f,f}",0.0029,0.002507)
 (25,"{1, 2, 1}","{f,f,f}",0.0029,0.002687)
 (26,"{2, 19, 1}","{f,f,f}",0.002867,0.002625)
 (27,"{1, 16, 1}","{f,f,f}",0.002867,0.002589)
 (28,"{8, 7, 1}","{f,f,f}",0.002867,0.002653)
 (29,"{1, 12, 1}","{f,f,f}",0.002867,0.002615)
 (30,"{3, 8, 1}","{f,f,f}",0.002867,0.002683)
 (31,"{2, 7, 1}","{f,f,f}",0.002867,0.002646)
 (32,"{3, 17, 1}","{f,f,f}",0.002833,0.002531)
 (33,"{2, 8, 1}","{f,f,f}",0.002833,0.002701)
 (34,"{7, 7, 1}","{f,f,f}",0.002833,0.002619)
 (35,"{1, 5, 1}","{f,f,f}",0.002833,0.002606)
 (36,"{4, 2, 1}","{f,f,f}",0.002833,0.002594)
 (37,"{9, 14, 1}","{f,f,f}",0.002833,0.002626)
 (38,"{7, 19, 1}","{f,f,f}",0.002833,0.002598)
 (39,"{1, 13, 1}","{f,f,f}",0.0028,0.00263)
 (40,"{4, 8, 1}","{f,f,f}",0.0028,0.002592)
 (41,"{5, 3, 1}","{f,f,f}",0.0028,0.002555)
 (42,"{4, 13, 1}","{f,f,f}",0.0028,0.002539)
 (43,"{5, 2, 1}","{f,f,f}",0.0028,0.002726)
 (44,"{9, 15, 1}","{f,f,f}",0.0028,0.002487)
 (45,"{8, 10, 1}","{f,f,f}",0.0028,0.002597)
 (46,"{5, 4, 1}","{f,f,f}",0.0028,0.002591)
 (47,"{2, 15, 1}","{f,f,f}",0.002767,0.002517)
 (48,"{1, 18, 1}","{f,f,f}",0.002767,0.002646)
 (49,"{7, 8, 1}","{f,f,f}",0.002767,0.002674)
 (50,"{7, 5, 1}","{f,f,f}",0.002767,0.002595)
 (51,"{3, 18, 1}","{f,f,f}",0.002767,0.002643)
 (52,"{1, 7, 1}","{f,f,f}",0.002767,0.00263)
postgres=# \df *.*pg_dep*
                                 List of functions
   Schema   |         Name         | Result data type | Argument data types | Type 
------------+----------------------+------------------+---------------------+------
 pg_catalog | pg_dependencies_in   | pg_dependencies  | cstring             | func
 pg_catalog | pg_dependencies_out  | cstring          | pg_dependencies     | func
 pg_catalog | pg_dependencies_recv | pg_dependencies  | internal            | func
 pg_catalog | pg_dependencies_send | bytea            | pg_dependencies     | func
(4 rows)

postgres=# select pg_dependencies_out(stxdependencies) from pg_statistic_ext ;
 pg_dependencies_out 
---------------------
 
(1 row)

参考

https://www.postgresql.org/docs/devel/sql-createstatistics.html

《PostgreSQL 10 黑科技 - 自定义统计信息》

《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《PostgreSQL 统计信息(dbms_stats)导出,导入,锁定,替换》

《PostgreSQL 快速返回表上某列的唯一值(枚举值) - pg_stats.most_common_vals》

《PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle》

《PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读》

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7300a699502fe5432b05fbc75baca534b080bebb

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/statistics/README.mcv;h=c18878f5d2b1966ffe0a73ac1ca2312f7e5aa456;hb=7300a699502fe5432b05fbc75baca534b080bebb

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat