digoal
2019-03-30
PostgreSQL , mcv , 自定义统计信息
数据库为了得到一条SQL的最优执行计划,通常需要通过优化器来解决。
PostgreSQL 支持很多种优化器,例如cbo, geqo, aqo等。
《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》
以CBO为例,基于成本的优化。成本计算的好与坏直接决定了执行计划的好坏。成本计算相关因素:公式本身,因子,以及统计信息的准确性。
统计信息又包括:相关性,唯一值个数,高频值,高频值比例,柱状图等等。
postgres=# \d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Collation | Nullable | Default
------------------------+----------+-----------+----------+---------
schemaname | name | | |
tablename | name | | |
attname | name | | |
inherited | boolean | | |
null_frac | real | | |
avg_width | integer | | |
n_distinct | real | | |
most_common_vals | anyarray | | |
most_common_freqs | real[] | | |
histogram_bounds | anyarray | | |
correlation | real | | |
most_common_elems | anyarray | | |
most_common_elem_freqs | real[] | | |
elem_count_histogram | real[] | | |
默认情况下PG只统计每一列的统计信息,对于复杂条件(多个字段的AND OR或者包括字段、表达式的AND OR组合条件,根据统计信息得到的选择性的评估可能不准确,导致SQL优化得到的执行计划不准确),为了提高复杂条件的SQL优化准确性,PG 10开始支持自定义统计信息。
用户可以根据SQL的条件,自己定义与条件相符的多列组合统计信息。
PG 10的版本支持的较为简陋,自定义统计信息仅包括多少个唯一值、相关性。
PG 12,将支持多字段组合高频词统计等内容,提高评估准确性。
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
postgres=# \d pg_statistic_ext
Table "pg_catalog.pg_statistic_ext"
Column | Type | Collation | Nullable | Default
-----------------+-----------------+-----------+----------+---------
oid | oid | | not null |
stxrelid | oid | | not null |
stxname | name | | not null |
stxnamespace | oid | | not null |
stxowner | oid | | not null |
stxkeys | int2vector | | not null |
stxkind | "char"[] | | not null |
stxndistinct | pg_ndistinct | C | |
stxdependencies | pg_dependencies | C | |
stxmcv | pg_mcv_list | C | |
Indexes:
"pg_statistic_ext_name_index" UNIQUE, btree (stxname, stxnamespace)
"pg_statistic_ext_oid_index" UNIQUE, btree (oid)
"pg_statistic_ext_relid_index" btree (stxrelid)
postgres=# \h create stati
Command: CREATE STATISTICS
Description: define extended statistics
Syntax:
CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
[ ( statistics_kind [, ... ] ) ]
ON column_name, column_name [, ...]
FROM table_name
URL: https://www.postgresql.org/docs/devel/sql-createstatistics.html
postgres=# create table abc(c1 int, c2 int, c3 int);
CREATE TABLE
postgres=# insert into abc select random()*10, random()*20, random()*30 from generate_series(1,100000);
INSERT 0 100000
postgres=# insert into abc select random()*10, random()*20, 1 from generate_series(1,100000);
INSERT 0 100000
postgres=# create statistics st1 (ndistinct,dependencies,mcv) on c1,c2,c3 from abc;
CREATE STATISTICS
postgres=# vacuum analyze abc;
VACUUM
postgres=# select * from pg_statistic_ext ;
oid | stxrelid | stxname | stxnamespace | stxowner | stxkeys | stxkind | stxndistinct | stxdependencies | stxmcv
-------+----------+---------+--------------+----------+---------+---------+----------------------------------------------------------+-----------------+--------
16395 | 16391 | st1 | 2200 | 10 | 1 2 3 | {d,f,m} | {"1, 2": 231, "1, 3": 341, "2, 3": 651, "1, 2, 3": 6996} | |
(1 row)
postgres=# \x
Expanded display is on.
postgres=# select * from pg_statistic_ext ;
-[ RECORD 1 ]---+---------------------------------------------------------
oid | 16395
stxrelid | 16391
stxname | st1
stxnamespace | 2200
stxowner | 10
stxkeys | 1 2 3
stxkind | {d,f,m}
stxndistinct | {"1, 2": 231, "1, 3": 341, "2, 3": 651, "1, 2, 3": 6996}
stxdependencies |
stxmcv |
postgres=# select pg_mcv_list_items(stxmcv) from pg_statistic_ext ;
pg_mcv_list_items
-----------------------------------------------
(0,"{2, 13, 1}","{f,f,f}",0.0034,0.002646)
(1,"{6, 13, 1}","{f,f,f}",0.003367,0.002641)
(2,"{3, 7, 1}","{f,f,f}",0.003267,0.002628)
(3,"{2, 2, 1}","{f,f,f}",0.003167,0.002703)
(4,"{5, 19, 1}","{f,f,f}",0.003133,0.002647)
(5,"{8, 15, 1}","{f,f,f}",0.003133,0.002524)
(6,"{9, 12, 1}","{f,f,f}",0.0031,0.002598)
(7,"{8, 6, 1}","{f,f,f}",0.0031,0.002588)
(8,"{3, 9, 1}","{f,f,f}",0.003067,0.002641)
(9,"{3, 6, 1}","{f,f,f}",0.003067,0.002564)
(10,"{7, 4, 1}","{f,f,f}",0.003067,0.002543)
(11,"{6, 17, 1}","{f,f,f}",0.003033,0.002544)
(12,"{5, 6, 1}","{f,f,f}",0.003,0.002604)
(13,"{6, 2, 1}","{f,f,f}",0.003,0.002698)
(14,"{8, 8, 1}","{f,f,f}",0.003,0.002709)
(15,"{6, 4, 1}","{f,f,f}",0.002967,0.002564)
(16,"{2, 5, 1}","{f,f,f}",0.002967,0.002622)
(17,"{7, 9, 1}","{f,f,f}",0.002967,0.002633)
(18,"{3, 13, 1}","{f,f,f}",0.002967,0.002628)
(19,"{2, 14, 1}","{f,f,f}",0.002967,0.002658)
(20,"{5, 14, 1}","{f,f,f}",0.002933,0.002681)
(21,"{2, 3, 1}","{f,f,f}",0.002933,0.002533)
(22,"{5, 8, 1}","{f,f,f}",0.002933,0.002725)
(23,"{3, 3, 1}","{f,f,f}",0.0029,0.002515)
(24,"{7, 3, 1}","{f,f,f}",0.0029,0.002507)
(25,"{1, 2, 1}","{f,f,f}",0.0029,0.002687)
(26,"{2, 19, 1}","{f,f,f}",0.002867,0.002625)
(27,"{1, 16, 1}","{f,f,f}",0.002867,0.002589)
(28,"{8, 7, 1}","{f,f,f}",0.002867,0.002653)
(29,"{1, 12, 1}","{f,f,f}",0.002867,0.002615)
(30,"{3, 8, 1}","{f,f,f}",0.002867,0.002683)
(31,"{2, 7, 1}","{f,f,f}",0.002867,0.002646)
(32,"{3, 17, 1}","{f,f,f}",0.002833,0.002531)
(33,"{2, 8, 1}","{f,f,f}",0.002833,0.002701)
(34,"{7, 7, 1}","{f,f,f}",0.002833,0.002619)
(35,"{1, 5, 1}","{f,f,f}",0.002833,0.002606)
(36,"{4, 2, 1}","{f,f,f}",0.002833,0.002594)
(37,"{9, 14, 1}","{f,f,f}",0.002833,0.002626)
(38,"{7, 19, 1}","{f,f,f}",0.002833,0.002598)
(39,"{1, 13, 1}","{f,f,f}",0.0028,0.00263)
(40,"{4, 8, 1}","{f,f,f}",0.0028,0.002592)
(41,"{5, 3, 1}","{f,f,f}",0.0028,0.002555)
(42,"{4, 13, 1}","{f,f,f}",0.0028,0.002539)
(43,"{5, 2, 1}","{f,f,f}",0.0028,0.002726)
(44,"{9, 15, 1}","{f,f,f}",0.0028,0.002487)
(45,"{8, 10, 1}","{f,f,f}",0.0028,0.002597)
(46,"{5, 4, 1}","{f,f,f}",0.0028,0.002591)
(47,"{2, 15, 1}","{f,f,f}",0.002767,0.002517)
(48,"{1, 18, 1}","{f,f,f}",0.002767,0.002646)
(49,"{7, 8, 1}","{f,f,f}",0.002767,0.002674)
(50,"{7, 5, 1}","{f,f,f}",0.002767,0.002595)
(51,"{3, 18, 1}","{f,f,f}",0.002767,0.002643)
(52,"{1, 7, 1}","{f,f,f}",0.002767,0.00263)
postgres=# \df *.*pg_dep*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------------+------------------+---------------------+------
pg_catalog | pg_dependencies_in | pg_dependencies | cstring | func
pg_catalog | pg_dependencies_out | cstring | pg_dependencies | func
pg_catalog | pg_dependencies_recv | pg_dependencies | internal | func
pg_catalog | pg_dependencies_send | bytea | pg_dependencies | func
(4 rows)
postgres=# select pg_dependencies_out(stxdependencies) from pg_statistic_ext ;
pg_dependencies_out
---------------------
(1 row)
https://www.postgresql.org/docs/devel/sql-createstatistics.html
《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》
《PostgreSQL 统计信息(dbms_stats)导出,导入,锁定,替换》
《PostgreSQL 快速返回表上某列的唯一值(枚举值) - pg_stats.most_common_vals》
《PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle》
《PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读》
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.