digoal
2024-05-10
CBO(基于代价的优化)是目前数据库优化器用来选择最优执行路径的常用方法.
方法是有了, 但是要得到正确(最优)的执行路径, 取决于代价计算的准确性.
例如一条SQL既能使用索引扫描, 又能使用位图扫描, 还能使用全表扫描. 这三种扫描方法的代价是怎么估算的? 公式.
- 公式设计影响了代价的计算结果,
- 公式里面用到的统计信息准不准(决定了选择性),
- 公式里面的系数(成本因子, 如离散扫描代价, 顺序扫描代价等),
# - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above #parallel_setup_cost = 1000.0 # same scale as above #parallel_tuple_cost = 0.1 # same scale as above #min_parallel_table_scan_size = 8MB #min_parallel_index_scan_size = 512kB #effective_cache_size = 4GB #jit_above_cost = 100000 # perform JIT compilation if available # and query more expensive than this; # -1 disables #jit_inline_above_cost = 500000 # inline small functions if query is # more expensive than this; -1 disables #jit_optimize_above_cost = 500000 # use expensive JIT optimizations if # query is more expensive than this; # -1 disables
- 公式里面的常数(如
hash join
的死成本等).
每个操作都会涉及到代价的计算, 例如扫描有代价, 计算有代价, 索引有选择代价, 关联有代价等等. 每个操作的代价计算都有不同的公式. 公式可以简单的表达为 cost = startup cost + func(成本因子 * 统计信息->选择性, 常数)
代价计算公式是代码写死的, 记录选择性计算公式是写死的, 成本常数是代码写死的. 如果要影响执行计划的选择, 用户能干什么?
- 1 让统计信息更准确, 通常能做的也比较少, 可以修改
default_statistics_target
增加柱状图bucket个数. - 2 让代价因子更准确(严格来说让这些因子更适合你的硬件环境, 例如机械盘的random_page_cost比seq_page_cost大很多, 而ssd的random_page_cost比seq_page_cost差不多)
- 3 直接给HINT, 例如使用pg_hint_plan插件
- 4 修改GUC开关, 禁止某些路径, 例如
enable_hashjoin=off enable_indexscan=off ...
本文主要讨论2, 之前写过一些文章来教大家校对因子.
- 《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》
- 《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》
- 《DB吐槽大会,第12期 - 没有自动成本校准器》
但是我这一期想吐槽的是PG没有内置的profiling功能, 才导致了因子校对比较麻烦. 槽点:
- 1、explain 不能打印出所有可能的执行计划, 以及对应的代价、估算的行数等.
- 2、没有打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等
- 3、explain analyze后, 对于有多个执行路径时, 如果真实情况和代价估算有出入, 导致选择了错误的执行计划时. 不能给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确.
我希望PG能做到这些, 对于优化者体验会好很多.
在explain时有开关来控制, 增加以下输出:
- 1、打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等
- 2、打印不同可选路径(例如每一种join)的: 代价, 选择性, 和实际耗时
- 3、分析代价和实际耗时的匹配度, 给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确. 给出建议: 加大
default_statistics_target
? 修改xxx代价因子到多少?
一些代价计算相关的代码:
https://doxygen.postgresql.org/structJoinCostWorkspace.html
Data Fields
Cost startup_cost
Cost total_cost
Cost run_cost
Cost inner_run_cost
Cost inner_rescan_run_cost
Cardinality outer_rows
Cardinality inner_rows
Cardinality outer_skip_rows
Cardinality inner_skip_rows
int numbuckets
int numbatches
Cardinality inner_rows_total
https://doxygen.postgresql.org/costsize_8c_source.html
/*
* initial_cost_hashjoin
* Preliminary estimate of the cost of a hashjoin path.
*
* This must quickly produce lower-bound estimates of the path's startup and
* total costs. If we are unable to eliminate the proposed path from
* consideration using the lower bounds, final_cost_hashjoin will be called
* to obtain the final estimates.
*
* The exact division of labor between this function and final_cost_hashjoin
* is private to them, and represents a tradeoff between speed of the initial
* estimate and getting a tight lower bound. We choose to not examine the
* join quals here (other than by counting the number of hash clauses),
* so we can't do much with CPU costs. We do assume that
* ExecChooseHashTableSize is cheap enough to use here.
*
* 'workspace' is to be filled with startup_cost, total_cost, and perhaps
* other data to be used by final_cost_hashjoin
* 'jointype' is the type of join to be performed
* 'hashclauses' is the list of joinclauses to be used as hash clauses
* 'outer_path' is the outer input to the join
* 'inner_path' is the inner input to the join
* 'extra' contains miscellaneous information about the join
* 'parallel_hash' indicates that inner_path is partial and that a shared
* hash table will be built in parallel
*/
void
initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,
JoinType jointype,
List *hashclauses,
Path *outer_path, Path *inner_path,
JoinPathExtraData *extra,
bool parallel_hash)
{
Cost startup_cost = 0;
Cost run_cost = 0;
double outer_path_rows = outer_path->rows;
double inner_path_rows = inner_path->rows;
double inner_path_rows_total = inner_path_rows;
int num_hashclauses = list_length(hashclauses);
int numbuckets;
int numbatches;
int num_skew_mcvs;
size_t space_allowed; /* unused */
/* cost of source data */
startup_cost += outer_path->startup_cost;
run_cost += outer_path->total_cost - outer_path->startup_cost;
startup_cost += inner_path->total_cost;
/*
* Cost of computing hash function: must do it once per input tuple. We
* charge one cpu_operator_cost for each column's hash function. Also,
* tack on one cpu_tuple_cost per inner row, to model the costs of
* inserting the row into the hashtable.
*
* XXX when a hashclause is more complex than a single operator, we really
* should charge the extra eval costs of the left or right side, as
* appropriate, here. This seems more work than it's worth at the moment.
*/
startup_cost += (cpu_operator_cost * num_hashclauses + cpu_tuple_cost)
* inner_path_rows;
run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;
/*
* If this is a parallel hash build, then the value we have for
* inner_rows_total currently refers only to the rows returned by each
* participant. For shared hash table size estimation, we need the total
* number, so we need to undo the division.
*/
if (parallel_hash)
inner_path_rows_total *= get_parallel_divisor(inner_path);
/*
* Get hash table size that executor would use for inner relation.
*
* XXX for the moment, always assume that skew optimization will be
* performed. As long as SKEW_HASH_MEM_PERCENT is small, it's not worth
* trying to determine that for sure.
*
* XXX at some point it might be interesting to try to account for skew
* optimization in the cost estimate, but for now, we don't.
*/
ExecChooseHashTableSize(inner_path_rows_total,
inner_path->pathtarget->width,
true, /* useskew */
parallel_hash, /* try_combined_hash_mem */
outer_path->parallel_workers,
&space_allowed,
&numbuckets,
&numbatches,
&num_skew_mcvs);
/*
* If inner relation is too big then we will need to "batch" the join,
* which implies writing and reading most of the tuples to disk an extra
* time. Charge seq_page_cost per page, since the I/O should be nice and
* sequential. Writing the inner rel counts as startup cost, all the rest
* as run cost.
*/
if (numbatches > 1)
{
double outerpages = page_size(outer_path_rows,
outer_path->pathtarget->width);
double innerpages = page_size(inner_path_rows,
inner_path->pathtarget->width);
startup_cost += seq_page_cost * innerpages;
run_cost += seq_page_cost * (innerpages + 2 * outerpages);
}
/* CPU costs left for later */
/* Public result fields */
workspace->startup_cost = startup_cost;
workspace->total_cost = startup_cost + run_cost;
/* Save private data for final_cost_hashjoin */
workspace->run_cost = run_cost;
workspace->numbuckets = numbuckets;
workspace->numbatches = numbatches;
workspace->inner_rows_total = inner_path_rows_total;
}