Skip to content

Latest commit

 

History

History
240 lines (199 loc) · 12.1 KB

20240511_01.md

File metadata and controls

240 lines (199 loc) · 12.1 KB

DB吐槽大会,第96期 - PG 优化器explain debug及代价计算|选择性|优化路径等优化建议能力太弱

作者

digoal

日期

2024-05-10

标签

PostgreSQL , PolarDB , DuckDB , explain , node , 路径 , 可能性 , 成本因子 , 选择性 , 统计信息 , 成本公式 , 成本常数

背景

视频回放

CBO(基于代价的优化)是目前数据库优化器用来选择最优执行路径的常用方法.

方法是有了, 但是要得到正确(最优)的执行路径, 取决于代价计算的准确性.

例如一条SQL既能使用索引扫描, 又能使用位图扫描, 还能使用全表扫描. 这三种扫描方法的代价是怎么估算的? 公式.

  • 公式设计影响了代价的计算结果,
    • 公式里面用到的统计信息准不准(决定了选择性),
    • 公式里面的系数(成本因子, 如离散扫描代价, 顺序扫描代价等),
    # - Planner Cost Constants -  
    #seq_page_cost = 1.0                    # measured on an arbitrary scale  
    #random_page_cost = 4.0                 # same scale as above  
    #cpu_tuple_cost = 0.01                  # same scale as above  
    #cpu_index_tuple_cost = 0.005           # same scale as above  
    #cpu_operator_cost = 0.0025             # same scale as above  
    #parallel_setup_cost = 1000.0   # same scale as above  
    #parallel_tuple_cost = 0.1              # same scale as above  
    #min_parallel_table_scan_size = 8MB  
    #min_parallel_index_scan_size = 512kB  
    #effective_cache_size = 4GB  
    #jit_above_cost = 100000                # perform JIT compilation if available  
                                            # and query more expensive than this;  
                                            # -1 disables  
    #jit_inline_above_cost = 500000         # inline small functions if query is  
                                            # more expensive than this; -1 disables  
    #jit_optimize_above_cost = 500000       # use expensive JIT optimizations if  
                                            # query is more expensive than this;  
                                            # -1 disables  
    
    • 公式里面的常数(如hash join的死成本等).

每个操作都会涉及到代价的计算, 例如扫描有代价, 计算有代价, 索引有选择代价, 关联有代价等等. 每个操作的代价计算都有不同的公式. 公式可以简单的表达为 cost = startup cost + func(成本因子 * 统计信息->选择性, 常数)

代价计算公式是代码写死的, 记录选择性计算公式是写死的, 成本常数是代码写死的. 如果要影响执行计划的选择, 用户能干什么?

  • 1 让统计信息更准确, 通常能做的也比较少, 可以修改default_statistics_target增加柱状图bucket个数.
  • 2 让代价因子更准确(严格来说让这些因子更适合你的硬件环境, 例如机械盘的random_page_cost比seq_page_cost大很多, 而ssd的random_page_cost比seq_page_cost差不多)
  • 3 直接给HINT, 例如使用pg_hint_plan插件
  • 4 修改GUC开关, 禁止某些路径, 例如enable_hashjoin=off enable_indexscan=off ...

本文主要讨论2, 之前写过一些文章来教大家校对因子.

但是我这一期想吐槽的是PG没有内置的profiling功能, 才导致了因子校对比较麻烦. 槽点:

  • 1、explain 不能打印出所有可能的执行计划, 以及对应的代价、估算的行数等.
  • 2、没有打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等
  • 3、explain analyze后, 对于有多个执行路径时, 如果真实情况和代价估算有出入, 导致选择了错误的执行计划时. 不能给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确.

我希望PG能做到这些, 对于优化者体验会好很多.

在explain时有开关来控制, 增加以下输出:

  • 1、打印出每个NODE的: 代价计算公式, 成本因子等值, 选择性计算公式, 统计信息及选择性计算结果, 代码内置常量等
  • 2、打印不同可选路径(例如每一种join)的: 代价, 选择性, 和实际耗时
  • 3、分析代价和实际耗时的匹配度, 给出原因: 统计信息不准确导致选择性计算偏差、算子值不正确导致代价计算不准确. 给出建议: 加大 default_statistics_target ? 修改xxx代价因子到多少?

一些代价计算相关的代码:

https://doxygen.postgresql.org/structJoinCostWorkspace.html

Data Fields  
Cost 	startup_cost  
Cost 	total_cost  
Cost 	run_cost  
Cost 	inner_run_cost  
Cost 	inner_rescan_run_cost  
Cardinality 	outer_rows  
Cardinality 	inner_rows  
Cardinality 	outer_skip_rows  
Cardinality 	inner_skip_rows  
int 	numbuckets  
int 	numbatches  
Cardinality 	inner_rows_total  

https://doxygen.postgresql.org/costsize_8c_source.html

 /*  
  * initial_cost_hashjoin  
  *    Preliminary estimate of the cost of a hashjoin path.  
  *  
  * This must quickly produce lower-bound estimates of the path's startup and  
  * total costs.  If we are unable to eliminate the proposed path from  
  * consideration using the lower bounds, final_cost_hashjoin will be called  
  * to obtain the final estimates.  
  *  
  * The exact division of labor between this function and final_cost_hashjoin  
  * is private to them, and represents a tradeoff between speed of the initial  
  * estimate and getting a tight lower bound.  We choose to not examine the  
  * join quals here (other than by counting the number of hash clauses),  
  * so we can't do much with CPU costs.  We do assume that  
  * ExecChooseHashTableSize is cheap enough to use here.  
  *  
  * 'workspace' is to be filled with startup_cost, total_cost, and perhaps  
  *      other data to be used by final_cost_hashjoin  
  * 'jointype' is the type of join to be performed  
  * 'hashclauses' is the list of joinclauses to be used as hash clauses  
  * 'outer_path' is the outer input to the join  
  * 'inner_path' is the inner input to the join  
  * 'extra' contains miscellaneous information about the join  
  * 'parallel_hash' indicates that inner_path is partial and that a shared  
  *      hash table will be built in parallel  
  */  
 void  
 initial_cost_hashjoin(PlannerInfo *root, JoinCostWorkspace *workspace,  
                       JoinType jointype,  
                       List *hashclauses,  
                       Path *outer_path, Path *inner_path,  
                       JoinPathExtraData *extra,  
                       bool parallel_hash)  
 {  
     Cost        startup_cost = 0;  
     Cost        run_cost = 0;  
     double      outer_path_rows = outer_path->rows;  
     double      inner_path_rows = inner_path->rows;  
     double      inner_path_rows_total = inner_path_rows;  
     int         num_hashclauses = list_length(hashclauses);  
     int         numbuckets;  
     int         numbatches;  
     int         num_skew_mcvs;  
     size_t      space_allowed;  /* unused */  
    
     /* cost of source data */  
     startup_cost += outer_path->startup_cost;  
     run_cost += outer_path->total_cost - outer_path->startup_cost;  
     startup_cost += inner_path->total_cost;  
    
     /*  
      * Cost of computing hash function: must do it once per input tuple. We  
      * charge one cpu_operator_cost for each column's hash function.  Also,  
      * tack on one cpu_tuple_cost per inner row, to model the costs of  
      * inserting the row into the hashtable.  
      *  
      * XXX when a hashclause is more complex than a single operator, we really  
      * should charge the extra eval costs of the left or right side, as  
      * appropriate, here.  This seems more work than it's worth at the moment.  
      */  
     startup_cost += (cpu_operator_cost * num_hashclauses + cpu_tuple_cost)  
         * inner_path_rows;  
     run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;  
    
     /*  
      * If this is a parallel hash build, then the value we have for  
      * inner_rows_total currently refers only to the rows returned by each  
      * participant.  For shared hash table size estimation, we need the total  
      * number, so we need to undo the division.  
      */  
     if (parallel_hash)  
         inner_path_rows_total *= get_parallel_divisor(inner_path);  
    
     /*  
      * Get hash table size that executor would use for inner relation.  
      *  
      * XXX for the moment, always assume that skew optimization will be  
      * performed.  As long as SKEW_HASH_MEM_PERCENT is small, it's not worth  
      * trying to determine that for sure.  
      *  
      * XXX at some point it might be interesting to try to account for skew  
      * optimization in the cost estimate, but for now, we don't.  
      */  
     ExecChooseHashTableSize(inner_path_rows_total,  
                             inner_path->pathtarget->width,  
                             true,   /* useskew */  
                             parallel_hash,  /* try_combined_hash_mem */  
                             outer_path->parallel_workers,  
                             &space_allowed,  
                             &numbuckets,  
                             &numbatches,  
                             &num_skew_mcvs);  
    
     /*  
      * If inner relation is too big then we will need to "batch" the join,  
      * which implies writing and reading most of the tuples to disk an extra  
      * time.  Charge seq_page_cost per page, since the I/O should be nice and  
      * sequential.  Writing the inner rel counts as startup cost, all the rest  
      * as run cost.  
      */  
     if (numbatches > 1)  
     {  
         double      outerpages = page_size(outer_path_rows,  
                                            outer_path->pathtarget->width);  
         double      innerpages = page_size(inner_path_rows,  
                                            inner_path->pathtarget->width);  
    
         startup_cost += seq_page_cost * innerpages;  
         run_cost += seq_page_cost * (innerpages + 2 * outerpages);  
     }  
    
     /* CPU costs left for later */  
    
     /* Public result fields */  
     workspace->startup_cost = startup_cost;  
     workspace->total_cost = startup_cost + run_cost;  
     /* Save private data for final_cost_hashjoin */  
     workspace->run_cost = run_cost;  
     workspace->numbuckets = numbuckets;  
     workspace->numbatches = numbatches;  
     workspace->inner_rows_total = inner_path_rows_total;  
 }  

digoal's wechat