Skip to content

Latest commit

 

History

History
290 lines (165 loc) · 13 KB

20190331_14.md

File metadata and controls

290 lines (165 loc) · 13 KB

PostgreSQL 12 preview - query rewrite API 增强 - Allow extensions to generate lossy index conditions - SupportRequestIndexCondition

作者

digoal

日期

2019-03-31

标签

PostgreSQL , query rewrite , API , extension , lossy index 重写 , 放大范围 , 改写where条件走索引 , lossy 化 , recheck , filter


背景

在数据库中QUERY REWRITE是一种常用的SQL优化方法。

我们看一个QUERY REWRITE的例子。

   1 /*-------------------------------------------------------------------------  
   2  *  
   3  * like_support.c  
   4  *    Planner support functions for LIKE, regex, and related operators.  
   5  *  
   6  * These routines handle special optimization of operators that can be  
   7  * used with index scans even though they are not known to the executor's  
   8  * indexscan machinery.  The key idea is that these operators allow us  
   9  * to derive approximate indexscan qual clauses, such that any tuples  
  10  * that pass the operator clause itself must also satisfy the simpler  
  11  * indexscan condition(s).  Then we can use the indexscan machinery  
  12  * to avoid scanning as much of the table as we'd otherwise have to,  
  13  * while applying the original operator as a qpqual condition to ensure  
  14  * we deliver only the tuples we want.  (In essence, we're using a regular  
  15  * index as if it were a lossy index.)  
  16  *  
  17  * An example of what we're doing is  
  18  *          textfield LIKE 'abc%def'  
  19  * from which we can generate the indexscanable conditions  
  20  *          textfield >= 'abc' AND textfield < 'abd'  
  21  * which allow efficient scanning of an index on textfield.  
  22  * (In reality, character set and collation issues make the transformation  
  23  * from LIKE to indexscan limits rather harder than one might think ...  
  24  * but that's the basic idea.)  
  25  *  
  26  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group  
  27  * Portions Copyright (c) 1994, Regents of the University of California  
  28  *  
  29  *  
  30  * IDENTIFICATION  
  31  *    src/backend/utils/adt/like_support.c  
  32  *  
  33  *-------------------------------------------------------------------------  
  34  */  

例如,like可以改写成范围查询,走索引。如下:

postgres=# explain select * from t1 where info like 'abc%def';  
                               QUERY PLAN                                  
-------------------------------------------------------------------------  
 Index Scan using idx_info_t1_1 on t1  (cost=0.15..6.89 rows=1 width=36)  
   Index Cond: ((info >= 'abc'::text) AND (info < 'abd'::text))  
   改写成 ((info >= 'abc'::text) AND (info < 'abd'::text)),使得这个LIKE可以走索引。  
   Filter: (info ~~ 'abc%def'::text)  
   Filter重新检查,保证原查询条件的正确性。  
(3 rows)  

实际上~~就是like操作符,这个操作符支持了SupportRequestIndexCondition。

postgres=# \do+ ~~  
                                            List of operators  
   Schema   | Name | Left arg type | Right arg type | Result type |  Function  |       Description         
------------+------+---------------+----------------+-------------+------------+-------------------------  
 pg_catalog | ~~   | bytea         | bytea          | boolean     | bytealike  | matches LIKE expression  
 pg_catalog | ~~   | character     | text           | boolean     | bpcharlike | matches LIKE expression  
 pg_catalog | ~~   | name          | text           | boolean     | namelike   | matches LIKE expression  
 pg_catalog | ~~   | text          | text           | boolean     | textlike   | matches LIKE expression  
(4 rows)  
  
postgres=# \df+ textlike  
                                                                                        List of functions  
   Schema   |   Name   | Result data type | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language | Source code |          Description            
------------+----------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+-------------------------------  
 pg_catalog | textlike | boolean          | text, text          | func | immutable  | safe     | postgres | invoker  |                   | internal | textlike    | implementation of ~~ operator  
(1 row)  

以前这是一个内置的功能,PostgreSQL 12将这个功能API化,当用户创建的函数支持了SupportRequestIndexCondition时,可以将这个函数调用进行转换,转换为支持索引扫描的QUERY(重写where条件),同时在index 中增加Filter,check每一条返回的记录,确保返回的记录符合原来的where条件。

实际上这个功能是lossy化,也就是说只要能用上索引,放宽一下查询条件也可以,在每条返回的记录上,再增加recheck(Filter)来保证返回记录符合原来的where条件。

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74dfe58a5927b22c744b29534e67bfdd203ac028

Allow extensions to generate lossy index conditions.  
  
For a long time, indxpath.c has had the ability to extract derived (lossy)  
index conditions from certain operators such as LIKE.  For just as long,  
it's been obvious that we really ought to make that capability available  
to extensions.  This commit finally accomplishes that, by adding another  
API for planner support functions that lets them create derived index  
conditions for their functions.  As proof of concept, the hardwired  
"special index operator" code formerly present in indxpath.c is pushed  
out to planner support functions attached to LIKE and other relevant  
operators.  
  
A weak spot in this design is that an extension needs to know OIDs for  
the operators, datatypes, and opfamilies involved in the transformation  
it wants to make.  The core-code prototypes use hard-wired OID references  
but extensions don't have that option for their own operators etc.  It's  
usually possible to look up the required info, but that may be slow and  
inconvenient.  However, improving that situation is a separate task.  
  
I want to do some additional refactorization around selfuncs.c, but  
that also seems like a separate task.  
  
Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us  
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml  
index b486ef3..3403269 100644 (file)  
--- a/doc/src/sgml/xfunc.sgml  
+++ b/doc/src/sgml/xfunc.sgml  
@@ -3460,4 +3460,18 @@ supportfn(internal) returns internal  
     This can be done by a support function that implements  
     the <literal>SupportRequestRows</literal> request type.  
    </para>  
+  
+   <para>  
+    For target functions that return boolean, it may be possible to  
+    convert a function call appearing in WHERE into an indexable operator  
+    clause or clauses.  The converted clauses might be exactly equivalent  
+    to the function's condition, or they could be somewhat weaker (that is,  
+    they might accept some values that the function condition does not).  
+    In the latter case the index condition is said to  
+    be <firstterm>lossy</firstterm>; it can still be used to scan an index,  
+    but the function call will have to be executed for each row returned by  
+    the index to see if it really passes the WHERE condition or not.  
+    To create such conditions, the support function must implement  
+    the <literal>SupportRequestIndexCondition</literal> request type.  
+   </para>  
   </sect1>  

小结

query rewrite API 增强 - Allow extensions to generate lossy index conditions - SupportRequestIndexCondition ,实际上就是让SQL改写成可以走索引(lossy化,通常是放大WHERE 条件,同时采用二次check来过滤放大的记录。)

同样具有lossy思想的还包括:

1、bloom index,放大搜索范围,再进行二次check。

2、bitmap scan,放大搜索范围,再进行二次check。

使用索引扫描降低了IO开销,但是二次check(Filter)需要一定的CPU开销,总体成本还是下降的。

如果要同时消除IO与check,可以使用精准索引(或者partial index)。

源码请参考

src/include/nodes/supportnodes.h:typedef struct SupportRequestIndexCondition  
src/include/nodes/supportnodes.h:} SupportRequestIndexCondition;  
src/include/nodes/nodes.h:      T_SupportRequestIndexCondition  /* in nodes/supportnodes.h */  
src/backend/optimizer/path/indxpath.c:  SupportRequestIndexCondition req;  
src/backend/optimizer/path/indxpath.c:  req.type = T_SupportRequestIndexCondition;  
src/backend/utils/adt/like_support.c:   else if (IsA(rawreq, SupportRequestIndexCondition))  
src/backend/utils/adt/like_support.c:           SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq;  
src/backend/utils/adt/network.c:        if (IsA(rawreq, SupportRequestIndexCondition))  
src/backend/utils/adt/network.c:                SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq;  

参考

https://www.postgresql.org/docs/devel/xfunc-optimization.html

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74dfe58a5927b22c744b29534e67bfdd203ac028

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/like_support.c;h=b001dde5fc7d73b24853296ebc27e20044fae2c7;hb=74dfe58a5927b22c744b29534e67bfdd203ac028

《PostgreSQL bitmap scan的IO放大的原理解释和优化》

《Recheck Cond filter IO\CPU放大 原理与优化CASE - 含 超级大表 不包含(反选) SQL优化》

《PostgreSQL multipolygon 空间索引查询过滤精简优化 - IO,CPU放大优化》

《索引顺序扫描引发的堆扫描IO放大背后的统计学原理与解决办法 - PostgreSQL index scan enlarge heap page scans when index and column correlation small.》

《PostgreSQL bloom filter index 扩展 for bigint》

《PostgreSQL 11 preview - BRIN索引接口功能扩展(BLOOM FILTER、min max分段)》

《PostgreSQL 9.6 黑科技 bloom 算法索引,一个索引支撑任意列组合查询》

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat