Skip to content

Latest commit

 

History

History
131 lines (92 loc) · 6.64 KB

20220808_04.md

File metadata and controls

131 lines (92 loc) · 6.64 KB

PostgreSQL 16 devel preview - 分区缓存, 提升批量写入特定分区的场景性能

作者

digoal

日期

2022-08-08

标签

PostgreSQL , 分区表 , 缓存 , 写入优化 , list , range , 16 , PARTITION_CACHED_FIND_THRESHOLD


背景

PostgreSQL 16, 分区查找优化:

  • 在同一个会话里面, 插入某个分区表, 如果连续16条记录的数据都写入到了某一个分区, 那么会把这个分区number缓存入会话的rel cache中. 接下来写入这个分区表时, 根据记录的值直接去判断是否属于缓存的分区number, 如果是, 就直接写入这个分区, 如果不是再走原有的分区查找流程(二分法查找).

这个优化方法适合在同一个会话中, 有连续写入分区表的场景, 而且连续写入的属于同一分区的情况.

例如:

  • 按list分区的SaaS类业务、IOT业务数据. 某个会话写入的就是某个传感器的批量数据(每个传感器必须是连续的一批进来, 不能错开)、某些客户(saas的客户ID区分)的批量数据(某个APPID的数据必须连续进来, 不能错开).
  • 按range分区的时序类数据. 例如按天按月等分区, 写入的数据基本上都是连续进入某一个特定分区.

这个优化方法减少了二分法查找分区的开销, 提升分区表的写入性能.

不适用的场景:

  • default分区不缓存
  • list null分区值不缓存
  • hash分区不加速(可能由于hash查找本身就比较快?)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3592e0ff98b130c428c1d8091fc61767f7608732

+/*  
+ * The number of times the same partition must be found in a row before we  
+ * switch from a binary search for the given values to just checking if the  
+ * values belong to the last found partition.  This must be above 0.  
+ */  
+#define PARTITION_CACHED_FIND_THRESHOLD            16  
Have ExecFindPartition cache the last found partition  
author	David Rowley <drowley@postgresql.org>	  
Mon, 1 Aug 2022 21:55:27 +0000 (09:55 +1200)  
committer	David Rowley <drowley@postgresql.org>	  
Mon, 1 Aug 2022 21:55:27 +0000 (09:55 +1200)  
commit	3592e0ff98b130c428c1d8091fc61767f7608732  
tree	ac997f9570f773793a25062559cfee98ac0a6528	tree  
parent	83f1793d6096deb419e16cc38201484a6f6e9f48	commit | diff  
Have ExecFindPartition cache the last found partition  
  
Here we add code which detects when ExecFindPartition() continually finds  
the same partition and add a caching layer to improve partition lookup  
performance for such cases.  
  
Both RANGE and LIST partitioned tables traditionally require a binary  
search for the set of Datums that a partition needs to be found for. This  
binary search is commonly visible in profiles when bulk loading into a  
partitioned table.  Here we aim to reduce the overhead of bulk-loading  
into partitioned tables for cases where many consecutive tuples belong to  
the same partition and make the performance of this operation closer to  
what it is with a traditional non-partitioned table.  
  
When we find the same partition 16 times in a row, the next search will  
result in us simply just checking if the current set of values belongs to  
the last found partition.  For LIST partitioning we record the index into  
the PartitionBoundInfo's datum array.  This allows us to check if the  
current Datum is the same as the Datum that was last looked up.  This  
means if any given LIST partition supports storing multiple different  
Datum values, then the caching only works when we find the same value as  
we did the last time.  For RANGE partitioning we simply check if the given  
Datums are in the same range as the previously found partition.  
  
We store the details of the cached partition in PartitionDesc (i.e.  
relcache) so that the cached values are maintained over multiple  
statements.  
  
No caching is done for HASH partitions.  The majority of the cost in HASH  
partition lookups are in the hashing function(s), which would also have to  
be executed if we were to try to do caching for HASH partitioned tables.  
Since most of the cost is already incurred, we just don't bother.  We also  
don't do any caching for LIST partitions when we continually find the  
values being looked up belong to the DEFAULT partition.  We've no  
corresponding index in the PartitionBoundInfo's datum array for this case.  
We also don't cache when we find the given values match to a LIST  
partitioned table's NULL partition.  This is so cheap that there's no  
point in doing any caching for this.  We also don't cache for a RANGE  
partitioned table's DEFAULT partition.  
  
There have been a number of different patches submitted to improve  
partition lookups. Hou, Zhijie submitted a patch to detect when the value  
belonging to the partition key column(s) were constant and added code to  
cache the partition in that case.  Amit Langote then implemented an idea  
suggested by me to remember the last found partition and start to check if  
the current values work for that partition.  The final patch here was  
written by me and was done by taking many of the ideas I liked from the  
patches in the thread and redesigning other aspects.  
  
Discussion: https://postgr.es/m/OS0PR01MB571649B27E912EA6CC4EEF03942D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com  
Author: Amit Langote, Hou Zhijie, David Rowley  
Reviewed-by: Amit Langote, Hou Zhijie  

digoal's wechat