Skip to content

Latest commit

 

History

History
96 lines (67 loc) · 4.57 KB

20230331_01.md

File metadata and controls

96 lines (67 loc) · 4.57 KB

PostgreSQL 16 preview - support Parallel Hash Full Join

作者

digoal

日期

2023-03-31

标签

PostgreSQL , PolarDB , Parallel Hash Full Join


背景

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=11c2d6fdf5af1aacec9ca2005543f1b0fc4cc364

之前不支持parallel full outer join的原因是有死锁风险, 16增加了PHJ phase PHJ_BATCH_SCAN处理死锁问题.

Parallel Hash Full Join.  
author	Thomas Munro <tmunro@postgresql.org>	  
Thu, 30 Mar 2023 22:01:51 +0000 (11:01 +1300)  
committer	Thomas Munro <tmunro@postgresql.org>	  
Thu, 30 Mar 2023 22:34:03 +0000 (11:34 +1300)  
commit	11c2d6fdf5af1aacec9ca2005543f1b0fc4cc364  
tree	24f7dcba5bd58fbf207e8e58b3f97291b2a873b4	tree  
parent	ca7b3c4c00042038ba9c282c4807e05c0a527e42	commit | diff  
Parallel Hash Full Join.  
  
Full and right outer joins were not supported in the initial  
implementation of Parallel Hash Join because of deadlock hazards (see  
discussion).  Therefore FULL JOIN inhibited parallelism, as the other  
join strategies can't do that in parallel either.  
  
Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on  
the inner side of one batch's hash table.  For now, sidestep the  
deadlock problem by terminating parallelism there.  The last process to  
arrive at that phase emits the unmatched tuples, while others detach and  
are free to go and work on other batches, if there are any, but  
otherwise they finish the join early.  
  
That unfairness is considered acceptable for now, because it's better  
than no parallelism at all.  The build and probe phases are run in  
parallel, and the new scan-for-unmatched phase, while serial, is usually  
applied to the smaller of the two relations and is either limited by  
some multiple of work_mem, or it's too big and is partitioned into  
batches and then the situation is improved by batch-level parallelism.  
  
Author: Melanie Plageman <melanieplageman@gmail.com>  
Author: Thomas Munro <thomas.munro@gmail.com>  
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>  
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com  
+set local max_parallel_workers_per_gather = 2;  
+explain (costs off)  
+     select  count(*) from simple r full outer join simple s using (id);  
+                         QUERY PLAN                            
+-------------------------------------------------------------  
+ Finalize Aggregate  
+   ->  Gather  
+         Workers Planned: 2  
+         ->  Partial Aggregate  
+               ->  Parallel Hash Full Join  
+                     Hash Cond: (r.id = s.id)  
+                     ->  Parallel Seq Scan on simple r  
+                     ->  Parallel Hash  
+                           ->  Parallel Seq Scan on simple s  
+(9 rows)  

digoal's wechat