Skip to content

Latest commit

 

History

History
81 lines (52 loc) · 3.79 KB

20221014_01.md

File metadata and controls

81 lines (52 loc) · 3.79 KB

PostgreSQL 16 preview - COPY into foreign table 加速 - batch insert mode

作者

digoal

日期

2022-10-14

标签

PostgreSQL , foreign table , copy into , batch


背景

这个特性主要用于提升copy into到外部表的速率. 实际是duckdb在postgres scanner中也用到了copy协议, 原因就是比insert 高效. 甚至用binary来绕过数据类型的in out转换再次提速.

《duckdb postgres_scan 插件 - 不落地数据, 加速PostgreSQL数据分析》

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=97da48246d34807196b404626f019c767b7af0df

Allow batch insertion during COPY into a foreign table.  
author	Etsuro Fujita <efujita@postgresql.org>	  
Thu, 13 Oct 2022 09:45:00 +0000 (18:45 +0900)  
committer	Etsuro Fujita <efujita@postgresql.org>	  
Thu, 13 Oct 2022 09:45:00 +0000 (18:45 +0900)  
commit	97da48246d34807196b404626f019c767b7af0df  
tree	dd31454c1d3efeac856866aca66e5f92fd1904e4	tree  
parent	56c19fee2db43f236eb8f4cbef5235701873fceb	commit | diff  
Allow batch insertion during COPY into a foreign table.  
  
Commit 3d956d956 allowed the COPY, but it's done by inserting individual  
rows to the foreign table, so it can be inefficient due to the overhead  
caused by each round-trip to the foreign server.  To improve performance  
of the COPY in such a case, this patch allows batch insertion, by  
extending the multi-insert machinery in CopyFrom() to the foreign-table  
case so that we insert multiple rows to the foreign table at once using  
the FDW callback routine added by commit b663a4136.  This patch also  
allows this for postgres_fdw.  It is enabled by the "batch_size" option  
added by commit b663a4136, which is disabled by default.  
  
When doing batch insertion, we update progress of the COPY command after  
performing the FDW callback routine, to count rows not suppressed by the  
FDW as well as a BEFORE ROW INSERT trigger.  For consistency, this patch  
changes the timing of updating it for plain tables: previously, we  
updated it immediately after adding each row to the multi-insert buffer,  
but we do so only after writing the rows stored in the buffer out to the  
table using table_multi_insert(), which I think would be consistent even  
with non-batching mode, because in that mode we update it after writing  
each row out to the table using table_tuple_insert().  
  
Andrey Lepikhov, heavily revised by me, with review from Ian Barwick,  
Andrey Lepikhov, and Zhihong Yu.  
  
Discussion: https://postgr.es/m/bc489202-9855-7550-d64c-ad2d83c24867%40postgrespro.ru  

digoal's wechat