Skip to content

Latest commit

 

History

History
144 lines (114 loc) · 6.15 KB

20240314_06.md

File metadata and controls

144 lines (114 loc) · 6.15 KB

PostgreSQL 17 preview - 新增 "builtin" collation provider

作者

digoal

日期

2024-03-14

标签

PostgreSQL , PolarDB , DuckDB , "builtin" collation provider


背景

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2d819a08a1cbc11364e36f816b02e33e8dcc030b

Introduce "builtin" collation provider.  
  
author	Jeff Davis <jdavis@postgresql.org>	  
Thu, 14 Mar 2024 06:33:44 +0000 (23:33 -0700)  
committer	Jeff Davis <jdavis@postgresql.org>	  
Thu, 14 Mar 2024 06:33:44 +0000 (23:33 -0700)  
commit	2d819a08a1cbc11364e36f816b02e33e8dcc030b  
tree	1a8d3b459866d7df936faffa0e64f5e339e6a6c2	tree  
parent	6ab2e8385d55e0b73bb8bbc41d9c286f5f7f357f	commit | diff  
Introduce "builtin" collation provider.  
  
New provider for collations, like "libc" or "icu", but without any  
external dependency.  
  
Initially, the only locale supported by the builtin provider is "C",  
which is identical to the libc provider's "C" locale. The libc  
provider's "C" locale has always been treated as a special case that  
uses an internal implementation, without using libc at all -- so the  
new builtin provider uses the same implementation.  
  
The builtin provider's locale is independent of the server environment  
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the  
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are  
set to "en_US", which is impossible with the libc provider.  
  
By offering a new builtin provider, it clarifies that the semantics of  
a collation using this provider will never depend on libc, and makes  
it easier to document the behavior.  
  
Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com  
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org  
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com  
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider  
+   <para>  
+    The available locale providers are listed below:  
+   </para>  
+  
+   <variablelist>  
+    <varlistentry>  
+     <term><literal>builtin</literal></term>  
+     <listitem>  
+      <para>  
+       The <literal>builtin</literal> provider uses built-in operations. Only  
+       the <literal>C</literal> locale is supported for this provider.  
+      </para>  
+      <para>  
+       The <literal>C</literal> locale behavior is identical to the  
+       <literal>C</literal> locale in the libc provider. When using this  
+       locale, the behavior may depend on the database encoding.  
+      </para>  
+     </listitem>  
+    </varlistentry>  
+  
+    <varlistentry>  
+     <term><literal>icu</literal></term>  
+     <listitem>  
+      <para>  
+       The <literal>icu</literal> provider uses the external  
+       ICU<indexterm><primary>ICU</primary></indexterm>  
+       library. <productname>PostgreSQL</productname> must have been  
+       configured with support.  
+      </para>  
+      <para>  
+       ICU provides collation and character classification behavior that is  
+       independent of the operating system and database encoding, which is  
+       preferable if you expect to transition to other platforms without any  
+       change in results. <literal>LC_COLLATE</literal> and  
+       <literal>LC_CTYPE</literal> can be set independently of the ICU  
+       locale.  
+      </para>  
+      <note>  
+       <para>  
+        For the ICU provider, results may depend on the version of the ICU  
+        library used, as it is updated to reflect changes in natural language  
+        over time.  
+       </para>  
+      </note>  
+     </listitem>  
+    </varlistentry>  
+  
+    <varlistentry>  
+     <term><literal>libc</literal></term>  
+     <listitem>  
+      <para>  
+       The <literal>libc</literal> provider uses the operating system's C  
+       library. The collation and character classification behavior is  
+       controlled by the settings <literal>LC_COLLATE</literal> and  
+       <literal>LC_CTYPE</literal>, so they cannot be set independently.  
+      </para>  
+      <note>  
+       <para>  
+        The same locale name may have different behavior on different  
+        platforms when using the libc provider.  
+       </para>  
+      </note>  
+     </listitem>  
+    </varlistentry>  
+   </variablelist>  

digoal's wechat