digoal
2024-03-14
PostgreSQL , PolarDB , DuckDB , "builtin" collation provider
Introduce "builtin" collation provider.
author Jeff Davis <jdavis@postgresql.org>
Thu, 14 Mar 2024 06:33:44 +0000 (23:33 -0700)
committer Jeff Davis <jdavis@postgresql.org>
Thu, 14 Mar 2024 06:33:44 +0000 (23:33 -0700)
commit 2d819a08a1cbc11364e36f816b02e33e8dcc030b
tree 1a8d3b459866d7df936faffa0e64f5e339e6a6c2 tree
parent 6ab2e8385d55e0b73bb8bbc41d9c286f5f7f357f commit | diff
Introduce "builtin" collation provider.
New provider for collations, like "libc" or "icu", but without any
external dependency.
Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.
The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.
By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.
Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
+ <para>
+ The available locale providers are listed below:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>builtin</literal></term>
+ <listitem>
+ <para>
+ The <literal>builtin</literal> provider uses built-in operations. Only
+ the <literal>C</literal> locale is supported for this provider.
+ </para>
+ <para>
+ The <literal>C</literal> locale behavior is identical to the
+ <literal>C</literal> locale in the libc provider. When using this
+ locale, the behavior may depend on the database encoding.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>icu</literal></term>
+ <listitem>
+ <para>
+ The <literal>icu</literal> provider uses the external
+ ICU<indexterm><primary>ICU</primary></indexterm>
+ library. <productname>PostgreSQL</productname> must have been
+ configured with support.
+ </para>
+ <para>
+ ICU provides collation and character classification behavior that is
+ independent of the operating system and database encoding, which is
+ preferable if you expect to transition to other platforms without any
+ change in results. <literal>LC_COLLATE</literal> and
+ <literal>LC_CTYPE</literal> can be set independently of the ICU
+ locale.
+ </para>
+ <note>
+ <para>
+ For the ICU provider, results may depend on the version of the ICU
+ library used, as it is updated to reflect changes in natural language
+ over time.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>libc</literal></term>
+ <listitem>
+ <para>
+ The <literal>libc</literal> provider uses the operating system's C
+ library. The collation and character classification behavior is
+ controlled by the settings <literal>LC_COLLATE</literal> and
+ <literal>LC_CTYPE</literal>, so they cannot be set independently.
+ </para>
+ <note>
+ <para>
+ The same locale name may have different behavior on different
+ platforms when using the libc provider.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+ </variablelist>