sql: Support multiple collation versions #63738
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
X-anchored-telemetry
The issue number is anchored by telemetry references.
Projects
Collated strings depend on the unicode CLDR database, which is embedded in the
golang.org/x/text
package. This database changes several times per year (current version is 39), although the copy ingolang.org/x/text
has remained pinned at version 23 for some time since Go does not yet implement the "fractional weights" feature that is required in newer versions of the data.We persist strings derived from the collation tables in indexes containing collated strings. If we attempt to query these indexes using a different version of the collation table than was used when they were built, we may get incorrect results. Therefore, unless we want to remain at CLDR version 23 forever, we must store (for each index containing collated strings) the CLDR version that was used to build the index, and ensure that we use the same version for all queries and updates to that index. We'll also want some way to rebuild an index with a newer CLDR version so that old versions can eventually be phased out. (for comparison, collation data upgrades are painful in postgres too)
The
x/text/collate
package currently treats the CLDR version as a singleton. In order to implement this migration we would need to be able to load multiple CLDR versions at once, which would require either substantial changes to thex/text/collate
APIs or moving to a different scheme in which we load collation tables from a data file.Note that other parts of
x/text
use more recent CLDR versions, but these do not have (known) compatibility issues since we only use thecollate
package when generating strings to be persisted. However, when we implement full-text search we will depend on more parts of CLDR data, so we will likely need similar multi-version support for full-text search as well.Regarding priority: I'm not aware of anyone asking for updated collation data (the changes tend to be fairly esoteric, although there was a change to the collation of emoji in version 32 that looks consequential), and the Go team does not appear to be actively working towards any updates in this area. However, when and if the upstream
x/text
package updates to a newer version of CLDR, this will become fairly important. We can stay pinned on an older version for a while, but we will need to stay on top of security updates (such as #63559) which may be burdensome if we're not up to date.Jira issue: CRDB-6752
The text was updated successfully, but these errors were encountered: