Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Support multiple collation versions #63738

Open
bdarnell opened this issue Apr 15, 2021 · 0 comments
Open

sql: Support multiple collation versions #63738

bdarnell opened this issue Apr 15, 2021 · 0 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@bdarnell
Copy link
Member

bdarnell commented Apr 15, 2021

Collated strings depend on the unicode CLDR database, which is embedded in the golang.org/x/text package. This database changes several times per year (current version is 39), although the copy in golang.org/x/text has remained pinned at version 23 for some time since Go does not yet implement the "fractional weights" feature that is required in newer versions of the data.

We persist strings derived from the collation tables in indexes containing collated strings. If we attempt to query these indexes using a different version of the collation table than was used when they were built, we may get incorrect results. Therefore, unless we want to remain at CLDR version 23 forever, we must store (for each index containing collated strings) the CLDR version that was used to build the index, and ensure that we use the same version for all queries and updates to that index. We'll also want some way to rebuild an index with a newer CLDR version so that old versions can eventually be phased out. (for comparison, collation data upgrades are painful in postgres too)

The x/text/collate package currently treats the CLDR version as a singleton. In order to implement this migration we would need to be able to load multiple CLDR versions at once, which would require either substantial changes to the x/text/collate APIs or moving to a different scheme in which we load collation tables from a data file.

Note that other parts of x/text use more recent CLDR versions, but these do not have (known) compatibility issues since we only use the collate package when generating strings to be persisted. However, when we implement full-text search we will depend on more parts of CLDR data, so we will likely need similar multi-version support for full-text search as well.

Regarding priority: I'm not aware of anyone asking for updated collation data (the changes tend to be fairly esoteric, although there was a change to the collation of emoji in version 32 that looks consequential), and the Go team does not appear to be actively working towards any updates in this area. However, when and if the upstream x/text package updates to a newer version of CLDR, this will become fairly important. We can stay pinned on an older version for a while, but we will need to stay on top of security updates (such as #63559) which may be burdensome if we're not up to date.

Jira issue: CRDB-6752

@bdarnell bdarnell added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 15, 2021
bdarnell added a commit to bdarnell/cockroach that referenced this issue Apr 15, 2021
Upgrading golang.org/x/text to a newere collate.CLDRVersion
will require a lot of work for compatibility (described in
issue cockroachdb#63738). Add an assertion to make sure such an upgrade
is not made accidentally.

Release note: None
@knz knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Apr 15, 2021
@knz knz added this to Triage in SQL Sessions - Deprecated via automation Apr 15, 2021
@knz knz added this to Triage in SQL Foundations via automation Apr 15, 2021
craig bot pushed a commit that referenced this issue Apr 15, 2021
63196: rowexec: optimize the inverted filterer r=yuzefovich a=yuzefovich

Previously, we were ignoring the first return parameter of
`prepareAddIndexRow` which indicates whether we should be adding a row
for inverted expr evaluation which is suboptimal (essentially we were
ignoring the result of prefiltering). This is now fixed.

Additionally, this commit adds some comments and a sanity check around
the usage of `batchedInvertedExprEvaluator`.

Release note: None

63739: sql/sem/tree: Add assertion against upgrading collation package r=knz a=bdarnell

Upgrading golang.org/x/text to a newere collate.CLDRVersion
will require a lot of work for compatibility (described in
issue #63738). Add an assertion to make sure such an upgrade
is not made accidentally.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
@ajwerner ajwerner removed this from Triage in SQL Foundations Apr 20, 2021
@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@rafiss rafiss moved this from Triage to Longer term backlog in SQL Sessions - Deprecated Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
No open projects
SQL Sessions - Deprecated
Longer term backlog
Development

No branches or pull requests

3 participants